SFTP Walkthrough
  • 2 Minutes to read
  • Dark
    Light
  • PDF

SFTP Walkthrough

  • Dark
    Light
  • PDF

Prerequisites

  • SFTP Connection

SFTP File Formats

The available formats for extractions in SFTP are CSV, EXCEL, and JSON for mapping and loading onto one of our operational data stores, such as Snowflake, Redshift, Google Big Query, etc. If you want you can load other types of files onto a storage cloud, such as Google Cloud Storage, S3, etc. using the "other" source file type.

We also support reading the above formats from compressed files such as zip files. To read those files please select the checkbox Is Compressed.

The Excel type will have more explanation following the extraction method.

Extraction Methods

In each SFTP river, you can choose what kind of extraction to pull your files with. You can choose the default "Run all" to pull everything that has a given pattern, or by selecting only files which were modified between a given interval of time:

In the following example, every file from the Root/test/*.json full file path until the modified datetime 2020-07-08 00:00:00 will be extracted. 


Supported wild cards in File pattern:

Pattern

Meaning

*

matches everything

?

matches any single character

[seq]

matches any character in seq

[!seq]

matches any character not in seq


Notes: 


  • For the File Pattern, If you'd like to add underscores '_' within the Prefix or File Pattern, they'll have to be put in the following format [!_]. 

    • example: *test_test.xlsx 
      File Pattern: *test[!_]test.xlsx
  • For the File Patterns using a prefix, you may need to add an asterisk before the prefix as well. 
    • example: contracts2020111312315151.csv (where you want all CSV files with the word contracts)
      File Pattern: *contracts*.csv

In this scenario coming from the source itself, the file excel_test1.xlsx will be extracted.


If you wish to pull data based on its name and not modified time, you can choose the extract method: incremental run: by template.

With the template, you can filter different sets of files with a given date or other differentiating string in their name. For example:


With this setup you will extract the files highlighted in the following image:


Please Note:

The Start Date won't be advanced if a River run is unsuccessful.

If you don't want this default setting, click More Options and check the box to advance the start date even if the River run is unsuccessful (Not recommended).


Excel File Filters

If you wish to extract excel files, Rivery support old and new types (xls and xlsx extensions).

In the following image:

You can choose to select is compressed in case the excel is compressed using zip or gzip for example.

Usually, the first row is the header, but you can choose to select other rows. If that is the case and your dataset starts after that row, please set the "start loading rows from row number" to the consecutive row after the header.

Here is an example of a header starting at row 3:

When you finish setting the filters, press the Source automapping.

If you wish to extract some columns, but not all of them, you can leave out those fields which you don't want by pressing the 'x' button.

Here is an example of the target mapping when column_2 was left out:

Built-in Actions After Loading to Target

You can choose one of our built-in actions after loading the files to your target:


1. Remain in original place - Do nothing.

2. Move to archive path - Archive the file into a different path in your SFTP.


3. Deleted -Delete the file.


Was this article helpful?