- 2 Minutes to read
- Print
- DarkLight
- PDF
FTP Walkthrough
- 2 Minutes to read
- Print
- DarkLight
- PDF
Prerequisites
- FTP Connection
FTP File Formats
The available formats for extractions in FTP are CSV, EXCEL, and JSON for mapping and loading onto one of our operational data stores, such as Snowflake, Redshift, Google Big Query, etc. If you want you can load other types of files onto a storage cloud, such as Google Cloud Storage, S3, etc. using the "other" source file type.
The Excel type will have more explanation following the extraction method.
Extraction Methods
In each FTP river, you can choose what kind of extraction to pull your files with. You can choose the default "Run all" to pull everything that has a given pattern, or by selecting only files which were modified between a given interval of time:
In the following example, every file from the Root/test/*.json full file path until the modified datetime 2020-07-08 00:00:00 will be extracted.
Supported wild cards in File pattern:
Pattern | Meaning |
---|---|
| matches everything |
| matches any single character |
| matches any character in seq |
| matches any character not in seq |
Notes:
For the File Pattern, If you'd like to add underscores '_' within the Prefix or File Pattern, they'll have to be put in the following format [!_].
- example: *test_test.xlsx
File Pattern: *test[!_]test.xlsx
- example: *test_test.xlsx
- For the File Patterns using a prefix, you may need to add an asterisk before the prefix as well.
- example: contracts2020111312315151.csv (where you want all csv files with the word contracts)
File Pattern: *contracts*.csv
- example: contracts2020111312315151.csv (where you want all csv files with the word contracts)
In this scenario coming from the source itself, the file excel_test1.xlsx will be extracted.
If you wish to pull data based on its name and not modified time, you can choose the extract method: incremental run: by template.
With the template, you can filter different sets of files with a given date or other differentiating string in their name. For example:
With this setup you will extract the files highlighted in the following image:
Excel File Filters
If you wish to extract excel files, Rivery support old and new types (xls and xlsx extensions).
In the following image:
You can choose to select is compressed in case the excel is compressed using zip or gzip for example.
Usually, the first row is the header, but you can choose to select other rows. If that is the case and your dataset starts after that row, please set the "start loading rows from row number" to the consecutive row after the header.
Here is an example of a header starting at row 3:
When you finish setting the filters, press the Source automapping.
If you wish to extract some columns, but not all of them, you can leave out those fields which you don't want by pressing the 'x' button.
Here is an example of the target mapping when column_2 was left out:
Built-in Actions After Loading to Target
You can choose one of our built-in actions after loading the files to your target:1. Remain in original place - Do nothing.
2. Move to archive path - Archive the file into a different path in your FTP.
3. Deleted -Delete the file.