Amazon S3 Walkthrough
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Amazon S3 Walkthrough

  • Dark
    Light
  • PDF

Article Summary

A Guide for getting data from Gmail to Rivery.

Prerequisites

An Amazon S3 connection.

How to pull data from Amazon S3 using Rivery

First, select 'Create New River' from the top right of the Rivery screen.

Choose 'Data Source to Target' as your river type.

In the 'General Info' tab, name your river, describe it and choose a group.
Next, navigate to the 'Source' tab.

Find Amazon S3 in the list of data sources and select it. (under Storage)

 

1. Under Source Connection, select the connection you created, or create a new one.

2.  Select the desired bucket name from the list.

3. Choose an extract method:

Run all - returns data from all times 

Incremental run: by modified timestamp

  • Pulls data in the date range between the start and end date provided, including the end date.
  • You must select a start date.

Leaving the end date empty will pull data according to the current time of the river's run

Please Note:

The Start Date won't be advanced if a River run is unsuccessful.

If you don't want this default setting, click More Options and check the box to advance the start date even if the River run is unsuccessful (Not recommended).


For those extract methods fill the file path prefix and the file pattern that you want to filter by.

Incremental run: by template - Templates gives you the option to run over folders and load the files according to the folder order. You just need to choose your template type and write your template structure. 

  • Timestamp template - Use {} and proper timestamp part in order to define the folder format.
  • Epoc time template - Use {e} (for an epoc) or  {ee} (for an epoc in milliseconds) fie in order to define the folder to be able to run it by epoc time. Enter the desired start value (required) and end value (optional).
  • Note: This method is valid for whole library and not available for files.

4.  Select the after pulling the desired method :

  • Remain on original place
  • Move to archive path:  choose the container name and archived folder path (optional).
  • Deleted

5. After choosing the target, select the desired file type: CSV or JSON.

Note: The supported JSON format is jsonlines format only. 

Known Issues 

  • The suffix "000", "001"  and so on will be added to the suffix according to every 5 gigabytes of data.
  • In order to load the data to the Gmail target as a CSV file:
  1. Choose a temporary target (not Gmail): Snowflake. Big Query, Redshift, etc. Doesn't necessary to have a connection to the chosen target so you can press cancel if a pop up window jumps.
  2. Choose the file type "CSV".
  3. Replace your target back to your Gmail.
  • The path structure:  my_bucket/folder1/folder2/file.csv*  
  1. Bucket = my_bucket
  2. PREFIX = folder1/folder2/file.csv
  3. FILENAME = *


Important
For compressed files we only support .gzip (zip is not supported at this time)

Was this article helpful?