Redshift Walkthrough
  • 6 Minutes to read
  • Dark
    Light
  • PDF

Redshift Walkthrough

  • Dark
    Light
  • PDF

This guide will walk you through the steps needed to integrate data from a Redshift database into a cloud target using Rivery.

The following topics will be covered:

Configure Redshift as a Source

Create an AWS Policy for the Rivery User

  1. Go to IAM in the AWS Console -> Policies

  2. Select the Create Policy button.

  3. In the Policy Creation wizard, go to the JSON tab, and paste the policy below into that tab. Please don’t forget to replace the bolded areas with your current settings as set earlier in this guide.

    Replace the <RiveryFileZoneBucket> with the S3 bucket name you created earlier.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "ManagingFZPolicy", "Effect": "Allow", "Action": [ "s3:ReplicateObject", "s3:PutObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:RestoreObject", "s3:GetObjectTagging", "s3:ListBucket", "s3:DeleteObject", "s3:GetBucketLocation", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::<RiveryFileZoneBucket>", "arn:aws:s3:::<RiveryFileZoneBucket>/*" ] }, { "Sid": "HeadingBuckets", "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets", "s3:HeadBucket" ], "Resource": "*" } ] }
    
    Shell
  4. After pasting the policy, click on Review Policy.

  5. Name the policy. (i.e. RiveryFileZoneBucketPolicy). Click on Create Policy.

Create a user for Rivery

We recommend having a specific Rivery user in AWS that has its own permissions for interfacing with the platform.

How to create a user for Rivery:

  1. Sign into the AWS Console.

  2. Go to IAM in the AWS Console -> Users -> Add User.

  3. In the wizard, set your username (i.e: RiveryUser), and select the Programmatic Access type. Click Next .

    image4.png

  4. In Permissions, choose the Attach Existing Policy Directly option. From here, search and choose the FileZone policy you've created above.image.png

  5. In the Review Screen - press Next.

  6. Once complete, you will get the user’s AWS credentials (aws_access_key_id, aws_secret_access_key) and the user's ARN.

  7. Save these credentials in a safe place! You will need these later on in Rivery.


Connect To Redshift in the Rivery Console

To connect to your Redshift Database, you must first set up your S3 file zone and database credentials.

Define your S3 File Zone connection

Rivery needs an S3 bucket to be a FileZone before your data is loading from Redshift.

You can find more information on how to create an S3 connection in Rivery here, or use an existing S3 connection if you already have one.

Define your Redshift connection

  1. In Rivery’s main menu, Go to Connections -> Create Connection, or create your new connection in the Source Tab, by clicking Create New Connection.

  2. In the pop-out window, define your credential details: Please be advised about whitelisting Rivery's IPs on your Database Firewall.
    Please note - In EU server there are 4 IPs to whitelist in the Database Firewall:   


    3.1 Define your Host, Port, Database, Username, and Password.

    3.2 Select the S3 file zone connection you've created:3.3 Select a bucket to load the data into.

Recommended:

Using SSH Tunneling in the Redshift Connection


Using a hybrid method of data processing, Rivery can connect to servers using an SSH Tunneling.

To use the SSH Tunneling option in Rivery, please configure your SSH Tunnel server in your network. This server will have the ability to get SSH connections from Rivery IPs, and will be able to connect to your database server via the internal IP or address. Rivery connects to that SSH server using a private key.

After you've configured SSH tunneling, you can set it up in the Rivery Connection:

image

Under the SSH Options section, in the Redshift Connection Pop-up, you can define the following:

  • SSH Hostname
  • SSH Port (in most of the connection, that should be 22)
  • The Username Rivery should connect via the SSH to the server
  • Password (optional)
  • SSH Pem Key (Recommended)
  • And if you have also a PemPassword for that key file.

Pull Redshift Data Into a Target

Using Rivery, you can pull data from your Redshift tables and send that data into your target database.

First select 'Create New River' from the top right of the Rivery screen.

image

Choose 'Data Source to Target' as your river type.

In the 'General Info' tab, name your river and give it a description. Next, navigate to the 'Source' tab.

Find Redshift in the list of data sources and select it:

Define a Source Connection (this will be the connection created earlier in the process). If you do not yet have a Redshift connection in your Rivery account, you can create a new connection here by clicking 'Create New Connection.'

Next, choose your River mode.

image

  • Multi-Tables : Load multiple tables simultaneously from Redshift to your target.
  • Custom Query : Create a custom query and load it into your target.
  • Union Tables : Merge several tables with the same metadata into a single table.
  • Legacy River: Choose a single source table to load into a single target.


Database Migration

For a detailed walkthrough of Multi-Tables mode, please refer to the Rivery Database Migration docs.


Pulling Data Using a Custom Query

You may also use the Custom Query in order to define a query data pull from Rivery. You may use any query that works in the source, using a specific SELECT query without any other statements. Rivery isn’t compatible using multi-statements or SQL Script in the custom query field.

image


Extract Method

Using Rivery, you can pull your data incrementally or pull the entirety of the data that exists in the table:

image

  • All: Fetch all data in the table using Chunks.

  • Incremental: Gives you the option to run over a column in a table or your custom query top SELECT. You can run over it by filtering Start and End dates or Epoch times

    Moreover, you may choose to run over the date using Daily, Monthly, Weekly, or Yearly chunks.

    Please define the incremental Field to be used in the Incremental Field section. After choosing the incremental field, choose the Incremental Type and the dates/values you would like to fetch.

    Note: Rivery will manage the increments over the runs using the Maximum value in the data. This means you will always get the entire data since the last run, which prevents data holes. You just need to configure your river once.

    Recommended : Define your incremental field in Rivery over a field with an Index or Partitions key in the table.

    image


Limit and Auto Mapping

After defining the extract method, you may choose a limit of top N rows to fetch. Rivery will set your Schema using the Auto Mapping feature. You can also choose fields you want to fetch in the Mapping table and add fields on your own.

image


Union Tables Mode

For like-named tables with identical schemas, the Union Tables river mode can be used to ingest these tables in a single process. Simply enter the table prefix using '*' as a wildcard and then

image

You can click 'Search Tables' below to return a list of tables in the database.

image

Next, choose your extraction method and map the columns by clicking 'Auto Mapping.' Note - the column names and data types are expected to be identical in all tables that are selected to load through the Union Tables mode.

image


Legacy River Mode

This river mode allows for a load of a single source table into a single target table.

image

In the above screenshot, there are steps to define the source table to pull (Rivery will auto-detect available schemas and tables), the extraction method to use, and options for filters or row limits on the data pull.


Multi-Table Mode

Load multiple tables simultaneously from SQL Server to your target. There are two Default Extraction Modes: Standard Extraction and Log-Based.

image.png

On the 'Table Settings' tab you are able to edit the following:

  • Change the loading mode
  • Change the extraction method. If 'Incremental' is selected, you can then define which field will be used to define the increment.
  • Filter by expression that will be used as a WHERE clause to fetch the selected data from the table.

Schedule The River


Once the creation of the river is complete, navigate to the 'Schedule' tab and click 'Schedule Me.'

image

Choose the frequency at which to schedule the river.

image

To notify certain users about a river failure or warning, enable notifications below:

image

You can edit your {Mail_Alert_Group} in the Variables page (find this in the left-hand pane of the browser).

Monitor The River


During the river run, or after the run has complete, you can monitor the river in its 'Activities' tab.

image

In this tab, you can monitor each of the status of the current river run. For the Multi-Table mode, you can monitor at the table-level (see above).

By toggling between 'Run View' and 'Target View' you can see the river results grouped either by time of run or by target location.

Check out the Targets section to find out how to load the data to your target data warehouse.



Type Mapping

We match Redshift data types to Rivery data types as we extract your data.

The following table shows the mapping of Redshift data types to Rivery-compatible types:

Redshift TypeRivery Type
INTERVALTIMESTAMP
DOUBLE_PRECISIONFLOAT
FLOATFLOAT
CHARSTRING
DATEDATE
VARCHARSTRING
TIMESTAMPTIMESTAMP
ENUMSTRING
BOOLEANBOOLEAN
ARRAYSTRING
arraySTRING
BYTEASTRING
BITBOOLEAN
REALFLOAT
BIGINTINTEGER
TEXTSTRING
SMALLINTINTEGER
NUMERICFLOAT
INTEGERINTEGER
CIDRSTRING
DropEnumTypeSTRING
INETSTRING
MACDDRSTRING
OIDSTRING
TSVECTORSTRING
UUIDSTRING
DATERANGEINTEGER
INT4RANGEINTEGER
INT8RANGEINTEGER
JSONSTRING
JSONBSTRING
NUMRANGEFLOAT
TSRANGEFLOAT
HSTORESTRING
SUPERJSON



Was this article helpful?