Setting up Amazon Redshift as a Target
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Setting up Amazon Redshift as a Target

  • Dark
    Light
  • PDF

Getting started with Rivery and Redshift:

This guide will show you how to create your first Amazon Redshift cluster, how to create an S3 bucket, and how to get the right credentials for using Redshift with Rivery. By the end of this guide, you will be able to quickly setup your new Redshift cluster in Rivery.

Prerequisites

Before you use this guide, please make sure you’ve signed up for AWS and that you have a console admin user (super user).

If you don’t have one of these prerequisites, you can start here .

Rivery uses an S3 bucket for loading source data, and from there the data is sent into Redshift. Therefore, you need to create a user in the Amazon Platform Console that will have access to the relevant bucket and Redshift cluster.

Custom Filezone in Amazon Redshift

Rivery comes with its own FileZone by default, but if you wish to use your own S3 bucket as a FileZone before loading data into Redshift, follow our step-by-step tutorial.

Create a User for Rivery

We recommend having a specific Rivery user in AWS that has its own permissions for interfacing with the platform.

How to create a user for Rivery:

  1. Sign into the AWS Console

  2. Go to IAM in the AWS Console -> Users -> Add User

  3. In the wizard, set your username (i.e: RiveryUser), and select the Programmatic Access type. Click Next .
    set_user_details.PNG

  4. In Permissions, choose the Attach Existing Policy Directly option. From here, search and choose the FileZone policy you've created above.

  5. image.png

  6. In the Review Screen - press Next

  7. Once complete, you will get the user’s AWS credentials (aws_access_key_id, aws_secret_access_key) and the user's ARN .

  8. Save these credentials in a safe place! You will need these later on in Rivery.

Create a Redshift Cluster

  1. Now, let’s create a Redshift cluster for Rivery If you don’t have one, you can create one by following these instructions .

  2. Open your cluster and authorize access to the cluster by following these tasks.

Create your Schema and User in the Redshift Database

In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data.

  1. Connect to your Redshift cluster with the credentials set in the cluster creation step with your preferred SQL management tool.

    We usually use Aginity, SQL Workbench/J or DBeaver.

    Note - the Redshift cluster must be accessible externally. To test this, connect to the cluster from a local machine using one of the above tools.

    You can find your host address in the Redshift cluster details. Choose the URL from the tool you use.

    image6.png

  2. Create the user and schema - In the SQL Management tool, run the following commands (replace the rivery_user_name and my_schema_name with your relevant names):

    CREATE USER <rivery_user_name> with password '<myPwd>' CREATEUSER;
    CREATE SCHEMA IF NOT EXISTS <my_schema_name> authorization <rivery_user_name>;
    

Security Group Access

  1. Whitelist our IPs.

  2. Login in to your AWS account.

  3. Search for the Amazon Redshift service in the search bar.
    search_amazon.PNG

  4. Click on clusters in the left side panel.
    clusters.PNG

  5. Under clusters, click on the cluster you've previously created.

  6. Click on the properties tab.
    Under Network and security settings search for VPC security group and click on the security group.

  7. Click on the Inbound rules tab and on the Edit inboud rules button
    edit_inboundrules.PNG

  8. In the new window click on Add rule.

  9. Under type select Redshift and for the Source enter one of the IPs required above, repeat steps 7-8 untill you have whitelisted all the IPs.

  10. Click on Save rules.
    ips.PNG

Configure your Redshift in Rivery

How to setup Redshift for use in Rivery:

  1. Log into Rivery.

  2. Next, set your bucket as the default S3 bucket in Rivery:

    • In the main menu, go to Variables
    • Set your {aws_file_zone} variable value to the s3 bucket name that you’ve created. This will be saved automatically by clicking anywhere on the screen.
    • If you don’t have the { aws_file_zone } variable, you should add a new variable with that name and the S3 bucket name that you’ve created as the value. Press on + Add Variable .

variables.PNG

  1. Let’s create a new connection for your Redshift instance:

    • Go to Connections.
    • Select New Connection .
    • From the source list, choose Amazon Redshift.

ama_icon.PNG
Now, enter your credentials information from the Redshift cluster and database (ie. host address, database, username and password of riveryusername ) and for the Rivery user that you’ve created.
redshift_connection.PNG
* Choose the file zone region from the drop down list (Default: us-east-1 (N.Virginia)).
* Choose the desired authentication method. There are 3 options to choose from:
1. AWS Keys :
aws_keys.PNG
Enter the credentials information that you created earlier in this guide
* AWS_access_key_id
* AWS_secret_access_key
2. IAM Role - Automatic :
Follow the detailed instructions in the connection window to create the Role ARN Key
3. IIAM Role - Manual :
Follow the detailed instructions in the connection window to create the Role ARN Key

  • You can test your connection by clicking the Test Connection button.
  • Give your connection a name and click Save .
  1. You can now use this connection in any river that has Redshift as the database target.

Type Mapping

We match Redshift data types to Rivery data types as we extract your data.

The following table shows the mapping of Redshift data types to Rivery-compatible types:

MongoDB Type Rivery Type
INTERVAL TIMESTAMP
DOUBLE_PRECISION FLOAT
FLOAT FLOAT
CHAR STRING
DATE DATE
VARCHAR STRING
TIMESTAMP TIMESTAMP
ENUM STRING
BOOLEAN BOOLEAN
ARRAY STRING
array STRING
BYTEA STRING
BIT BOOLEAN
REAL FLOAT
BIGINT INTEGER
TEXT STRING
SMALLINT INTEGER
NUMERIC FLOAT
INTEGER INTEGER
CIDR STRING
DropEnumType STRING
INET STRING
MACDDR STRING
OID STRING
TSVECTOR STRING
UUID STRING
DATERANGE INTEGER
INT4RANGE INTEGER
INT8RANGE INTEGER
JSON STRING
JSONB STRING
NUMRANGE FLOAT
TSRANGE FLOAT
HSTORE STRING
SUPER JSON

Note:
If you want to use SUPER datatypes, simply change the mapping from Object to JSON and the SUPER datatype will be generated.

Further Reading

Check out the Best Practices, Considerations, and Requirements for Using Redshift and Rivery article when using Redshift as a target in Rivery.


Was this article helpful?