Setting up Amazon Redshift as a Target
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Setting up Amazon Redshift as a Target

  • Dark
    Light
  • PDF

Article Summary

This guide will show you how to create your first Amazon Redshift cluster, how to create an S3 bucket, and how to get the right credentials for using Redshift with Rivery. By the end of this guide, you will be able to quickly setup your new Redshift cluster in Rivery.

Prerequisites

Please make sure you've signed up for AWS as a Superuser before using this guide.

Rivery uses an S3 bucket for loading source data, and from there the data is sent into Redshift. Therefore, you need to create a user in the Amazon Platform Console that will have access to the relevant bucket and Redshift cluster.

These are the requirements:

  • Bucket
  • Policy
  • Rivery User in AWS

Bucket

A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket.
Let's begin by logging into AWS and searching for Buckets:

Note:
This is a tour of the console. Please hover over the rippling dots and read the notes attached to follow through.
For a fullscreen view, click the green dot in the upper left corner.

Policy

A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it.
Now that you've created a bucket, let's create a policy to grant the necessary permissions:


Here's the policy's code:

{
 "Version":"2012-10-17",
 "Statement":[
   {
    "Sid":"RiveryManageFZBucket",
    "Effect":"Allow",
    "Action":[
    "s3:GetBucketCORS",
    "s3:ListBucket",
    "s3:GetBucketLocation"
     ],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>"
   },
   {
    "Sid":"RiveryManageFZObjects",
    "Effect":"Allow",
    "Action":[
      "s3:ReplicateObject",
      "s3:PutObject",
      "s3:GetObjectAcl",
      "s3:GetObject",
      "s3:PutObjectVersionAcl",
      "s3:PutObjectAcl",
      "s3:ListMultipartUploadParts"],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>/*"
  },
  {
     "Sid":"RiveryHeadBucketsAndGetLists",
     "Effect":"Allow",
     "Action":"s3:ListAllMyBuckets",
     "Resource":"*"
  }
 ]
}

Rivery User in AWS

Now, in order to connect to the Amazon S3 Source and Target (described in the following section) in Rivery console, you must first create an AWS Rivery user:

Rivery's S3 Connection

Follow our Amazon S3 connection tutorial by scrolling down to Connection Procedure section.



Create a Redshift Cluster

  1. Now, let’s create a Redshift cluster for Rivery If you don’t have one, you can create one by following these instructions .

  2. Open your cluster and authorize access to the cluster by following these tasks.

Create your Schema and User in the Redshift Database

In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data.

  1. Connect to your Redshift cluster with the credentials set in the cluster creation step with your preferred SQL management tool.

    We usually use Aginity, SQL Workbench/J or DBeaver.

    Note - the Redshift cluster must be accessible externally. To test this, connect to the cluster from a local machine using one of the above tools.

    You can find your host address in the Redshift cluster details. Choose the URL from the tool you use.

    image6.png

  2. Create the user and schema - In the SQL Management tool, run the following commands (replace the rivery_user_name and my_schema_name with your relevant names):

    CREATE USER <rivery_user_name> with password '<myPwd>' CREATEUSER;
    CREATE SCHEMA IF NOT EXISTS <my_schema_name> authorization <rivery_user_name>;
    

Security Group Access

  1. Whitelist our IPs.

  2. Login in to your AWS account.

  3. Search for the Amazon Redshift service in the search bar.
    search_amazon.PNG

  4. Click on clusters in the left side panel.
    clusters.PNG

  5. Under clusters, click on the cluster you've previously created.

  6. Click on the properties tab.
    Under Network and security settings search for VPC security group and click on the security group.

  7. Click on the Inbound rules tab and on the Edit inboud rules button
    edit_inboundrules.PNG

  8. In the new window click on Add rule.

  9. Under type select Redshift and for the Source enter one of the IPs required above, repeat steps 7-8 untill you have whitelisted all the IPs.

  10. Click on Save rules.
    ips.PNG



Rivery's Redshift Connection

You can connect to Redshift using one of two methods:

  • Login Credentials
  • SSH Tunnel

Login Credentials

  1. Whitelist our IPs.
  2. Choose your Connection Name.
  3. Enter Host.
  4. Fill in the Port Number.
  5. Enter your Username and Password
  6. Use the Test Connection function to see if your connection is up to the task.
    If the connection succeeded, you can now use this connection in Rivery.

SSH Tunnel

To connect via SSH Tunnel, follow our step-by-step tutorial.

Filezone

Rivery allows you to create a custom FileZone to manage your data in your own S3 service. Your data will be stored in your bucket.

To use Filezone, follow the guidelines listed below:

  1. Toggle the Custom File Zone to true.

  2. By clicking on FileZone Connection, you can select the previously configured Amazon S3 Connection.

  3. Choose a Default Bucket from the drop-down list.

  4. Use the Test Connection function to see if your connection is up to the task.
    If the connection was successful, click Save.

image.png



Type Mapping

We match Redshift data types to Rivery data types as we extract your data.

The following table shows the mapping of Redshift data types to Rivery-compatible types:

Amazon Redshift TypeRivery Type
INTERVALTIMESTAMP
DOUBLE_PRECISIONFLOAT
FLOATFLOAT
CHARSTRING
DATEDATE
VARCHARSTRING
TIMESTAMPTIMESTAMP
ENUMSTRING
BOOLEANBOOLEAN
ARRAYSTRING
arraySTRING
BYTEASTRING
BITBOOLEAN
REALFLOAT
BIGINTINTEGER
TEXTSTRING
SMALLINTINTEGER
NUMERICFLOAT
INTEGERINTEGER
CIDRSTRING
DropEnumTypeSTRING
INETSTRING
MACDDRSTRING
OIDSTRING
TSVECTORSTRING
UUIDSTRING
DATERANGEINTEGER
INT4RANGEINTEGER
INT8RANGEINTEGER
JSONSTRING
JSONBSTRING
NUMRANGEFLOAT
TSRANGEFLOAT
HSTORESTRING
SUPERJSON

Note:
If you want to use SUPER datatypes, simply change the mapping from Object to JSON and the SUPER datatype will be generated.


Was this article helpful?