- 4 Minutes to read
- Print
- DarkLight
- PDF
Setting up Amazon Redshift as a Target
- 4 Minutes to read
- Print
- DarkLight
- PDF
This guide will show you how to create your first Amazon Redshift cluster, how to create an S3 bucket, and how to get the right credentials for using Redshift with Rivery. By the end of this guide, you will be able to quickly setup your new Redshift cluster in Rivery.
Prerequisites
Please make sure you've signed up for AWS as a Superuser before using this guide.
Rivery uses an S3 bucket for loading source data, and from there the data is sent into Redshift. Therefore, you need to create a user in the Amazon Platform Console that will have access to the relevant bucket and Redshift cluster.
These are the requirements:
- Bucket
- Policy
- Rivery User in AWS
Bucket
A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket.
Let's begin by logging into AWS and searching for Buckets:
Note:
This is a tour of the console. Please hover over the rippling dots and read the notes attached to follow through.
For a fullscreen view, click the green dot in the upper left corner.
Policy
A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it.
Now that you've created a bucket, let's create a policy to grant the necessary permissions:
Here's the policy's code:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":"arn:aws:s3:::<RiveryFileZoneBucket>"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::<RiveryFileZoneBucket>/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
Rivery User in AWS
Now, in order to connect to the Amazon S3 Source and Target (described in the following section) in Rivery console, you must first create an AWS Rivery user:
Rivery's S3 Connection
Follow our Amazon S3 connection tutorial by scrolling down to Connection Procedure section.
Create a Redshift Cluster
Now, let’s create a Redshift cluster for Rivery If you don’t have one, you can create one by following these instructions .
Open your cluster and authorize access to the cluster by following these tasks.
Create your Schema and User in the Redshift Database
In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data.
Connect to your Redshift cluster with the credentials set in the cluster creation step with your preferred SQL management tool.
We usually use Aginity, SQL Workbench/J or DBeaver.
Note - the Redshift cluster must be accessible externally. To test this, connect to the cluster from a local machine using one of the above tools.
You can find your host address in the Redshift cluster details. Choose the URL from the tool you use.
Create the user and schema - In the SQL Management tool, run the following commands (replace the rivery_user_name and my_schema_name with your relevant names):
CREATE USER <rivery_user_name> with password '<myPwd>' CREATEUSER; CREATE SCHEMA IF NOT EXISTS <my_schema_name> authorization <rivery_user_name>;
Security Group Access
Login in to your AWS account.
Search for the Amazon Redshift service in the search bar.
Click on clusters in the left side panel.
Under clusters, click on the cluster you've previously created.
Click on the properties tab.
Under Network and security settings search for VPC security group and click on the security group.Click on the Inbound rules tab and on the Edit inboud rules button
In the new window click on Add rule.
Under type select Redshift and for the Source enter one of the IPs required above, repeat steps 7-8 untill you have whitelisted all the IPs.
Click on Save rules.
Rivery's Redshift Connection
You can connect to Redshift using one of two methods:
- Login Credentials
- SSH Tunnel
Login Credentials
- Whitelist our IPs.
- Choose your Connection Name.
- Enter Host.
- Fill in the Port Number.
- Enter your Username and Password
- Use the Test Connection function to see if your connection is up to the task.
If the connection succeeded, you can now use this connection in Rivery.
SSH Tunnel
To connect via SSH Tunnel, follow our step-by-step tutorial.
Filezone
Rivery allows you to create a custom FileZone to manage your data in your own S3 service. Your data will be stored in your bucket.
To use Filezone, follow the guidelines listed below:
Toggle the Custom File Zone to true.
By clicking on FileZone Connection, you can select the previously configured Amazon S3 Connection.
Choose a Default Bucket from the drop-down list.
Use the Test Connection function to see if your connection is up to the task.
If the connection was successful, click Save.
Type Mapping
We match Redshift data types to Rivery data types as we extract your data.
The following table shows the mapping of Redshift data types to Rivery-compatible types:
Amazon Redshift Type | Rivery Type |
---|---|
INTERVAL | TIMESTAMP |
DOUBLE_PRECISION | FLOAT |
FLOAT | FLOAT |
CHAR | STRING |
DATE | DATE |
VARCHAR | STRING |
TIMESTAMP | TIMESTAMP |
ENUM | STRING |
BOOLEAN | BOOLEAN |
ARRAY | STRING |
array | STRING |
BYTEA | STRING |
BIT | BOOLEAN |
REAL | FLOAT |
BIGINT | INTEGER |
TEXT | STRING |
SMALLINT | INTEGER |
NUMERIC | FLOAT |
INTEGER | INTEGER |
CIDR | STRING |
DropEnumType | STRING |
INET | STRING |
MACDDR | STRING |
OID | STRING |
TSVECTOR | STRING |
UUID | STRING |
DATERANGE | INTEGER |
INT4RANGE | INTEGER |
INT8RANGE | INTEGER |
JSON | STRING |
JSONB | STRING |
NUMRANGE | FLOAT |
TSRANGE | FLOAT |
HSTORE | STRING |
SUPER | JSON |
Note:
If you want to use SUPER datatypes, simply change the mapping from Object to JSON and the SUPER datatype will be generated.