Amazon Redshift as a Source Connection
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Amazon Redshift as a Source Connection

  • Dark
    Light
  • PDF

Article Summary

Introduction

This document describes the process of integrating Amazon Redshift as a Source within Rivery. By integrating Amazon Redshift with Rivery, users can easily extract their data warehoused in Redshift, transform it as needed, and load it into Targets for further analysis or operational use.

Prerequisites

Before using this guide, ensure the following prerequisites are met:

AWS Superuser Account:

Make sure you have signed up for AWS as a Superuser.

S3 Bucket, Policy, and Rivery User in AWS:

To enable Rivery to load source data into Redshift, you must create an S3 bucket, configure a bucket policy, and set up a Rivery user in the Amazon Platform Console. These are the essential requirements for the setup.

S3 Bucket

A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket.
Let's begin by logging into AWS and searching for Buckets:

Note:
This is a tour of the console. Please hover over the rippling dots and read the notes attached to follow through.
For a fullscreen view, click the green dot in the upper left corner.

Policy

A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it.
Now that you've created a bucket, let's create a policy to grant the necessary permissions:


Here's the policy's code:

    {
     "Version":"2012-10-17",
     "Statement":[
       {
        "Sid":"RiveryManageFZBucket",
        "Effect":"Allow",
        "Action":[
        "s3:GetBucketCORS",
        "s3:ListBucket",
        "s3:GetBucketLocation"
         ],
        "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>"
       },
       {
        "Sid":"RiveryManageFZObjects",
        "Effect":"Allow",
        "Action":[
          "s3:ReplicateObject",
          "s3:PutObject",
          "s3:GetObjectAcl",
          "s3:GetObject",
          "s3:PutObjectVersionAcl",
          "s3:PutObjectAcl",
          "s3:ListMultipartUploadParts"],
        "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>/*"
      },
      {
         "Sid":"RiveryHeadBucketsAndGetLists",
         "Effect":"Allow",
         "Action":"s3:ListAllMyBuckets",
         "Resource":"*"
      }
     ]
    }

Rivery User in AWS

Now, in order to connect to the Amazon S3 Source and Target (described in the following section) in Rivery console, you must first create an AWS Rivery user:

Rivery's S3 Connection

Follow our Amazon S3 connection tutorial by scrolling down to Connection Procedure section.


AWS Permissions Setup:

Create a Redshift Cluster

  1. If you don't already have a Redshift cluster, you can create one by following these instructions.
  2. Once your cluster is created, you need to authorize access to the cluster.

Create Your User and Schema within the Redshift Database

In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data. Follow these steps:

  1. Connect to your Redshift cluster using the credentials set during cluster creation, using your preferred SQL management tool (e.g., Aginity, SQL Workbench/J or DBeaver).
    Make sure the Redshift cluster is externally accessible. Obtain the host address from the Redshift cluster details.
    image6.png
  2. Create the user and schema by running the following SQL commands (replace "Rivery User" and "schema name" with your desired names):
        CREATE USER <Rivery User> with password '<myPwd>' CREATEUSER;
        CREATE SCHEMA IF NOT EXISTS <schema name> authorization <Rivery User>;
    
  3. Grant the necessary permissions to the user by running the following commands:
    GRANT SELECT on ALL TABLES IN SCHEMA <schema name> TO <Rivery User>;
    GRANT EXECUTE on ALL FUNCTIONS IN SCHEMA <schema name> TO <Rivery User>;
    GRANT USAGE on SCHEMA <schema name> TO <Rivery_User>
    

Replace "Rivery User" with the user previously created and "schema name" with the schema you grant Rivery permission to utilize. Repeat this process for every schema you wish for Rivery to access and utilize data.

Please Note:
Rivery conducts a review of the Access Control Lists (ACLs) to ensure that these permissions are replicated across all tables. The current Grant permissions found within the raw ACLs are structured as shown below:

rivery=arwdRxtDPA/rivery,group test_group=arwdxtD/rivery,user1_name=wx/rivery,user1_name=x/rivery

In this context, the string "arwdRxtDPA" denotes the permissions granted to the Rivery user. Each character within this string corresponds to a specific action, with 'a' representing INSERT, 'r' representing SELECT, 'w' representing UPDATE, and 'd' representing DELETE.
Rivery interprets this string, generates grant statements, and subsequently executes them individually on the respective tables.

Whitelist Our IPs

To whitelist Rivery's IPs, follow these steps:

  1. Log in to your AWS account and search for the "Amazon Redshift" service in the AWS Console.
    search_amazon.PNG
  2. Access the "clusters" section in the left sidebar.
    image.png
  3. Select the cluster you've created.
  4. Navigate to the "properties" tab and look for "VPC security group" under "Network and security settings." Click on the Security Group ID.
  5. In the "Inbound rules" tab, click "Edit inbound rules."
    image.png
  6. Add a rule of type "Redshift" and specify the required source IP addresses as outlined in the documentation.
  7. Save the rules to whitelist the specified IPs.
    image.png

Connection Procedure

  1. Type in the Connection Name.
  2. Enter your Host name.
  3. Fill in your Port.
  4. Enter your Username and Password.
  5. In the File Zone Connection section, make sure to choose the S3 connection that was previously established, and then select the associated S3 Bucket.
  6. Use the Test Connection function to see if your connection is up to the task.
  7. If the connection succeeded, you can now use this connection in Rivery.

image.png

SSL Options (Optional)

Key File

To establish an SSL connection and authenticate your Redshift user, you have the option to use a key file (.pem).

image.png

SSH (Optional)

Secure Shell Protocol (SSH) is used to provide secure and encrypted access to internal databases from Rivery servers in order to retrieve data.

Configure your SSH Tunnel server in your network to use SSH Tunneling via Rivery. This server will be able to connect to your database server via the internal IP or address, and will accept SSH connections from Rivery IPs. Rivery uses a key file to connect to the SSH server.

image.png


Was this article helpful?