Amazon Redshift as a Source Connection

3 Minutes to read

Print
Share
Dark
Light
PDF

Amazon Redshift as a Source Connection

3 Minutes to read

Print
Share
Dark
Light
PDF

Article summary

Did you find this summary helpful?

Thank you for your feedback!

Introduction

This document describes the process of integrating Amazon Redshift as a Source within Rivery. By integrating Amazon Redshift with Rivery, users can easily extract their data warehoused in Redshift, transform it as needed, and load it into Targets for further analysis or operational use.

Rivery offers flexible options for data management. You can securely store your data in your own Custom File Zone, giving you control over your designated storage. Alternatively, you can use Rivery's default Managed File Zone, which requires no setup. For organizations needing greater control over data storage and management, the Custom File Zone is available. In both cases, a bucket serves as a repository for objects, where each object represents a file along with its associated metadata.

Prerequisite

AWS Superuser Account:

Make sure you have signed up for AWS as a Superuser.

AWS Permissions Setup:

Create a Redshift Cluster

If you don't already have a Redshift cluster, you can create one by following these instructions.
Once your cluster is created, you need to authorize access to the cluster.

Create Your User and Schema within the Redshift Database

In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data. Follow these steps:

Connect to your Redshift cluster using the credentials set during cluster creation, using your preferred SQL management tool (e.g., Aginity, SQL Workbench/J or DBeaver).
Make sure the Redshift cluster is externally accessible. Obtain the host address from the Redshift cluster details.

Create the user and schema by running the following SQL commands (replace "Rivery User" and "schema name" with your desired names):

    CREATE USER <Rivery User> with password '<myPwd>' CREATEUSER;
    CREATE SCHEMA IF NOT EXISTS <schema name> authorization <Rivery User>;

Grant the necessary permissions to the user by running the following commands:

GRANT SELECT on ALL TABLES IN SCHEMA <schema name> TO <Rivery User>;
GRANT EXECUTE on ALL FUNCTIONS IN SCHEMA <schema name> TO <Rivery User>;
GRANT USAGE on SCHEMA <schema name> TO <Rivery_User>

Replace "Rivery User" with the user previously created and "schema name" with the schema you grant Rivery permission to utilize. Repeat this process for every schema you wish for Rivery to access and utilize data.

Please Note:
Rivery conducts a review of the Access Control Lists (ACLs) to ensure that these permissions are replicated across all tables. The current Grant permissions found within the raw ACLs are structured as shown below:

rivery=arwdRxtDPA/rivery,group test_group=arwdxtD/rivery,user1_name=wx/rivery,user1_name=x/rivery

In this context, the string "arwdRxtDPA" denotes the permissions granted to the Rivery user. Each character within this string corresponds to a specific action, with 'a' representing INSERT, 'r' representing SELECT, 'w' representing UPDATE, and 'd' representing DELETE.
Rivery interprets this string, generates grant statements, and subsequently executes them individually on the respective tables.

Whitelist Our IPs

To whitelist Rivery's IPs, follow these steps:

Log in to your AWS account and search for the "Amazon Redshift" service in the AWS Console.
Access the "clusters" section in the left sidebar.
Select the cluster you've created.
Navigate to the "properties" tab and look for "VPC security group" under "Network and security settings." Click on the Security Group ID.
In the "Inbound rules" tab, click "Edit inbound rules."
Add a rule of type "Redshift" and specify the required source IP addresses as outlined in the documentation.
Save the rules to whitelist the specified IPs.

Connection Procedure

Type in the Connection Name.
Enter your Host name.
Fill in your Port.
Enter your Username and Password.
In the File Zone Connection section, you can whether to use the Custom File Zone that was previously established, and then select the associated S3 Bucket.
Use the Test Connection function to see if your connection is up to the task.
If the connection succeeded, you can now use this connection in Rivery.

SSL Options (Optional)

Key File

To establish an SSL connection and authenticate your Redshift user, you have the option to use a key file (.pem).

SSH (Optional)

Secure Shell Protocol (SSH) is used to provide secure and encrypted access to internal databases from Rivery servers in order to retrieve data.

Configure your SSH Tunnel server in your network to use SSH Tunneling via Rivery. This server will be able to connect to your database server via the internal IP or address, and will accept SSH connections from Rivery IPs. Rivery uses a key file to connect to the SSH server.

Was this article helpful?

What's Next

Amazon Redshift Walkthrough

Table of contents

Introduction
Prerequisite
Connection Procedure