Configuring Amazon Redshift as a Target
  • 6 Minutes to read
  • Dark
    Light
  • PDF

Configuring Amazon Redshift as a Target

  • Dark
    Light
  • PDF

Article summary

Introduction

This guide will walk you through the steps of setting up your first Amazon Redshift cluster and acquiring the essentials needed for integration with Rivery.

Important Note:

This connector is not compatible with Redshift Serverless.

Prerequisites

Before using this guide, ensure the following prerequisites are met:

Select a Source

Setting up a Target requires first selecting a Source and creating a connection to it.

Amazon Redshift Connection

Ensure that you have established a connection with Amazon Redshift as your Target.


Configuration Process

  1. Click the curved arrow next to Schema on the right side of the row. After the refresh is complete, click the row and choose the Schema where the data will be stored.

  2. Enter the Table Name.

  3. Select a Distribution Method (Note that this option is available only when you select Custom Report, Custom Query or Legact River as the River Mode.):

    • All - Involves distributing a complete copy of the table to every node in the cluster. Best suited for tables that are rarely updated or are relatively static (slow moving). It's not typically beneficial for small tables, as the cost of redistribution during queries is low.
    • Even - Involves the leader node distributing data rows across slices in a round-robin manner, without considering the values in any specific column. This method is ideal for tables that are not involved in joins. It's also a suitable choice when neither KEY nor ALL distribution methods are clearly advantageous.
    • Key - Rows are allocated based on the values in a designated column. The leader node places rows with matching values in the same node slice. Particularly effective when distributing a pair of tables on their joining keys. This ensures that rows with matching values from the common columns are stored together physically, facilitating efficient join operations.
      It's important to remember that when using the Key Distribution Method, you must choose a single key column to perform the slicing.

    image.png

    Please Note:
    When using Multi-Tables or Predefined Reports as your River Mode, you have the option to choose a Distribution Method. This can be done in the Table Settings, which is accessible by clicking on a particular table in the Schema tab.
    image.png

  4. Set the Loading Mode.
    Please Note:
    Amazon Redshift offers the flexibility to choose different Merge methods. For more information about these options, please refer to the Amazon Redshift Upsert-Merge Loading Mode Options documentation.

image.png

  1. In the Additional Options menu, the following options are available:
  • Truncate Columns - is designed to handle instances where the length of an array exceeds the maximum VARCHAR length allowed in Redshift. Since Redshift's array type has limited flexibility and can change between different data loads, Rivery ensures compatibility by converting arrays into VARCHAR(max) type. However, if an array surpasses the maximum VARCHAR length, the Truncate Columns option, found under Advanced Options in the Target tab, can be used to truncate the array data to fit within Redshift's maximum size constraints.
  • Compression Update - updates the column compression in the target table during data loading. This action occurs only if the target table has not been previously created.
  • Keep Schema-Binding Views - ensures that all schema-binding views remain intact when employing upsert-merge or overwrite methods. If this option is not selected, any schema-binding views that depend on the target table will be dropped.
  • Add Rivery Metadata - designed to enhance the Target table by automatically including three columns: Rivery_last_update, Rivery_river_id, and Rivery_run_id. This feature also provides the flexibility to incorporate more metadata fields by utilizing expressions.
    Please Note:
    When the Source is in Multi Table mode, this option becomes available.

image.png

5.If you have configured a Custom File Zone, select a Bucket and specify a path where your data will be stored. Additionally, establish a timeframe for period partitioning within a FileZone folder.
Note: You can instruct Rivery to divide the data according to the data insertion day, the Day/Hour, or the Day/Hour/Minute. This means that Rivery will produce data files from your sources under folders that correspond to the correct partition you've selected.
6. Any Source to Target River can now send data to your Amazon Redshift Bucket.

image.png


Type Mapping

We match Redshift data types to Rivery data types as we extract your data.

The following table shows the mapping of Redshift data types to Rivery-compatible types:

Amazon Redshift TypeRivery Type
INTERVALTIMESTAMP
DOUBLE_PRECISIONFLOAT
FLOATFLOAT
CHARSTRING
DATEDATE
VARCHARSTRING
TIMESTAMPTIMESTAMP
ENUMSTRING
BOOLEANBOOLEAN
ARRAYSTRING
arraySTRING
BYTEASTRING
BITBOOLEAN
REALFLOAT
BIGINTINTEGER
TEXTSTRING
SMALLINTINTEGER
NUMERICFLOAT
INTEGERINTEGER
CIDRSTRING
DropEnumTypeSTRING
INETSTRING
MACDDRSTRING
OIDSTRING
TSVECTORSTRING
UUIDSTRING
DATERANGEINTEGER
INT4RANGEINTEGER
INT8RANGEINTEGER
JSONSTRING
JSONBSTRING
NUMRANGEFLOAT
TSRANGEFLOAT
HSTORESTRING
SUPERJSON

Note:
If you want to use SUPER datatypes, simply change the mapping from Object to JSON and the SUPER datatype will be generated.

Scenarios of Data Type Conflicts

In instances where a datatype mismatch occurs, for example, when the datatype of an existing column in the source table is modified, the datatype of the corresponding column in the target table will be determined based on certain defined rules, thereby resolving the conflict. These rules ensure that the most comprehensive data type is chosen to maintain data integrity and avoid loss of information.

Examples of Data Type Conflicts

The table below outlines various scenarios of data type conflicts and the resolution rules. The Inferred Type column represents the chosen data type in the target table when there's a conflict.

Conflicting TypesInferred TypeExample Scenario
TIMESTAMP vs. DATETIMESTAMPIf a column in the source, initially of TIMESTAMP type, is changed to DATE, it will be converted into TIMESTAMP in the target.
SMALLINT vs. BOOLEANSMALLINTA conflict between SMALLINT and BOOLEAN types in the source column will lead to the target column adopting the SMALLINT type.
INTEGER vs. (SMALLINT / BOOLEAN)INTEGERIf an INTEGER is updated to SMALLINT or BOOLEAN in the source, it will be promoted to INTEGER in the target.
BIGINT vs. (INTEGER / SMALLINT / BOOLEAN)BIGINTIf a column of BIGINT type is changed to SMALLINT, BOOLEAN, or INTEGER in the source, it will be elevated to BIGINT in the target.
DECIMAL vs. (INTEGER / BIGINT / SMALLINT / BOOLEAN)DECIMALA column that is INTEGER or DECIMAL in the source will be transformed into DECIMAL.
DOUBLE PRECISION vs. (INTEGER / BIGINT / SMALLINT / BOOLEAN)DOUBLE PRECISIONIf there is a conflict between DOUBLE PRECISION and SMALLINT in the source column, the target column will adopt DOUBLE PRECISION.
REAL vs. (DOUBLE PRECISION / DECIMAL / INTEGER / BIGINT / SMALLINT / BOOLEAN)REALIf a column of DOUBLE PRECISION type is altered to REAL type in the source, it will remain as REAL in the target.
FLOAT vs. (INTEGER / BIGINT / SMALLINT / DECIMAL / REAL / DOUBLE PRECISION / BOOLEAN)FLOATA source column initially of FLOAT type that is updated to INTEGER will remain as FLOAT in the target.
VARCHAR vs. ALL DATATYPESVARCHARIf a column in the source was VARCHAR type and has been updated to INTEGER type, the target column will still retain the VARCHAR type.
OBJECT vs. ALL DATATYPESOBJECTIf a source column is initially set as OBJECT type and later changed to DATE or INTEGER, it will still be treated as OBJECT type in the target.

Please Note:

VARCHAR and OBJECT have higher priority and will override other data types in conflicts.


Was this article helpful?