PostgreSQL as Target
  • 3 Minutes to read
  • Dark
    Light
  • PDF

PostgreSQL as Target

  • Dark
    Light
  • PDF

Article Summary

Overview


Unlike Analytics Data Warehouses, PostgreSQL (in most cases) is used as an operational database. This means the data structured in PostgreSQL is row-based (as it is in every operational RDBMS) and not columned-based. This has its own advantages and disadvantages - especially when calculating massive amounts of data. However, running as a row-based database provides a better use for Key columns and indexes, along with JOIN clauses. 


Create PostgreSQL as Target In River

In your Data Source to Target River, select the RDS/Aurora PostgreSQL on the Target Step. 



From here, you can input the Schema and Table to write into. 

With options to add information through Upsert-Merge, Append, or Overwrite.

Loading Modes

Overwrite - This will TRUNCATE the existing records in the Target Table and INSERT the new records.
Append - This will use INSERT INTO the Target Table (matching on columns) and append any new records, keeping all existing records.
Upsert Merge - Contains 2 merge methods:

  •  Delete-Insert - Delete and then INSERT INTO table by the keys defined in the column mapping. This mode deletes existing rows (when keys match between new and existing data) and inserts new data after. Using this merge method does not maintain duplication, it creates an index for better performance and is recommended to us for ongoing merge runs.
  • Insert on Conflict - Perform upsert using INSERT INTO and ON CONFLICT clauses, this is based on the primary keys selected in the table mapping.

Loading Unstructured Data into PostgreSQL table

Postgres has a very powerful json extraction function since version 9.4. In order to prevent hitting limitations, and in order to provide a dynamic loading data from unstructured data sources (especially dynamic keys in the source - like in MongoDB), any unstructured column in mapping will be kept as “json” data type in the target table. That means the table will contain some "structured" data types, like varchar, integer, or date, alongside unstructured data type represented as JSON, such as ARRAYS and RECORDS (Objects).


In order to select JSON columns in the target table, Postgres provides a vast number of JSON functions.

Here is an example of handling JSON data in Postgres:

 

-- Action table has action_calc json column with the next values:
-- {"name": "click", "amount": 4 }

SELECT action_calc -> 'name' AS customer
FROM actions;

-- Result - 'click'

How does Rivery handle data loads and table metadata?

Rivery manages the tables metadata updates alongside with handling the data loading. 

Here is the complete scheme of the loading process in Rivery database:


Staging Table Loading


  1. Creating a table with tmp_ prefix with the correct metadata.
  2. Loading the data to the tmp_ table using aws_s3.table_import_from_s3 (for RDS/Aurora) or using COPY statement (For others)
  3. Flattening the data (if json type in the source) using Json Functions (as above) and casting with <column>::<type> syntax. 
  4. Auto-scaling Integer columns to the right Integer data type based on their data length (SMALLINT->INTEGER->BIGINT->NUMERIC).
  5. Dropping the old tmp_ table with the json_data field (if applicable).

Load from STG Table to Target Table

In this stage - all of the following steps are performed in one transaction (Based on Postgres ability to run DDL and DML inside one transaction):


  1. Defining and updating any metadata changes on the target table, excluding dropping columns.
  2. Begin transaction with a time limit.
    • If the target table does not exist - create it using CREATE AS SELECT clause.
    • Add/Drop/Alter columns in the target table.
    • Create any primary key/index needed for data uniqueness (in Merge - insert-on-conflict) or for target merge performance (in Merge - Delete-Insert), 
    • Keep any key the user set in the overwrite loading mode with “purge keys” options on.
    • Load the data to the target table using the loading mode (as above). 
  3. Commit transaction.
  4. Drop the tmp_ tables in the process.


Considerations and Best Practices in Postgres

Case Sensitivity in Postgres


TL;DR - Rivery relies on the lower-case sensitivity of Postgres

Rivery's mechanism is based upon the assumption of the default case sensitivity in Postgres, which is lower-case. This case sensitivity is the base of all creation, updating, and managing schemas, tables, columns, views, etc. using Rivery. Please make sure your metadata doesn't contain the same name with two different cases.





Was this article helpful?