- 2 Minutes to read
- Print
- DarkLight
- PDF
PostgreSQL as a Target
- 2 Minutes to read
- Print
- DarkLight
- PDF
Introduction
Unlike Analytics Data Warehouses, PostgreSQL is primarily used as an operational database. It structures data in a row-based format, similar to other operational RDBMSs, as opposed to a column-based format. This row-based structure offers specific advantages and disadvantages, particularly when processing large volumes of data. However, it provides optimized performance for key columns, indexes, and JOIN operations.
Establish a Connection
Follow our step-by-step guide to create a connection and find everything you need.
Steps to Configure PostgreSQL as Target
- In your Data Source to Target River, select the RDS/Aurora PostgreSQL option in the Target Step.
- Input the Schema and Table to write into.
- Choose the loading mode: Upsert-Merge, Append, or Overwrite.
Loading Modes
- Overwrite: TRUNCATE existing records in the Target Table and INSERT new records.
- Append: INSERT INTO the Target Table (matching on columns), appending new records and retaining existing ones.
- Upsert Merge: Offers 2 methods:
- Delete-Insert: Deletes existing rows (when keys match between new and existing data) and inserts new data. This method avoids duplication, creates an index for better performance, and is recommended for ongoing merge runs.
- Insert on Conflict: Uses INSERT INTO and ON CONFLICT clauses based on primary keys selected in the table mapping.
Loading Unstructured Data into PostgreSQL
PostgreSQL has powerful JSON extraction functions since version 9.4. To manage dynamic loading from unstructured data sources (like MongoDB), any unstructured column in mapping is kept as a "json" data type in the target table. The table will thus contain both structured data types (e.g., varchar, integer, date) and unstructured JSON data types (e.g., arrays, records).
Example of handling JSON data in PostgreSQL:
-- Action table has action_calc json column with the following values:
-- {"name": "click", "amount": 4 }
SELECT action_calc -> 'name' AS customer
FROM actions;
-- Result: 'click'
Rivery's Data Load and Table Metadata Handling
Rivery manages table metadata updates along with data loading. Below is the complete process for loading data in Rivery:
Staging Table Loading
- Create a table with a 'tmp_' prefix with the correct metadata.
- Load data into the tmp_ table using aws_s3.table_import_from_s3 (for RDS/Aurora) or the COPY statement (for others).
- Flatten the data (if JSON type in the source) using JSON functions and casting with
:: syntax. - Auto-scale integer columns to the appropriate data type based on their data length (SMALLINT -> INTEGER -> BIGINT -> NUMERIC).
- Drop the old tmp_ table with the json_data field (if applicable).
Load from Staging Table to Target Table
The following steps are performed in one transaction (leveraging PostgreSQL's ability to run DDL and DML inside one transaction):
- Define and update any metadata changes on the target table, excluding dropping columns.
- Begin transaction with a time limit.
- If the target table does not exist, create it using the CREATE AS SELECT clause.
- Add/Drop/Alter columns in the target table.
- Create any primary key/index needed for data uniqueness (in Merge - insert-on-conflict) or for target merge performance (in Merge - Delete-Insert).
- Retain any key set by the user in the overwrite loading mode with the “purge keys” option enabled.
- Load data into the target table using the specified loading mode.
- Commit the transaction.
- Drop the temporary tables used in the process.
Considerations and Best Practices in PostgreSQL
Case Sensitivity
Rivery relies on PostgreSQL's default lower-case sensitivity. This case sensitivity is fundamental to the creation, updating, and management of schemas, tables, columns, views, etc., within Rivery. Ensure your metadata does not contain names that differ only by case.