PostgreSQL Log-Based Overview
  • 4 Minutes to read
  • Dark
    Light
  • PDF

PostgreSQL Log-Based Overview

  • Dark
    Light
  • PDF

What is Log-Based Extraction?

Rivery's Log-Based extraction method provides a real-time stream of any changes made to the databases and tables configured, eliminating the need to implement and maintain incremental fields or retrieve data via select queries. It also allows you to retrieve deleted rows and schema changes from the database.

How Does Log-Based Extraction Work?

The Change Data Capture (CDC) mechanism is based on a Write-Ahead Log (WAL) provided by the PostgreSQL database. The logical decoding feature of PostgreSQL is available in versions 10 and higher. In PostgreSQL, the logical decoding mechanism creates a consistent and easy-to-understand format in one place. The logical decoding mechanism in PostgreSQL, on the other hand, is based on converting the data written to the WAL using an output plugin that converts the WAL transaction rows into a dedicated format. To be able to consume data from the database, this output plugin must be installed on the PostgreSQL server and the logical decoding configuration must be enabled. See the PostgreSQL configuration docs for more information on setting up a PostgreSQL database for Rivery Log-Based extraction.

Rivery uses the output plugin to continuously pull new rows in the WAL in order to pull data using the CDC architecture. In order to get historical data from the database, Rivery cannot rely on the entire log history already existing prior to setting up a river. This is because PostgreSQL normally keeps track of the WAL and purges it after a certain amount of time.

Rivery uses the Overwrite loading mode to create a full snapshot (or migration) of the chosen table(s) in order to align the data and metadata as it was on the first run. After the migration is complete, Rivery takes the existing WAL records and performs an Upsert-merge to the target table(s), while continuing to fetch new records from the log as they are created.

Rivery's PostgreSQL connector reads WAL records and generates change events in the FileZone files for row-level INSERT, UPDATE, and DELETE commands. Each file represents a set of database operations performed over a period of time. The data from the log is continuously streamed into the FileZone path determined in the river (with timeframes of no more than 15 minutes) and pushed into the target by the river's scheduled frequency. This method saves the data first in the FileZone, and then it can be pushed into the target DWH at any time.

FileZone is covered in further detail in the Target documentation.

Architecture Diagram

How to Enable Log-Based Extraction?

After you've established a connection, go to the Source tab and do the following:

  1. Choose the Multi-Tables as the River mode.
  2. Select the Log Based as the extraction method.
  3. A brief reminder appears, encouraging you to check your connection and set up your Source and Target, which will happen next. Select 'Got It' to proceed.
  4. Choose Log Based Version from the Advanced Options drop-down menu.
  5. Set the name of the custom replication slot (optional).
  6. Set a custom publication name (optional).
  7. Turn the 'Enable Log' toggle to true at the bottom of the page.
  8.  A new window pops up, guiding you on the following steps. Select 'Target'.
  9.  You'll be directly sent to the Target tab, where you can choose your data target.
  10.  Define your Target Connection, then choose a Database and a Schema to load into.
  11. To make sure the operation is complete, click the Enable Log toggle once more.
  12.  To navigate to the Schema tab, click Schema.
  13.  Click 'Show Me My Schemas'.
  14.  A minimum of one Schema must be chosen. 
  15. To use CDC, the table must contain a key, which is defined as the Source Column - id.
  16.  Navigate to the 'Enable Log' toggle and select the number of hours you want to run the River, then click 'I'm Sure'.
  17.  Wait for the Initiation Log to complete its processing.
  18.  You're all set to run your log-based River now.
  19. Following the completion of the River Run, return to the Schema tab and check that the Table you selected is Streaming.



Log Configurations

Custom Replication Slot

Rivery allows you to set a custom replication slot name, which is the name of the PostgreSQL logical decoding slot that was created for streaming changes from a specific plug-in for a specific database/schema.

If none is provided, a default one will be created: Rivery_<account id>_<river id>_source.

You can run a query in your database to see the replication slots and data on it:

select * from pg_replication_slots;


Custom Publication Name

This is the name of the PostgreSQL publication that will be used to track changes in real-time. If this field is left blank, the following will be used as the default: publication_<account id>_<river id>_source

Note: 

  • You must make sure that a custom publication already exists before using it.
  • If you're using a custom publication, make sure it has permission to update the tables you want, or it'll only update data for the default tables.

You can run a query in your database to see the supported tables:

select * from pg_publication_tables;



Type Mapping

While extracting your data, we match PostgreSQL data types to Rivery data types.

The mapping of PostgreSQL data types to Rivery-compatible types is shown in the table below:

PostgreSQL TypeRivery Type
BOOLEANBOOLEAN
BYTEASTRING
TOAST-Not Supported-
CHARSTRING
VARCHARSTRING
TEXTSTRING
JSONJSON
JSONBJSON
DATESTRING
FLOAT4STRING
FLOAT8STRING
INT2INTEGER
INT4INTEGER
INT8INTEGER
INTERVALSTRING
NUMERICFLOAT
TIMESTRING
TIMESTAMPSTRING
TIMESTAMPTZSTRING
UUIDSTRING
XMLSTRING


Limitations

  • The 'TOAST' file format is not supported.
  • Any special character will be replaced with an underscore. As a result, if you wish to edit the table name, go to: 

         1. The 'Schema' tab

         2. Select a Table

         3. Click 'Table Settings'

         4. Choose 'Edit' to change the table name manually.




Was this article helpful?