- 5 Minutes to read
PostgreSQL Log-Based Configuration
- 5 Minutes to read
- Make sure you're running PostgreSQL 10 or higher.
- Pgoutput plugin.
- Server and Database permissions are required to change server configuration (depending on the PostgreSQL cloud/on-prem provider).
Notes on PostgreSQL Server Architecture
- DDL changes are not supported by Logical Decoding. Rivery is unable to report DDL change events to the database as a result of this.
- Only the primary servers in the cluster can use logical decoding replication slots. It won't work with warm or hot standby replicas. Log fetching will stop and the River will fail if the primary server fails or becomes detached.
- Only UTF-8 character encoding databases are supported.
Configuring PostgreSQL on Amazon RDS / Aurora
Make sure to enable logical decoding for a PostgreSQL RDS instance following the PostgreSQL documentation.
AWS RDS PostgreSQL v10+ comes pre-configured with pgoutput and logical replication, so all you have to do is enable them as follows:
To get started, go to the AWS RDS console.
Click Services in the upper left corner, then RDS.
Under Parameter Groups, select Create Parameter Group.
Select the parameter group family that corresponds to your PostgreSQL server version.
On Aurora cluster, choose DB cluster parameter group as the parameter group type. On RDS, choose DB paramter group.
Give it a name (for example, Postgresql-rivery-cdc), and click Create.
Enter the parameter group you just created. We can now change and enable some configurations in the parameter group associated with the current server.
The following parameters must be modified:
rds.logical_replication = 1 rds.log_retention_period = 10080 # 7 days wal_sender_timeout = 0
Click Save Changes.
Navigate to the Databases section and select the CDC Postgres server you want, then click Modify.
In the DB parameter group section, select the parameter group created in step 3.
Select Continue, then Apply Immediately under Modification Scheduling.
You may need to reboot the writer db instnace in order to apply the new parameters values
- Validate - Execute the following query as the DB master user to ensure that the wal level parameter is set to logical:
Permissions for Replication Slots Configuration
The REPLICATION role must be granted permission to the Rivery user in order for the replication slot to be enabled.
If the server does not already have a REPLICATION role, you can create one with the following SQL command (
CREATE ROLE <replication_role_name> REPLICATION LOGIN;
After creating the role, create a new Rivery login (if one does not already exist for Rivery's Log-Based extraction):
CREATE USER rivery WITH <replication_role_name>
If the user for Rivery already exists, grant the user the new role:
ALTER USER rivery WITH <replication_role_name>;
Configure Replica Identity to Each Non-keyed or Unique Table
It's necessary to set the replication identity of the tables in order to ensure that all of the table's old values are retrieved from the Write-Ahead Log (WAL) in Postgresql during DELETE events. There are four different types of REPLICA IDENTITY in Postgres that control how events are written to the logical replication stream:
- DEFAULT - The previous values for a table's primary key columns are only present in UPDATE and DELETE events; in the case of UPDATE, only the primary columns with changed values are present.
- NOTHING - The UPDATE and DELETE events will not include any information about the previous value of any table column.
- FULL - The previous values of all the table's columns will be stored in the UPDATE and DELETE events.
- INDEX - UPDATE and DELETE events will contain the previous values of the columns contained in the index definition named index name; however, only the indexed columns with changed values will be present in the case of UPDATE.
Tables must be set to REPLICA IDENTITY FULL in cases of DELETE and UPDATE events. Any non-key column will be written to the WAL as a NULL value if the DELETE event is not performed. Run the following query to set the REPLICA IDENTITY to FULL:
ALTER TABLE <schema>.<table> REPLICA IDENTITY FULL;
Configuring Log Base in an On-Premise Database
Run the following command in your server to set up PostgreSQL Log Based in an on-premise database:
# REPLICATION wal_level = logical max_wal_senders = 4 max_sender_timeout = 0 max_replication_slots = 4
As Rivery creates replication slot and publication automatically in Postgresql database, you may want or need to create replication or publication dedicated to specific rivers.
Creating new publication
If wanted, you may create publication for certain tables in your database. Only changes from those tables will be replicated by Rivery. You must use CREATE privilages in order to preform this action. By default, the following publication will be created if no specific name provided:
publication_<account id>_<river id>_source.
Rivery uses in such cases the ability to create publication for ALL TABLES and uses the next command automatically on first run:
CREATE PUBLICATION publication_<account id>_<river id>_source FOR ALL TABLES;
If needed, you can run a query in your database to see the supported tables:
Log in to a PostgreSQL client as a super user.
Create publication for your tables.
CREATE PUBLICATION rivery_publication FOR TABLE movies, films, actors;
In order to remove or update the tables list under publication:
ALTER PUBLICATION rivery_publication ADD/DROP TABLE my_table;
optional: You may also choose to create a publication for specific operation of specific tables:
CREATE PUBLICATION rivery_insert_only FOR TABLE movies WITH (publish = 'INSERT');
- Verify your tables are on publication:
SELECT * FROM pg_publication_tables;
- After creating your publication, set its name in the river:
Custom Replication Slots
Rivery allows you to set a custom replication slot name, which is the name of the PostgreSQL logical decoding slot that was created for streaming changes from a specific plug-in for a specific database/schema.
If no replication slot is provided, a default one will be created:
Rivery_<account id>_<river id>_source
You can run a query in your database to see the replication slots and data on it:
select * from pg_replication_slots;
In order to create a new replication slot (You must create a new replication slot after you created a publication):
- Log in to a PostgreSQL client using super user credentials.
- Run the next command:
SELECT pg_create_logical_replication_slot('rivery_pgoutput', 'pgoutput');
- Set replication slot name in the river:
Connect to Rivery
Follow the PostgreSQL Connection documentation to connect to the Rivery console.