- 6 Minutes to read
Configuring PostgreSQL on Amazon RDS / Aurora
- 6 Minutes to read
- Make sure you're running PostgreSQL 10 or higher.
- Pgoutput plugin.
- Server and Database permissions are required to change server configuration (depending on the PostgreSQL cloud/on-prem provider).
Notes on PostgreSQL Server Architecture
- DDL changes are not supported by Logical Decoding. Rivery is unable to report DDL change events to the database as a result of this.
- Only the primary servers in the cluster can use logical decoding replication slots. It won't work with warm or hot standby replicas. Log fetching will stop and the River will fail if the primary server fails or becomes detached.
- Only UTF-8 character encoding databases are supported.
PostgreSQL setup for Amazon RDS / Aurora
Make sure to enable logical decoding for a PostgreSQL RDS instance following the PostgreSQL documentation.
AWS RDS PostgreSQL v10+ comes pre-configured with pgoutput and logical replication, so all you have to do is enable them as follows:
To get started, go to the AWS RDS console.
Click Services in the upper left corner, then RDS.
Under Parameter Groups, select Create Parameter Group.
Select the parameter group family that corresponds to your PostgreSQL server version.
On Aurora cluster, choose DB cluster parameter group as the parameter group type. On RDS, choose DB paramter group.
Give it a name (for example, Postgresql-rivery-cdc), and click Create.
Enter the parameter group you just created. We can now change and enable some configurations in the parameter group associated with the current server.
The following parameters must be modified:
rds.logical_replication = 1 rds.log_retention_period = 10080 # 7 days wal_sender_timeout = 0
Click Save Changes.
Navigate to the Databases section and select the CDC Postgres server you want, then click Modify.
In the DB parameter group section, select the parameter group created in step 3.
Select Continue, then Apply Immediately under Modification Scheduling.
You may need to reboot the writer db instnace in order to apply the new parameters values
- Validate - Execute the following query as the DB master user to ensure that the wal level parameter is set to logical:
Configuring the RDS Superuser Role in Amazon RDS
The 'rds_superuser' is a special user in Amazon RDS for PostgreSQL that is used for maintenance operations on a database instance. The 'rds_superuser' role often used for administrative tasks such as creating and managing databases, users, and tables.
In this section, we will outline the steps to configure the rds_superuser role in Amazon RDS for PostgreSQL.
You will require the following before you begin:
- Access to an Amazon Aurora PostgreSQL database
- Access to a client that supports the PostgreSQL protocol, such as the 'psql' command-line tool.
- Set the current session's role to rds_superuser. This role grants the superuser privileges required to perform administrative tasks on the RDS instance.
To set the role, connect to the RDS instance using an administrative account and run the following command:
SET ROLE rds_superuser;
- Create a new role named replication_role with the REPLICATION LOGIN option. This role will allow the user to connect to the RDS instance and replicate data.
To create the role, run the following command:
CREATE ROLE replication_role WITH REPLICATION LOGIN;
- After creating the role, we need to create a user account named Rivery with a password and the REPLICATION option. This user will connect to the RDS instance and perform replication tasks.
To create the user, run the following command:
CREATE USER Rivery WITH PASSWORD 'my-secret-pw' REPLICATION;
- Finally, we will grant the replication_role to the Rivery user. This grants the necessary permissions for the user to perform replication tasks.
To grant the role, run the following command:
GRANT replication_role TO Rivery;
In Amazon Aurora PostgreSQL, the 'rds_superuser' role is a strong role that needs to be handled with caution.
Permissions for Replication Slots Configuration
To enable replication slots, the REPLICATION role must be granted permission to the Rivery user. If the server does not already have a REPLICATION role, create one with the following SQL command:
Please note that the <replication_role_name> must be substituted with a role name of your choice.
CREATE ROLE <replication_role_name> REPLICATION LOGIN;
Once the role has been created, execute the following SQL command to create a new login for Rivery's Log-Based extraction if it doesn't already exist:
CREATE USER rivery WITH <replication_role_name>;
In case the Rivery user already exists, use the following SQL command to assign the new role to the user:
ALTER USER rivery WITH <replication_role_name>;
Configure Replica Identity to Each Non-keyed or Unique Table
It's necessary to set the replication identity of the tables in order to ensure that all of the table's old values are retrieved from the Write-Ahead Log (WAL) in Postgresql during DELETE events. There are four different types of REPLICA IDENTITY in Postgres that control how events are written to the logical replication stream:
- DEFAULT - The previous values for a table's primary key columns are only present in UPDATE and DELETE events; in the case of UPDATE, only the primary columns with changed values are present.
- NOTHING - The UPDATE and DELETE events will not include any information about the previous value of any table column.
- FULL - The previous values of all the table's columns will be stored in the UPDATE and DELETE events.
- INDEX - UPDATE and DELETE events will contain the previous values of the columns contained in the index definition named index name; however, only the indexed columns with changed values will be present in the case of UPDATE.
Tables must be set to REPLICA IDENTITY FULL in cases of DELETE and UPDATE events. Any non-key column will be written to the WAL as a NULL value if the DELETE event is not performed. Run the following query to set the REPLICA IDENTITY to FULL:
ALTER TABLE <schema>.<table> REPLICA IDENTITY FULL;
Configuring Log Base in an On-Premise Database
Run the following command in your server to set up PostgreSQL Log Based in an on-premise database:
# REPLICATION wal_level = logical max_wal_senders = 4 max_sender_timeout = 0 max_replication_slots = 4
As Rivery creates replication slot and publication automatically in Postgresql database, you may want or need to create replication or publication dedicated to specific rivers.
Creating new publication
If wanted, you may create publication for certain tables in your database. Only changes from those tables will be replicated by Rivery. You must use CREATE privilages in order to preform this action. By default, the following publication will be created if no specific name provided:
publication_<account id>_<river id>_source.
Rivery uses in such cases the ability to create publication for ALL TABLES and uses the next command automatically on first run:
CREATE PUBLICATION publication_<account id>_<river id>_source FOR ALL TABLES;
If needed, you can run a query in your database to see the supported tables:
Log in to a PostgreSQL client as a super user.
Create publication for your tables.
CREATE PUBLICATION rivery_publication FOR TABLE movies, films, actors;
In order to remove or update the tables list under publication:
ALTER PUBLICATION rivery_publication ADD/DROP TABLE my_table;
optional: You may also choose to create a publication for specific operation of specific tables:
CREATE PUBLICATION rivery_insert_only FOR TABLE movies WITH (publish = 'INSERT');
- Verify your tables are on publication:
SELECT * FROM pg_publication_tables;
- After creating your publication, set its name in the river:
Custom Replication Slots
Rivery allows you to set a custom replication slot name, which is the name of the PostgreSQL logical decoding slot that was created for streaming changes from a specific plug-in for a specific database/schema.
If no replication slot is provided, a default one will be created:
Rivery_<account id>_<river id>_source
You can run a query in your database to see the replication slots and data on it:
select * from pg_replication_slots;
In order to create a new replication slot (You must create a new replication slot after you created a publication):
- Log in to a PostgreSQL client using super user credentials.
- Run the next command:
SELECT pg_create_logical_replication_slot('rivery_pgoutput', 'pgoutput');
- Set replication slot name in the river:
Connect to Rivery
Follow the PostgreSQL Connection documentation to connect to the Rivery console.