- 7 Minutes to read
- Print
- DarkLight
- PDF
MySQL CDC Configuration
- 7 Minutes to read
- Print
- DarkLight
- PDF
This is a step-by-step guide on how to configure MySQL CDC.
Prerequisites
MySQL 5.6+ or MySQL 8.0+.
MySQL server instance running.
Permission to 'GRANT' and 'CREATE USER' in MySQL.
if you wish to use CDC from a Read Replica - GTID must be enabled.
Binlog vs. GTID
MySQL replication is a process that automatically copies data from one MySQL database server (the master) to one or more MySQL database servers (the slaves).
You can track these changes in MySQL in 2 ways:
Binlog
GTID
Binlog
The binary log is a collection of log files containing information about data changes made to a MySQL server instance. Starting the server with the --log-bin option enables the log.
MySQL 3.23.14 introduced the binary log. It contains all data-updating statements. Unless row-based logging is used, it also contains statements that could have potentially updated it (for example, a DELETE that matched no rows). Statements are saved as "Events" that describe the modifications. The binary log also includes information on how long each statement that updates data ended up taking.
Other metadata in the binary log include:
Information about the server's current state that is required to correctly reproduce statements
Error messages
Metadata required for binary log maintenance (for example, Rotate Events).
GTID
Global transaction identifiers (GTIDs) are unique identifiers generated for committed MySQL transactions. GTIDs can be used to simplify troubleshooting of Binlog replication.
GTIDs are unique across all DB instances in a replication configuration. GTIDs make replication configuration easier because they eliminate the need to refer to log file positions. GTIDs also simplify tracking replicated transactions and determining whether the source instance and replicas are consistent.
GTIDs are unique identifiers for transactions that occur on a server within a cluster. Using GTIDs streamlines replication and makes it easier to determine whether the primary and replica servers are consistent.
When Should You Use Binlog or GTID?
When you have more than one instance (for example, cluster replica), use GTID.
Otherwise, Binlog should be used.
Configuring Mysql CDC
Configuring CDC on a Self-Managed MySQL Server
You'll need access to the server's configuration files if the MySQL database is hosted on a self-managed MySQL server.
Check to see if the server's 'binlog' is enabled.
Run the following SQL statement to see if the 'binlog' is enabled in the MySQL server:
SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::";
FROM information_schema.global_variables WHERE variable_name='log_bin';
If the outcome is 'OFF,' make the following changes to your MySQL configuration file (my.cnf):
Set binlog_format to ROW
Set binlog_row_image to FULL
Set expire_logs_days to 10
Set log_bin to mysql-bin
Make sure your server has enough storage to keep your database 'binlog' for ten days.
Configuring CDC on Amazon RDS/Aurora MySQL
If you're using Amazon RDS, configuring these properties can be done through the RDS Console.
How to Create a Parameter Group:
Log in to the Amazon RDS console.
Go to Parameter Groups and then Create Parameter Group.
Create a parameter group to use with the database:
Under Parameter group family, select the relevant MySQL version.
Click Create after selecting a suitable Group Name and Description.
Set the relevant parameters:
Choose a new group from the drop-down menu.
Click on Edit Parameters.
Set the following relevant parameters in the parameter group:
Set binlog_format to ROW.
b. Set binlog_row_image to Full.
Make sure you have enabled automated backups with at least 1 day retention, as follows:
Set the Duration of Your 'binlog' Retention Period
Rivery suggests a 10-day 'binlog' retention period.
Due to out-of-sync scenarios, this is required to limit the amount of resync (migration) required from the database.
According to the Amazon RDS documentation, the default strategy is to remove the log as quickly as possible, so you'll need to increase the retention period from the database's default setting.
To check the current state of the bin log retention period, do the following:
1. Connect to your MySQL using Workbench.
2. Run the following command:
mysql> call mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name | value | description |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | NULL | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
Query OK, 0 rows affected (0.06 sec)
If the 'binlog retention hours' parameter is NULL or 0, the database will erase the log as quickly as possible.
Run the following command to increase this configuration value:
call mysql.rds_set_configuration('binlog retention hours', 168);
Caution:
Increased binary log retention hours demands more disk space in your source Amazon RDS MySQL database since the number of log files saved at one time grows.
After you've established the retention period, keep an eye on the DB instance's storage usage to make sure the binary logs you want to keep aren't taking up too much space.
Create User for Replication
Create a User for CDC Permissions
To allow CDC to read from MySQL, you must create a user with specific replication permissions. These permissions ensure the CDC process can read the binary log and track changes.
Log in to MySQL as an administrative user.
Create a new user (or modify an existing one) for CDC. Run the following SQL commands:
CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'cdc_user'@'%';
This grants the user the necessary permissions to:
REPLICATION SLAVE: Read the binary logs on the replica.
REPLICATION CLIENT: Execute commands related to replication.
SELECT: Read data from the database tables being monitored by CDC.
Flush privileges to apply the changes:
FLUSH PRIVILEGES;
Configuring CDC from a MySQL Replica
This section details the steps required to configure Change Data Capture (CDC) from a MySQL Read Replica, specifically when using Global Transaction Identifiers (GTID) for tracking transactions. GTIDs help maintain transactional consistency and ensure that CDC can capture changes accurately and in the correct order.
Prerequisites
CDC must be configured on Mysql (Self-Managed/Amazon RDS)
Create a user for CDC replication.
GTID must be enabled to use CDC from a Read Replica.
Please Note:
To enable MySQL GTIDs for Amazon RDS / Aurora MySQL, refer to the Amazon documentation for instructions on enabling this feature in your Amazon Aurora MySQL database.ur content goes here
Why GTID Permissions Are Needed
GTID is a critical feature for ensuring that CDC reads data changes correctly and efficiently from a MySQL Read Replica. Here's why GTID is essential:
Transaction Consistency: GTIDs ensure that each transaction has a unique identifier. This allows CDC to capture and process the exact sequence of changes made to the database, avoiding issues like data loss or duplication.
Improved Failover and Resilience: In the event of a failure or restart of the CDC process, GTIDs enable the process to resume from the exact transaction it last processed, preventing data inconsistency during recovery.
Configuring GTID-Based CDC from a MySQL Replica
To set up CDC with GTID on a MySQL Read Replica, follow the steps below:
1. Verify GTID Configuration on the Replica
GTID-based replication must be enabled on your MySQL replica. Check the current GTID settings and ensure the following options are enabled:
gtid_mode
: This must be set toON
to enable GTID replication.enforce_gtid_consistency
: This should also be set toON
to ensure consistency of GTIDs across all transactions.
Run the following commands to verify these settings:
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
If either of these settings is not enabled, modify the MySQL configuration (my.cnf
or my.ini
) and restart the replica server.
2. Create a User for CDC with GTID Permissions
To allow CDC to read from the MySQL Read Replica, you must create a user with specific replication permissions. These permissions ensure the CDC process can read the binary log and track changes via GTIDs.
Log in to MySQL as an administrative user.
Create a new user (or modify an existing one) for CDC. Run the following SQL commands:
CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'cdc_user'@'%';
This grants the user the necessary permissions to:
REPLICATION SLAVE: Read the binary logs on the replica.
REPLICATION CLIENT: Execute commands related to replication.
SELECT: Read data from the database tables being monitored by CDC.
Flush privileges to apply the changes:
FLUSH PRIVILEGES;
3. Enable GTID-Based Replication
Ensure that GTID replication is enabled on the MySQL replica. The following settings must be configured:
gtid_mode:
ON
enforce_gtid_consistency:
ON
These settings can be adjusted in the MySQL configuration file (my.cnf
or my.ini
) and the server must be restarted for changes to take effect.
gtid_mode = ON
enforce_gtid_consistency = ON
4. Verify the CDC Process
Once the user is configured and GTID replication is enabled, monitor the CDC process to ensure that it is reading and capturing changes from the replica without issues.
Check the CDC logs to confirm that the CDC process is reading the binary logs in the correct sequence and processing transactions accurately.
To verify the GTID value, execute the following command on the MySQL Server:
PL/SQL
SHOW MASTER STATUS;
Here's an example of the expected output:
Additional Considerations
Performance: Enabling GTID-based CDC may affect performance, especially on high-throughput systems. It is recommended to test the configuration in a staging environment before applying it to production systems.
Storage: Storing binary logs for extended periods (as required by CDC) will consume additional disk space. Regularly monitor the disk usage to ensure that the system has sufficient resources.
Replication Consistency: Ensure the replica is properly configured and synced with the master to prevent replication lag or inconsistencies during CDC reads.