- 4 Minutes to read
- Print
- DarkLight
- PDF
MySQL CDC Configuration
- 4 Minutes to read
- Print
- DarkLight
- PDF
This is a step-by-step guide on how to configure MySQL CDC.
Prerequisites
- MySQL 5.6+ or MySQL 8.0+.
- MySQL server instance running.
- Permission to 'GRANT' and 'CREATE USER' in MySQL.
- A connection to the Master / Primary Database is necessary; Read Replicas are not supported.
Configuring CDC on a MySQL On-Prem Server
You'll need access to the server's configuration files if the MySQL database is hosted on an on-premise server.
Check to see if the server's 'binlog' is enabled.
Run the following SQL statement to see if the 'binlog' is enabled in the MySQL server:
SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::"
FROM information_schema.global_variables WHERE variable_name='log_bin';
If the outcome is 'OFF,' make the following changes to your MySQL configuration file (my.cnf):
- Set binlog_format to ROW
- Set binlog_row_image to FULL
- Set expire_logs_days to 10
- Set log_bin to mysql-bin
- Make sure your server has enough storage to keep your database 'binlog' for ten days.
Configuring CDC on Amazon RDS
If you're using Amazon RDS, configuring these properties can be done through the RDS Console.
How to Create a Parameter Group:
- Log in to the Amazon RDS console.
- Go to Parameter Groups and then Create Parameter Group.
- Create a parameter group to use with the database:
- Under Parameter group family, select the relevant MySQL version.
- Click Create after selecting a suitable Group Name and Description.
- Set the relevant parameters:
- Choose a new group from the drop-down menu.
- Click on Edit Parameters.
- Set the following relevant parameters in the parameter group:
- Set binlog_format to ROW.
b. Set binlog_row_image to Full.
- Set binlog_format to ROW.
- Make sure you have enabled automated backups with at least 1 day retention, as follows:
Set the Duration of Your 'binlog' Retention Period
Rivery suggests a 10-day 'binlog' retention period.
Due to out-of-sync scenarios, this is required to limit the amount of resync (migration) required from the database.
According to the Amazon RDS documentation, the default strategy is to remove the log as quickly as possible, so you'll need to increase the retention period from the database's default setting.
To check the current state of the bin log retention period, do the following:
1. Connect to your MySQL using Workbench.
2. Run the following command:
mysql> call mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name | value | description |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | NULL | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
Query OK, 0 rows affected (0.06 sec)
If the 'binlog retention hours' parameter is NULL or 0, the database will erase the log as quickly as possible.
Run the following command to increase this configuration value:
call mysql.rds_set_configuration('binlog retention hours', 168);
Increased binary log retention hours demands more disk space in your source Amazon RDS MySQL database since the number of log files saved at one time grows.
After you've established the retention period, keep an eye on the DB instance's storage usage to make sure the binary logs you want to keep aren't taking up too much space.
Create User for Replication
In order to get the 'binlog', create a replication user for Rivery or change an existing user with the following grants and permissions.
To create a user for Rivery, run the following command:
For MySQL versions prior to 8.0:
CREATE USER 'rivery'@'%' IDENTIFIED BY 'password';
For MySQL versions 8.0 and later:
CREATE USER 'rivery'@'localhost' IDENTIFIED WITH mysql_native_password BY '< password >';
Permissions for Version 2 (V2) Log-Based
Grant the user with the right permissions for Version 2 and set the necessary privileges.
Note:
The same privileges can be applied to an existing user.
Set the <database> to the appropriate database(s) on your server, or set it to * to all databases:
GRANT SELECT, SHOW DATABASES, REPLICATION SLAVE, RELOAD, REPLICATION CLIENT on *.* TO 'rivery'@'%';
To enable the permissions on the user, run the following command:
FLUSH PRIVILEGES;
Make sure you're connected to Rivery with the user you just configured.
Binlog vs. GTID
MySQL replication is a process that automatically copies data from one MySQL database server (the master) to one or more MySQL database servers (the slaves).
You can track these changes in MySQL in two ways:
- Binlog
- GTID
Binlog
The binary log is a collection of log files containing information about data changes made to a MySQL server instance. Starting the server with the--log-bin option enables the log.
MySQL 3.23.14 introduced the binary log. It contains all data-updating statements. Unless row-based logging is used, it also contains statements that could have potentially updated it (for example, aDELETE that matched no rows). Statements are saved as "Events" that describe the modifications. The binary log also includes information on how long each statement that updates data ended up taking.
Other metadata in the binary log include:
- Information about the server's current state that is required to correctly reproduce statements
- Error messages
- Metadata required for binary log maintenance (for example, Rotate Events).
GTID
Global transaction identifiers (GTIDs) are unique identifiers generated for committed MySQL transactions. GTIDs can be used to simplify troubleshooting of Binlog replication.
GTIDs are unique across all DB instances in a replication configuration. GTIDs make replication configuration easier because they eliminate the need to refer to log file positions. GTIDs also simplify tracking replicated transactions and determining whether the source instance and replicas are consistent.
GTIDs are unique identifiers for transactions that occur on a server within a cluster. Using GTIDs streamlines replication and makes it easier to determine whether the primary and replica servers are consistent.
When Should You Use Binlog or GTID?
When you have more than one instance (for example, cluster replica), use GTID.
Otherwise, Binlog should be used.