- 2 Minutes to read
SQL Server CDC Walkthrough
- 2 Minutes to read
How Does Change Data Capture Extraction Work?
SQL Server Change Data Capture (CDC) was created to simplify the extract, transform, and load processes. It also records data changes such as inserts, deletes, and updates, but it provides more details than SQL Server Change Tracking. The mechanism for capturing and the information captured are distinct.
The information is retrieved in Change Data Capture by querying the online transaction log on a regular basis. The procedure is asynchronous. Database performance is unaffected, and performance overhead is lower than with other solutions (e.g. using triggers).
SQL Server CDC requires no schema changes to existing tables, no columns for timestamps are added to the tracked (source) tables, and no triggers are created.
It collects data and stores it in tables known as change tables.
When you apply CDC to a table, SQL Server automatically creates a second table to track all DML (inserts, updates, and deletes) to the table. SQL Server internally tracks all DML to the base table by reading committed transactions from the transaction log, similar to how replication is implemented. This log reading occurs in the background and adds no additional work to the originating transaction.
A 'Sequence' CDC Deployment
Discrepancies in transaction records can arise when two users simultaneously execute identical transactions, causing conflicts in the timestamp field.
Recognizing this challenge, Rivery has implemented a "sequence" Change Data Capture (CDC) mechanism to tackle this issue.
Rivery has enhanced each emitted record from the database by incorporating two extra metadata fields: '__transaction_id' and '__transaction_order'.
The '__transaction_id' field serves as a unique identifier for each transaction, ensuring that no two transactions share the same identifier. This uniqueness allows for precise identification and differentiation between transactions, mitigating conflicts arising from identical timestamps.
Furthermore, the '__transaction_order' field denotes the order in which the transactions were emitted from the database. By incorporating this field, the sequencing of transactions can be accurately maintained, enabling downstream systems such as Apache Kafka or AWS Kinesis to process and order transactions correctly.
The inclusion of these metadata fields guarantees that the ordering of transactions is preserved throughout the River. As a result, smooth and accurate transaction flows can be achieved, resolving the discrepancies that previously arose from transactions with identical timestamps.
The additional fields are depicted in this table:
- Producers - The process of reading the changes made to the source database.
- Kafka - Streaming service to ensure that changes are delivered securely, and credibly.
- Consumers - The process of reading the Kafka Stream and saving the changes as files.
- FileZone - The storage that holds the files that will be loaded into the target datastore.
- DWH - The target Data Warehouse into which the changes will be loaded.
How to Enable Change Data Capture Extraction?
This is a console tour for using SQL Server CDC extraction mode. Please hover over the rippling dots and read the notes attached to follow through.
After you've established a connection, go to the Create tab in the main menu and do the following:
Version 2 is a newer version for using SQL Server Log-Based extraction mode; version 1 will be deprecated in the near future.