SQL Server CDC Walkthrough
  • 2 Minutes to read
  • Dark
    Light
  • PDF

SQL Server CDC Walkthrough

  • Dark
    Light
  • PDF

Article summary

How Does Change Data Capture Extraction Work?

SQL Server Change Data Capture (CDC) was created to simplify the extract, transform, and load processes. It also records data changes such as inserts, deletes, and updates, but it provides more details than SQL Server Change Tracking. The mechanism for capturing and the information captured are distinct.

The information is retrieved in Change Data Capture by querying the online transaction log on a regular basis. The procedure is asynchronous. Database performance is unaffected, and performance overhead is lower than with other solutions (e.g. using triggers).

SQL Server CDC requires no schema changes to existing tables, no columns for timestamps are added to the tracked (source) tables, and no triggers are created.
It collects data and stores it in tables known as change tables.

When you apply CDC to a table, SQL Server automatically creates a second table to track all DML (inserts, updates, and deletes) to the table. SQL Server internally tracks all DML to the base table by reading committed transactions from the transaction log, similar to how replication is implemented. This log reading occurs in the background and adds no additional work to the originating transaction.

Please Note:

  • A CDC table must have a Primary Key to be tracked by the CDC mechanism in SQL Server, as required by SQL Server.
  • Rivery does not support the @index_name option.

A 'Sequence' CDC Deployment

Discrepancies in transaction records can arise when two users simultaneously execute identical transactions, causing conflicts in the timestamp field.
Recognizing this challenge, Rivery has implemented a "sequence" Change Data Capture (CDC) mechanism to tackle this issue.

Rivery has enhanced each emitted record from the database by incorporating two extra metadata fields: '__transaction_id' and '__transaction_order'.

The '__transaction_id' field serves as a unique identifier for each transaction, ensuring that no two transactions share the same identifier. This uniqueness allows for precise identification and differentiation between transactions, mitigating conflicts arising from identical timestamps.

Furthermore, the '__transaction_order' field denotes the order in which the transactions were emitted from the database. By incorporating this field, the sequencing of transactions can be accurately maintained, enabling downstream systems such as Apache Kafka or AWS Kinesis to process and order transactions correctly.

The inclusion of these metadata fields guarantees that the ordering of transactions is preserved throughout the River. As a result, smooth and accurate transaction flows can be achieved, resolving the discrepancies that previously arose from transactions with identical timestamps.

The additional fields are depicted in this table:
image.png

For further details about Change Data Capture (CDC) Metadata Fields, please refer to our Database Overview document.

Architecture Diagram

image.png

  • Producers - The process of reading the changes made to the source database.
  • Kafka - Streaming service to ensure that changes are delivered securely, and credibly.
  • Consumers - The process of reading the Kafka Stream and saving the changes as files.
  • FileZone - The storage that holds the files that will be loaded into the target datastore.
  • DWH - The target Data Warehouse into which the changes will be loaded.


How to Enable Change Data Capture Extraction?

This is a console tour for using SQL Server CDC extraction mode. Please hover over the rippling dots and read the notes attached to follow through.

After you've established a connection, go to the Create tab in the main menu and do the following:


Was this article helpful?