SQL Server Log-Based Overview
  • 25 Apr 2022
  • 2 Minutes to read
  • Dark
    Light
  • PDF

SQL Server Log-Based Overview

  • Dark
    Light
  • PDF

What is Log-Based Extraction?

Change Data Capture (or CDC), is an efficient and fast way to fetch data continuously from databases, based on the database transaction log. The Log-Based extraction method in Rivery provides a real-time stream of any change made on the databases and tables configured, which means there is no need for implementing and maintaining any incremental field or getting data by select queries, and also gives the ability to fetch deleted rows and schema changes from the database.

How is Log-Based Extraction configured in SQL Server?

The Microsoft SQL Server log connector is based on the logs provided by the change data capture feature provided by SQL Server Standard (since SQL Server 2016 SP1+) or Enterprise edition. SQL Server capture process monitors all databases and tables the user is interested in, and stores the changes into specifically created CDC tables. The SQL Server CDC tables are used and managed by built-in stored-procedures provided by Microsoft SQL Server.

Like all other databases, you need to specifically enable logging for Microsoft SQL Server to make sure the CDC process works. 

Note
SQL Server requires that  CDC be turned on for each table individually.

How does Log-Based Extraction work in Rivery?

In order to pull data using the Change Data Capture architecture, Rivery continuously pulls new rows in written in the CDC tables, using the output plugin. Rivery cannot rely on the entire log history already existing prior to setting up a river in order to get the historical data from the database. This is because SQL Server normally maintains and truncates the CDC tables after some period of time. 

In order to align the data and the metadata as it is the first run, Rivery makes a full snapshot (or migration) of the chosen table(s) using the Overwrite loading mode. After the migration ends successfully, Rivery takes the existing CDC records and makes an Upsert-merge to the target table(s), continuing to fetch new records from the log as they are created.

The Microsoft SQL Server connector in Rivery reads the CDC tables and produces change events for row-level INSERT, UPDATE, and DELETE commands into the FileZone files. Each file represents a group of operations made in the database in a specific timeframe. The data from the log is streamed constantly (by timeframes of maximum 15 minutes) into the FileZone path determined in the river and pushed into the target by the river's scheduled frequency. This approach means the data is saved, first and foremost, in the FileZone, and then can be pushed anytime into the target DWH.

For more information regarding the FileZone configuration in Rivery, you can refer to the target configuration documentation.

Architecture Diagram -

Limitations

For table names, CDC only allows the following characters: 'A-Z', 'a-z', '0-9', the underscore '_' character, and the hyphen '-' character.


Was this article helpful?