CDC 'Point in Time' Position
  • 8 Minutes to read
  • Dark
    Light
  • PDF

CDC 'Point in Time' Position

  • Dark
    Light
  • PDF

Article Summary

Introduction

Within the this document, we will explore the mechanics of Change Data Capture (CDC), a system designed for monitoring of source database logs and the meticulous capture of any modifications made to the source data. Furthermore, we will explore the capabilities of the CDC Point in Time Position feature, which enables users to better understand the operational details of the River's streaming process. This functionality also offers crucial assistance for data recovery and synchronization by enabling users to locate and retrieve data from a particular point in history using the precise data stored in the CDC log position.

Prerequisite

Before proceeding with CDC Point in Time Position setup, ensure you have established a functioning CDC connection for your specific database. If you haven't done so, refer to the following documentation for CDC setup instructions:

Glossary

  • Initial Migration: The process of transferring historical data from the source database to the target data warehouse.
  • Streaming Process: The CDC-based River's active retrieval of changes from the source database log.
  • Table Status: The specific status associated with each selected table, with different states detailed in the Table Configuration Options Document.
  • Detected Tables: All tables identified during the stream enablement process with a 'waiting for migration' table status.

New River Setup Guide

After configuring the Source, schema, and Target settings in a CDC-based River, follow these steps to enable the streaming process:

  1. Activate the "Enable Stream" toggle located at the bottom of your screen.
  2. Rivery will prompt for the desired sync options:

image.png

Automated Sync

Automated Sync is the recommended choice for initial setup and low-touch stream management. The process is as follows:

  1. Enable the 'Enable Stream' UI toggle.

  2. Rivery establishes a CDC connector (sink) if the enablement process is successful.

  3. The CDC connector continuously fetches changes from your database since the enablement of the CDC process.

  4. If a full migration is required, Rivery initiates a one-time migration process concurrently with the CDC connector establishment and CDC log-based changes retrieval. Note that the duration of the initial migration process depends on the size of your database's migrated tables.

  5. The historical data from the initial migration is stored in a managed/custom filezone as per the user's Target connection definition.

  6. Rivery replicates all historical data from the filezone to the user's DWH target table(s).

  7. After migration, all changes captured from the CDC connector establishment time and future changes stream to the user's DWH target table(s) based on the River schedule. If the user opts to skip the migration process, the first load and any subsequent changes will stream directly from the CDC connector to the target table(s).

image.png

Reinitialize Sync

Reinitialize Sync is recommended in the event of a database failure, corrupted log, or other scenarios requiring a log resync. When selected, Rivery will point the River’s log position to the source database's current position and capture changes from this point onward.

Manual Sync

This Option grants complete command over the streaming process, particularly concerning the log position of the River. It is advisable to employ this feature if you intend to commence retrieving updates from the user's database log starting at a particular point or if you want to restore data from a specific point within the river. When enabled, Rivery will obtain data from the user-inputted log position once the river is executed, following the established schedule. Any modifications retrieved from the log will be replicated to the target Data Warehouse (DWH) following the initial migration (or immediately if the migration process is skipped).

Please be aware that improper usage of this option can lead to data loss. Prior to utilizing this feature, please ensure that you genuinely intend to load data starting from a specific point in time and that no changes before the provided position need to be retrieved.

Follow these steps to manually configure the position:

  1. Activate the 'enable stream' UI toggle.

  2. Rivery will establish a CDC connector based on the user's manual configuration, provided the specified location exists.

  3. The CDC connector will continuously retrieve any database changes from the moment the CDC process is re-enabled.

  4. If the user opts for a full migration, Rivery will initiate a one-time migration process concurrently with the establishment of the CDC connector and the retrieval of changes Please note that the initial migration process will affect the runtime of your data river based on the size of the tables being migrated.

  5. The historical data from the initial migration will be stored in a managed or custom filezone, as defined by the user's Target connection configuration.

  6. Rivery will replicate all historical data from the filezone to the user's designated Data Warehouse (DWH) target table(s).

  7. Following the completion of the migration process, any changes captured from the time of CDC connector establishment and all future changes will be streamed to the user's DWH target table(s) based on the River's schedule. If the user chooses to skip the migration process, the initial load and all future changes will be streamed directly from the CDC connector to the target table(s).

image.png


Existing River Setup Guide

Following the initial setup of the streaming process, every operational river will maintain a designated log position that continuously adapts to alterations within the user's database.
To check the current log position, follow these steps:

  1. Go to the Source tab.
  2. Access the Advanced Options at the bottom of your screen.
  3. Select the "Check Stream Position" button:

image.png

The latest CDC log position will be displayed:

image.png

Please Note that the "Check Stream Position" button is visible only after the first River run.


In case the streaming process of an existing River is disabled or you wish to change the River's position mode, follow these steps:

Automated Sync

As previously stated, this option is advisable for managing streams with minimal manual intervention. Once activated within an established river, the Rivery CDC connector will autonomously retrieve any updates starting from the most recent stream position behind the scenes. These updates will either be immediately pushed or deferred until after the initial migration process (if chosen).

Reinitialize Sync

This option is suggested for situations where a database failure, log corruption, or any other circumstance necessitates a log re-synchronization. When activated within an established river, Rivery will reset the log position of the existing river and initialize it by aligning the CDC connector's log position with the current position in the user's database. It will then capture changes from this point onward.

Any updates retrieved from the log will either be immediately pushed or deferred until after the initial migration process).
Please note that improper use of this option can lead to data loss.

To reinitialize the synchronization process, follow these steps:

  1. Deactivate the 'Enable stream' UI toggle.

  2. This action disables Rivery's CDC connector.

  3. The last known CDC position (where Rivery ceased to retrieve changes) is established as the most recent river position, denoted as 'X' in the diagram.

  4. Reactivate the 'Enable stream' UI toggle.

  5. Rivery re-establishes the CDC connector, disregarding the current known log position ('X') and replacing it with the latest available log position from the user's database.

  6. The CDC connector continuously fetches any changes from the user's database starting from the moment of reactivation.

  7. If the user opts to execute a full migration, Rivery initiates a one-time migration process concurrently with CDC connector re-establishment and change retrieval. Please note that an initial migration process will extend the duration of your river run, depending on the size of your database's migrated tables.

  8. Historical data from the initial migration is stored in a managed or custom file zone, as per the user's target connection definition.

  9. Rivery replicates all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).

  10. After migration, all changes captured from the time of CDC connector re-establishment and all subsequent changes are streamed to the user's DWH target table(s) based on the River's schedule. If the user chooses to skip the migration process, the initial load and all future changes will be directly streamed from the CDC connector to the target table(s).

image.png

Manual Sync

This option grants comprehensive control over the streaming process, especially regarding the river's log position. It is advisable to employ this option when you wish to retrieve database log changes from a specific starting point or restore the river's data from a particular point in time. When activated within an existing river, Rivery will erase the current river log position and instead configure it to match the user-provided input, which represents the log position. Changes from this designated point will be directly pushed or scheduled for push after the initial migration process (if chosen).

Please be aware that improper use of this option can result in data loss. Before utilizing it, ensure that you are proficient in obtaining the database log position.

To perform a manual synchronization, follow these steps:

  1. Disable the 'enable stream' UI toggle.
  2. The Rivery CDC connector (sink) will be deactivated.
  3. Set the latest known CDC connector position (where Rivery ceased to retrieve changes) as the current river position, denoted as 'X' in the diagram.
  4. Enable the 'enable stream' UI toggle again.
  5. Rivery will re-establish the CDC connector, disregarding the current known log position ('X') and replacing it with the position specified by the user, represented as 'Y' in the diagram.
  6. The CDC connector will continuously retrieve any changes from your database from the moment the CDC is re-enabled.
  7. If you opt to execute a full migration, Rivery will commence a one-time migration process in parallel with CDC connector re-establishment and change retrieval. Please note that an initial migration process will extend the duration of your river run, depending on the size of your database's migrated tables.
  8. The historical data from the initial migration will be stored in a managed or custom file zone in accordance with the user's Target connection definition.
  9. Rivery will replicate all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).
  10. After migration, all changes captured from the time of CDC connector re-establishment and all subsequent changes will be streamed to the user's DWH target table(s) based on the River's schedule. If the user decides to skip the migration process, the initial load and any future changes will be directly streamed from the CDC connector to the target table(s).

image.png


Was this article helpful?

What's Next