Oracle CDC Overview
  • 5 Minutes to read
  • Dark
    Light
  • PDF

Oracle CDC Overview

  • Dark
    Light
  • PDF

Article summary

Introduction

Rivery uses the capabilities of Oracle LogMiner to establish a robust and effective Change Data Capture (CDC) feature for Oracle databases. Our internally developed solution maximizes the potential of Oracle LogMiner, ensuring a dependable and precise method for capturing and handling alterations within the Oracle database. By capturing table inserts, updates, and deletes, CDC enables applications to efficiently utilize real-time data changes.

This document explains the concept of Oracle CDC, its benefits, and how it can be enabled in Rivery. It will guide you through the process of setting up Oracle CDC in Rivery.

Oracle Change Data Capture (CDC)

What is CDC?

CDC, or Change Data Capture, is a feature in Oracle Database that identifies and captures data changes in tables. Instead of querying the entire table to check for modifications, CDC enables users to extract only the changed data, reducing resource usage and improving application performance.

Benefits of CDC

  • Real-time data integration: CDC allows for the immediate capture and propagation of data changes, enabling real-time data warehousing and analytics.

  • Reduced load on the source database: By capturing only the changes, CDC minimizes the impact on the source database, improving overall system performance.

  • Reliable data synchronization: CDC ensures that the target system receives a consistent and accurate representation of data changes.

How CDC Works

The Rivery CDC solution operates by continuously observing the database's Redo-log, which maintains a comprehensive history of all data modifications. Once a transaction is confirmed, corresponding Redo-log entries are produced by the database to record each specific change. Using the Logminer interface, we extract information from Redo-logs. Oracle manages these archived logs, which effectively record and store the changes made to the data.

CDC Point in Time Position Feature

The CDC "Point in Time" Position feature allows users to gain deeper insights into the operational details of a River's streaming process. This functionality is essential for data recovery and synchronization, enabling users to locate and retrieve data from a specific point in history using the exact information stored in the CDC log position. For additional information, refer to our documentation.

image.png

A 'Sequence' CDC Deployment

Discrepancies in transaction records can arise when two users simultaneously execute identical transactions, causing conflicts in the timestamp field.
Recognizing this challenge, Rivery has implemented a "sequence" Change Data Capture (CDC) mechanism to tackle this issue.

Rivery has enhanced each emitted record from the database by incorporating two extra metadata fields: '__transaction_id' and '__transaction_order'.

The '__transaction_id' field serves as a unique identifier for each transaction, ensuring that no two transactions share the same identifier. This uniqueness allows for precise identification and differentiation between transactions, mitigating conflicts arising from identical timestamps.

Furthermore, the '__transaction_order' field denotes the order in which the transactions were emitted from the database. By incorporating this field, the sequencing of transactions can be accurately maintained, enabling downstream systems such as Apache Kafka or AWS Kinesis to process and order transactions correctly.

The inclusion of these metadata fields guarantees that the ordering of transactions is preserved throughout the River. As a result, smooth and accurate transaction flows can be achieved, resolving the discrepancies that previously arose from transactions with identical timestamps.

The additional fields are depicted in this table:
image.png

For further details about Change Data Capture (CDC) Metadata Fields, please refer to our Database Overview document.

Oracle CDB and PDB in Integration with CDC

In the context of real-time Integration with Change Data Capture (CDC), Oracle's Container Database (CDB) and Pluggable Database (PDB) architecture plays a crucial role. This section provides an overview of the CDB-PDB architecture and its significance in enabling real-time data integration through Change Data Capture.

Oracle CDB-PDB Architecture

Container Database (CDB)

The Container Database serves as the root container that holds multiple Pluggable Databases. It provides a centralized and shared infrastructure for managing and maintaining database resources. In the context of real-time data integration, the CDB allows for efficient resource utilization and a unified approach to Change Data Capture.

Pluggable Database (PDB)

Pluggable Databases reside within the Container Database and operate as separate, fully functional databases. Each PDB has its own data dictionary, tablespaces, and schema, enabling a multi-tenant architecture. Real-Time Integration with CDC leverages the isolation and independence of PDBs for capturing and processing changes at a granular level.

Integration with Change Data Capture (CDC)

The CDB-PDB architecture is well-suited for CDC scenarios. Each PDB can independently enable Change Data Capture based on its specific requirements. With multiple PDBs operating independently, the CDB-PDB architecture provides scalability in capturing and processing changes, making it suitable for large scale integration scenarios.

CDC Setup in Oracle CDB-PDB

Within Oracle's multitenant architecture, particularly when managing redo logs within a Pluggable Database (PDB) setting, a user with administrative privileges designated as 'sys' with the role 'sysdba' is required to establish a new user for Rivery applications.
This user should possess the capability to utilize the LogMiner API for redo log operations on Pluggable Databases (PDBs), as employed by Rivery for streaming changes.

To get guidance on setting up, please review the Oracle CDC Configuration document. For more information about Container Databases (CDBs) and Pluggable Databases (PDBs), consult Oracle's documentation.

Working with Oracle CDC in Rivery

Rivery integrates seamlessly with Oracle CDC by leveraging CDC capabilities provided by Oracle Database. Rivery's CDC integration enables users to set up data replication tasks that capture and propagate real-time changes from Oracle Database to their desired data Targets.

To set up Oracle CDC in Rivery, follow these general steps:

  1. Oracle CDC Configuration: In order to use this feature, it is crucial to activate the Archivelog mode in your Oracle Database. Activating this mode is essential for the effective operation of the CDC mechanism, as it guarantees the preservation and retrievability of all data changes.

  2. Connect to Oracle Database: Establish a connection between Rivery and the Oracle Database containing the source data.

  3. Enable CDC: Within the following console tour on using Oracle CDC extraction mode, you will be guided through the process of:

  • Enabling CDC in Oracle
  • Selecting a schema for CDC
  • Choosing the desired frequency for the River to run
  • Making sure CDC can be used by ensuring the table has a key.

Please hover over the rippling dots and read the notes attached to follow through.

Current Stream Position in Your Database

To confirm the Stream position, run the following command on the server:

SELECT CURRENT_SCN
FROM V$DATABASE;

Was this article helpful?