- 3 Minutes to read
MongoDB Log-Based Overview
- 3 Minutes to read
What is Log-Based Extraction?
Rivery's Log-Based extraction method provides a real-time stream of any changes made to the databases and tables configured, eliminating the need to implement and maintain incremental fields or retrieve data via select queries. It also allows you to retrieve schema changes from the database.
How Does Log-Based Extraction Work?
Rivery uses the Change Data Capture architecture to continuously pull new rows from the Change Streams in order to retrieve data.
Change Data Capture (CDC) is a fast and effective method of continuously fetching data from databases using the database transaction log.
Rivery uses the Overwrite loading mode to take a full snapshot (or migration) of the chosen table(s) in order to align the data and metadata as it was on the first run. Rivery takes the existing 'Change Stream' records and performs an Upsert-Merge to the target table(s) after the migration is complete, while continuing to fetch new records from the log as they are created.
Rivery's MongoDB connection reads the 'Change Stream' records and generates change events in the FileZone files for row-level INSERT and UPDATE commands. Each file represents a set of database actions performed over a period of time. The data from the log is continuously streamed into the FileZone path established in the River and pushed into the target by the River's scheduled frequency. This method saves the data first in the FileZone, and then it may be pushed into the target DWH at any moment.
FileZone is covered in further detail in the Target documentation.
How to Enable Log-Based Extraction?
- A brief reminder appears, encouraging you to check your connection and set up your Source and Target, which will happen next. Select 'Got It' to proceed.
- Turn the 'Enable Log' toggle to true at the bottom of the page.
- There are some limitations when it comes to connecting to MongoDB (click the URI and SSH to get to their documentation):
|URI||If you're using Log-Based with Atlas, leave the analytics node out of the connection URI.|
Connecting to Primary with MongoDB Atlas is supported. We'll be able to connect to the Analytics Node, but we won't be able to get any messages from it due to Atlas' implementation.
- MongoDB will not send any documents that are more than 16MB in size, including all metadata in the change stream.
- Any special character will be replaced with an underscore. As a result, if you wish to edit the table name, go to:
1. The 'Schema' tab
2. Select a collection
3. Click 'Table Settings'
4. Choose 'Edit' to change the table name manually.