Database Migration
  • 4 Minutes to read
  • Dark
  • PDF

Database Migration

  • Dark
  • PDF

Rivery allows for a complete database migration in a matter of clicks. When connecting to a supported RDBMS source (MySQL, PostgreSQL, Microsoft SQL Server, or Oracle) you can create a Data Source to Target river using Multi-Table mode, which provides the capability to migrate multiple tables to your target in a single river.

First, click Create New River and select Data Source to Target.

Set the Source

Name the river and move to the Source tab to choose your relational database source. Once you choose your source database, establish your connection. If you already have made a connection, select it from the connection dropdown. Otherwise, you can create one on the fly by clicking 'Create New Connection' and follow the documentation for your respective database (MySQL, PostgreSQL, Microsoft SQL Server, Oracle).

Select Multi-Tables as your river mode.

Select the default extract method.

  • Standard Extraction - reads data directly from the table.
  • Log Based - continuously reads from the database logs to capture all data and metadata changes in the source database.
  • Change Tracking -  offers an effective mechanism that records the rows in your tables that were modified.

Set the Target 

Next, navigate to the Target tab to select your target data warehouse.

Once a target is selected, establish a connection via the dropdown or create a new one.

Next, set the destination database and schema. Optionally, add a table prefix.

Set the default loading mode to be assigned to your tables. Note - this can be edited at the individual table level in the next step.

To learn more about Rivery loading modes and how they work, see Data Source to Target documentation.

If Log-Based Extraction is being used,  select whether to include snapshot tables in the target. These tables will accompany each data table with a history of all changes made on the source database.

Map the Tables

Next, navigate to the Mapping tab. Select which schemas you would like to migrate.

Click into an individual schema to see the underlying tables. Select the tables you would like to load. 

Once a table is selected, it will appear with a status of 'Waiting for Migration' and its default Extract Method. Even though a default method is set, Rivery allows for both standard extraction and log-based extraction tables in the same river. 

Click the Edit button to further edit the configuration of a specific table.

In the above view, you can edit the following:

  • Select or deselect specific columns
  • Set columns to be keys (this only applies when using Upsert-Merge loading mode)
  • Rename columns in the Target Column field
  • Cast fields to a different data type by clicking the Type value
  • Change the Mode of a field
  • Set fields to be a Cluster Key for partitioning
  • Add a Calculated Column (this will use the SQL syntax of your target data warehouse)

Click into the Table Settings menu to further edit the Loading Mode and Extract Method of the table.

For the table Loading Mode, you can choose between Upsert-Merge, Append and Overwrite.

For the Extraction Method of the table, you can either extract all data in the table, load incrementally, or load changes from the database log. If Incremental is selected as the Extraction Method, you will be prompted to choose a field to define your increments. This can be either a running number or a date field.

If Log-Based Extraction is desired, the logs of the database will have to be enabled. To do this, select 'Enable Log' at the bottom of the river. For more guidance on configuring the source database for Log-Based Extraction, see the docs for the corresponding source database: MySQL, MS SQL Server, MongoDB, or PostgreSQL.

Once 'Enable Log' is selected, Rivery will initiate the log sync, testing both the source and target configuration:

Once the logs have synced successfully, the river can be executed or scheduled.

Once all of your tables have been configured to your migration requirements, you can run your river and monitor the progress in the Activities tab. See the Data Source to Target Overview documentation for a walkthrough on scheduling and monitoring rivers.

Multi-Table River FAQs

What happens if an existing column is dropped on the source table?

In this case, the mapping will not change, but also will not error. Null values will populate the dropped field in the target table.

What if a new column is added to the source table?

Rivery will detect this and add it to the target table. Records that already existed in the target table before the new column was created will have null values for that column.

What if a data type of an existing column is changed in the source table?

The target column will be set as the "wider" data type. For example, if the field is changed from an integer to a string, the target data type will be changed to a string, as this data type can encapsulate integer values. If the column has been changed from a string to a date, it will remain a string in the target.

What kind of TIMESTAMP fields does Rivery output?

Rivery stores timestamps as NTZ, or without the timezone. You can assume that the target will always match the source timestamp, the timezone is not taken into account.

How does Rivery handle records that are deleted in the source database?

Overall, Rivery does not 'hard' delete any data in targets. Deletes can be handled downstream using a Logic Step by querying the resulting table to take only records that are not flagged as deleted (i.e. if the source table includes a delete flag field such as IS_DELETED that is set to TRUE if the record is deleted in the source).

For multi-table rivers utilizing the Log-Based extraction method, an __DELETED field will automatically be added to the target output tables. In this case handling deletes does not require having a delete flag in the source tables since Rivery is connecting to the binary logs instead of directly to the tables themselves.

In Log-Based rivers, Rivery will first perform a full migration to sync the source tables fully with the target tables. After this initial load, all ongoing data changes will be fetched via the log connection. The __DELETED field will appear in the output tables once the first run from the log connection is complete.

Was this article helpful?