SQL Server Database Walkthrough
  • 6 Minutes to read
  • Dark
    Light
  • PDF

SQL Server Database Walkthrough

  • Dark
    Light
  • PDF

Article Summary

This guide will walk you through the steps needed to integrate data from a SQL Server database (on-prem or managed service) into a cloud target using Rivery.

For steps on how to connect Rivery to your SQL Server, see here for standard extraction requirements and here for log-based extraction instructions.

Pull SQL Server Data into a Target

Using Rivery, you can pull data from your SQL Server tables and send that data into your target database.

First select 'Create New River' from the top right of the Rivery screen.

image

Choose 'Data Source to Target' as your river type.

In the 'General Info' tab, name your river and give it a description. Next, navigate to the 'Source' tab.

Find SQL Server in the list of data sources and select it:

image

Define a Source Connection (this will be the connection created earlier in the process). If you do not yet have a SQL Server connection in your Rivery account, you can create a new connection here by clicking 'Create New Connection.'

image

Next, choose your River mode.

image

  • Multi-Tables : Load multiple tables simultaneously from SQL Server to your target.
  • Custom Query : Create a custom query and load it into your target.
  • Union Tables : Merge several tables with the same metadata into a single table.
  • Legacy River : Choose a single source table to load into a single target.

Multi-Tables mode

For a detailed walk through of Multi-Tables mode, please refer to the Rivery Database Migration docs.

Pulling data from a custom query

You may also use the Custom Query in order to define a query data pull from Rivery. You

may use any query that works in the source, using a specific SELECT query without any other statements. Rivery isn’t compatible using multi-statements or SQL Script in the custom query field.

image

Known Issues

TIMESTAMP columns might not be returned in the right format as the target table TIMESTAMP type, and will fail to be uploaded into the target table with as TIMESTAMP.
When getting an error of a TIMESTAMP column that failed to be loaded into the target table, try to convert its format in your custom query as follows:
convert(varchar, <DATE_COLUMN_NAME>, 127) as <DATE_COLUMN_NAME>
Please make sure that this colum is a TIMESTAMP type in the source tab mapping.

 


Extract Method:

Using Rivery, you can pull your data incrementally or pull the entirety of the data that exists in the table:

image

  • All: Fetch all data in the table using Chunks.
  • Incremental: Gives you the option to run over a column in a table or your custom query top SELECT. You can run over it by filtering Start and End dates or Epoch times

Moreover, you may choose to run over the date using Daily, Monthly, Weekly, or Yearly chunks.

Please define the incremental Field to be used in the Incremental Field section. After choosing the incremental field, choose the Incremental Type and the dates/values you would like to fetch.

Note: Rivery will manage the increments over the runs using the Maximum value in the data. This means you will always get the entire data since the last run, which prevents data holes. You just need to configure your river once.

Recommended : Define your incremental field in Rivery over a field with an Index or Partitions key in the table.

image

Note:
  • Start Date is mandatory.

  • Data can be retrieved for the date range specified between the Start and End dates.

  • If you leave the end date blank, the data will be pulled at the current time of the River's run.

  • Dates timezone: UTC time.

  • The Start Date won't be advanced if a River run is unsuccessful.
    If you don't want this default setting, click More Options and check the box to advance the start date even if the River run is unsuccessful (Not recommended).
    image.png

  • Use the 'Last Days Back For Each Run' option to gather data from a specified number of days prior to the selected start date.

Limit and Auto Mapping

After defining the extract method, you may choose a limit of top N rows to fetch. Rivery will set your Schema using the Auto Mapping feature. You can also choose fields you want to fetch in the Mapping table and add fields on your own.

image

 


Row Version

Row Version is commonly used to version-stamp table rows. The storage size is 8 bytes.
The row version data type is an incrementing integer that does not store a date or time.

Row Version can be used in two River modes:

  • Legacy River
  • Multi-Tables

To use Row Version, complete these steps:

We will use Legacy River for this process, but the technique is the same for Multi-Tables.

  1. Choose your Source Connection.

Group 1 3.png

  1. Select Schema and Table Name.
    Group 1 5.png
  2. Click on Extract Method and select Incremental.
  3. Type in Incremental Field Name.
  4. Choose Row Vision in Incremental Type.
    123454.PNG
  5. Enter a zero as the Start Value.
    edsfwer.PNG
    Note:
    The default value for 'Rows in Chunk' is 100,000.

    7.Click Run river.
    8. Connect to your Target
    image 1.png
    Note:
    The Row Version is indicated in Hexadecimal values.

When you run a River with Row Version again, the Start Value will be updated to the last row's end value.

dsfdfs.PNG

 


Union Tables Mode

For like-named tables with identical schemas, the Union Tables river mode can be used to ingest these tables in a single process. Simply enter the table prefix using '*' as a wildcard and then

image

You can click 'Search Tables' below to return a list of tables in the database.

image

Next, choose your extraction method and map the columns by clicking 'Auto Mapping.' Note - the column names and data types are expected to be identical in all tables that are selected to load through the Union Tables mode.

image

 


Legacy River Mode

This river mode allows for the load of a single source table into a single target table.

image

In the above screenshot, there are steps to define the source table to pull (Rivery will auto-detect available schemas and tables), the extraction method to use, and options for filters or row limits on the data pull.

 


Multi-Table Mode

Load multiple tables simultaneously from SQL Server to your target. There are two Default Extraction Modes: Standard Extraction and Log-Based.

image.png

On the 'Table Settings' tab you are able to edit the following:

  • Change the loading mode

  • Change the extraction method. If 'Incremental' is selected, you can then define which field will be used to define the increment.

  • Filter by expresion that will be used as a WHERE clause to fetch the selected data from the table.

 

Advanced Options

When using Multi-Tables River mode in Standard or Change Tracking extraction mode, the Advanced Options at the bottom contains checkboxes that could enhance the usability of the data inserted into the Target table:

  • Invalid Characters
  • Hidden columns
  • Varbinary Values

Invalid Characters

When this option is ticked, any invalid characters are automatically replaced with underscores.

Hidden Columns

Automatic extraction of hidden columns is enabled; however, you can disable it by clicking the "Ignore Hidden Columns" checkbox.

Varbinary Values

This option allows you to export the column's Varbinary data as a hexadecimal string; the maximum length is 254 characters.

Deleted Rows

This will pull back deleted rows. The row's key fields won't change, and the remaining fields will all be NULL.

image.png


Schedule the River

Once the creation of the river is complete, navigate to the 'Schedule' tab and click 'Schedule Me.'

image

Choose the frequency at which to schedule the river.

image

To notify certain users about a river failure or warning, enable notifications below:

image

You can edit your {Mail_Alert_Group} in the Variables page (find this in the left-hand pane of the browser).

 

 

Monitor the River

During the river run, or after the run has complete, you can monitor the river in its 'Activities' tab.

image

In this tab you can monitor each the status of the current river run. For the Multi-Table mode, you can monitor at the table-level (see above).

By toggling between 'Run View' and 'Target View' you can see the river results grouped either by time of run or by target location.

Check out the Targets section to find out how to load the data to your target data warehouse.

 

 


Was this article helpful?