- 7 Minutes to read
- 7 Minutes to read
In Rivery, a Target is a location to which data is being loaded or moved. Any storage system that can receive and store data qualifies, whether it's a database, data warehouse, etc. Connectors are used to link Targets to Rivery, enabling Rivery to work with the target and carry out operations, including loading data, building tables, and executing queries.
Rivery supports loading data into a variety of data storage and warehouse systems. You can set up connections to multiple targets as needed. For instructions on how to connect to specific types of Targets, refer to the relevant documentation:
The Process and Principles
Running Over Data Images
Setting up a Custom File Zone in Rivery is an optional feature, with the default option relying on the Managed File Zone provided by the platform, which requires no setup.
The main advantage of setting up a Custom File Zone is the ability for organizations to ensure that data is stored within their own file zones, as opposed to being stored in Rivery's Managed File Zone. This also enables organizations to use the Custom File Zone as a data lake, where raw data can be stored before it is loaded into a Target cloud data warehouse. Furthermore, organizations have the ability to define their own retention policies for data stored in the Custom File Zone. Rivery's Managed File Zone (default) retains data for a period of 48 hours.
Rivery allows for 3 distinct options when it comes to the loading mode options:
- Append Only
Rivery's Upsert-Merge method is a way to synchronize data between different Sources and Targets. It first checks if a record exists in the target table based on a specified key field (or set of keys). If the record does exist, this mode updates the existing record with the new data. If the record does not exist, the method creates a new record in the target table with the new data. This allows data to be kept up-to-date and consistent between different sources and destinations.
When using Snowflake, Azure SQL, or Databricks, there are multiple Merge methods for loading data.
Upsert-Merge Method Flowchart
1. Incoming data is loaded into a temporary table: This is done to make sure that the data is verified, cleared, and transformed (if necessary) before being put into the target table.
2. Destination table is truncated: All of the data rows in the table are removed. This step is essential to ensure the target table is empty before the new data is loaded. This guarantees that no redundant or outdated data is left behind.
3. Target table is updated with data from the temporary table: With the new information from the temporary table, this final stage of the overwrite procedure replaces the whole contents of the target table.
The append-only method ensures that data is added to a table or a dataset without modifying or deleting any existing data. When using this method, new data is appended to the end of the table or dataset, and the original data remains unchanged. This method is useful for maintaining a historical record of data changes and ensuring data integrity and consistency.
Append-Only Method Flowchart
1. Identifying the table: The first step in the Append Only method is choosing the table to which new data will be inserted.
2. Extracting new data: New data is then taken from the source. You can accomplish this by reading a file, running a database query, or using any other data retrieval technique. After that, the extracted data is ready for validation.
3. Validating the data: The extracted data is then checked for errors or inconsistencies to make sure it is in the right format. To guarantee the accuracy of the data and avoid any problems with the data being added to the database, this validation phase is crucial.
4. Appending data to the table: After the data has been verified, it is added to the table's end, adding new rows for each new entry. The table contains the new data, but earlier iterations are not changed or removed.
5. Storing the data: The table now contains the new data, but prior iterations are left alone and unaltered. As a result, it is simple to keep track of changes made to the data over time and to preserve a history of those changes.
6. Repeat the process: The process is repeated each time new information has to be added to the table. As a result, new data can be continuously added to the database without changing or removing the data that is already there. This method makes it simple to keep track of any changes made to the data over time and preserve a record of those alterations.
The Overwrite method replaces the existing data in a table or dataset with new data. When using this method, the new data completely replaces the existing data, and the original data is permanently lost. This method is useful for updating and refreshing large data sets, but it should be used with caution as it can result in data loss if the new data is not accurate or complete.
When using the overwrite method, it's important to have a data backup before replacing it.
Overwrite Method Flowchart
1. Begin: In Rivery, you have the option of configuring a pipeline using the Overwrite approach. This indicates that new data from the source table or file will be substituted for existing data in the target table or file when the River is executed.
2. The source table or file is identified and selected: Finding the source table or file that holds the data that will be used to overwrite the target table or file is the first step in the River. This might be a CSV file, a database table, or some other kind of file or data source.
3. The target table is identified and selected: This is to determine which table or file will receive the data from the source table or file and which will be replaced. This might be a CSV file, a database table, etc.
4. A table or file's data is moved from its source to its destination: In the following step, the data is moved from the source table or file to the target table or file. Writing data to the target table or file after reading it from the source table or file and formatting it as necessary.
5. The data in the target table is replaced with the data from the source table: The information from the source table or file is then rewritten in the destination table or file once the data has been transferred. Doing this will replace the old data in the target table or file with the new data.
6. Data is overwritten in the target table or file once the River is executed: The pipeline is executed and the data is overwritten in the destination table or file after all the configuration steps have been completed. Once the operation is finished, the new data from the source table or file is present in the destination table or file.
Source To Target River: Loading Data Into a Target Table
Using the above-mentioned core principles, Source to target Rivers are pipelines that take data from sources and load it into target databases. The Extract and Load phases of the ELT process are handled by these Rivers. Depending on the pipeline configuration, data can be loaded using a variety of modes, including Upsert-Merge, Overwrite, and Append-Only. The File Zone is used to load the data.