- 6 Minutes to read
- Print
- DarkLight
- PDF
Amazon Redshift Upsert-Merge Loading Mode Options
- 6 Minutes to read
- Print
- DarkLight
- PDF
Overview
Rivery allows for 3 distinct options when the Upsert-Merge loading mode option is selected in Amazon Redshift.
The Upsert-Merge option is used to update existing data in a table that may have changed in the source since the last run, as well as insert new records. While all three Upsert-Merge options result in this effect, they differ in the backend process, and have performance implications.
Switch - Merge
The Switch Merge method in Amazon Redshift is a way of merging data from two or more tables or streams into a single table or stream. It works by creating a new table or stream that contains the data from the other tables, and then switching the references to the new table. This can be useful for a number of purposes, such as:
Updating data in a table: By using the switch merge method, you can easily update the data in a table by replacing it with new data from another table.
Consolidating data from multiple tables: If you have data spread across multiple tables, the Switch Merge method can be used to consolidate it into a single table.
Performing "Upsert" operations: The Switch Merge method can be used to perform "upsert" operations, which insert new rows into a table or stream if they don't already exist, or update existing rows if they do.
Switch-Merge Method Flowchart
The Steps
- Select target table and merge table: The target table, which will be modified by the merge, and the merge table, which contains the data to be merged, are selected.
- Create temporary table: A temporary table is created to hold the merged data.
- Insert merged data into temporary table: The data from the merge table is inserted into the temporary table.
- Swap target table and temporary table: The original target table is swapped with the temporary table containing the merged data.
- Check for conflicts in merged data: Any conflicts or errors in the merged data are identified.
- Resolve conflicts in merged data: Any conflicts or errors are resolved.
- Drop temporary table: The temporary table is no longer needed, so it is dropped.
- Finish: The process is completed and the merged table is now the current table.
Example
In Amazon Redshift, the MERGE statement allows you to update or insert data into a table, depending on whether the data already exists. The SWITCH statement can be used to control the order in which the WHEN MATCHED and WHEN NOT MATCHED clauses are executed within the MERGE statement.
Here is an example of how you could use the SWITCH and MERGE statements together in Amazon Redshift:
CREATE TABLE my_table (
id INTEGER,
data VARCHAR
);
INSERT INTO my_table (id, data)
VALUES (1, 'A'), (2, 'B'), (3, 'C');
-- Update rows with even id
MERGE INTO my_table t
USING (
SELECT id, 'X' as data
FROM my_table
WHERE id % 2 = 0
) s
ON t.id = s.id
WHEN MATCHED THEN
UPDATE SET t.data = s.data
WHEN NOT MATCHED THEN
INSERT (id, data) VALUES (s.id, s.data);
-- Update rows with odd id
SWITCH
WHEN id % 2 = 1 THEN
MERGE INTO my_table t
USING (
SELECT id, 'Y' as data
FROM my_table
WHERE id % 2 = 1
) s
ON t.id = s.id
WHEN MATCHED THEN
UPDATE SET t.data = s.data
WHEN NOT MATCHED THEN
INSERT (id, data) VALUES (s.id, s.data);
END
In this example, the SWITCH statement is used to select between two blocks of MERGE statements. The first MERGE statement updates rows with even id values, and the second MERGE statement, which is executed when the SWITCH condition is met, updates rows with odd id values.
Please Note:
In cases of Switch-Merge (or other fallbacks to Switch-Merge due to schema changes), we replace the target table. Consequently, when replacing the target table, we must re-grant the original table's permissions.
Delete - Insert
The DELETE-INSERT method in Amazon Redshift allows you to update a table by deleting the outdated rows and then adding new rows containing the updated data. This can be useful when you want to replace all of the data in a table rather than only changing a few rows or columns.
The benefits of using Delete-Insert include the fact that a unique match key is not necessary and that changes to metadata, such as renaming columns or changing the data type, will not cause an error.
Delete-Insert Method Flowchart
The Steps
- Begin transaction: This step begins a new transaction, which allows you to execute multiple SQL statements as a single unit of work. Transactions are used to ensure data integrity and consistency.
- Delete rows in target table: In this step, all rows in the target table that match the condition specified in the WHERE clause are deleted.
- Insert new rows: In this step, the new rows are inserted into the target table.
- Commit transaction: This step commits the transaction, which makes the changes made during the transaction permanent.
Example
Here is an example of how the DELETE-INSERT method can be used in Amazon Redshift:
BEGIN;
DELETE FROM my_table WHERE id = 123;
INSERT INTO my_table (id, col1, col2) VALUES (123, 'new_val1', 'new_val2');
COMMIT;
In this example, any rows in the my table table with an id of 123 will be deleted. A new row will then be inserted with the revised values for col1 and col2.
It's vital to keep in mind that altering data in a table using the DELETE-INSERT method may take longer than using the UPDATE statement. This is due to the fact that it requires deleting and re-inserting data, which might eat up more resources than just updating already-existing data. However, there are times when you might wish to fully replace the data in a table, and in those cases, it can be helpful.
Merge
Based on whether the row already exists or not, the MERGE command lets you update or add rows to tables. It can be used to insert data into a table or update an existing one.
Merge Method Flowchart
The Steps
- Identify the Target Table: This is the table that the data from the source table will be merged into.
- Find the source table: This is the table that contains the data that will be merged into the target table.
- Determine the merge condition: This is the condition that specifies which rows from the source table should be merged into the target table.
- Execute the merge: This step performs the actual merge operation, using the target table, source table, and merge condition specified in the previous steps. The merge operation will insert any rows from the source table that do not exist in the target table and update any rows in the target table that match the merge condition with the corresponding rows from the source table.
Example
Here is an example of how to use the Merge method in Amazon Redshift:
MERGE INTO target_table t
USING (SELECT * FROM source_table WHERE column1 = 'value1') s
ON t.primary_key = s.primary_key
WHEN MATCHED THEN UPDATE SET t.column2 = s.column2
WHEN NOT MATCHED THEN INSERT (t.primary_key, t.column2) VALUES (s.primary_key, s.column2);
In this example, the MERGE statement updates the column2 value in the target_table for rows that have a matching primary_key value in the source_table. For rows in the source_table that do not have a matching primary_key value in the target_table, the MERGE statement inserts these rows into the target_table.