- 4 Minutes to read
- Print
- DarkLight
- PDF
Setting up Google BigQuery as a Target
- 4 Minutes to read
- Print
- DarkLight
- PDF
Overview
Before using this guide, please make sure you have signed up for the Google Cloud Platform and have a console admin user.
Prerequisites
- Enable BigQuery API
- Enable Google Cloud Storage JSON API
- An available BigQuery Dataset. Create a new BigQuery dataset if you do not already have one.
- Create a Google Cloud Storage bucket (only required if using Custom File Zone option)
In the Rivery console, navigate to Connections > New Connection and select 'Google BigQuery (Target)' to create a new BigQuery connection.
Find your Google Cloud Platform Project ID and Project Number by navigating to your Google Cloud console home page.
Enter these values in the Rivery Connection screen, and select the region where your BigQuery dataset exists.
With Rivery you can either choose to leverage an automatically created service account and bucket for your account or set your own. This is determined by the 'Custom File Zone' toggle in the connection setup.
When the above toggle is left OFF, this implies that Rivery's service account and managed file zone will be used in the connection. When the toggle is ON, a Custom File Zone (user-managed Google Cloud Storage bucket) is used to stage the incoming data before inserting into the final Google BigQuery table(s).
Working with Rivery Default File Zone
Rivery automatically creates a dedicated service account and Google Cloud Storage bucket for your ease of use. The service account only has access to the dedicated bucket created. In this option, you will need to grant permission to Rivery's service account.
Grant permission to the service account
Rivery uploads your source data into a dedicated Google Cloud Storage bucket. Your BigQuery project then pulls in data from this storage bucket. For Rivery to access the BigQuery project, you will need to grant permission to the service account listed in the Connection screen to the relevant BigQuery project.
- Sign in to Google Cloud Platform Console. Make sure you are on the desired project.
- Go to IAM & Admin -> IAM.
- Click on +GRANT ACCESS.
- In the new window add the Rivery service account as the principal.
- Choose the BigQuery Admin role and click save.
Working with a Custom File Zone
Using the 'Custom File Zone' option means that Rivery will use a Google Cloud Storage bucket on your Google Cloud Platform project instead of a Rivery-managed one. To leverage this option, toggle the 'Custom File Zone' ON.
Create a Service Account User for Rivery
To access the relevant storage bucket and BigQuery project, you will need to first create a service account in the Google Cloud Platform Console and provide it access to the relevant bucket and the relevant BigQuery project.
In order to create a new service account in Google Cloud Platform,
please make sure your user has a ServiceAccountAdmin role.
Now, let’s create a service account in the GCP Console:
- Sign into Google Cloud Platform Console .
- Go to IAM & Admin -> Service Accounts
- Click on CREATE SERVICE ACCOUNT .
- Set your Service Account name (i.e: RiveryUser)
- Set Roles. This option requires BigQuery Admin and Storage Admin roles.
- Copy your Account Service ID / Email from the service account list.
- Mark the checkbox of your service account in the service account list, and from the dropdown choose 'Manage Keys'.
- Click on the ADD KEY > Create new key. Select 'JSON' option. Your JSON secret key will download locally. Upload this key file to the Rivery connection.
- In the 'Default Bucket' input in the Rivery connection screen, input the Google Cloud Storage bucket name to use as your Custom File Zone. Note - if Rivery cannot detect your bucket list, this may be a permissions issue, see Known Issue section below. Otherwise, you can manually type the bucket name.
Additional Options
Data Type Widening (Schema Shift)
When the data type of a column ischanged in the Source table, the corresponding Target column will be adjusted to accommodate the "wider" data type by default.
For instance, if the original field changes from an Integer to a String, the target data type will be modified to a String since it canencapsulate integer values. On the other hand, if the column has been changed from a String to a Date, it will retain its String data type in the target.
The toggle located in the Additional Options section guarantees that these changes will take effect as the default behavior. If you wish to disable this option and cause the River to fail in the case of a data type mismatch, you can set the toggle to false.
Please Note:
To receive notifications about these changes, it is advisable to set the "On Warning" toggle to the true in the settings tab.
Known Issues
- Sometimes the "Storage Admin" type user role does not have a certain permission "storage.buckets.get" given to it by default.
In this case, you will have to edit your GCP user roles edit your GCP user roles by duplicating that "Storage Admin" role by clicking
making sure the custom role you create has the storage.buckets.get permission, then assigning your service account this custom role instead of the "Storage Admin" (See the "Create a Service Account User for Rivery" section of this document).
- Requirement for Consistent Locations for the Connection and Dataset ID in BigQuery:
When using Bigquery as a Target, it's crucial to ensure that the connection and Dataset ID originate from the same location (region). Failure to do so will result in a loading error displayed by Bigquery:Loading to BigQuery failed: BigQuery failed to load job. Error: Cannot read and write in different locations