Datastore Walkthrough
  • 5 Minutes to read
  • Dark
    Light
  • PDF

Datastore Walkthrough

  • Dark
    Light
  • PDF

Article Summary

You can extract Datasource data via Rivery in one of the following options:

  1. Query by Kind
  2. Query by GQL 

Query by Kind

This option is similar to the Datastore console, where it is possible to browse a selected kind in Datastore.

  1. Kind - Insert the name of the kind to pull the data from. The kind name is case sensitive: Make sure you insert the exact same name of the kind in the Datastore.
  2. Extract method- Rivery allows users to pull data in one of two extract methods:
    • All - Rivery will pull all the available data from the selected kind.
    • Incremental - Rivery will pull the data according to the chosen increments. 
  3. If selecting Incremental:
    • Timestamp - if your field is a standard timestamp field in Datastore.
    • Epoch - if your field is an Integer field that represents epoch time.

    1. Incremental Field- Filter the data by this field. Important: Make sure this field is indexed in the selected Datastore kind. If not, you won’t be able to filter the data according to the incremental field chosen. 
    2. Incremental type- Timestamp / epoch
    3. Start date & end date - Select the start datetime and end datetime.
    4. Leave the date empty in order to pull the data up until the current moment. Use “Include end value” if you are running the river until some end datetime, and if it is necessary to pull the data including the given end datetime. 
    5. Interval chunks size - Use the interval chunks size if you are pulling a large amount of data (for a long period of time) to reduce the amount of data pulled by Datastore in each request. 

Important: when running the river using the incremental extract method, each run of the river updates the start date with the end date of the current run. This allows the next run to start from where the last run stopped. 

Filters - Add filters in order to filter the data to pull from the selected kind.

Please notice that it’s possible to filter the data only by indexed columns.

In each filter, select its columns, operator and type, and insert the value to filter by.

Get ancestors - Checking this input will return the ancestors' information of each entity in the results. If an entity has more than one ancestor, it will contain only the first level. 

More information about ancestors in Datastore can be found in this link.

 

Query by GQL

GQL, or Google Query Language, is a query language for the Google Cloud Datastore. It is used to retrieve and manipulate data held in Datastore, which is a NoSQL document database. GQL is similar to SQL in that it allows users to specify criteria for data selection and manipulation.

With the help ofQuery Parameters, you can use operators in your query to, for instance, exclude specific rows and columns from your Target table.

To do this, follow the instructions below:

  1.  Include the value you intend to use in the list of Query Parameters.
  2.  Enter the parameter's Key, Value, and Type in the appropriate fields.
  3.  Insert a valid GQL query in the “Query" text box.
  4.  If you wish to pull the data Ancestors per each entity, toggle the option to true.
  5.  Run the River.


Here's an example:


Please Note:

  • It is advised to first verify in the Datastore console that the query returns results by pulling a few sample tables from Datastore using GQL.
  • Running the River incrementally is not possible when using GQL.
  • Get Ancestors - The ancestors of each entity in the results will be displayed if this box is checked. An entity will only contain the first level if it has more than one ancestor.
  • The GQL is typically used when users need to execute a complex query that aggregates and manipulates the data. It is advised to use the "Query by Kind" option in all other cases.

 



Mapping attributes

The mapping attributes is the way to control how Rivery pulls the data from the selected Datastore kind.

1. Get the attributes mapping of the data 

Click on auto mapping. Rivery will sample the data and will return a list of the columns in the selected kind or the given query. 

Each column contains its name, type and mode (single value or array), and also an indication if the column is indexed in Datastore. Columns that are objects will contain all of its nested columns under its name.

Since the mapping is based on a sample of the data, there is a chance that some required column won’t be in the mapping results (as this column wasn’t in the sample).

However, it is absolutely possible to add more columns - make sure the names are exactly the same as it is in Datastore and that the type is equal. 

2. Convert data types 

Rivery supports converting data types of selected columns. For example, if one of the columns is an array or an object, and for some reason,you’d like to load it in Rivery as a String, change the type of this column to String. These columns will be pulled as a String from Datastore and will be uploaded to the target table as a String as well. It is possible to change any other data type to String as well. 

Converting columns to String can be useful in order to fix mapping errors when trying to load the data to the target table. For example: if some columns are a Boolean type in the mapping results, but in some places in the data the columns are actually a String, you won’t be able to load this data and Rivery will return an error. A quick and easy solution for this is to convert these columns to String, and all the values will be loaded as String, no matter what the actual type is in the data source.  

3. Select only part of the columns

Rivery supports changing the selection of the columns to pull from Datastore. Removing or adding columns to the list will affect the query sent to Datastore, and the results will contain the selected columns. 

Important notice: Since Datastore works with indexes and every query should have a supported index, it is not possible to select any combination of fields. If you select only some of the columns in the kind that don’t have an appropriate index in the Datastore, Rivery will return an error saying that it is not possible to pull these columns. When running the API with all the columns, no index is required as the query is based on the key of the index. 

Important: After any change in the Attributes mapping, Rivery will advise users to refresh the mapping of the target table (step 3). In order to load all the fields in the attributes mapping to the target table in step 1, you must refresh the mapping.



Was this article helpful?

What's Next