Datastore Walkthrough
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Datastore Walkthrough

  • Dark
    Light
  • PDF

You can extract Datasource data via Rivery in one of the following options:

  1. Query by Kind
  2. Query by GQL 

Query by Kind

This option is similar to the Datastore console, where it is possible to browse a selected kind in Datastore.

  1. Kind - Insert the name of the kind to pull the data from. The kind name is case sensitive: Make sure you insert the exact same name of the kind in the Datastore.
  2. Extract method- Rivery allows users to pull data in one of two extract methods:
    • All - Rivery will pull all the available data from the selected kind.
    • Incremental - Rivery will pull the data according to the chosen increments. 
  3. If selecting Incremental:
    • Timestamp - if your field is a standard timestamp field in Datastore.
    • Epoch - if your field is an Integer field that represents epoch time.

    1. Incremental Field- Filter the data by this field. Important: Make sure this field is indexed in the selected Datastore kind. If not, you won’t be able to filter the data according to the incremental field chosen. 
    2. Incremental type- Timestamp / epoch
    3. Start date & end date - Select the start datetime and end datetime.
    4. Leave the date empty in order to pull the data up until the current moment. Use “Include end value” if you are running the river until some end datetime, and if it is necessary to pull the data including the given end datetime. 
    5. Interval chunks size - Use the interval chunks size if you are pulling a large amount of data (for a long period of time) to reduce the amount of data pulled by Datastore in each request. 

Important: when running the river using the incremental extract method, each run of the river updates the start date with the end date of the current run. This allows the next run to start from where the last run stopped. 

Filters - Add filters in order to filter the data to pull from the selected kind.

Please notice that it’s possible to filter the data only by indexed columns.

In each filter, select its columns, operator and type, and insert the value to filter by.

Get ancestors - Checking this input will return the ancestors' information of each entity in the results. If an entity has more than one ancestor, it will contain only the first level. 

More information about ancestors in Datastore can be found in this link.

 

Query by GQL

This option is similar to the Datastore console for querying by GQL

  1. Insert a valid GQL query in the “query text box”. It is recommended to first check that the query returns results in the Datastore console.
  2. Get ancestors - Checking this input will return the ancestors' information of each entity in the results. If an entity has more than one ancestor, it will contain only the first level. 

More information about ancestors in Datastore can be found in this link.

 

When using GQL, it is not possible to run the river in an incremental fashion. 

It is recommended to use the GQL to pull small tables from Datastore and to test the data. 

Usually,the GQL will be used when users need to run an advanced query that manipulates and aggregates the data. Otherwise, it is recommended to use the “query by kind” option.

 

Mapping attributes

The mapping attributes is the way to control how Rivery pulls the data from the selected Datastore kind.

1. Get the attributes mapping of the data 

Click on auto mapping. Rivery will sample the data and will return a list of the columns in the selected kind or the given query. 

Each column contains its name, type and mode (single value or array), and also an indication if the column is indexed in Datastore. Columns that are objects will contain all of its nested columns under its name.

Since the mapping is based on a sample of the data, there is a chance that some required column won’t be in the mapping results (as this column wasn’t in the sample).

However, it is absolutely possible to add more columns - make sure the names are exactly the same as it is in Datastore and that the type is equal. 

2. Convert data types 

Rivery supports converting data types of selected columns. For example, if one of the columns is an array or an object, and for some reason,you’d like to load it in Rivery as a String, change the type of this column to String. These columns will be pulled as a String from Datastore and will be uploaded to the target table as a String as well. It is possible to change any other data type to String as well. 

Converting columns to String can be useful in order to fix mapping errors when trying to load the data to the target table. For example: if some columns are a Boolean type in the mapping results, but in some places in the data the columns are actually a String, you won’t be able to load this data and Rivery will return an error. A quick and easy solution for this is to convert these columns to String, and all the values will be loaded as String, no matter what the actual type is in the data source.  

3. Select only part of the columns

Rivery supports changing the selection of the columns to pull from Datastore. Removing or adding columns to the list will affect the query sent to Datastore, and the results will contain the selected columns. 

Important notice: Since Datastore works with indexes and every query should have a supported index, it is not possible to select any combination of fields. If you select only some of the columns in the kind that don’t have an appropriate index in the Datastore, Rivery will return an error saying that it is not possible to pull these columns. When running the API with all the columns, no index is required as the query is based on the key of the index. 

Important: After any change in the Attributes mapping, Rivery will advise users to refresh the mapping of the target table (step 3). In order to load all the fields in the attributes mapping to the target table in step 1, you must refresh the mapping.



Was this article helpful?