Blueprint Components and Configuration
How to Configure a Connector

In this document we'll provides a step-by-step guide on setting up a YAML source configuration using
GitHub's REST API as an example.

Connector Section

Connector Details

  name: GitHubConnector 
  type: rest
  • name: Identifies the connector, in this case, GitHubConnector.
  • type: Specifies that the connector type is rest, indicating it will communicate using RESTful API calls.

Base URL

base_url: ''
  • base_url: Sets the base URL for all API requests.

Default Headers

  Authorization: 'Basic {YOUR_AUTH}'
  X-GitHub-Api-Version: '{YOUR_X_GITHUB_API_VERSION}'
  User-Agent: '{YOUR_USER_AGENT}'
  • Authorization: Used for authentication. The API requires the auth token to be a part of the headers for each request.
  • X-GitHub-Api-Version: Specifies the version of the GitHub API to use.
  • User-Agent: Identifies the client making the request.

Variables Storages

  - name: results dir
    type: file_system
    path: storage/results/filesystem
  - name: results memory
    type: memory
  • results dir: Storage for results in the filesystem, specified by path.
  • results memory: In-memory storage for temporary data during processing.

Variables Metadata

    storage_name: results memory
    format: json
    storage_name: results memory
    format: json
    storage_name: results dir
    format: json
  • page_number: Tracks the current page of results being processed.
  • last_result: Stores the results of the last API call.
  • final_output_file: Holds the final output data in a file.

Steps Section

Main Loop Step

- name: WhileLoopOverCommits
  description: Get all commits of a github repository - until end of them
  type: loop
  • name: Identifies the step as WhileLoopOverCommits.
  • description: Describes the purpose of the step.
  • type: Specifies that this step is a loop.

Loop Configuration

  type: while
  variable_name: page_number
  value: 1
    operation_to_perform: add
    value_to_perform: 1
    max_iterations: 10000
    - name: BreakIfOutOfCommits
        type: string_equal
        value: "[]"
      variable: "{{%last_result%}}"
  • type: Indicates a while loop.
  • variable_name: The loop uses page_number to keep track of the current page.
  • value: Initial value for page_number.
  • while_settings:
    • operation_to_perform: Operation to perform on each iteration (add).
    • value_to_perform: Amount to add to page_number each iteration.
    • max_iterations: Maximum number of iterations to prevent infinite loops.
  • break_conditions:
    • name: Identifies the break condition.
    • condition: Specifies the condition type (string_equal) and the value to compare ("[]").
    • variable: Uses the variable last_result to determine if there are no more commits.

Pagination Step

  - name: Pagination
    description: Retrieve a page of github repository commits
    endpoint: "{{%BASE_URL%}}/repos/{YOUR_REPOSITORY_OWNER}/{YOUR_REPOSITORY}/commits"
    http_method: GET
    type: rest
      page: "{{%page_number%}}"
      - variable_name: last_result
        response_location: data
        variable_format: json
        overwrite_storage: true
      - variable_name: final_output_file
        response_location: data
        variable_format: json
          - type: extract_json
            json_path: $.[*].commit
            from_type: json
  • name: Identifies the step as Pagination.
  • description: Describes the purpose of this step.
  • endpoint: Constructs the API endpoint using the base URL and repository details.
  • http_method: Specifies the HTTP method (GET).
  • query_params: Adds query parameters to the API call (page number).
  • variables_output:
    • last_result: Stores the API response data in last_result.
    • final_output_file: Saves the extracted commit data in final_output_file.
  • transformation_layers:
    • type: Type of transformation (extract_json).
    • json_path: JSON path to extract specific data ($.[*].commit).
    • from_type: Data format (json).

Retry Strategy

This section describes how to configure a retry strategy in a step within the YAML to handle API response errors, ensuring that your workflow can recover from transient issues by retrying the step a specified number of times.

Example YAML Code with Retry Strategy

  - description: Fetch all posts
    endpoint: "{{%BASE_URL%}}/posts"
      - 200
    method: GET
    name: FetchPosts
    type: rest
        max_attempts: 2

Explanation of the Configuration

In this example, we define a step called FetchPosts that performs a GET request to fetch posts from the specified API endpoint. The key configuration here is the retry_strategy, which allows you to specify how the system should handle certain HTTP status codes by retrying the request.

Step Breakdown:

  • description: Provides a brief explanation of the step, in this case, "Fetch all posts."
  • endpoint: The API URL to which the request is sent. The value {{%BASE_URL%}}/posts dynamically pulls the base URL.
  • expected_status_codes: Specifies which status codes indicate success. Here, the only expected status code is 200.
  • method: Defines the HTTP method used for the request. In this case, GET is used to fetch data.
  • name: This is the name of the step, which can be used for identification and reporting purposes.
  • type: Indicates that this step is a REST API call (rest).

Retry Strategy Configuration:

  • retry_strategy: This defines how the system should handle specific error codes, enabling retries if a request fails.
    • 400: If the API returns an HTTP 400 (Bad Request) error, the retry strategy will trigger.
    • max_attempts: 2: This specifies that the system should retry the request up to 2 times before giving up.

How It Works:

  1. If the API request to {{%BASE_URL%}}/posts fails and returns a 400 Bad Request status code, Rivery will automatically retry the step up to 2 times.
  2. If the API still returns a 400 after the 2 retries, the step will fail, and the workflow will stop or proceed based on your configuration for handling failures.
  3. If a successful 200 OK status code is returned, the retry mechanism will not trigger, and the workflow will continue as normal.

Use Cases:

  • Handling temporary issues: A retry strategy is particularly useful when you're dealing with APIs that may intermittently return errors due to temporary issues, such as network problems or rate limits.
  • API reliability: By configuring retries, you can improve the reliability of your workflows, ensuring that transient errors don't cause the entire process to fail.

Retry Strategy:

You can configure the retry strategy for different status codes or increase the number of retry attempts for more robust error handling. For example:

    max_attempts: 3  # Retry 3 times on 500 Internal Server Errors
    max_attempts: 5  # Retry 5 times if the API returns 429 Too Many Requests

This flexible configuration ensures that your workflows can handle different failure scenarios, making your data integration pipelines more resilient.


  • Connector Section: Configures the API connection details, authentication, headers, and storage for variables.
  • Steps Section: Defines a loop to paginate through GitHub commits, retrieve data, and manage the loop's conditions and output handling.

