Blueprint Components and Configuration
  • 5 Minutes to read
  • Dark
    Light
  • PDF

Blueprint Components and Configuration

  • Dark
    Light
  • PDF

Article summary

Blueprint and Copilot Now Available in Private Preview

We are excited to announce that our new Copilot and Blueprint features are now available to a select group of beta customers! This exclusive opportunity allows you to explore and test the latest capabilities of Blueprint before its official release.

If you're interested in joining the beta program, click here to request access.

How to Configure a Connector

In this document we'll provides a step-by-step guide on setting up a YAML source configuration using
GitHub's REST API as an example.

Connector Section

Connector Details

connector:
  name: GitHubConnector 
  type: rest
  • name: Identifies the connector, in this case, GitHubConnector.
  • type: Specifies that the connector type is rest, indicating it will communicate using RESTful API calls.

Base URL

base_url: 'https://api.github.com'
  • base_url: Sets the base URL for all API requests.

Default Headers

default_headers:
  Authorization: 'Basic {YOUR_AUTH}'
  X-GitHub-Api-Version: '{YOUR_X_GITHUB_API_VERSION}'
  User-Agent: '{YOUR_USER_AGENT}'
  • Authorization: Used for authentication. The API requires the auth token to be a part of the headers for each request.
  • X-GitHub-Api-Version: Specifies the version of the GitHub API to use.
  • User-Agent: Identifies the client making the request.

Variables Storages

variables_storages:
  - name: results dir
    type: file_system
    path: storage/results/filesystem
  - name: results memory
    type: memory
  • results dir: Storage for results in the filesystem, specified by path.
  • results memory: In-memory storage for temporary data during processing.

Variables Metadata

variables_metadata:
  page_number:
    storage_name: results memory
    format: json
  last_result:
    storage_name: results memory
    format: json
  final_output_file:
    storage_name: results dir
    format: json
  • page_number: Tracks the current page of results being processed.
  • last_result: Stores the results of the last API call.
  • final_output_file: Holds the final output data in a file.

Steps Section

Main Loop Step

- name: WhileLoopOverCommits
  description: Get all commits of a github repository - until end of them
  type: loop
  • name: Identifies the step as WhileLoopOverCommits.
  • description: Describes the purpose of the step.
  • type: Specifies that this step is a loop.

Loop Configuration

loop:
  type: while
  variable_name: page_number
  value: 1
  while_settings:
    operation_to_perform: add
    value_to_perform: 1
    max_iterations: 10000
  break_conditions:
    - name: BreakIfOutOfCommits
      condition:
        type: string_equal
        value: "[]"
      variable: "{{%last_result%}}"
  • type: Indicates a while loop.
  • variable_name: The loop uses page_number to keep track of the current page.
  • value: Initial value for page_number.
  • while_settings:
    • operation_to_perform: Operation to perform on each iteration (add).
    • value_to_perform: Amount to add to page_number each iteration.
    • max_iterations: Maximum number of iterations to prevent infinite loops.
  • break_conditions:
    • name: Identifies the break condition.
    • condition: Specifies the condition type (string_equal) and the value to compare ("[]").
    • variable: Uses the variable last_result to determine if there are no more commits.

Pagination Step

steps:
  - name: Pagination
    description: Retrieve a page of github repository commits
    endpoint: "{{%BASE_URL%}}/repos/{YOUR_REPOSITORY_OWNER}/{YOUR_REPOSITORY}/commits"
    http_method: GET
    type: rest
    query_params:
      page: "{{%page_number%}}"
    variables_output:
      - variable_name: last_result
        response_location: data
        variable_format: json
        overwrite_storage: true
      - variable_name: final_output_file
        response_location: data
        variable_format: json
        transformation_layers:
          - type: extract_json
            json_path: $.[*].commit
            from_type: json
  • name: Identifies the step as Pagination.
  • description: Describes the purpose of this step.
  • endpoint: Constructs the API endpoint using the base URL and repository details.
  • http_method: Specifies the HTTP method (GET).
  • query_params: Adds query parameters to the API call (page number).
  • variables_output:
    • last_result: Stores the API response data in last_result.
    • final_output_file: Saves the extracted commit data in final_output_file.
  • transformation_layers:
    • type: Type of transformation (extract_json).
    • json_path: JSON path to extract specific data ($.[*].commit).
    • from_type: Data format (json).

Retry Strategy

This section describes how to configure a retry strategy in a step within the YAML to handle API response errors, ensuring that your workflow can recover from transient issues by retrying the step a specified number of times.

Example YAML Code with Retry Strategy

steps:
  - description: Fetch all posts
    endpoint: "{{%BASE_URL%}}/posts"
    expected_status_codes:
      - 200
    method: GET
    name: FetchPosts
    type: rest
    retry_strategy:
      400:
        max_attempts: 2

Explanation of the Configuration

In this example, we define a step called FetchPosts that performs a GET request to fetch posts from the specified API endpoint. The key configuration here is the retry_strategy, which allows you to specify how the system should handle certain HTTP status codes by retrying the request.

Step Breakdown:

  • description: Provides a brief explanation of the step, in this case, "Fetch all posts."
  • endpoint: The API URL to which the request is sent. The value {{%BASE_URL%}}/posts dynamically pulls the base URL.
  • expected_status_codes: Specifies which status codes indicate success. Here, the only expected status code is 200.
  • method: Defines the HTTP method used for the request. In this case, GET is used to fetch data.
  • name: This is the name of the step, which can be used for identification and reporting purposes.
  • type: Indicates that this step is a REST API call (rest).

Retry Strategy Configuration:

  • retry_strategy: This defines how the system should handle specific error codes, enabling retries if a request fails.
    • 400: If the API returns an HTTP 400 (Bad Request) error, the retry strategy will trigger.
    • max_attempts: 2: This specifies that the system should retry the request up to 2 times before giving up.

How It Works:

  1. If the API request to {{%BASE_URL%}}/posts fails and returns a 400 Bad Request status code, Rivery will automatically retry the step up to 2 times.
  2. If the API still returns a 400 after the 2 retries, the step will fail, and the workflow will stop or proceed based on your configuration for handling failures.
  3. If a successful 200 OK status code is returned, the retry mechanism will not trigger, and the workflow will continue as normal.

Use Cases:

  • Handling temporary issues: A retry strategy is particularly useful when you're dealing with APIs that may intermittently return errors due to temporary issues, such as network problems or rate limits.
  • API reliability: By configuring retries, you can improve the reliability of your workflows, ensuring that transient errors don't cause the entire process to fail.

Retry Strategy:

You can configure the retry strategy for different status codes or increase the number of retry attempts for more robust error handling. For example:

retry_strategy:
  500:
    max_attempts: 3  # Retry 3 times on 500 Internal Server Errors
  429:
    max_attempts: 5  # Retry 5 times if the API returns 429 Too Many Requests

This flexible configuration ensures that your workflows can handle different failure scenarios, making your data integration pipelines more resilient.

Summary

  • Connector Section: Configures the API connection details, authentication, headers, and storage for variables.
  • Steps Section: Defines a loop to paginate through GitHub commits, retrieve data, and manage the loop's conditions and output handling.

Was this article helpful?

What's Next