- 5 Minutes to read
- Print
- DarkLight
- PDF
Blueprint Components and Configuration
- 5 Minutes to read
- Print
- DarkLight
- PDF
Blueprint and Copilot Now Available in Private Preview
We are excited to announce that our new Copilot and Blueprint features are now available to a select group of beta customers! This exclusive opportunity allows you to explore and test the latest capabilities of Blueprint before its official release.
If you're interested in joining the beta program, click here to request access.
How to Configure a Connector
In this document we'll provides a step-by-step guide on setting up a YAML source configuration using
GitHub's REST API as an example.
Connector Section
Connector Details
connector:
name: GitHubConnector
type: rest
- name: Identifies the connector, in this case,
GitHubConnector
. - type: Specifies that the connector type is
rest
, indicating it will communicate using RESTful API calls.
Base URL
base_url: 'https://api.github.com'
- base_url: Sets the base URL for all API requests.
Default Headers
default_headers:
Authorization: 'Basic {YOUR_AUTH}'
X-GitHub-Api-Version: '{YOUR_X_GITHUB_API_VERSION}'
User-Agent: '{YOUR_USER_AGENT}'
- Authorization: Used for authentication. The API requires the auth token to be a part of the headers for each request.
- X-GitHub-Api-Version: Specifies the version of the GitHub API to use.
- User-Agent: Identifies the client making the request.
Variables Storages
variables_storages:
- name: results dir
type: file_system
path: storage/results/filesystem
- name: results memory
type: memory
- results dir: Storage for results in the filesystem, specified by path.
- results memory: In-memory storage for temporary data during processing.
Variables Metadata
variables_metadata:
page_number:
storage_name: results memory
format: json
last_result:
storage_name: results memory
format: json
final_output_file:
storage_name: results dir
format: json
- page_number: Tracks the current page of results being processed.
- last_result: Stores the results of the last API call.
- final_output_file: Holds the final output data in a file.
Steps Section
Main Loop Step
- name: WhileLoopOverCommits
description: Get all commits of a github repository - until end of them
type: loop
- name: Identifies the step as
WhileLoopOverCommits
. - description: Describes the purpose of the step.
- type: Specifies that this step is a loop.
Loop Configuration
loop:
type: while
variable_name: page_number
value: 1
while_settings:
operation_to_perform: add
value_to_perform: 1
max_iterations: 10000
break_conditions:
- name: BreakIfOutOfCommits
condition:
type: string_equal
value: "[]"
variable: "{{%last_result%}}"
- type: Indicates a
while
loop. - variable_name: The loop uses
page_number
to keep track of the current page. - value: Initial value for
page_number
. - while_settings:
- operation_to_perform: Operation to perform on each iteration (
add
). - value_to_perform: Amount to add to
page_number
each iteration. - max_iterations: Maximum number of iterations to prevent infinite loops.
- operation_to_perform: Operation to perform on each iteration (
- break_conditions:
- name: Identifies the break condition.
- condition: Specifies the condition type (
string_equal
) and the value to compare ("[]"
). - variable: Uses the variable
last_result
to determine if there are no more commits.
Pagination Step
steps:
- name: Pagination
description: Retrieve a page of github repository commits
endpoint: "{{%BASE_URL%}}/repos/{YOUR_REPOSITORY_OWNER}/{YOUR_REPOSITORY}/commits"
http_method: GET
type: rest
query_params:
page: "{{%page_number%}}"
variables_output:
- variable_name: last_result
response_location: data
variable_format: json
overwrite_storage: true
- variable_name: final_output_file
response_location: data
variable_format: json
transformation_layers:
- type: extract_json
json_path: $.[*].commit
from_type: json
- name: Identifies the step as
Pagination
. - description: Describes the purpose of this step.
- endpoint: Constructs the API endpoint using the base URL and repository details.
- http_method: Specifies the HTTP method (
GET
). - query_params: Adds query parameters to the API call (
page
number). - variables_output:
- last_result: Stores the API response data in
last_result
. - final_output_file: Saves the extracted commit data in
final_output_file
.
- last_result: Stores the API response data in
- transformation_layers:
- type: Type of transformation (
extract_json
). - json_path: JSON path to extract specific data (
$.[*].commit
). - from_type: Data format (
json
).
- type: Type of transformation (
Retry Strategy
This section describes how to configure a retry strategy in a step within the YAML to handle API response errors, ensuring that your workflow can recover from transient issues by retrying the step a specified number of times.
Example YAML Code with Retry Strategy
steps:
- description: Fetch all posts
endpoint: "{{%BASE_URL%}}/posts"
expected_status_codes:
- 200
method: GET
name: FetchPosts
type: rest
retry_strategy:
400:
max_attempts: 2
Explanation of the Configuration
In this example, we define a step called FetchPosts that performs a GET request to fetch posts from the specified API endpoint. The key configuration here is the retry_strategy, which allows you to specify how the system should handle certain HTTP status codes by retrying the request.
Step Breakdown:
- description: Provides a brief explanation of the step, in this case, "Fetch all posts."
- endpoint: The API URL to which the request is sent. The value
{{%BASE_URL%}}/posts
dynamically pulls the base URL. - expected_status_codes: Specifies which status codes indicate success. Here, the only expected status code is
200
. - method: Defines the HTTP method used for the request. In this case,
GET
is used to fetch data. - name: This is the name of the step, which can be used for identification and reporting purposes.
- type: Indicates that this step is a REST API call (
rest
).
Retry Strategy Configuration:
- retry_strategy: This defines how the system should handle specific error codes, enabling retries if a request fails.
- 400: If the API returns an HTTP
400
(Bad Request) error, the retry strategy will trigger. - max_attempts: 2: This specifies that the system should retry the request up to 2 times before giving up.
- 400: If the API returns an HTTP
How It Works:
- If the API request to
{{%BASE_URL%}}/posts
fails and returns a400 Bad Request
status code, Rivery will automatically retry the step up to 2 times. - If the API still returns a
400
after the 2 retries, the step will fail, and the workflow will stop or proceed based on your configuration for handling failures. - If a successful
200 OK
status code is returned, the retry mechanism will not trigger, and the workflow will continue as normal.
Use Cases:
- Handling temporary issues: A retry strategy is particularly useful when you're dealing with APIs that may intermittently return errors due to temporary issues, such as network problems or rate limits.
- API reliability: By configuring retries, you can improve the reliability of your workflows, ensuring that transient errors don't cause the entire process to fail.
Retry Strategy:
You can configure the retry strategy for different status codes or increase the number of retry attempts for more robust error handling. For example:
retry_strategy:
500:
max_attempts: 3 # Retry 3 times on 500 Internal Server Errors
429:
max_attempts: 5 # Retry 5 times if the API returns 429 Too Many Requests
This flexible configuration ensures that your workflows can handle different failure scenarios, making your data integration pipelines more resilient.
Summary
- Connector Section: Configures the API connection details, authentication, headers, and storage for variables.
- Steps Section: Defines a loop to paginate through GitHub commits, retrieve data, and manage the loop's conditions and output handling.