Configure Custom FileZone for Databricks SQL
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Configure Custom FileZone for Databricks SQL

  • Dark
    Light
  • PDF

Requirements

FileZone Types
  • S3 (IAM role connection type only)

Creating a custom FileZone in Rivery gives you the opportunity to manage your data in your own S3 service, in your own AWS, as a staging area prior loading data into Databricks SQL.

Therefore, your data can be saved for at leas 24 hours in your S3 bucket. You can either use the FileZone bucket or objects as a base to other Hadoop or Spark operations by Amazon EMR, or by your other services.

Before you use this guide, please make sure you’ve signed up for AWS and you have a console admin user.

If you don’t have one of these prerequisites, you can start here.

Rivery needs an S3 bucket to be a FileZone before your data is loading up to Databricks SQL. You can either use the FileZone bucket or objects as a base to other Hadoop or spark operation by Amazon EMR, or by your other services.

Note: You can find the up to date documentation of S3 operations and getting started here .

Create an S3 Bucket

  1. Go to S3 Management in AWS Console .

  2. Click on Create Bucket
    image.png

  3. Give the bucket a name, and choose the same region your redshift will be on (in most of the cases - US-East (N.Virginia)).
    Use the S3 wizard defaults by reviewing and following the wizard screen, and click on Create Bucket .

image.png

Configure custom FileZone for Databricks in Rivery

Rivery uses Amazon S3 bucket to upload your source data into it and then push that data to Databricks SQL. Databricks SQL uses COPY INTO with Assuming Role mechanism on AWS. Therefore, there is a need to create a role in AWS, that will have permissions the relevant bucket and provide Rivery AWS account a permission to to get into the bucket. Creating an AWS role is in this case is mandatory in order to connect Databricks SQL with an custom filezone.

  1. Open your Databricks SQL Connection, by going to Connections->Create New Connection, and choose Databricks SQL.

  2. In the connection, check the Custom File Zone
    configure-custom-filezone-for-snowflake_mceclip011.png

  3. Create custom FileZone connection or you can create a new connection.

  4. Choose the region your bucket is configured to

  5. Under the Credential Type choose one of IAM Role - Automatic or IAM Role - Manual.
    image.png

  6. Follow the instructions of creating the IAM role for Rivery.

  7. Name your S3 File Zone Connection and Save.

  8. Now you can test your connection .

  9. After saving, Choose your default bucket for your FileZone area.
    Use the bucket you've created in above:
    configure-custom-filezone-for-snowflake_mceclip2_1.png

  10. Save the Databricks connection.


Was this article helpful?