Configure Custom FileZone for Databricks SQL
  • 2 Minutes to read
  • Dark
  • PDF

Configure Custom FileZone for Databricks SQL

  • Dark
  • PDF


FileZone Types
  • S3 (IAM role connection type only)

Creating a custom FileZone in Rivery gives you the opportunity to manage your data in your own S3 service, in your own AWS, as a staging area prior loading data into Databricks SQL.

Therefore, your data can be saved for at leas 24 hours in your S3 bucket. You can either use the FileZone bucket or objects as a base to other Hadoop or Spark operations by Amazon EMR, or by your other services.

Before you use this guide, please make sure you’ve signed up for AWS and you have a console admin user.

If you don’t have one of these prerequisites, you can start here.

Rivery needs an S3 bucket to be a FileZone before your data is loading up to Databricks SQL. You can either use the FileZone bucket or objects as a base to other Hadoop or spark operation by Amazon EMR, or by your other services.

Note: You can find the up to date documentation of S3 operations and getting started here .

Create an S3 Bucket

  1. Go to S3 Management in AWS Console .

  2. Click on Create Bucket

  3. Give the bucket a name, and choose the same region your redshift will be on (in most of the cases - US-East (N.Virginia)).
    Use the S3 wizard defaults by reviewing and following the wizard screen, and click on Create Bucket .


Configure custom FileZone for Databricks in Rivery

Rivery uses Amazon S3 bucket to upload your source data into it and then push that data to Databricks SQL. Databricks SQL uses COPY INTO with Assuming Role mechanism on AWS. Therefore, there is a need to create a role in AWS, that will have permissions the relevant bucket and provide Rivery AWS account a permission to to get into the bucket. Creating an AWS role is in this case is mandatory in order to connect Databricks SQL with an custom filezone.

  1. Open your Databricks SQL Connection, by going to Connections->Create New Connection, and choose Databricks SQL.

  2. In the connection, check the Custom File Zone

  3. Create custom FileZone connection or you can create a new connection.

  4. Choose the region your bucket is configured to

  5. Under the Credential Type choose one of IAM Role - Automatic or IAM Role - Manual.

  6. Follow the instructions of creating the IAM role for Rivery.

  7. Name your S3 File Zone Connection and Save.

  8. Now you can test your connection .

  9. After saving, Choose your default bucket for your FileZone area.
    Use the bucket you've created in above:

  10. Save the Databricks connection.

Was this article helpful?