Amazon Athena as a Target
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Amazon Athena as a Target

  • Dark
    Light
  • PDF

This is a step-by-step tutorial for setting Amazon Athena as a Target.

Please note:
Amazon Athena is currently available in beta. It is only accessible in Logic Rivers.

Overview

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Prerequisite

Create an AWS User for Rivery

Create a user for Rivery and grant it permission to manage and read the FileZone bucket by following these steps:

  1. Sign in to the AWS Management Console and open the Amazon S3 console.

  2. Navigate to Users - > Add User.

image.png

  1. In the console, set your username, and check the Access type to Programmatic access.

image.png

  1. Click Next: Permissions.

  2. In the Set Permissions form select Attach Existing Policies Directly.

  3. Click Create Policy.

image.png

  1. Navigate to the JSON tab.

image.png

  1. Copy and paste the following policy:
    Note:
    Replace <RiveryFileZoneBucket> with the name of your S3 bucket.
{
 "Version":"2012-10-17",
 "Statement":[
   {
    "Sid":"RiveryManageFZBucket",
    "Effect":"Allow",
    "Action":[
    "s3:GetBucketCORS",
    "s3:ListBucket",
    "s3:GetBucketLocation"
     ],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>"
   },
   {
    "Sid":"RiveryManageFZObjects",
    "Effect":"Allow",
    "Action":[
      "s3:ReplicateObject",
      "s3:PutObject",
      "s3:GetObjectAcl",
      "s3:GetObject",
      "s3:PutObjectVersionAcl",
      "s3:PutObjectAcl",
      "s3:ListMultipartUploadParts"],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>/*"
  },
  {
     "SID": "AthenaAccess",
     "Effect": "Allow",
     "Action": [
        "athena:*",
        "glue:CreateDatabase",
        "glue:DeleteDatabase",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:UpdateDatabase",
        "glue:CreateTable",
        "glue:DeleteTable",
        "glue:BatchDeleteTable",
        "glue:UpdateTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:BatchCreatePartition",
        "glue:CreatePartition",
        "glue:DeletePartition",
        "glue:BatchDeletePartition",
        "glue:UpdatePartition",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition"],
     "Resource": [ "*" ]
   },
  {
     "Sid":"RiveryHeadBucketsAndGetLists",
     "Effect":"Allow",
     "Action":"s3:ListAllMyBuckets",
     "Resource":"*"
  }
 ]
}
  1. Select Review Policy.

  2. Give the Policy a name and click Create Policy.

  3. Using the rounded arrows on the upper right, refresh the list of policies, check the policy you just created, and select Next: Tags.

image.png

  1. Click Next: Tags, Next: Review and then Create User to complete the process.

  2. In the summary screen, you'll find the user's AWS credentials (Access key id and Secret access key), which you can download as a CSV file (this is the only time you'll be able to do so).

image.png

  1. The User should now be able to manage and read the bucket that was created for Rivery. Please double-check that the policy you created is linked to the user you created.

create-an-aws-iam-user_mceclip63.png


Create a Workgroup in Amazon Athena

Follow the Amazon Athena User Guide to create a Workgroup.

Please Note:
Make sure the Override client-side settings box is not checked.
image.png


Connection Procedure

  1. Enter the Rivery Console and select +Create from the menu on the upper left.

  2. Choose Create Logic River.

  3. Select SQL / DB Tranformation as the Logic step.

  4. Clicking the Target Section will open a drop-down menu where you can choose Amazon Athena.
    image.png

  5. Click the +New Connection button.

  6. Type in the Connection Name.

  7. Enter Your AWS Access key and AWS Access Secret.

  8. From the drop-down menu, choose your Region.

  9. Specify the name of your Workgroup.

  10. Use the Test Connection function to see if your connection is up to the task.
    If the connection succeeded, you can now use this connection in Rivery.

image.png


Configuration Process

  1. Now that you've established a Connection, enter your Connection Name.
  2. In the Source section, type your SQL query or SQL script.

Please Note:

  • The SQL script is not encrypted in any way. Please avoid using credentials in the script.
  • The SQL script runs according to user configurations. Users are responsible for any changes to the table, schema, or data that occur due to the script.

image.png

  1. in the Target section, select Table or Variable.
  • If you selected 'Variable', select the variable's type and then, from the drop-down list, choose the variable you want to use as a Target.
  • if 'Table' was your choice, move on to step 5.
  1. Click the Bucket Name section and pick the bucket you want to load into from the drop-down list.
  2. Enter the FileZone path from your S3 storage. Here's an example:

image.png

  1. Click the Schema section and select the schema you want from the drop-down list.
  2. Type in your Table Name.
  3. Select Overwrite as the loading mode.
    Note: The table will be reset after you run your River.
  4. Set the period Partition Type and Granularity for a bucket.
Please Note:

A new column with the selected granularity value will be added to the target table in Athena. Here's an example:
Assume you want to partition data in our mapping with a granularity of one day based on the "order_date" column. This column is of type DATE. After a successful River run on the selected Target table, you should see in the Athena console, a new column with the name "order_date_day" and the value of the day from "order_date." The granularity of this partitioned column will be just the day.
image.png

  1. Now simply click the Column Mapping button.
  2. Configure the Columns to your preferences as they will appear in your Target.

image.png

Note:

  • To add a new field to your Target, select +Add Field.
  • If a Bucket is selected next to a particular field, you would be able to specify the number of buckets to target that field by clicking Save and entering a Number of Buckets in a newly created section in the Target form.
    The range of defined buckets is 2 to 100.
  1. Click Save.
  2. You can now Select Run to run your River.

Was this article helpful?