Quick Guide to Converting a CSV File to Parquet
  • 1 Minute to read
  • Dark
    Light
  • PDF

Quick Guide to Converting a CSV File to Parquet

  • Dark
    Light
  • PDF

This is a step-by-step guide for using Rivery to convert a CSV file to Parquet in Amazon S3.

You will need the following to do so:

  • Bucket
  • Policy
  • Rivery User in AWS

Bucket

A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket.
Let's begin by logging into AWS and searching for Buckets:

Note:
This is a tour of the console. Please hover over the rippling dots and read the notes attached to follow through.

Policy

A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it.
Now that you've created a bucket, let's create a policy to grant the necessary permissions:


Here's the policy's code:

{
 "Version":"2012-10-17",
 "Statement":[
   {
    "Sid":"RiveryManageFZBucket",
    "Effect":"Allow",
    "Action":[
    "s3:GetBucketCORS",
    "s3:ListBucket",
    "s3:GetBucketLocation"
     ],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>"
   },
   {
    "Sid":"RiveryManageFZObjects",
    "Effect":"Allow",
    "Action":[
      "s3:ReplicateObject",
      "s3:PutObject",
      "s3:GetObjectAcl",
      "s3:GetObject",
      "s3:PutObjectVersionAcl",
      "s3:PutObjectAcl",
      "s3:ListMultipartUploadParts"],
    "Resource":"arn:aws:s3:::<RiveryFileZoneBucket>/*"
  },
  {
     "Sid":"RiveryHeadBucketsAndGetLists",
     "Effect":"Allow",
     "Action":"s3:ListAllMyBuckets",
     "Resource":"*"
  }
 ]
}

Rivery User in AWS

Now, in order to connect to the Amazon S3 Source and Target (described in the following section) in Rivery console, you must first create an AWS Rivery user:



Converting with Rivery

After you've completed all of the necessary AWS configurations, you'll need to create a Rivery Account in order to connect to the Rivery Console. Then, using Rivery's feature, you'll be able to convert the CSV file to Parquet:


That's all there is to it; you've completed the quick guide and successfully converted the file.


Was this article helpful?