Bootstrap FreeKB - Amazon Web Services (AWS) - Parse JSON using Athena
Amazon Web Services (AWS) - Parse JSON using Athena


This assumes you are familiar with AWS Athena. If not, check out my article FreeKB - Amazon Web Services (AWS) - Getting Started with Athena.

Let's say you have a file named foo.json in one of your S3 Buckets.

 

And let's say foo.txt contains the following lines.

{id: 1, message:"Hello"}
{id: 2, message:"World"}

 

This assumes you have already created a database in Athena. If not, check out my article FreeKB - Amazon Web Services (AWS) - Create Database in AWS Athena. Let's say you created a database named "mydatabase".

 

Now let's select the database you just created (mydatabase in this example) and create a table.

  • Since the id key in our JSON contains an integer we create the id INT column
  • Since the message key in our JSON contains a string we create the message STRING column
  • ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' is used to parse each line in the file as JSON
  • STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' is used so that the incoming data is prosed as text
  • OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' is TBD
  • LOCATION 's3://my-bucket-abc123/' is used to process all of the files at and below the root directory in the S3 Bucket containing your JSON files
CREATE EXTERNAL TABLE IF NOT EXISTS parse_json (
    id INT, 
    message STRING)
ROW FORMAT SERDE
  'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
  's3://my-bucket-abc123/';

 

Once the table has been created, the table and the columns in the table should be displayed in the left panel.

 

And the query result should contain each filed from each line in foo.json. Nice!

 




Did you find this article helpful?

If so, consider buying me a coffee over at Buy Me A Coffee



Comments


Add a Comment


Please enter 44bfb5 in the box below so that we can be sure you are a human.