Lädt...


🔧 Streamline your data analytics with AWS Athena queries and Terraform


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Part I of this article will detail how to streamline your data analytics with AWS Athena using Terraform Infrastructure as Code.

Requirements

  • AWS account
  • Terraform

Walkthrough

Configure AWS credentials with aws configure.

At the root of our project, create a file named provider.tf which we will specify the aws provider. Also we are going to read dynamically our columns and declare a variable for each column.

#provider.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.59.0"
    }
  }
}

Next, let's define an S3 bucket in a file named s3.tf which we will use as a source of data.

#s3.tf
resource "aws_s3_bucket" "log_bucket" {
  bucket = "my-tf-log-bucket-source-athena"
}

Now we will define a file named athena.tf, in this file we are going to create an athena workgroup,db, table, and our query.

For our aws_glue_catalog_table we are expecting the table to serealize a json object.

Note: Is important to note that our json files should have no spaces, otherwise it won't parse well other alternative could be the use of LazySimpleSerDe which could convert from byte stream

#athena.tf
resource "aws_glue_catalog_database" "myservice_athena_db" {
  name = "myservice"
}

resource "aws_glue_catalog_table" "athena_table" {
 name          = "mytable"
 database_name = aws_glue_catalog_database.myservice_athena_db.name
 description   = "Table containing the results stored in S3 as source"


 table_type = "EXTERNAL_TABLE"


 storage_descriptor {
   location      = "s3://${aws_s3_bucket.log_bucket.bucket}/mydata"
   input_format  = "org.apache.hadoop.mapred.TextInputFormat"
   output_format = "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"


   ser_de_info {
     name                  = "s3-stream"
     serialization_library = "org.openx.data.jsonserde.JsonSerDe"


     parameters = {
       "ignore.malformed.json" = "TRUE"
       "dots.in.keys"          = "FALSE"
       "case.insensitive"      = "TRUE"
       "mapping"               = "TRUE"
     }
   }


   dynamic "columns" {
     for_each = var.columns


     iterator = column
     content {
       name = column.value.name
       type = column.value.type
     }
   }
 }
}

resource "aws_athena_workgroup" "test" {
  name = "example123123s"

  configuration {
    enforce_workgroup_configuration    = true
    publish_cloudwatch_metrics_enabled = true

    result_configuration {
      output_location = "s3://${aws_s3_bucket.log_bucket.bucket}/output/"

    }
  }
}

resource "aws_athena_named_query" "athena_query" {
  name      = "test_query"
  workgroup = aws_athena_workgroup.test.id
  database  = aws_glue_catalog_database.myservice_athena_db.name
  query     = "SELECT json_extract_scalar(person, '$.name') as name, product as source FROM myservice.mytable;"
}

Our json example should look like this (but without spaces)

{
  "product": "Live JSON generator",
  "version": 3.1,
  "releaseDate": "2014-06-25T00:00:00.000Z",
  "demo": true,
  "person": {
    "id": 12345,
    "name": "John Doe",
    "phones": {
      "home": "800-123-4567",
      "mobile": "877-123-1234"
    },
    "email": [
      "[email protected]",
      "[email protected]"
    ],
    "dateOfBirth": "1980-01-02T00:00:00.000Z",
    "registered": true,
    "emergencyContacts": [
      {
        "name": "Jane Doe",
        "phone": "888-555-1212",
        "relationship": "spouse"
      },
      {
        "name": "Justin Doe",
        "phone": "877-123-1212",
        "relationship": "parent"
      }
    ]
  }
}

In our variables.tf we will dynamically call all the columns

#variables.tf
variable "columns" {
  type = list(object({
    name = string,
    type = string,
  }))
  default     = []
  description = "The columns in the table, where the key is the name of the column and the value the type"
}
#terraform.tfvars
columns = [
    {
      name = "product"
      type = "string"
    },

     {
      name = "version"
      type = "int"
    },

     {
      name = "releaseDate"
      type = "string"
    },

     { 
      name = "demo" 
      type = "boolean" 
    } ,
     {
      name = "person"
      type = "string"
    }
]

After finishing our files let's use terraform to create our resources.

terraform init

Plan init

terraform plan

Plan creation

terraform apply

Plan apply
Update our data.json file through s3 commands or directly in the interface

aws s3 cp data.json s3://<<your_bucket_name>>/mydata/

Now in Athena service in our AWS account we can see our db, table and saved query, running our query should return a value like this.

Athena UI
And save the results in our bucket.

Bucket

...

🔧 Streamline your data analytics with AWS Athena queries and Terraform


📈 85.72 Punkte
🔧 Programmierung

🔧 Streamline SSO Access to AWS Redshift Query Editor with Okta and Terraform


📈 37.11 Punkte
🔧 Programmierung

🎥 Access advanced data management solutions and streamline your workflow with analytics and AI.


📈 34.86 Punkte
🎥 Video | Youtube

🎥 Access advanced data management solutions and streamline your workflow with analytics and AI.


📈 34.86 Punkte
🎥 Video | Youtube

🔧 Análisis de Fraude Bancario con AWS Athena, AWS Lambda y Pandas


📈 32.98 Punkte
🔧 Programmierung

🔧 Step-by-Step Guide to Setting Up Terraform, AWS CLI, and Your AWS Environment.


📈 31.3 Punkte
🔧 Programmierung

🔧 Enhancing Data Queries with Firebase: Using Compound Queries to Filter Results


📈 31.01 Punkte
🔧 Programmierung

📰 Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC


📈 30.84 Punkte
🔧 AI Nachrichten

🔧 Hasura GraphQL APIs for Snowflake Streamline Embedded Analytics and Accelerate Data Access


📈 29.86 Punkte
🔧 Programmierung

🔧 Media Queries vs Container Queries – Which Should You Use and When?


📈 29.41 Punkte
🔧 Programmierung

🔧 Unleashing the Power of Serverless Data Analysis with AWS Athena


📈 29.38 Punkte
🔧 Programmierung

🎥 Mar 2023: Top takeaways from Google for Games Developer Summit, Firestore OR queries, and Terraform


📈 28.09 Punkte
🎥 Video | Youtube

🔧 Streamline Your AWS Asset Inventory with Automated Discovery and Reporting


📈 27.88 Punkte
🔧 Programmierung

📰 What Is Data Analytics? Your Guide to Data Analytics


📈 27.69 Punkte
📰 IT Nachrichten

🔧 The Future of Big Data Analytics & Data Science: 6 Key Trends of Data Analytics


📈 27.63 Punkte
🔧 Programmierung

🔧 Data Engineering - AWS Data and Analytics – Collection – Kinesis Data Stream


📈 27.26 Punkte
🔧 Programmierung

🔧 Deploying a Full Stack AWS Architecture Using Terraform: Ensuring High Availability in AWS


📈 26.31 Punkte
🔧 Programmierung

🔧 How To Manage AWS Security Hub in AWS Organizations Using Terraform


📈 26.31 Punkte
🔧 Programmierung

🔧 SST, AWS CDK, AWS CloudFormation migration to Terraform


📈 26.31 Punkte
🔧 Programmierung

🔧 How to query CSV file stored in AWS S3 using Athena?


📈 26.11 Punkte
🔧 Programmierung

🔧 AWS Partition Projections: Enhancing Athena Query Performance


📈 26.11 Punkte
🔧 Programmierung

🔧 Learning AWS Day by Day — Day 43 — Amazon Athena


📈 26.11 Punkte
🔧 Programmierung

🔧 AWS Athena for GitHub - Eliminating the default Security blind spots


📈 26.11 Punkte
🔧 Programmierung

🔧 Master REST API Development and Streamline Data Access with Spring Boot and Spring Data JPA


📈 25.89 Punkte
🔧 Programmierung

📰 CBRE and AWS perform natural language queries of structured data using Amazon Bedrock


📈 25.68 Punkte
🔧 AI Nachrichten

matomo