
Explore AWS Athena, an interactive service to analyze data directly, covering creating databases and tables, loading data from S3, and working with partitions, bucketing, and structured, unstructured, and semi-structured data.
Navigate the Athena interface to write and run SQL queries, view results, and save or download results. Manage data sources and recent queries, plus view execution stats and encryption settings.
Create a database in AWS Athena using the employees dataset housed in the history bucket, including employees, departments, countries, job history, jobs, and locations, and verify the database creation.
Learn to create an external table from an S3 bucket in the employees data database by defining columns and data types, configuring encryption, handling headers, and validating results.
Learn CTAS in AWS Athena by creating a table from a select statement that filters employees with salaries greater than or equal to 10,000, resulting in 19 records.
Create and use views in AWS Athena, building a view from a template, concatenating names, and querying salary and job data.
Save a table and data in Jason format, rename the table to employees underscore Jason, and download and verify the generated Jason file after executing the query.
Learn to store data in a row format, rename the table employees in the school, view the data schema, and download the file that isn’t in a human readable format.
Learn bucketing as a high-cardinality alternative to partitioning to speed queries by grouping records into a fixed number of buckets, e.g., four buckets.
Explore partitioning and bucketing in AWS Athena, using partitioned and bucketed tables for low and high cardinality fields. Build an employees table with four buckets to illustrate partitioning and bucketing.
Insert records into the employees information table using insert statements and values. Avoid manual inserts by selecting from another table to reduce metadata in the history bucket.
Delete records in AWS Athena by removing S3 bucket metadata, since Athena cannot delete individual rows with DML.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
In this course you will work with,
• Creating a database
• Creating tables
• Create table out of a file
• Querying the data from S3 bucket
• CTAS (Create Table As Select)
• Partitions and Bucketing
• Interact with structured, unstructured and semi structured data
• Store the data in TEXTFILE, PARQUET, JSON, ORC and AVRO formats
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.
Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena is ideal for interactive querying and can also handle complex analysis, including large joins, window functions, and arrays. Amazon Athena is highly available; and executes queries using compute resources across multiple facilities and multiple devices in each facility. Amazon Athena uses Amazon S3 as its underlying data store, making your data highly available and durable.