
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Explore the AWS data engineering landscape for the DEA-C01 exam, covering key services from EMR and Lake Formation to Redshift, Kinesis, Glue, and more, with a practical hands-on focus.
Access the course materials, slides, and exercise downloads for the aws data engineer course at sundog-education.com/aws-data-engineer, with links in resources and optional updates via the mailing list.
Navigate Udemy's interface to ask questions, browse Q&A, and access transcripts and captions; optimize playback with speed and resolution controls for a smoother learning experience.
Explore data engineering fundamentals highlighted in the exam guide, covering data structures, modeling, sampling, Git, and SQL for transforming and querying data.
Master the three data types: structured, unstructured, and semi-structured. Understand their definitions, real-world examples, and how structure affects queryability and preparation for data engineering exams.
Explore the three Vs of data, volume, velocity, and variety, and how they shape batch versus real-time processing and the design of data engineering pipelines.
Compare data warehouses and data lakes to decide storage for analytics. Explain schema-on-write vs schema-on-read, ETL vs ELT, and the rise of data lakehouses.
Understand data mesh as a decentralized governance model where teams own their data and publish data products, guided by federated governance and central standards, using AWS Lake Formation and Glue.
Explore ETL and ELT from diverse sources to data warehouses or data lakes. Learn about data integrity, transformation options, and orchestrating pipelines with AWS Glue, EventBridge, and Step Functions.
Explore data sources from jdbc and odbc to raw logs and APIs, including streaming options like Kafka and Kinesis. Compare csv, json, avro, and parquet formats for analytics.
Explore data modeling with a star schema and its fact and dimension tables, and learn data lineage and schema evolution concepts in AWS contexts.
Apply indexing to avoid full table scans, partition data to reduce scanned volumes, and use compression formats such as gzip, lzop, bzip2, and zstandard for faster access and lower storage.
Learn data sampling techniques to create smaller, representative data sets for analysis, focusing on random sampling and stratified sampling, with notes on systemic sampling for consistent selection.
Explore data skew, including uneven partitioning and the celebrity problem, and learn monitoring and adaptive fixes like salting, sampling, and custom partitioning.
Explore data validation and profiling by assessing completeness, consistency, accuracy, and integrity across sources, including cross-field validation and foreign key checks, and plan for missing data.
Explore a quick SQL review of aggregation, grouping by fields, sorting, and pivoting, with examples of count, sum, avg, max, min, and the use of case statements and where clauses.
Master SQL join types—from inner and outer joins to cross join—using customer and payment examples to show on conditions, nulls, and data mismatches.
Master sql regular expressions as a pattern matching tool beyond like, using tilde for case sensitive matches and tilde star for case insensitive, with keys like caret, dollar, and pipe.
The following three coding exercises allow you to practice solving SQL problems in a hands-on environment!
However, we have heard feedback from some students that Udemy's SQL workspace can be unreliable at times, timing out even on simple queries. If you encounter technical problems, please let Udemy support know. You can visit https://support.udemy.com/hc/en-us/articles/229606768-Learning-With-Coding-Exercises and click on the "Contact Us" button. After some screening from a chatbot, you should be able to file a trouble ticket describing the problem.
Review of Git as a version control system with remote and local repositories, branches, and essential commands like pull, clone, add, commit, push, and merge.
Explore AWS storage technologies for data engineering, including S3, EBS, EFS, and Backup, with hands-on examples and exam-style quizzes to reinforce real-world decision making.
Explore the latest AWS console UI update with a bright white interface and rounded blue buttons, while preserving the same usability as the older gray, square-button design.
Set up a billing alarm in the AWS console to monitor monthly expenses and receive email alerts before charges exceed your budget.
Explore Amazon S3 as a scalable storage backbone, and learn about buckets, objects, keys, region-based hosting, archival with S3 Glacier, and use cases from backups to data lakes.
Create and configure an Amazon S3 bucket across regions, choose general purpose, set security options, enable S3-managed encryption, upload objects, organize into folders, and understand public versus pre-signed URLs.
Explore Amazon S3 security with user-based and resource-based controls, bucket policies, acl options, cross-account access, public access settings, and object encryption.
Enable public access on an S3 bucket and create a bucket policy with the policy generator to allow getObject on all objects, demonstrated with the coffee.jpg image.
Enable bucket versioning in Amazon S3 to create file versions on each upload and roll back to earlier versions, with delete markers protecting against unintended deletes.
Enable bucket versioning in Amazon S3 and manage object versions with version IDs to track changes, roll back content, and perform permanent deletes.
Configure CRR or SRR in Amazon S3 by enabling versioning and granting read/write IAM permissions for asynchronous replication; use cases include compliance, lower latency, and cross-account log aggregation.
Enable S3 replication to copy new objects; batch replication handles existing items, optionally replicate delete markers, but deletions with a version ID are not replicated, and no chaining across buckets.
Set up cross-region S3 replication by creating an origin and replica bucket, enabling versioning, and applying a replication rule; verify new objects, versions, and delete markers replicate.
Explore Amazon S3 storage classes from standard to intelligent-tiering and glacier variants. Learn to assign or automate transitions with lifecycle configurations while understanding durability 11 nines and availability by class.
Explore Amazon S3 storage classes from standard to Glacier, including Intelligent-Tiering, Standard-IA, One Zone-IA, and various Glacier tiers, and automate object transitions with lifecycle rules.
Explore the S3 express one zone storage class, a high‑performance, single‑AZ bucket designed for latency‑sensitive workloads and co‑located compute, delivering about 10x the S3 standard performance at ~50% lower cost.
Learn to implement Amazon S3 lifecycle rules to move objects between storage classes and expire or delete them. Use prefixes, tags, versioning, and analytics to optimize transitions.
Create and manage an Amazon S3 lifecycle rule to automate moving current and non-current object versions across storage classes, expire or delete objects, and delete markers or incomplete multi-part uploads.
Discover how S3 event notifications trigger on object events (created, removed, restored, replication), filter by the JPEG suffix, and deliver to SNS, SQS, Lambda, or EventBridge with resource access policies.
Demonstrates setting up S3 event notifications, choosing S3 events such as object created, and routing to SQS, Lambda, or SNS, then testing with a sample upload to verify event delivery.
Explore Amazon S3 baseline performance, per-prefix throughput of 3500 put and 5500 get per second, and techniques like multi-part uploads, transfer acceleration, and byte-range fetches.
Explore Amazon S3 object encryption: SSE-S3, SSE-KMS, SSE-C, and client-side encryption. Understand encryption in transit with https, and enforce it via bucket policies that require secure transport.
Create a bucket with default encryption and versioning, upload files, and explore switching between sse-s3, sse-kms, and dsse-kms to understand encryption and key management in s3.
Learn how Amazon S3 default encryption works with SSE-S3 and how to switch to SSE-KMS. Discover how bucket policies can force encryption by denying unencrypted put requests.
Learn to use Amazon S3 access points to grant read/write access for finance and sales data, and read-only access for analytics, each with policy and DNS name for scalable security.
Explore Amazon S3 Tables as the heart of a data lakehouse, using Iceberg format for tabular data and enabling compatibility with Spark, Flink, Trino, and Hive.
Explore S3 tables replication for read-only cross-region or cross-account replicas, improve latency, centralize analytics, and meet compliance, with security, encryption, and strict access controls.
Explore S3 storage lens to analyze and optimize storage across your AWS organization, using the default dashboard and exported reports to identify cost efficiencies, data protection gaps, and usage trends.
Explore elastic block store (EBS) volumes for EC2, including AZ binding, detach-and-attach capabilities, provisioning with capacity and IOPS, snapshot-based cross-AZ movement, and delete on termination for root and attached volumes.
Explore EBS volumes and their attachment to EC2 instances, including creating gp2 volumes, attaching them to the same AZ, and understanding delete on termination for root volumes.
Explore how EBS elastic volumes enable on-the-fly changes to size, type, and performance without downtime, including switching from gp2 to gp3 and setting iops and throughput.
Explore Amazon EFS, a managed NFS file system that scales across linux-based AMI on EC2 across availability zones, with performance modes, throughput, storage classes, lifecycle policies, for cost-efficient data sharing.
Explore Amazon elastic file system setup, selecting regional storage with backups and lifecycle tiers, enabling elastic throughput, and mounting the shared EFS across two EC2 instances in different availability zones.
Contrast EBS volumes and EFS file systems: EBS attaches to a single instance and is AZ-locked, while EFS is a network file system shared across AZs with mount targets.
Centralize and automate backups across AWS services with AWS Backup, enabling cross-region, cross-account protection, tag-based policies, scheduled or on-demand backups, and Vault Lock for write-once-read-many safety in S3.
Learn to create AWS backup plans using templates, configure daily and monthly rules with retention and cold storage options, assign resources by tags, and understand backup, restore, and copy jobs.
Explore AWS database services from DynamoDB and RDS to DocumentDB, MemoryDB, Keyspaces for Apache Cassandra, Neptune, and Redshift, for data warehousing and analytics, with exam-style quizzes.
Explore DynamoDB, a fully managed NoSQL database with multi-AZ replication, auto-scaling, and low-latency performance for massive workloads; design primary keys using partition keys and sort keys.
Create and configure DynamoDB tables with partition keys and optional sort keys. Learn item creation, flexible attributes, and the concepts of provisioned capacity and secondary indexes.
DynamoDB handles hot data at scale across mobile apps, gaming, log ingestion, and web sessions with metadata storage for S3 objects, while avoiding uses for joins or complex transactions.
Explain provisioned and on-demand DynamoDB capacity modes, including RCU and WCU calculations, throttling, burst capacity, exponential backoff, and the role of DAX.
Define RCU and WCU for your DynamoDB tables, compare on-demand and provisioned capacity, and use the capacity calculator with auto scaling to estimate costs and performance.
Master DynamoDB basic APIs, including PutItem, UpdateItem, conditional writes, GetItem, query and scan with projections and filters, batch operations, and PartiQL for sql-style data access.
Master DynamoDB basic APIs with hands-on put, get, update, and batch delete, plus scan and query using hash key and sort key.
Learn how DynamoDB uses local and global secondary indexes to enable targeted queries, while noting creation timing and throughput implications.
Learn to create DynamoDB indexes by building a demo table with a local index at creation and a global index later, specifying partition and sort keys and projection options.
Explore PartiQL for DynamoDB, a sql-like syntax to insert, update, select, and delete items, with batch operations and console editor demonstrations using the users and demo indexes tables.
Learn how DynamoDB Accelerator (DAX) provides a fully managed in-memory cache that solves the hot key problem with microsecond reads, ttl caching, multi-az clusters, and secure DynamoDB API integration.
Create and configure a DynamoDB Accelerator (DAX) cluster with chosen node types and sizes. Set up subnets, VPC, security groups, IAM roles, and monitor cache performance.
Discover how DynamoDB streams capture an ordered history of item-level changes (create, update, delete) and route them to Lambda or Kinesis for real-time analytics.
Enable DynamoDB streams on the users post table and configure a Lambda trigger with new and old images. Test updates, creates, and deletes, and review CloudWatch logs.
Enable time to live in DynamoDB to auto delete items after expire_on epoch timestamp. Delivers no WCU cost and deletes items from tables and indexes, emitting a DynamoDB stream event.
Explore two patterns for using DynamoDB with S3: store large objects in S3 while keeping metadata in DynamoDB, enabling efficient queries and retrieval through metadata and S3 URLs.
Explore DynamoDB security, including VPC endpoints, IAM access control, encryption at rest with KMS, in transit with TLS, PITR backups, and fine-grained access via federated logins.
Explore Amazon RDS, a hosted relational database for small data, including Aurora, MySQL, PostgreSQL, and more, with acid compliance, backups, read replicas, VPC isolation, and encryption.
Learn how relational databases use shared and exclusive locks to manage reads and writes, with explicit lock commands like for share, for update, and lock tables, and understand deadlock risks.
Explore Amazon RDS best practices for operations and performance, including monitoring with CloudWatch, tuning I/O and backups, and testing failovers with proper RAM and indexing.
Amazon Aurora is a cloud-native AWS database compatible with PostgreSQL and MySQL, with auto-expanding storage up to 256 TB and fast failover.
Build and configure an Amazon Aurora database (MySQL-compatible) in production, tune instance classes, replicas, and endpoints, and explore read replica auto-scaling and global database options.
DocumentDB is AWS’s cloud-native, MongoDB-compatible NoSQL database. It is managed, highly available, with multi-AZ replication and storage that grows in 10 GB increments to handle millions of requests per second.
Amazon MemoryDB for Redis offers a Redis-compatible, durable in-memory database with ultra-fast performance, multi-AZ transaction logs, and scalable storage for web, mobile, gaming, and streaming applications.
Explore Amazon Keyspaces, a serverless, managed Cassandra service on AWS offering auto scaling, multi-AZ replication, CQL, low latency, thousands of requests per second, and on-demand and provision modes.
Explore Amazon Neptune, a fully managed graph database for highly connected data, like social networks. Handle billions of relationships with millisecond latency across availability zones and support knowledge graphs.
Discover the three query languages supported by Amazon Neptune—Gremlin, openCypher, and sparql—see example queries for graph data and exam prep.
Explore Amazon Timestream, a fully managed, serverless time series database that is fast, scalable, in-memory for recent data, cost-optimized for historical storage, with SQL compatibility and real-time analytics.
Explore amazon redshift, a fully managed petabyte-scale data warehouse. Experience fast olap analytics with mpp and columnar storage on pay-as-you-go pricing.
Query exabytes of unstructured data in S3 without loading it into Redshift, using Redshift Spectrum to join S3 data with Redshift tables and scale compute with MPP.
Learn how Redshift ensures durability with three data copies—original, cluster replica, and backups in S3—and supports vertical and horizontal scaling with seamless cutovers.
Discover redshift distribution styles: auto, even, key, and all, and how the leader node distributes rows across compute nodes and slices to balance load.
Learn the copy command to load large external data into Redshift from S3. Explore unload for exporting, manifest files, encryption, compression, and VPC routing enhancements.
Explore how Redshift integrates with AWS services like S3, DynamoDB, EMR, and Glue, using copy and vacuum to manage data, and tune workload management (WLM) and concurrency scaling.
Learn how to resize a redshift cluster with elastic resize for quick capacity, or use classic resize for node type changes, and apply snapshot-restore-resize to minimize downtime.
Explore Redshift RA3 decoupled compute and storage, data lake export to S3 in Parquet, cross-region sharing, and Redshift ML with SageMaker Autopilot for real-time predictions.
Explore Redshift security fundamentals, including using hardware security modules with client and server certificates, migrating to encrypted clusters, and managing access with grant and revoke commands for users and groups.
Redshift serverless scales automatically and bills by RPUs, enabling easy spin-up for development, testing, or ad hoc analysis, with an IAM role, a VPC, JDBC/ODBC, and CloudWatch metrics.
Explore Redshift materialized views, a pre-computed, refreshable table-like structure that accelerates complex queries and dashboards, with explicit or auto refresh and the ability to stack views.
Share live, read-only data across Redshift clusters to isolate workloads and enable cross-group collaboration, development, test, and production environments via standard, data exchange, or lake formation shares.
Learn how Redshift integrates AWS Lambda functions as user defined functions in SQL queries; register external functions, grant language and IAM permissions, and invoke Lambda to perform operations.
Learn how Redshift federated queries connect to RDS and Aurora databases to query live data in place, avoiding ETL, offloading computation, and expanding to data lakes via external schemas.
Explore redshift system tables and views to monitor performance. Practice joining stl_query with svl_qlog to analyze recent queries and compute execution time using datediff.
Run SQL against provisioned or serverless Redshift clusters with the secure Redshift Data API. Integrate via the AWS SDK, support asynchronous queries, and monitor activity with CloudTrail.
Learn how to migrate and transfer data with AWS, including tools like AWS Application Discovery Service, Database Migration Service, Data Sync, Snow Family, and Transfer Family.
Plan cloud migrations with the AWS Application Discovery Service to map servers and dependencies, then rehost to AWS using the AWS Application Migration Service for minimal downtime.
Explore AWS DMS to migrate on-premises databases to AWS with continuous data replication using CDC. Understand homogeneous and heterogeneous migrations and use SCT for schema conversion when engines differ.
Explore AWS DMS from discovery and assessment to migration, including endpoint setup, schema conversion, and choosing provisioned or serverless replication for data migration or replication.
Explore how AWS DataSync synchronizes data between on-premises or other clouds and AWS storage services (S3, EFS, FSx), preserving metadata and permissions with an on-premises agent, and scheduled replication tasks.
Learn how AWS Snowball and Snowball Edge devices enable data migration and edge computing, with storage and compute optimized options and on-device processing with EC2 or Lambda.
Explore the AWS Snow Family by walking through a hands-on import workflow from Amazon S3 using Snowball Edge storage-optimized or compute-optimized, including pricing, security roles, and shipping.
Find and subscribe to third-party data with AWS Data Exchange, load datasets into S3 for analysis or SageMaker ML, and license data via Redshift and APIs.
Learn how AWS Transfer Family provides FTP, FTPS, or SFTP access to S3 or EFS via a fully managed, scalable service with pricing and Active Directory or LDAP authentication options.
Learn how computing resources power data processing and how EC2, AWS Lambda, AWS Serverless Application Model, and AWS Batch fit into data engineering, preparing you for the associate-level exam.
Explore how EC2 powers big data with on-demand, spot, and reserved instances, enables auto scaling for EMR and DynamoDB, and supports fault-tolerant checkpointing.
Explore aws graviton, amazon's processor family powering ec2 general purpose, compute optimized, memory optimized, storage optimized instances, accelerated computing and price performance, and support for msk, rds, memorydb, and fargate.
Explore how AWS Lambda enables serverless data processing, acting as glue between services like API Gateway, Cognito, Kinesis, and DynamoDB to transform and route data at scale.
Discover how AWS Lambda enables serverless code execution, with real-time file processing, ETL, stream processing, and scheduled tasks, plus triggers from S3, DynamoDB, and Kinesis.
Learn how Lambda links S3 to Amazon OpenSearch to ingest, transform, and visualize log data for near real-time analytics. Explore data pipelines, Redshift loading, and state tracking with DynamoDB.
Mount the EFS file system to Lambda in a VPC with EFS access points for fast, shared storage; compare ephemeral /tmp, Lambda layers, S3, and EFS options to choose best.
Discover how AWS SAM streamlines building and deploying serverless apps with YAML templates, CloudFormation, and constructs for Lambda, API Gateway, and DynamoDB.
Learn to initialize a SAM app and deploy a hello world lambda behind an API gateway using the AWS SAM framework and SAM CLI.
Build a serverless API with AWS SAM and DynamoDB, including a DynamoDB table and get and post endpoints, plus local testing with SAM CLI and Docker.
Run batch jobs from Docker images with AWS Batch, which dynamically provisions EC2 or Spot instances and lets you schedule with CloudWatch Events or orchestrate via Step Functions.
Deploy and manage data processing applications in the cloud using Amazon ECS, ECR, and EKS, packaging with Docker and Kubernetes. Follow hands-on examples and reinforce learning with a quiz.
Discover how Docker containers package apps for consistent deployment across environments. Learn Docker images, repositories like Docker Hub and Amazon ECR, and AWS container services ECS, EKS, and Fargate.
Explore amazon ecs architectures with ec2 launch type and fargate, including clusters, ecs agents, task roles, and efs for persistent shared storage across azs with ALB or NLB options.
Create and configure an Amazon ECS cluster, exploring Fargate and self-managed capacities, setting up instance roles, and provisioning an auto scaling group to register container instances for running tasks.
Create an ECS service on Fargate with a nginxdemos-hello task definition, map port 80, attach an application load balancer, and scale tasks across AZs.
Amazon ECR, the elastic container registry, stores and manages Docker images on AWS with private or public repositories, integrates with ECS for EC2 image pulls.
Launch and manage Kubernetes clusters on AWS with Amazon EKS, offering EC2 or Fargate deployment, and support for managed or self-managed nodes with EBS, EFS, and FSx storage.
Create and manage an Amazon EKS cluster, configure networking and security, provision a managed node group or Fargate profile, and explore add-ons like the EBS CSI driver.
Explore AWS analytics tools such as AWS Glue, Lake Formation, Athena, EMR, OpenSearch, QuickSight, Kinesis, and Amazon Managed Streaming for Apache Kafka, and apply streaming and analytics to data lakes.
Explore AWS Glue, a serverless data catalog and crawler that discovers schema from S3 data and enables ETL with Spark for Athena, Redshift, and EMR.
Explore how AWS Glue integrates with Hive on EMR via the Glue data catalog as a metastore, enabling serverless ETL with Spark and dynamic frames.
Modify the glue data catalog from ETL scripts by updating partitions and schema, or creating new tables with enable update catalog and update behavior.
Run Glue jobs on a cron-style schedule or use job bookmarks to persist state. CloudWatch notifies on success or failure and triggers Lambda, SNS, EC2, Kinesis, or Step Functions.
Understand Glue cost models, billing by the second for crawlers and etl jobs, data catalog free tier, anti-patterns—avoid multiple etl engines, with Spark-based streaming.
Learn how AWS Glue flex jobs use a flex execution class. Save 30–35% with spare capacity for time-insensitive workloads, though Python shell, streaming, and machine learning jobs are not supported.
Explore AWS Glue Studio, a visual ETL tool for DAG-based workflows from sources like S3, Kinesis, Kafka, and JDBC, with transforms, partitioning, and a visual dashboard.
Explore AWS Glue data quality in Glue Studio, create manual or automatic rules with the Data Quality Definition Language, log results in CloudWatch, or fail the job.
Explore Glue DataBrew, a visual data preparation UI for pre-processing data from S3, data warehouses, or databases, with 250 transformations, nest to map, data quality rules, and security features.
Explore AWS Glue DataBrew to pre-process data with recipes, perform transformations, clean and filter, impute missing values, and scale features, then run interactive sessions or scheduled jobs on S3 data.
Explore how to handle personally identifiable information in Databrew transformations by applying substitution, encryption (deterministic and probabilistic), masking, decryption, deletion, and hashing for secure data.
Learn how AWS Glue workflows orchestrate multi-job ETL processes inside AWS Glue, using schedule, on-demand, or event bridge triggers to run crawlers and jobs and update schema.
Learn how AWS Lake Formation, built on Glue, enables building and securing a data lake with data loading, transformations to Parquet or ORC, cataloging, and fine-grained access control.
Explore how Lake Formation uses data filters to implement column, row, and cell level security, configuring filters for database and table via console or API CreateDataCellsFilter.
Explore Amazon Athena, a serverless sql interface for S3 data that queries data directly in place and supports csv, json, orc, parquet, and avro.
See how Athena queries S3 data via the Glue data catalog, using work groups for access and cost control, while leveraging columnar formats like ORC and Parquet to cut scans.
Improve Athena performance by using columnar formats like ORC or Parquet. Partition data in S3 by date and use the msck repair table command to add partitions later.
Enable acid transactions in Athena by creating a table with table_type equals Iceberg, allowing safe concurrent row modifications and time travel with EMR, Spark, and Iceberg-compatible tools.
Learn how Apache Iceberg enables ACID-compliant transactions on data lakes via Athena, Glue, EMR, and Spark, with schema evolution, time travel, partitioning, and efficient metadata management.
Explore Athena fine-grained access to AWS Glue Data Catalog, enabling IAM-based database and table level security for operations across regions beyond lake formation filters.
Learn how Apache Spark uses in-memory caching and a query optimizer to accelerate big data workloads with Spark SQL, MLLib, GraphX, and structured streaming on EMR and Hadoop.
Build a data lake in S3 by crawling JSON magazine reviews with Glue, infer a schema, and query in Athena to identify top reviewers.
Create a new table from a query with CTAS in Athena, optionally changing format and location (Parquet with Snappy or ORC) for more efficient queries on S3 data.
See how Spark streaming treats data as a dataset, perform one-hour window counts on S3 logs, and write results to MySQL via JDBC, with Kinesis and Redshift integration on EMR.
Explore running Apache Spark within Amazon Athena to interactively explore and prepare data in your S3 data lake via a built-in notebook, using serverless resources and API access.
Discover how athena federated queries connect to data sources through lambda connectors. Query sources include cloudwatch, dynamodb, rds, opensearch, and third‑party databases, with glue views, secrets manager, and cross‑account support.
Explore Amazon EMR, a managed Hadoop framework on EC2 with master, core, and task nodes; run Spark, use EMR notebooks, and launch transient or long-running clusters.
Explore how EMR integrates with the AWS ecosystem, leveraging EC2, VPC, S3, CloudWatch, IAM, and CloudTrail, and compare HDFS and EMRFS storage for persistent data with S3.
Discover EMR promises cost efficiency with hourly pricing and auto start/stop, and managed scaling that uses task and core nodes, EMRFS, and Spark, Hive, or YARN workloads.
learn EMR serverless and EMR on EKS for running Spark, Hive, or Presto jobs with managed capacity. configure initial capacity, IAM roles, and Spark parameters, and monitor with logs.
Explore how Kinesis data streams ingests big data with provisioned shards, while producers send records using partition keys and data blobs, and various consumers read them.
Explore how to produce data to Amazon Kinesis Streams using the SDK, KPL, and Kinesis Agent, plus third-party libraries, with batching, throughput management, and partition key strategies.
Learn how classic Kinesis data streams consumers read data with the SDK or CLI, using the KCL for coordination with DynamoDB, and delivery options like Firehose and Lambda.
Practice building a Kinesis data stream, compare on-demand and provisioned modes, scale with shards, and produce/consume data via SDK, KPL, or Lambda using the AWS CLI.
Explore amazon kinesis data streams enhanced fan out, where SubscribeToShard pushes two megabytes per second per consumer per shard via http/2 for scalable, low-latency data delivery and trade-offs.
Split hot shards to increase Kinesis throughput, then merge low-traffic shards to save costs; manage resharding to preserve record order, and leverage the KCL and auto scaling for capacity planning.
Handle producer-side duplicates in Kinesis data streams caused by network timeouts and retries. Apply deduplication with a unique record id and ensure idempotent consumers to handle common retry scenarios.
Explore securing Kinesis data streams with IAM policies, encryption in transit via HTTPS endpoints, and encryption at rest with KMS. Assess VPC endpoints for private access and manual client-side encryption.
Learn how Kinesis Data Firehose ingests data from streams, batches and optionally transforms it with Lambda, then delivers to S3, Redshift, OpenSearch, or Splunk with automatic scaling.
Troubleshoot and tune Kinesis data streams by addressing producer throughput, service and shard limits, hot shards, and partition key distribution, while enabling batching, retries, and enhanced fan-out for consumers.
Explore how Kinesis Data Analytics becomes the managed Apache Flink service, enabling streaming ETL and analytics via SQL or Flink APIs, with outputs to S3, Redshift, DynamoDB, or Lambda destinations.
Explore the Kinesis Analytics serverless cost model, paying only for resources consumed, with automatic scaling and schema discovery for inferred SQL columns and random cut forest for anomaly detection.
Explore Amazon MSK, a fully managed Apache Kafka service on AWS, and compare it with Kinesis while understanding its architecture, multi-AZ deployment, security, and monitoring for data engineering workloads.
Discover MSK Connect, a managed kafka connect service on AWS with auto-scaling workers and connectors like Amazon S3, Redshift, OpenSearch, and Debezium, enabling data flow from MSK to destinations.
Explore Amazon MSK serverless, which provisions and scales Kafka resources automatically, lets you define topics and partitions, and applies IAM access control with a flexible pricing model.
Compare Amazon Kinesis Data Streams and Amazon MSK to grasp limits, scaling, security, including 1 megabyte vs 10 megabytes, shards and partitions, and IAM or TLS and Kafka ACL options.
Learn how OpenSearch service, a fork of Elasticsearch built on Lucene, enables petabyte-scale analysis, dashboards, and fast ingestion via Kinesis; it uses indices, shards, and primary/replica replication.
Explore OpenSearch Service, comparing the fully managed domain with the serverless option, and learn security, availability, and AWS integrations for scalable search and analytics.
Explore OpenSearch storage strategies, including hot, ultrawarm, and cold, and automate index management with ISM, roll-ups, transforms, and cross-cluster replication using follower and leader indices for stability.
Optimize Amazon OpenSearch performance by reducing memory pressure on the JVM. Identify unbalanced shard allocations and excessive shard counts, then delete older unused indices or archive data to reclaim memory.
Learn OpenSearch serverless, introduced in 2023, with on-demand autoscaling, search and time series collections, OCUs for capacity, encryption with a KMS key, and a console showing managed and serverless sections.
Amazon QuickSight is AWS's cloud-powered visualization service for fast, ad-hoc analytics and dashboards, connecting to Redshift, Aurora, RDS, Athena, S3, OpenSearch, and more, with SPICE acceleration and robust security.
Understand QuickSight pricing per user, annual and monthly options, SPICE capacity charges, QuickSight Queue, embedded dashboards via JavaScript SDK, and ML insights like anomaly detection, forecasting, and auto narratives.
Master Quicksight calculated fields to create new metrics, apply level-aware and window calculations, and leverage pre-aggregation with pre-filter options for flexible insights.
Explore how Amazon EventBridge, AppFlow, and AWS Step Functions integrate into data pipelines to automate etl, and review Simple Queue Service, Simple Notification Service, and Amazon MWAA as managed tools.
Explore how Amazon SQS uses queues with producers and consumers, offering standard and FIFO modes, at-least-once delivery, and scalable, low-latency messaging with batching limits.
Explore when to use Amazon Kinesis Data Streams versus Amazon SQS, contrasting multi-reader streaming with decoupled queues, retention, replay, ordering, and scaling options.
Explore dead letter queues in SQS, set maximum receives to detect failures, retain messages for 14 days, and use redrive to the source queue to debug, fix, and reprocess safely.
Configure an Amazon SQS dead-letter queue (DLQ) with DemoQueueDLQ, set retention and encryption, simulate poison pill failures, and redrive messages to the source queue.
Explore how Amazon SNS uses pub/sub to publish messages to a topic with multiple subscribers, delivering to email, SMS, HTTP endpoints, SQS, Lambda, or mobile platform endpoints.
Master the SNS and SQS fan-out pattern, using a single SNS topic to fan-out to multiple SQS queues with FIFO topics, message filtering, and cross-region delivery.
Learn to create and test an Amazon SNS topic, choose between standard and fifo, configure access policies, and publish a test message to an email subscription.
Design and visualize multi-step workflows with AWS Step Functions, enabling external error handling, retries, and an audit history, with time delays between steps defined in JSON-based ASL.
Discover how AWS step functions orchestrate data pipelines with a state machine, featuring task, choice, wait, parallel, map, and pass, succeed, and fail states.
Explore Amazon AppFlow, a fully managed integration service, connecting SaaS sources like Salesforce and Zendesk to destinations such as S3 and Redshift, with on-demand or scheduled transfers and data transformation.
Discover how Amazon EventBridge connects AWS services and external partners to schedule cron jobs, react to events, and route them to Lambda, SQS, SNS, or Step Functions.
Learn to set up Amazon EventBridge rules with event patterns to detect EC2 instance state changes (shutting-down or terminated) and route notifications to an SNS topic.
Explore Amazon Managed Workflows for Apache Airflow (MWAA) and learn how it runs Python-defined DAGs in a scalable, VPC-hosted environment with S3 code, an Airflow web UI, and IAM-managed endpoints.
Explore security, identity, and compliance in data engineering, emphasizing least privilege and IAM policies to protect personally identifiable information, with AWS KMS, Secrets Manager, Macie, and EventBridge.
Learn the principle of least-privilege by granting only required permissions, with an S3 bucket access example restricted to data/reports and csv files, and IAM Access Analyzer guidance.
Explain data masking and anonymization to protect personally identifiable information, showing credit card masking and last four digits, and implementing masking policies in Redshift and Glue DataBrew.
Apply per-user, cryptographically secure salts to passwords before hashing to thwart rainbow table attacks and produce unique hashes. Rotate salts and verify by hashing with the user's salt using sha256.
Learn how to prevent backups or replication to disallowed regions by enforcing geographic restrictions with AWS Organizations service control policies, IAM and S3 policies, and monitoring with CloudTrail and CloudWatch.
Explore IAM basics, including creating users and groups, assigning policies in JSON, and applying the least privileged principle to securely manage access to EC2, Elastic Load Balancing, and CloudWatch.
Create IAM users and groups, assign administrator access, and sign in as an IAM user using a custom account alias to avoid using the root account.
Enable multi-session support to sign into multiple AWS accounts from the browser with separate windows. Switch between account IDs in EC2 and EBS to view two accounts side by side.
Learn how IAM policies attach to groups and users, including inline and inherited policies, and master the policy structure—version, id, statement, effect, principle, action, resource, and optional conditions.
Examine how IAM policies control access by attaching administrator and read-only permissions to users and groups. Learn to create, attach, and interpret policies using JSON and visual editors.
Implement strong password policies and enable multi-factor authentication to protect AWS accounts, using virtual MFA (Google Authenticator or Authy) or hardware security keys (YubiKey, Gemalto, SurePassID) across IAM users.
Define and enforce a password policy in the IAM console, including length and character requirements, then set up root account multi-factor authentication with an authenticator app via a QR code.
Explore IAM roles as the permissions conduit for AWS services, enabling an EC2 instance, Lambda, or CloudFormation to perform actions on your behalf using defined roles.
Learn how to create an AWS IAM role for EC2, attach policies like read-only access, verify permissions, and define trusted entities for secure EC2 access.
Learn how encryption in flight uses TLS/SSL and certificates to protect data between client and server, and explore server-side and client-side encryption with data keys and envelope encryption.
Learn how AWS KMS manages encryption keys, integrates with IAM, and audits API calls via CloudTrail; explore symmetric and asymmetric keys, key policies, cross-account use, and regional constraints.
Explore how AWS KMS manages AWS managed and customer keys, key policies, and cryptographic configurations, then encrypt and decrypt data using the CLI.
AWS Macie uses machine learning and pattern matching to discover and protect PII in S3 buckets, alerting discoveries via EventBridge and enabling SNS and Lambda integrations.
Explore how AWS Secrets Manager stores and rotates database credentials, encrypts them with KMS, automates secret generation with Lambda, and enables multi-region replication for disaster recovery.
Learn how AWS Secrets Manager securely stores, rotates, and retrieves secrets with database integrations (MySQL, PostgreSQL, Amazon Aurora, RDS) and region replication, plus flexible secret types and rotation via Lambda.
Learn how AWS WAF protects your web app at layer seven with Web ACLs, deployed on ALB, API Gateway, CloudFront, AppSync, or Cognito.
Discover how AWS Shield protects against DDoS attacks. Shield Standard provides free protection for layer 3/4 floods, while Shield Advanced adds 24/7 support and automatic layer 7 mitigation.
Learn security across AWS services with encryption in transit and at rest, IAM access controls, and private access via VPC endpoints for Kinesis, SQS, IoT, S3, and DynamoDB.
Explore AWS data security across RDS, Aurora, Lambda, Glue, VPC isolation, security groups. Use KMS for encryption at rest, SSL for in flight, and IAM authentication for databases.
Explore AWS data engineer security deep dive, covering EMR access control, IAM roles, Kerberos, and EMRFS encryption, plus OpenSearch Service, Redshift, Athena, and QuickSight security basics.
NEW: Updated for version 1.1 of the exam (2026) and now includes an additional practice exam
The AWS Certified Data Engineer Associate Exam (DEA-C01 or DE1-C01) is one of the most challenging associate-level certification exams you can take from Amazon Web Services, and even among the most challenging overall. Passing it tells employers in no uncertain terms that your knowledge of data pipelines is wide and deep. But, even experienced technologists need to prepare heavily for this exam. This course sets you up for success, by covering all of the data ingestion, transformation, and orchestration technologies on the exam and how they fit together.
Best-selling Udemy instructors Frank Kane and Stéphane Maarek have teamed up to deliver the most comprehensive and hands-on prep course we've seen. Together, they've taught over 4 million people around the world. This course combines Stéphane's depth on AWS with Frank's experience in wrangling massive data sets, gleaned during his 9-year career at Amazon itself.
The world of data engineering on AWS includes a dizzying array of technologies and services. Just a sampling of the topics we cover in-depth are:
Streaming and transforming data with Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (MSK)
Queuing messages with Simple Queue Service (SQS)
Orchestrating data pipelines with Amazon AppFlow, Amazon EventBridge, AWS Step Functions, and Amazon Managed Workflows for Apache Airflow (MWAA)
Transitioning from small to big data with the AWS Database Migration Service (DMS), AWS DataSync, Snow Family, Transfer Family, and more
Storing massive data lakes with the Simple Storage Service (S3) and managing data lifecycles
Optimizing transactional queries with DynamoDB, DocumentDB, Keyspaces, and MemoryDB
Tying your big data systems together with AWS Lambda
Making unstructured data query-able with AWS Glue, Glue DataBrew, and Lake Formation
Processing data at unlimited scale with Elastic MapReduce, including Apache Spark
Applying advanced machine learning algorithms at scale with Amazon SageMaker
Searching and analyzing petabyte-scale data with Amazon Opensearch (formerly Elasticsearch) Service
Querying S3 data lakes with Amazon Athena
Hosting massive-scale data warehouses with Redshift and Redshift Spectrum
Integrating smaller data with your big data, using the Relational Database Service (RDS)
Keeping your data secure with encryption, KMS, Macie, Secrets Manager, IAM, and more
Managing and governing your systems with CloudFormation, CloudTrail, CloudWatch, AWS Config, and more
Integrating data with Generative AI applications using Bedrock, Vector Stores, Aurora, Kendra, and Knowledge Bases.
Throughout the course, you'll have lots of opportunities to reinforce your learning with hands-on demos, two full-length practice exams, and additional practice questions. We'll also arm you with some valuable test-taking tips and strategies along the way.
Although this is an associate-level exam, it is one of the more challenging ones. AWS recommends having a few years of both data engineering experience and AWS experience before tackling it. This exam is not intended for AWS beginners.
You want to go into the AWS Certified Data Engineer Associate Exam with confidence, and that's what this course delivers. Hit the enroll button, and we're excited to see you in the course... and ultimately to see you get your certification!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Instructor
My name is Stéphane Maarek, I am passionate about Cloud Computing, and I will be your instructor in this course. I teach about AWS certifications, focusing on helping my students improve their professional proficiencies in AWS.
I have already taught 3,000,000+ students and gotten 500,000+ reviews throughout my career in designing and delivering these certifications and courses!
With AWS becoming the centerpiece of today's modern IT architectures, I have decided it is time for students to learn how to be an AWS Data Analytics Professional. So, let’s kick start the course! You are in good hands!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Instructor
Hey, I'm Frank Kane, and I'm also co-instructing this course. I spent nine years working for Amazon from the inside as a senior engineer and senior manager, and I'm best known for my top-selling courses in "big data", data analytics, machine learning, AI, Apache Spark, system design, and Elasticsearch.
I've been teaching on Udemy since 2015, where I've reached over one million students all around the world!
I've worked hard to keep this course up to date with the latest developments in AWS data engineering, and to make sure you're prepared for the latest version of this exam. Let's dive in and get you ready!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
This course also comes with:
Lifetime access to all future updates
A responsive instructor in the Q&A Section
Udemy Certificate of Completion Ready for Download
A 30 Day "No Questions Asked" Money Back Guarantee!
Join us in this course if you want to pass the AWS Certified Data Engineer - Associate Exam DEA-C01 and master the AWS platform!