
In this introduction video, you will learn the following topics
1. Introduction and Common Terminologies
2. What does Resiliency Mean
3. Common Events versus One-time Events
4. How to Measure Availability - Request-based or Time-based
5. Fault Tolerance versus High-Availability
6. High-Availability is not Disaster Recovery!
7. Disaster Recovery Metrics (RTO/RPO)
8. Fail-over and Fail-back
9. Backup and Restore (Option 1)
10. Pilot Light (Option 2)
11. Warm Standby (Option 3)
12. Multi-region Active-Active (Option 4)
13. Cloud-based DR
From midnight on October 19 to October 20, 2025, AWS’s us-east-1 region faced one of its longest disruptions — lasting over 15 hours and impacting key services like DynamoDB, EC2, and IAM. The incident also triggered cascading failures across other regions that rely on us-east-1–hosted global services.
In this video, we unpack what went wrong, how AWS managed the crisis, and the disaster recovery (DR) strategies every architect should apply to protect workloads from region-level and cross-region dependencies.
The last AWS outage exposed a critical weakness: regional independence is a myth for certain core services. We dive deep into the Control Plane vs. Data Plane architecture and identify 4 global dependencies—including IAM, Route 53, S3, STS—that can quietly derail your multi-region failover. Learn the secret to Statically Stable Design and achieve true resilience.
Google Cloud accidentally deleted UniSuper's account, resulting in the loss of data and access to retirement and pension funds for over 500,000 customers. In this video, we delve into this issue and discuss disaster recovery strategies to prevent such scenarios.
In this lab, you will learn how to
Deploy Multi-AZ Database (Highly Available)
Configure Continuous Backup and Snapshot
Create Table - Load and Query Data
Perform Failover
Configure Read Replica
In this lab, you will learn how to
Launch Aurora Multi-AZ Configuration (Primary, Reader)
Create Table, Load and Query Data in Aurora
Perform same-region Failover
Convert to Global Database
Managed fail over to a Disaster Recovery region
Unplanned fail over to a Disaster Recovery region
Hi and Welcome to the Resilient Architectures on AWS with Practical Solutions course
This course teaches you how to design and implement disaster recovery architectures that minimize downtime and data loss.
You'll deploy a multi-tier application and evaluate Backup and Restore, Pilot Light, Warm Standby, and Multi-Region Active-Active solutions.
Additionally, you'll learn strategies for recovering from malicious and accidental data loss, as well as regional failures.
The course also covers techniques for reducing recovery time (RTO) and data loss (RPO) using DynamoDB Global Tables and Aurora Global Database.
I am Chandra Lingam, and I am your instructor.
In this course, you will learn:
Key concepts and terminologies related to disaster recovery
The concept of "Everything Fails, All the Time" as espoused by Werner Vogles
The meaning of resiliency and availability, and how they differ
The distinction between fault tolerance and high availability and why high availability alone is not sufficient for disaster-proofing
Hands-on labs to apply the concepts learned
You will deploy a multi-tier web application using DynamoDB Global Table, Lambda, API Gateway, EC2, Elastic Load Balancer, and Route 53
Explore backup and restore options using AWS Backup, configure point-in-time-recovery (PITR), schedule backups and maintain copies in a second region
Simulate malicious data loss and corruption and learn how to recover the data
Handle the loss of a region using Backup and Restore
Configure the App in Pilot Light mode and observe how it minimizes data loss
Upgrade the infrastructure as Warm Standby and how it helps to reduce recovery time
Look at the multi-site active-active configuration for a zero-downtime solution
Learn how DR changes with relational databases such as RDS and Aurora
Configure point-in-time-recovery, schedule backup, and continuous replication
Perform both Managed Failover and Unplanned Failover to a DR Region using Aurora Global Database
This is an intermediate level course, you need to have an AWS account with administrative access and be familiar with EC2, ELB, IAM, and Databases. I am looking forward to meeting you!
Happy Learning!
Chandra Lingam
Cloud Wave LLC