
In this lecture, you'll install and configure the AWS CLI and verify your setup.
You'll also get access to the Lambda durable functions course GitHub repository, Durable Lambda resources, and official AWS documentation used throughout the course.
Discover why AWS Lambda Durable Functions exists and the problem it solves. We trace the evolution from stateless Lambda functions (with no built-in state, manual retry logic, and a 15-minute hard limit)through Step Functions (powerful but requiring you to learn a separate JSON/YAML language called ASL), and finally to Durable Functions: workflows written in pure JavaScript, no new language to learn.
What is a Lambda Durable Function?
Hands-on from start to finish. You'll create a durable function in the AWS Consolet. Then you'll paste a working handler using withDurableExecution and context.step, invoke it, and watch the execution history in the console timeline.
Lambda Durable Functions has idempotency baked in through the --durable-execution-name flag. Submit the same execution name twice, and you get back the same execution, not two separate runs. This lecture demonstrates how the durable execution name acts as a global idempotency key for your entire workflow.
Not all invocations are equal. This lecture breaks down the difference between RequestResponse (synchronous) and Event (asynchronous) invocation for durable functions. You'll learn when synchronous invocation works (only when your total execution timeout is 15 minutes or less) and why longer workflows must use async invocation. We cover the exact CLI flags, the error you'll hit if you get it wrong (You cannot synchronously invoke a durable function with an executionTimeout greater than 15 minutes).
One of the most common mistakes in Lambda Durable Functions is the hardest to debug. Because the handler always re-runs from the top on every replay, any code outside a step that produces a different value each time (like Date.now(), Math.random(), crypto.randomUUID(), or an API call) will return a different result on replay than it did on the first invocation, causing a NonDeterministicExecutionError. In this lecture, you'll see exactly why this breaks the checkpoint-replay model and learn the fix: move all non-deterministic code inside a context.step so the result is checkpointed and returned from cache on every subsequent replay.
A subtle trap that looks completely harmless until your workflow replays. If you declare a let variable outside a step and mutate it from inside the step body, that mutation is silently discarded on replay, leaving the variable empty when the next step tries to use it. This lecture demonstrates the broken pattern (let x; context.step(...) { x = result }) side-by-side with the correct one (const x = await context.step(...) { return result }). The rule is simple: always return values from steps, never mutate the outer scope.
Durable workflows are just code, so regular JavaScript if/else branching works exactly as you'd expect, and this lecture shows you how. You'll build a media-type router where the workflow takes a different durable step depending on the incoming event: text content goes to Amazon Bedrock for analysis, images go to Amazon Rekognition, and unsupported types exit early without running any steps at all. Key insight: early returns before any context.step call are perfectly valid, no checkpoint is written, and no charge is incurred.
What happens when a step throws?
By default, it retries 5 times with exponential backoff.
In the next few sections, we will go through the fundamentals of different durable operations.
Lambda Durable Functions gives you three distinct ways to pause a workflow, each designed for a different scenario.
context.wait pauses for a fixed duration (no compute charges during the pause).
waitForCondition repeatedly checks an external system until a condition is met, i.e., the polling pattern.
waitForCallback suspends the workflow indefinitely until an outside system explicitly signals it to resume.
invoke waits for another lambda function to finish executing.
A deep-dive into context.waitForCondition.
In this AWS integration demo, you'll invoke Amazon Polly's StartSpeechSynthesisTask (an async API that returns immediately with a TaskId) inside a context.step, then use waitForCondition with createWaitStrategy to poll GetSpeechSynthesisTask every 5–15 seconds until the task status reaches "completed".
The full IAM policy for Polly + S3 is included.
This is the canonical pattern for any AWS service that returns a job ID first and completes asynchronously: Textract, Transcribe, Rekognition Video, Glue jobs, and more.
waitForCallback is how you integrate with anything outside AWS: a human reviewer, a third-party webhook, a payment gateway, or a mobile app.
Callback Using waitForCallback Composite Operation.
With a live callback waiting, this lecture walks through how to send the resolution from the command line.
The first capstone project, pulling together everything from Section 4. You'll build a complete human-in-the-loop pipeline.
context.invoke operations lets you kick off another lambda function and wait for it to finish. Durable function checkpoints and suspends (no charges) till invocation of another function is finished.
Sequential steps mean the total workflow time is the sum of every step's duration. This lecture introduces the concurrency model in Lambda Durable Functions: how multiple operations launch within the same Lambda invocation.
context.parallel runs branches concurrently, i.e., each branch is a different function doing a different job. You'll build an order pre-check workflow that simultaneously verifies inventory, validates payment, and confirms shipping availability, then collects all three results with result.getResults().
Key concepts: each branch receives its own isolated child context (ctx), branch functions are defined inline or as named functions, and all branches start in the same Lambda invocation.
You'll see the execution history showing all three steps running concurrently rather than one after another.
context.parallel has a completionConfig option that controls how failures are handled across branches.
context.map concurrently applies the same operation to every item in an array concurrently, with each item getting its own isolated child context and its own named step.
You'll process an array of three orders (shoes, shirt, jacket) in parallel, with each item fulfilled in fulfill-0, fulfill-1, fulfill-2 steps that you can track individually in the execution history.
You'll learn the difference between parallel (heterogeneous branches, fixed list) and map (homogeneous operation, dynamic array).
Project and Code Walkthrough
context.runInChildContext groups multiple steps and wait operations under a single named logical unit, like a sub-workflow with its own isolated checkpoint counter. On replay, the entire child context is replayed as one atomic unit rather than step by step, making it more efficient and keeping execution history clean and collapsible. This lecture covers two reasons you'd reach for it directly.
First: grouping. You'll build an order processor where validation and charging run inside a process-order child context, the result surfaces as a single entry in the execution timeline, and the parent just receives the final return value.
Second: concurrency correctness. The replay model assigns sequential IDs to operations in the order they are called, but concurrent branches resolve in a different order each run. When two branches each have multiple chained steps sharing the parent counter, the IDs get mismatched on replay, and each step gets the wrong cached result. Wrapping each branch in its own child context gives it an isolated counter, so the parent only tracks two IDs (one per branch) regardless of internal execution order. The payoff at the end: context.parallel and context.map does this automatically, and you never need to call runInChildContext manually for parallel work.
What actually happens when a step throws?
This lecture walks through the default error behavior.
Add a retryStrategy to any step and the SDK handles retry logic for you, no try/catch, no manual loops. This lecture covers createRetryStrategy with all its options: maxAttempts, initialDelay, maxDelay, backoffRate, and JitterStrategy.FULL (randomizes delay to prevent thundering herd).
You'll also learn the two ways to filter which errors are retryable: retryableErrorTypes (uses instanceof for class-based errors like NetworkError) and retryableErrors (matches on error message substrings or regex like /timeout/i). Non-matching errors bypass the strategy entirely and fail immediately.
createRetryStrategy covers most cases but sometimes you need different logic per error type and per attempt count simultaneously.
A custom retry strategy is just a function (error, attemptCount) => { shouldRetry, delay }. This lecture builds one from scratch.
The SDK ships two built-in shortcuts so you don't always have to configure from scratch.
The most important concept for non-idempotent operations.
By default, context.step uses AtLeastOncePerRetry. If Lambda crashes mid-step with no checkpoint saved, the step re-executes on replay. Safe for idempotent operations, dangerous for payments or SMS.
StepSemantics.AtMostOncePerRetry changes this: the SDK writes a START checkpoint before running the step body, so on replay it sees the START, skips re-execution, and throws StepInterruptedError instead.
The four-scenario live experiment that makes semantics concrete
Checkpoints are stored as stringified JSON, and JSON does not know about JavaScript classes.
When a step returns an instance of your Order class, the checkpoint saves a plain JSON object. On replay, the SDK reads that JSON back, but the result is now a bare {} — no label() method, no prototype, nothing.
Calling order.label() throws TypeError: order.label is not a function. This lecture demonstrates the problem live: an Order class with a label() method, returned from a step, works perfectly on the first invocation and crashes on replay. This is the exact scenario where SerDes is required.
A SerDes is an object with two methods: serialize and deserialize. You can implement your own logic for these methods.
Writing serialize/deserialize manually for every class gets repetitive. createClassSerdes(MyClass) is the SDK's built-in shortcut. It generates the SerDes for you by calling new MyClass() and using Object.assign to copy parsed JSON fields onto the instance.
Serdes helper function to preserve date methods.
SerDes is not just for class restoration. The serialize method can return any string, which means you can run the data through any transformation before checkpointing it. This lecture uses Node's built-in zlib to gzip-compress a 500-row Report object before it hits the checkpoint store: serialize runs gzipSync(json).toString('base64'), deserialize runs gunzipSync(Buffer.from(data, 'base64')).
Large step results (reports, API payloads, document content) can be compressed 60–80% before storage, directly cutting checkpoint storage costs.
The same pattern extends to encryption, S3 offload for very large payloads, or any custom wire format your downstream system requires.
When a distributed workflow touches multiple external systems, a failure midway through leaves everything in an inconsistent state. The Saga pattern solves this by building a compensation stack as steps succeed: each successful step pushes a corresponding undo operation. If any step throws, the catch block runs all compensations in reverse order.
This lecture is about the saga pattern demo.
AWS Lambda Durable Functions: From Zero to Hero
You're building a workflow that includes payment confirmation, human approval, and an external API call that takes hours. On regular Lambda, you reach for SQS, DynamoDB state tables, and Step Functions JSON and end up with five services stitched together with glue code that breaks in ways you can't predict.
Until now, coordinating multi-step asynchronous processes on AWS meant manually stitching together SQS queues, maintaining custom state tables in DynamoDB, or wrestling with massive, unreadable Step Functions JSON/YAML definitions. Worse, you were constantly fighting the hard 15-minute execution limit of standard stateless Lambda functions.
AWS Lambda Durable Functions change everything.
This new execution model lets you write long-running, stateful workflows that can run for up to one full year entirely in code. Workflows automatically pause while waiting for payments, human approvals, callbacks, or external events, then resume exactly where they left off without losing state. They survive failures, handle retries natively, and charge you zero compute costs while suspended.
No orchestration glue code. No custom state tables. No workflow spaghetti. Just durable serverless workflows written using the Lambda programming model you already know.
This course takes you from the fundamentals of Durable Functions to building resilient, production-ready serverless systems.
Why Learn From This Course?
This isn't a shallow overview of the documentation.
The curriculum is built by an AWS-certified engineer whose AWS courses have been featured on freeCodeCamp and who contributed bug fixes directly to the official AWS Durable Functions SDK and docs.
You'll learn how Durable Functions behave in real-world environments, inspect execution histories and CloudWatch logs, understand replay behavior, troubleshoot failures, implement retries, and apply production-grade engineering patterns that go far beyond simple demos.
What You Will Build
QuantaSneaks Drop E-Commerce System: Build a distributed sneaker drop platform that coordinates multiple Lambda functions, manages workflow state, integrates AI risk scoring, and implements a real human-in-the-loop approval process.
When an order is rejected, a Saga Pattern automatically compensates payment.
What You'll Learn
• Checkpoint & Replay Internals
• Durable Operations & Workflow Design
• waitForCondition, waitForCallback & Heartbeats
• Parallel Execution & Map Operations
• Retry Strategies & Failure Recovery
• Idempotency & Execution Semantics
• Saga Pattern & Distributed Transactions
• Human in Workflows
• Testing Durable Functions
• CloudWatch Observability & Execution History
• Infrastructure as Code with AWS CDK
Requirements
• Basic AWS knowledge (Lambda, IAM, CloudWatch)
• JavaScript fundamentals
• AWS Account (Free Tier is sufficient)
If you want to master one of AWS's most powerful new serverless capabilities before it becomes mainstream, this course is for you. The window to learn this before it becomes mainstream is right now. Enroll and be ahead of it.