
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Man reist ja nicht um anzukommen, sondern um zu reisen
AI Engineer Bootcamp 1337 — Learn to Build Real AI Agents
3 free modules. A story-driven journey. And the price only goes up from here.
What you get right now — for free:
Module 1 — Python Foundations — everything you need to start building
Module 2 — LangChain — chains, RAG, memory, MCP, and a full AI assistant
Module 3 — LangGraph — stateful agents, HITL, time travel, multi-agent systems
This isn't just another coding course.
Every lesson is part of a narrative. You're not watching tutorials — you're living a story, solving real problems, building toward something bigger.
Coming soon (and the price rises with each new module):
Microsoft AutoGen / AG2 — multi-agent teams, GroupChat, CaptainAgent
Google ADK + Agent2Agent Protocol (A2A) — production-grade agents that talk to each other across frameworks
DSPy — and much more
Early access = lowest price. The longer you wait, the more you pay.
"I'm genuinely happy you chose this course — and truly grateful for any support you give it."
[Enroll now while it's free →]
Overview: A narrative introduction set in the year 2049—the dawn of Artificial General Intelligence (AGI).
Key highlights:
Follow the story of Triangle, a young developer from New Seoul-7 facing the overnight collapse of the junior tech market.
Discover the origins of his coding journey, starting from a foundational Python course he took at age 13.
Set the stage for "CTF Matrix Hack"—the exact course that laid the groundwork for everything to come.
A narrative introduction to the course. Triangle—a developer who wants to be on the side that patches vulnerabilities, not exploits them—discovers that every path in cybersecurity starts with one fundamental step: learning to code.
Key Highlights:
Direct Access: One author. No team, no marketing—your feedback goes straight to the source.
Community-Driven: Your comments, questions, and insights directly shape future lessons.
Leave Your Mark: Rate the course, drop a comment, ask questions, and share the link.
The first real lesson of "CTF Matrix Hack." Triangle opens the terminal and starts from absolute zero.
Key Highlights:
Variables: Name tags pointing to objects in memory, not physical boxes.
Four Basic Types: int, float, str, and bool.
Operators: Arithmetic, comparison, and logical operations.
Interaction & Formatting: Using f-strings for clean output and input() for user interaction.
Anti-Pattern: == vs is—a classic trap every beginner falls into.
Extra Deep Dive: Integer caching, the floating-point trap, None, inf, nan, and the mysterious ... (ellipsis).
Module 01 — Prologue | Lesson 1: Python Basics
This lesson is your first step into the Matrix. Before building AI agents, writing RAG pipelines, or orchestrating multi-agent systems — you need to speak the language. This lesson covers the Python fundamentals that every AI engineer relies on daily.
What you'll learn:
Variables as references — not boxes. Understand how Python actually stores objects in memory and why this matters when your code scales
Core data types — int, float, str, bool — with a practical cheat sheet and type casting rules
f-strings — the clean, fast, readable way to format output (and why concatenation is a bad habit)
Arithmetic and logical operators — including the tricky ones: //, %, **, and operator precedence
User input — how input() always returns a string and how to safely convert it
== vs is — one of the most common beginner mistakes, explained with memory diagrams
Extra deep dives for the curious:
CPython Integer Caching — why 256 is 256 is True but 257 is 257 is False
Float precision trap — why 0.1 + 0.2 != 0.3 and how to fix it with math.isclose()
Special values: None, float('inf'), float('nan')
Hands-on exercises:
Easy — Signal Decoding — variables, arithmetic, f-strings, boolean logic
Medium — Temperature Converter — input(), type casting, formulas, try/except
Hard — Hacking Calculator — full mini-app with loops, error handling, and type inspection
By the end of this lesson you'll have a solid mental model of how Python manages memory — a foundation that will matter when we get to agents, chains, and production systems.
Triangle learns how to make code think and automate repetitive tasks—the absolute foundation of every hacking script.
Key Highlights:
if / elif / else: Branching logic, ternary operators, and evaluation order.
Truthy and Falsy: What Python actually treats as False (spoiler: it's not just False).
The for loop: Iterating ranges, strings, and lists; using enumerate for index and value.
The while loop: Running until a condition breaks; avoiding infinite loop traps.
break and continue: Taking full control of your loop flow.
Anti-Pattern: Never write if is_active == True—it's a massive red flag.
Extra: The little-known for...else and while...else constructs for cleaner search patterns without flag variables.
Module 01 — Prologue | Lesson 2: Conditionals and Loops
The node is locked. To break through, your code needs to make decisions and repeat actions automatically. This lesson covers the control flow tools that form the backbone of every Python program — and every AI agent built on top of it.
What you'll learn:
if / elif / else — how Python evaluates conditions top to bottom and stops at the first match, with real branching logic examples
Ternary operator — writing simple conditions in a single readable line
Truthy and Falsy values — why 0, "", [], {}, and None all behave like False in conditions, and how to write idiomatic checks
The for loop — iterating over lists, strings, and ranges, plus enumerate for clean index access
The while loop — running until a condition is met, with proper exit logic to avoid infinite loops
break and continue — taking control mid-loop: stop searching when found, skip what doesn't fit
Common anti-patterns covered:
if x == True — why this signals inexperience and how to fix it
range(len(list)) — the clunky way to loop with an index, and why enumerate is better
if len(data) == 0 — versus the idiomatic if not data
Extra deep dives:
for...else and while...else — the hidden loop construct almost nobody uses, but should
try...else — separating dangerous code from safe continuation cleanly
Hands-on exercises:
Easy — Access Code Check — for loop with nested conditions, boolean logic
Medium — Password Hack — while loop with attempt counter, break on match
Hard — Password Generator — random, strength validation, while loop for regeneration
By the end of this lesson your code will stop running top to bottom like a script and start making decisions like a program. Next up — lists and tuples.
Triangle digs into the Matrix's data core and learns the two most essential sequence types in Python.
Key Highlights:
Lists: Ordered, mutable collections; append, pop, remove, sort.
Indexing & Slices: [start:stop:step], negative indices, reversing.
Tuples: Immutable sequences; when to use [] vs (); unpacking and value swapping.
List Comprehension: Filter and transform in a single readable line.
Anti-Pattern: Never mutate a list while iterating over it.
Extra: sort() vs sorted(), tuple unpacking in loops, print(*list, sep='\n').
Module 01 — Prologue | Lesson 3: Lists and Tuples
Data is flowing through the network — packets, coordinates, signal codes. To work with collections of data in Python, you need two fundamental structures: the list and the tuple. This lesson covers how they work under the hood, when to use each, and the patterns that separate clean code from amateur mistakes.
What you'll learn:
Lists — ordered, mutable collections with O(1) index access; how append, remove, pop, and sort behave under the hood and where they're fast or slow
Indexing and slicing — start:stop:step syntax, negative indices, reversing with [::-1], and extracting subsets without loops
Tuples — immutable sequences for fixed data: coordinates, configs, multiple return values, and dictionary keys
List comprehension — filtering and transforming collections in a single readable line, without manual loops
Unpacking — assigning tuple elements to variables in one line, including the a, b = b, a swap pattern
When to use which:
Use a list when the collection grows, shrinks, or gets sorted — logs, queues, results
Use a tuple when the data is fixed and shouldn't change — coordinates, function return values, hashable keys
Anti-patterns covered:
Mutating a list during iteration — why elements get skipped and how comprehension solves it cleanly
result = original.sort() — why this returns None and how sorted() differs
Extra — interview question:
sort() vs sorted() — in-place mutation versus returning a new list, a classic trap in technical interviews
Hands-on exercises:
Easy — Packet Filtering — sort a list, filter via comprehension, print count
Medium — Finding the Key Packet — index(), slicing around a found element, edge case handling
Hard — Traffic Analyzer — min, max, average, top-5 via slice, above-average filter using only built-ins
Next up — dictionaries and sets.
Triangle infiltrates the Matrix's data vault and learns how to map the network using fast lookups and unique collections.
Key Highlights:
Dictionaries: Instant O(1) lookups by key; safe access using .get().
Dict Comprehension: Filtering and transforming data into dictionaries in a single readable line.
Sets: Ensuring uniqueness and performing set theory operations (intersection, union, difference).
Anti-Pattern: The fatal trap of modifying a dictionary while iterating over it.
Extra Deep Dive: Hashability rules, frozenset, and the incredibly useful collections.defaultdict.
Module 01 — Prologue | Lesson 4: Dictionaries and Sets
The network has nodes, IDs, connections, and overlaps. To map it efficiently, you need structures that give you instant lookup by name and fast membership checks. This lesson covers Python's two hash-based structures — the dictionary and the set — and the patterns that make them powerful.
What you'll learn:
Dictionaries — hash tables with O(1) key access; how keys are hashed, why only hashable types qualify, and why insertion order is preserved since Python 3.7
Safe key access — why d[key] raises KeyError and how .get(key, default) protects you
keys(), values(), items() — iterating over a dictionary cleanly, and dict comprehension for building new mappings with filters
Sets — unique elements, O(1) membership checks, and the full set theory toolkit: intersection &, union |, difference -, symmetric difference ^
add and discard — mutating a set safely without raising errors on missing elements
When to use which:
dict — when you need to look up a value by name or key
set — when you need uniqueness or fast in checks with no associated value
list — when order matters and duplicates are allowed
Anti-patterns covered:
Mutating a dictionary during iteration — why Python raises RuntimeError and two correct alternatives: comprehension or iterating over list(data.keys())
{} for an empty set — this creates an empty dict, not a set; use set() instead
Hands-on exercises:
Easy — Node Map — update nested dict values, filter with dict comprehension
Medium — Network Analysis — set operations, build a mapping of which network each ID belongs to
Hard — Frequency Analyzer — word count from text, case normalization, top-5 with sorted(key=...)
Next up — functions.
Manual hacking is too slow. Triangle learns to write reusable commands—the core building blocks of any real script.
Key Highlights:
Functions: def, return, and naming conventions (always start with a verb).
Parameters & Arguments: Positional, keyword, defaults, *args, and **kwargs.
Scope (LEGB): Local → Enclosing → Global → Built-in; why global is a strict anti-pattern.
Lambda Functions: For clean one-liners in sorted(), filter(), and map().
Anti-Pattern: Mutable default arguments—one of Python's sneakiest traps.
Decorators: Wrap any function with logging, timing, or access checks using @.
Extra: Keyword-only arguments with * and positional-only arguments with /.
Module 01 — Prologue | Lesson 5: Functions
Manual, repetitive code is slow and fragile. Functions are how you turn a sequence of actions into a reusable, testable, nameable unit. This lesson covers everything from basic def syntax to decorators — the tools that power every Python library, framework, and AI agent you'll work with in this course.
What you'll learn:
def and return — defining functions, writing docstrings, and understanding why a function without return silently yields None
Parameter types — positional, keyword, default values, *args for variable positional arguments, and **kwargs for variable keyword arguments
Scope and the LEGB rule — how Python resolves variable names: Local → Enclosing → Global → Built-in, and why using global inside functions is an anti-pattern
Lambda functions — anonymous single-expression functions for sorted(key=...), filter(), and map(), and when to use a regular def instead
Decorators — wrapping functions to add logging, timing, and access control without modifying the original code
Anti-patterns covered:
Mutable default arguments — why def f(items=[]) is one of Python's most notorious traps: the list is created once at definition time and shared across all calls. Fix: use None as the sentinel and create the object inside the function
global variables — passing data via arguments and returning it via return is always cleaner
Extra pattern:
Functions are first-class objects in Python — they can be passed as arguments, stored in variables, and returned from other functions. This is the foundation of decorators and higher-order functions used throughout LangChain and LangGraph
Hands-on exercises:
Easy — Vulnerability Calculation — arithmetic, conditionals, default parameters
Medium — Packet Analyzer — returns a dictionary with min, max, avg, count, and filtered large packets; handles empty input
Hard — Logger Decorator — @log_call that prints function name, arguments, result, and execution time via time.time()
Next up — exceptions.
Deep in the Matrix, system agents start pushing back—and they show up as errors. Triangle learns to write resilient code that doesn't crash under pressure.
Key Highlights:
EAFP vs LBYL: Why Python favors "try it and catch the error" over "check before every step."
try / except / else / finally: The full block structure and when to strictly use each part.
Common Exceptions: ValueError, TypeError, ZeroDivisionError, KeyError, FileNotFoundError, and more.
raise: How to manually trigger exceptions for input validation.
Custom Exceptions: Building an exception hierarchy (HackError → AccessDeniedError) for meaningful error messages.
Anti-Pattern: The bare except: pass—Python's silent bug killer.
Extra Deep Dive: The exception hierarchy from BaseException, and exception chaining using raise ... from.
Module 01 — Prologue | Lesson 6: Exception Handling
Every real system fails. Files go missing, users enter garbage, network calls time out, APIs return unexpected data. This lesson teaches you to write code that survives failure — catching the right errors, communicating them clearly, and cleaning up resources no matter what happens.
What you'll learn:
try / except / else / finally — the full exception handling structure: risky code in try, specific handlers in except, success path in else, guaranteed cleanup in finally
EAFP over LBYL — Python's philosophy: act first and handle failure, rather than checking every precondition before acting. Fewer states, less redundancy, more idiomatic code
Common exception types — ValueError, TypeError, ZeroDivisionError, IndexError, KeyError, FileNotFoundError — when each occurs and how to handle them specifically
raise — throwing exceptions deliberately for validation logic, with clear, informative messages for the caller
Custom exceptions — building a module exception hierarchy with a base class and specific subclasses that carry structured data like error codes and context fields
Anti-patterns covered:
Bare except: — catches everything including KeyboardInterrupt and SystemExit, silently swallows bugs, and makes debugging nearly impossible. Always name the exception type
except Exception: pass — slightly better but still dangerous; at minimum, log the error or return a meaningful default
The rule: every except block must do something — log, return a fallback, or re-raise
Patterns you'll use throughout this course:
try / except blocks inside LangChain tools and AG2 agents to handle API failures gracefully
raise ValueError for input validation at agent boundaries
finally for closing files, database connections, and HTTP sessions
Hands-on exercises:
Easy — Safe Input — wrap input() calls with ValueError and ZeroDivisionError handlers, finally for logging
Medium — Robust Log Parser — parse a mixed list of strings, skip invalid entries, return structured stats with error count and average
Hard — Retry Decorator — @retry(max_attempts=3, delay=1) that logs each attempt, sleeps between retries, and re-raises after exhausting attempts
Next up — working with files.
Hacked data lives only in RAM—close the program and it's gone. Triangle learns how to save and load data that actually persists.
Key Highlights:
with open(): The only correct way to open files; auto-closes even on error.
File Modes: "r", "w", "a", "x", "r+", and exactly what they do.
Reading: read(), readlines(), and for line in f (the best approach for large files).
Writing: write(), writelines(), and why you must always append \n manually.
CSV: csv.DictWriter and DictReader for handling flat tabular data.
JSON: json.dump and json.load for complex, nested structures.
Anti-Pattern: Skipping encoding="utf-8"—the silent UnicodeDecodeError trap on Windows.
Extra Deep Dive: pathlib—modern, object-oriented path handling.
Module 01 — Prologue | Lesson 7: Working with Files
Data in memory disappears when the process ends. Files are how programs persist state, share results, and communicate across runs. This lesson covers the full toolkit for reading and writing text, CSV, and JSON — the three formats you'll encounter constantly when building AI pipelines, logging agent outputs, and storing configuration.
What you'll learn:
with open() — the only correct way to work with files: the context manager guarantees the file closes even if an exception occurs mid-read
File modes — r, w, a, x, r+: what each does, which ones create files, and why w is dangerous (it silently erases existing content)
Reading strategies — read() for small files, readlines() for a list of strings, for line in f for large files that shouldn't be loaded entirely into memory
Writing — write() and writelines(), the difference between overwrite and append, and why \n must be added manually
CSV with csv.DictWriter / csv.DictReader — reading and writing tabular data as dictionaries, and why newline="" is mandatory on Windows
JSON with json.dump / json.load — saving and loading nested structures with indent for readability and ensure_ascii=False for non-ASCII characters
Anti-pattern covered:
Omitting encoding — Python falls back to the system default: cp1251 on Windows, utf-8 on Linux. The same script breaks on a different machine. Always specify encoding="utf-8" explicitly
Patterns you'll use throughout this course:
Saving agent outputs and pipeline results to JSON
Reading configuration files for LangChain and ADK agents
Logging tool calls and responses to CSV for evaluation datasets
Hands-on exercises:
Easy — Operation Log File — write a list of log dicts to both .txt and .csv, handle PermissionError, read back and verify
Medium — Log Analyzer — parse the CSV, count action frequencies, extract successful entries, save a report to JSON
Hard — DataManager class — save(data, path) and load(path) that detect format by file extension and handle all IO errors gracefully
Next up — object-oriented programming.
Deep in the Matrix's core, simple scripts no longer cut it. Triangle learns to build sophisticated digital weapons using the full OOP toolkit.
Key Highlights:
The Four Pillars Encapsulation, Abstraction, Inheritance, and Polymorphism, plus the power of duck typing.
Classes Mastering class, __init__, and self; understanding attributes vs. methods.
Magic (Dunder) Methods Tapping into __str__, __repr__, __add__, __eq__, and the broader ecosystem.
Inheritance Implementing super().__init__(), method overriding, and isinstance() checks.
Composition vs. Inheritance "Has-a" vs. "is-a" relationships; why composition is often the superior architectural choice.
Class vs. Instance Attributes Navigating a frequent source of logic bugs.
Anti-Pattern The "God Class"—avoiding the trap of a single class that does everything.
Extra Tech Leveraging @property, __slots__, and @dataclass.
Module 01 — Prologue | Lesson 8: Object-Oriented Programming
Scripts get you started. Classes take you further. When your codebase grows — multiple agents, tools, memory systems, API clients — you need a way to bundle state and behavior into reusable, testable units. This lesson covers OOP from first principles to the patterns used in every major Python framework.
What you'll learn:
Classes and objects — __init__, self, instance attributes, and the difference between a blueprint and an instance
The four pillars — encapsulation (protect state), abstraction (hide complexity), inheritance (reuse without duplication), polymorphism (duck typing in Python: if it has the method, it works)
Dunder methods — __str__ for human-readable output, __repr__ for developer debugging, __add__ for operator overloading, __eq__ for equality — integrating your class into the Python ecosystem
Inheritance — super().__init__(), method overriding, and isinstance() for type checking across a class hierarchy
Class vs instance attributes — the subtle trap where assigning via self.x creates a new instance attribute instead of modifying the shared class attribute
Inheritance vs Composition:
Use inheritance (is-a) when a subclass genuinely is a specialized version of the parent — StealthVirus is a Virus
Use composition (has-a) when a class contains or uses another — VirusTracker has viruses. Composition is more flexible, less coupled, and easier to change without breaking dependent code
Favor composition — tighter inheritance hierarchies break easily when the parent changes
Anti-pattern covered:
God Class — one class that scans nodes, hacks systems, saves logs, sends emails, and renders UI. Violates the Single Responsibility Principle and makes every part untestable. Break it into focused classes: one class, one reason to change
Why this matters for AI engineering:
LangChain tools, LangGraph nodes, and ADK agents are all implemented as classes. Understanding __init__, method overriding, and composition is the prerequisite for reading and extending framework source code confidently.
Hands-on exercises:
Easy — Virus Class — __init__, attack(), toggle(), __str__
Medium — Inheritance + Tracker — Trojan(Virus) with disguise_level, VirusTracker with add, launch_attacks, get_stats
Hard — RPG System — Character base class with Warrior, Mage, Rogue subclasses and a Battle class that logs moves and declares a winner
Next up — the final mission.
Everything from nine lessons converges into one project. Build a complete, coherent system—not a collection of scripts.
Goal: A console app that manages network nodes, deploys viruses, analyzes the network, and logs everything.
Architecture: Node, Virus, StealthVirus, and MatrixCoreHacker controller—all connected.
New Tools: random, datetime, and PermissionError.
5 Build Steps: Data classes → controller → automation + analytics → file I/O → demo.
Quality Checklist: __str__/__repr__, with statement + encoding, specific except blocks, no code duplication, lambda usage, and sets.
Module 01 — Prologue | Lesson 9: Final Mission — Matrix Core Hacker
Nine lessons. One system. This is where everything comes together. Instead of isolated exercises, you'll build a complete, working program from scratch — a network hacking simulator that uses every concept from the module in its proper context.
What you'll build:
MatrixCoreHacker — a fully functional controller that manages a network of nodes, deploys viruses, runs probabilistic attacks, analyzes network state, and persists all data to files.
System architecture:
Node — data model with node_id, signal_strength, security_level, and is_hacked state
Virus — attack model with probabilistic attack(node) logic based on power vs. security level
StealthVirus(Virus) — subclass that multiplies attack chance by a stealth_factor, demonstrating inheritance and polymorphism
MatrixCoreHacker — main controller using a dict for nodes, list for viruses, set for active node tracking, and list for logs
Every module concept in use:
Data types — Node and Virus attributes
Conditions & loops — hacking logic, auto-hack
Lists, tuples — viruses, logs
Dictionaries, sets — nodes, active_nodes
Functions, lambda — methods, analytics filters
Exceptions — input validation, IO errors
Files, CSV, JSON — load_nodes_from_csv, save_report_json
OOP — Node, Virus, StealthVirus, MatrixCoreHacker
Build sequence:
Node and Virus data classes with __str__ and probabilistic attack()
StealthVirus subclass overriding attack() with stealth multiplier
MatrixCoreHacker controller with add_node, hack_node, auto_hack, and internal _log()
analyze_network() returning total, hacked count, active viruses, average signal and security
File IO — load_nodes_from_csv, save_logs, save_report_json — all wrapped in specific exception handlers
Quality requirements enforced throughout:
Every class has __str__ or __repr__
Every file operation uses with open() and encoding="utf-8"
No bare except: — every handler names a specific exception type
No duplication — shared logic lives in methods
snake_case for variables and functions, PascalCase for classes
This is the foundation. Web, AI agents, and automation start in the next module.
This is the closing narrative of the Matrix Coders Python course.
Triangle enters a new world of cutting-edge research and meets Joy, an architecture specialist who reveals the true future of the industry: the AI Engineer.
Triangle transitions from simple coding to mastering neural architectures, embracing a philosophy that defines the journey of an AI Engineer.
Triangle spends an all-nighter at the library diving into the world of autonomous agents, realizing that building them requires more than just code—it requires a perfectly tuned environment.
Triangle discovers that in 2026, an IDE is no longer just a text editor—it's a mission control center for a team of autonomous AI agents.
Module 02 — LangChain | Lesson 01: Agentic IDEs
The editor used to be a passive canvas. In 2025, it became a partner. This lesson introduces the four agentic IDEs reshaping how developers work — and the one workflow habit that keeps you in control of all of them.
The shift from classic to agentic:
A classic IDE autocompletes tokens, one file at a time, with the human driving every keystroke. An agentic IDE takes your intent — a feature description, a refactor goal, a test requirement — plans the task, edits multiple files, runs tests, and hands you a diff to review. You move from typing every line to reviewing what the agent produced.
Four tools worth knowing:
Cursor — VS Code fork with the deepest agent mode, tab autocomplete, and the strongest ecosystem
Antigravity — Google's agent IDE, built for long-running tasks and browser control
Kiro — spec-driven; write a markdown spec first, the agent generates the code from it
Trae — lightweight, fast inline edits, low ceremony for smaller tasks
The spec-driven loop:
The workflow that actually works: write a short spec (what, why, acceptance criteria) → agent proposes a plan → agent edits code and runs tests → you review the diff and accept or send corrections. The spec is the contract; the agent is the implementer.
Three habits of agentic coders:
Write a spec before you write code
Review every diff like it's a teammate's pull request
Measure output in shipped features, not lines written
The one rule: Never merge code you didn't read. The agent writes fast. You are still the one on call at 3 a.m.
The rest of this module covers the building blocks those agents use under the hood.
Joi introduces Triangle to uv, a blazingly fast Rust-based package manager. They explore how it replaces pip, venv, pyenv, and poetry with a single unified binary.
Key Highlights:
Why You Need uv Escaping the fragmented Python ecosystem and dramatically speeding up dependency installs.
Installation Setting up uv seamlessly across macOS, Linux, and Windows environments.
Key Commands Mastering uv init, uv add, uv run, and more.
Lockfiles and Reproducibility Understanding how uv.lock and pyproject.toml guarantee cross-platform reproducibility.
Anti-Pattern Naked pip installs without lockfiles—why unpinned dependencies break production.
Module 02 — LangChain | Lesson 02: uv — The Python Package Manager You Were Waiting For
Python tooling used to mean juggling pip, virtualenv, pip-tools, poetry, and pyenv as separate tools with separate configs. uv replaces all of them with one binary written in Rust — and it is an order of magnitude faster. This lesson gets you set up and shows you the four daily commands you'll use for the rest of the course.
Why it matters — the numbers:
10–100× faster than pip on every operation: install, resolve, sync
1 binary — no Python required to bootstrap it
Under 1 second for a warm install of a full stack
What uv replaces:
pip + requirements.txt — replaced by uv add + uv.lock
venv / virtualenv — uv manages environments automatically
pyenv — uv python install and uv python pin handle CPython versions per project
pipx — uv tool install drops CLIs into isolated envs and exposes them globally
The four daily commands:
uv init — creates a new project with pyproject.toml
uv add — installs a package, records it in the project file, updates the lockfile
uv run — executes a script inside the managed environment with the correct interpreter
uv sync — rebuilds the environment from uv.lock on a fresh clone
Reproducibility — why the lockfile matters:
pip with requirements.txt only pins direct dependencies. Transitive dependencies drift silently over time. uv.lock pins every transitive dependency with a hash — your laptop, CI pipeline, and production image all resolve to identical bytes. Commit it.
Migrating from pip:
uv pip install -r requirements.txt is a drop-in replacement — same interface, same flags, Rust resolver underneath. Migrate an existing project in an afternoon.
Three rules to remember:
One tool replaces pip, venv, poetry, and pyenv — remove the rest
Commit uv.lock — it is the source of truth for reproducibility
Prefer uv run over calling python directly — it always resolves the right interpreter and dependencies
Every script in this course runs inside uv.
Joi explores OpenRouter—the unified gateway to hundreds of LLMs. She learns what an API is, how to structure a project securely using uv and .env, and writes her very first script to send a HumanMessage to the AI.
Key Highlights:
APIs Explained Understanding the restaurant analogy for APIs and how REST communication works.
The Power of OpenRouter Why a single endpoint and API key beats installing separate SDKs for OpenAI, Anthropic, and Google.
Secure Environment Setup Initializing a project with uv and keeping API keys out of code using .env.
The First Request Writing a script with ChatOpenAI and HumanMessage to invoke a response directly from the terminal.
Anti-Pattern Hardcoding API keys—why committing keys directly into a script is a critical security flaw.
Module 02 — LangChain | Lesson 03: OpenRouter — One Key, Every Model
Every AI vendor has its own SDK, its own billing portal, its own rate limits, and its own response format. OpenRouter puts all of them behind a single OpenAI-compatible endpoint. One API key, one base URL, over two hundred models — and you change a model by changing a string.
How it works:
Your code calls the standard OpenAI SDK. You point base_url at OpenRouter. OpenRouter routes the request to the actual provider — OpenAI, Anthropic, Google, Mistral, or any hosted Llama variant — and returns a single unified JSON shape. Nothing else in your code changes when you switch vendors.
To switch to Claude: change model to "anthropic/claude-sonnet-4-5". To switch to Gemini: "google/gemini-flash". The SDK, the auth, the parsing code — none of it moves.
Four reasons to default to OpenRouter:
Vendor-neutral — no lock-in, swap models with a single string
One key, one bill — unified usage metrics across all providers
Automatic fallback — if a provider goes down, OpenRouter reroutes
Pay per token — no monthly minimum, no commitment
The one rule before writing any code:
Keys go in .env. Never hardcoded, never committed, never pasted in Slack. python-dotenv loads them at runtime — that's the only acceptable pattern.
ChatOpenAI + OpenRouter base_url is your universal LLM client for the rest of this course.
Triangle and Joi put theory into practice, comparing different prompting levels on the same problem. They discover how structural variations radically change model behavior and output quality.
Key Highlights:
Four Levels of Prompting Comparing Zero-shot, Role prompting, Few-shot, and Chain-of-Thought on a single prompt.
Prompt Templates Utilizing PromptTemplate and ChatPromptTemplate to build reusable, scalable inputs instead of string concatenation.
Advanced Templates + LCEL A first look at the | operator by chaining prompt, LLM, and parser.
Intent Classifier Using FewShotChatMessagePromptTemplate to route agent tasks based on user intent.
Anti-Pattern Hardcoded prompts versus templates—why string concatenation fails at scale.
Module 02 — LangChain | Lesson 04: Transformers — The Engine Under Every LLM
Before you prompt one more model, spend eight minutes on what is actually happening inside it. Three ideas — tokens, attention, and parallelism — explain why prompting works and why some prompts work better than others.
Tokens in, tokens out:
Every LLM is a next-token predictor. Your text is cut into sub-word fragments called tokens. Each token becomes a vector. The vectors pass through layers of attention. The final layer outputs a probability distribution over the next token — sample one, repeat. That loop is the entire show.
Why transformers won:
Before 2017, sequences were processed one token at a time by RNNs — early context faded, long inputs got slow, and GPUs sat mostly idle. The 2017 paper "Attention Is All You Need" replaced recurrence with a single operation: every token attends to every other token in parallel. GPUs loved it, scaling laws kicked in, and context windows went from 512 tokens to millions.
Why prompting works:
Prompting is not magic — it is context. The model predicts the next token given everything in the prompt: system message, examples, user question. Better context yields better predictions. That is the entire theory of prompt engineering in one sentence.
Four prompting styles, cheapest first:
Zero-shot — ask directly, no examples
Few-shot — show 2–5 examples to bias the output style
Chain-of-thought — add "think step by step" to surface reasoning
Role prompting — "you are a senior code reviewer" anchors tone and perspective
Reach for the cheapest style that works. Every token you add costs latency and money.
Three images to keep in your head:
Next-token prediction — the whole game
Attention — parallel context mixing across all tokens simultaneously
Prompt = context — the model can only work with what you give it
Joi explains the core architecture driving modern AI, exploring how Transformers overcome the limitations of recurrent networks and how Prompt Engineering acts as the steering wheel for these massive models.
Key Highlights:
The Transformer Architecture Understanding self-attention to see how models process context all at once.
The Birth of GPT The evolution from pre-training and fine-tuning to the crucial role of RLHF.
Prompt Engineering Strategies Mastering Zero-shot, Few-shot, Chain-of-Thought (CoT), and Role prompting to control model outputs.
The Golden Rule "Garbage in, Garbage out"—why precise context is non-negotiable.
Anti-Pattern Naked prompts without context—why contextless queries yield unpredictable results.
Module 02 — LangChain | Lesson 05: Prompt Engineering — From Ask to Template
A prompt is a program. This lesson turns the four prompting styles from theory into working code, then wraps them in reusable templates that scale to thousands of inputs without copy-paste.
Four techniques, same model:
Zero-shot — one message, no examples; fastest and cheapest, great for obvious tasks
Few-shot — pre-load 2–5 demonstrations before the real input; locks the output format by having the model mimic your examples
Chain-of-thought — add "think step by step" to surface reasoning; boosts accuracy on multi-step problems at the cost of extra tokens and latency
Role prompting — a system message puts the model in character; nudges tone and persona without changing the task
Pick the cheapest style that passes your evaluation. Combine them only when a simpler approach fails.
Templates replace copy-paste:
The moment you use the same prompt string twice, it should become a ChatPromptTemplate. Declare the messages once with curly-brace placeholders for role, text, or any variable — then invoke with a dictionary. Same template, a thousand different inputs, zero string formatting by hand.
MessagesPlaceholder for history:
Injecting a full chat history into a prompt without manual concatenation is what powers memory in LangChain agents. MessagesPlaceholder reserves a slot in the template that gets filled with prior messages at runtime — the same prompt shape holds yesterday's conversation and today's question.
FewShotChatMessagePromptTemplate for structured examples:
Few-shot with raw strings breaks as examples grow. The structured version treats each example as a typed human/AI message pair — declare the shape once, add examples freely, and the template renders them into proper message roles. Format drift disappears.
Three rules for prompts that scale:
Use the cheapest style that passes your eval
Copy-paste a prompt twice — make it a template
Inject history and examples with MessagesPlaceholder, never by hand
Next lesson wires the prompt into a full prompt | llm | parser pipeline.
Deep into the night, Triangle and Joi explore LangChain's core. They dive into LCEL (LangChain Expression Language), learning how to pipe models, prompts, and parsers together to build robust, asynchronous AI pipelines.
Key Highlights:
What is LangChain? Moving beyond single model calls to a framework of standard Runnable blocks.
LCEL Fundamentals The power of the | operator to create declarative, clean processing chains.
Four Levels of LCEL Progressing from manual API calls to dynamic prompts and integrated token streaming.
Behind the Pipe Why LCEL gives you asynchronicity, streaming, and batching out of the box.
Anti-Pattern Manual concatenation of prompts and responses instead of relying on the streamlined LCEL pipeline.
Module 02 — LangChain | Lesson 06: LCEL — The Pipe That Runs Every LangChain App
Three letters that turn scattered glue code into a single declarative pipeline. This lesson covers the one pattern — prompt | llm | parser — that runs almost every LangChain application in production, and the four execution modes it unlocks for free.
Why LCEL exists:
Without it, every chain requires the same boilerplate: build messages by hand, call the model, dig .content out of the response, repeat. With the pipe operator, all of that disappears. One line declares the chain. Streaming, batching, and async come with no extra code.
Four levels of the same chain:
Level 1 — raw call — pass a hand-built message list to the model, extract .content manually. Works, but repeats boilerplate in every chain you write
Level 2 — the pipe — ChatPromptTemplate describes the messages, StrOutputParser strips .content, the | operator connects them into a runnable chain
Level 3 — dynamic variables — parameterize role, style, and question in the template; invoke with a different dictionary each time; one chain, many behaviors
Level 4 — streaming — replace .invoke() with .stream() and iterate over chunks; tokens arrive as they are generated; this is what powers the typing effect in every chat UI
The Runnable protocol:
Every LCEL chain speaks the same four methods — write the chain once, choose the execution mode at call time:
invoke — single call, full result
stream — yields tokens as they are generated
batch — runs many inputs in parallel
ainvoke — async version for servers under load
Four ideas to take home:
The pipe moves data left to right, transforming at each step
Every chain is a Runnable — streaming and batching are free
Templates with variables turn one chain into many behaviors
Compose small pieces first; production features come without rewriting
Next lesson plugs memory and retrieval into the same chain — and the pattern does not change.
Triangle learns to tame unstructured text. Using Pydantic v2 and with_structured_output, he forces the LLM to output predictable, validated JSON objects, transforming the model from a text generator into a reliable data source.
Key Highlights:
Structure Out of Chaos Why raw text is bad for applications and how JSON schemas solve the problem.
Pydantic Basics Defining robust data models and nested structures using Python type hinting.
Modern Structured Output Using with_structured_output() versus the legacy PydanticOutputParser.
Function Calling API How modern models use native function calling to guarantee schema compliance.
Anti-Pattern Relying on fragile regular expressions to parse unstructured LLM text instead of native structured outputs.
Module 02 — LangChain | Lesson 07: Pydantic & Structured Output
Free-text replies are great for demos and terrible for production. When your downstream code needs a rating as an integer, a list of pros, or a boolean recommendation — you need typed, validated objects, not strings you parse with regex. This lesson shows the modern way to get them.
Define the shape first:
Everything starts with a Pydantic BaseModel. Field types are enforced. Descriptions are forwarded to the model as hints. This single declaration is both your Python type and the schema the LLM writes into — one source of truth for the whole pipeline.
Three paths to structured output:
Raw text + regex — ask for JSON in the prompt, parse it yourself, retry on failure. Fragile, verbose, and impossible to type correctly
PydanticOutputParser — injects format instructions into the prompt, parses the JSON reply into your BaseModel. Works on any model, including those without tool-calling support
with_structured_output — pass your BaseModel directly to the model; it uses native tool-calling under the hood and returns a typed object with no parser needed. This is the default in 2026
The payoff:
After .invoke(), you get a fully typed object back. .rating is an int. .pros is a list[str]. .recommend is a bool. No json.loads(), no try/except, no regex. That is the shape your downstream code — and every agent tool — actually wants.
When to use which:
Default to with_structured_output for any model that supports tool-calling
Fall back to PydanticOutputParser only when the model lacks tool-calling support
Never reach for regex on an LLM reply
Three rules to remember:
Never regex an LLM reply — define a schema instead
llm.with_structured_output(MyModel) is the one-liner default
Your BaseModel is the contract between the agent and the rest of your system
One BaseModel, one method call, a typed object on the other side. You just promoted your LLM from a chatbot to an API.
At 3:00 AM, Triangle tackles the hallucination problem. He builds a RAG (Retrieval-Augmented Generation) pipeline, letting the model "read" external documents by converting texts into numerical embeddings and querying a vector database.
Key Highlights:
Embeddings & Vector Search Converting meaning into mathematics to enable semantic similarity search rather than keyword search.
The RAG Architecture Breaking down the process: Indexing (chunking, embedding, storing) and Retrieval + Generation.
Chunking Matters Exploring RecursiveCharacterTextSplitter and understanding how chunk size prevents the "Lost in the Middle" problem.
Document Loaders & Chroma DB Using LangChain tools to parse PDFs, CSVs, and web pages into a local vector database.
Anti-Pattern Relying on fragile keyword search instead of robust semantic search for unstructured data.
Module 02 — LangChain | Lesson 08: RAG — Give Your LLM a Memory of Your Docs
An LLM does not know your private documents. RAG — Retrieval Augmented Generation — fixes that without fine-tuning or retraining. You build a searchable index of your files once; at query time the model retrieves the relevant passages and answers from real context.
Why not just stuff the whole document into the prompt?
Context windows are finite and expensive. Dumping every document into every call wastes tokens, hits context limits, and dilutes the model's attention. Retrieval finds the two or three chunks that actually matter for this specific question — higher accuracy, lower cost, faster response.
The six-step pipeline:
Load — document loaders read PDFs, Markdown, web pages, and return a list of Document objects
Chunk — RecursiveCharacterTextSplitter cuts documents into overlapping passages to avoid splitting mid-thought
Embed — OpenAIEmbeddings converts each chunk into a vector representation
Store — Chroma.from_documents() persists the vectors to disk; build once, reuse forever
Retrieve — at query time, embed the question and fetch the top-k most similar chunks
Answer — stuff the retrieved chunks into the prompt; the model answers from grounded context
The four knobs to tune first:
Chunk size — start at 800 characters; too small loses context, too large dilutes it
Overlap — 10–15% of chunk size prevents ideas from being cut in half
Top-k — 3 to 5 chunks covers most Q&A use cases
Embedding model — upgrade only when eval scores prove the improvement justifies the cost
Beyond naive similarity — two upgrades:
MMR (Maximal Marginal Relevance) — diversifies results so you don't retrieve three near-duplicate chunks
Contextual Compression — wraps the retriever with an LLM extractor that trims each chunk down to only the sentences relevant to the question; less noise, smaller prompt, better answers
The query chain in one LCEL expression:
A retriever is just a Runnable. It slots directly into a prompt | llm | parser chain via RunnablePassthrough. The full RAG pipeline — from question to grounded answer — is a single composable expression.
This pattern powers every document search tool, support bot, and knowledge base product on the market. Six steps, one chain, your LLM now answers from your documents.
As dawn approaches, Triangle realizes his RAG bot has severe amnesia. He learns to manage context windows, exploring Buffer, Window, and File memory strategies to give the LLM the ability to recall past conversations.
Key Highlights:
The Stateless Model Understanding why LLMs don't remember and how MessagesPlaceholder bridges the gap.
Three Memory Strategies Comparing Buffer Memory, Window Memory (trim_messages), and persistent File Memory.
Managing Token Budgets Why memory isn't infinite and how to prevent context window overflow.
RunnableWithMessageHistory Mastering the LangChain standard for automatically injecting session history into LCEL chains.
Anti-Pattern Unbounded buffer memory—why letting context grow infinitely leads to high costs and degraded performance.
Module 02 — LangChain | Lesson 09: Memory — Making Chat Feel Like Chat
LLMs are stateless. Every call forgets the previous one. Memory is the layer you add to make a conversation feel continuous — and to keep one user's history from leaking into another's. One LangChain wrapper, three storage strategies.
Three strategies — pick by lifespan and budget:
Buffer — store every message in memory; simplest to implement, but context grows without bound and dies on restart
Window — keep only the last N turns; constant token cost, but early facts are lost
Persisted — write to a JSON file, Redis, or SQL; survives restarts, scales to real users, and is the production default
The one primitive — RunnableWithMessageHistory:
Wrap any existing chain with this single wrapper. Give it the chain, a get_history(session_id) function that returns the store for that user, and the names of the input and history keys. It injects prior messages before each call and appends the new exchange after — with no changes to the chain itself.
Multi-user session routing:
At invoke time, pass a session_id in the config. The wrapper calls your get_history function, loads that user's message file, injects it into the MessagesPlaceholder, runs the chain, and saves the new turn. Every user gets their own isolated file — nothing leaks between sessions.
Three gotchas that will bite you:
Unbounded buffer memory is a token-cost time bomb — always cap the history length
Chat history is plaintext — never store secrets, tokens, or credentials in message history
Session ID must come from your auth layer, never from the client — otherwise any user can impersonate any other
One wrapper, one get_history function, one session ID per user. The model stays stateless; your application does not.
Triangle moves from static reasoning to dynamic action, giving his model "hands." He learns to use the @tool decorator and the ReAct architecture to build autonomous agents capable of performing math, checking time, and writing to files.
Key Highlights:
Why Tools are Needed Giving LLMs the ability to reach outside their isolated text generation environment.
Defining Tools Using the @tool decorator to turn ordinary Python functions into AI-accessible utilities.
The ReAct Cycle Understanding the Reason+Act loop where models continuously evaluate, tool-call, and observe.
Building an Agent Writing a manual ReAct loop to chain multiple tool calls together to solve complex queries.
Anti-Pattern Infinite agent loops—why failing to set max_iterations leads to runaway token consumption.
Module 02 — LangChain | Lesson 10: Tools & Agents — Let the LLM Use Your Functions
A chain is a one-shot pipeline. An agent is a chain that can decide, on its own, to call your Python functions, read the results, and keep going. This lesson turns your functions into tools and builds the decision loop that every LangGraph agent is built on.
The ReAct loop:
Every agent is a cycle — Reason, Act, Observe, Repeat. The model thinks about what it needs, calls a tool with typed arguments, reads the result, and decides again. The loop continues until the model produces a reply with no tool calls. That is the entire agent primitive.
Defining a tool:
The @tool decorator turns any function into a capability the model can call. The function name, docstring, and type hints all become the model's reference for when and how to use it. The docstring is the only thing the model sees — keep it tight and specific.
The manual loop — written once so you understand it:
Bind tools to the model with bind_tools. When the model decides to act, its reply carries tool_calls instead of plain text. Execute each call, append the result as a ToolMessage, invoke the model again. Repeat until the reply has no tool calls. That's the whole agent.
Skip the boilerplate in production:
create_react_agent from LangGraph ships the same loop, battle-tested, with streaming, checkpoints, and interrupts included. Three lines replace the hand-rolled for-loop. Use it unless you need custom control flow — which is what the rest of this course covers.
Four rules for sane agents:
Write tight docstrings — that is your prompt to the model
bind_tools returns a Runnable and drops into any LCEL chain
Let the model fill typed arguments — never parse tool inputs yourself
Cap max_iterations always — a stuck model will happily burn a thousand calls and your entire budget
The one safety rule:
Every agent loop needs an exit. Cap your iterations. Log every tool call. Those two habits separate a demo agent from a production one.
At 4:15 AM, Triangle discovers MCP, Anthropic's universal tool standard. Instead of rewriting tools for every new framework, he builds a single FastMCP server to expose his functions to LangChain, Cursor, and any other MCP-compatible client.
Key Highlights:
Write Once, Run Anywhere Understanding the Host-Client-Server Model Context Protocol architecture.
Building an MCP Server Using fastmcp to expose standalone tools (e.g., calculator, weather, currency converter) outside the LLM loop.
Connecting an MCP Client How LangChain dynamically converts MCP schema definitions into native StructuredTool objects.
Cross-Platform Agent Building a ReAct loop that uses tools securely running in a separate MCP server process.
Anti-Pattern Framework lock-in—developing duplicate tool implementations for LangChain, OpenAI, and Cursor instead of using one universal MCP server.
Module 02 — LangChain | Lesson 11: MCP — The Standard Plug for AI Tools
Every agent reinvents the same wire protocol for connecting to tools. MCP — Model Context Protocol — ends that. Publish your tools once and any MCP-aware client can use them: Claude Desktop, Cursor, your own LangChain agent, or any future host that adopts the standard.
Three roles — Host, Client, Server:
Host — the application the user sees: Claude Desktop, Cursor, an IDE, or your own app
Client — runs inside the host, speaks the MCP protocol over stdio or HTTP
Server — a small process you write that exposes tools, resources, and prompts; the host never calls your code directly, the protocol is the contract
FastMCP — a server in 20 lines:
The @mcp.tool() decorator feels identical to LangChain's @tool. Import FastMCP, decorate your functions, call mcp.run(). The framework handles the entire protocol — handshake, schema advertising, message framing. You write only business logic.
The client side — discover and call:
Open a stdio session against your server process, call list_tools() to discover what's exposed, call call_tool() with a name and arguments to invoke one. The discovered schemas can feed directly into bind_tools — your MCP server becomes agent-ready with no extra code.
Three primitives, not one:
Tools — actions the model can invoke
Resources — read-only data the host attaches to context: files, database rows, API responses
Prompts — reusable templates the host offers as slash-commands
Same decorator pattern, three different verbs in the protocol.
Four reasons to default to MCP:
Write your tool once — every MCP-aware host gets it automatically
Transport-agnostic — stdio, HTTP, and WebSockets all work
Self-describing — tools advertise their own schemas; no manual documentation
Language-agnostic — Python, TypeScript, and Rust clients all interoperate on the same wire
Host, client, server — three roles, one open standard. Your agent now speaks the same protocol as every major AI tool in the ecosystem.
Morning breaks. Triangle combines everything he's learned into a capstone project: a fully autonomous research assistant. By integrating web search, HTML cleaning, Pydantic validation, Chroma DB, and a multi-stage LLM fallback mechanism, he builds a miniature version of Perplexity AI.
Key Highlights:
The Architecture Building a multi-stage pipeline: Search → Validate → Save → Query → Fallback.
HTML Cleaning & Token Budgeting Using regex to format raw web pages to fit efficiently into the context window.
Validation with Few-Shot Using with_structured_output and examples to programmatically reject junk content.
Intelligent Note Retrieval Searching the local vector database first, then automatically falling back to live web searches when answers are missing.
Tying it All Together Creating a production-ready CLI loop that acts as an end-to-end research workflow.
Module 02 — LangChain | Lesson 12: Mini Perplexity — Putting It All Together
Every concept from this module — LCEL, structured output, memory, tools, and RAG — composed into one working application. A research assistant that searches the web, validates what it finds, persists vetted knowledge, and answers with citations.
The five-component architecture:
Search tool — a @tool-decorated wrapper around DuckDuckGo that returns typed snippets with title, URL, and text
Validator — with_structured_output forces the model to judge each snippet with a typed Verdict: relevant, reason, and summary. No regex, no string parsing
Notes store — every snippet that passes validation gets persisted to Chroma on disk, keyed by source URL
Retriever — on follow-up questions, the vector store returns the top-k vetted notes instead of hitting the web again
LCEL chain — prompt, model, history, and retriever wired into a single composable expression that returns a grounded answer with sources
How a conversation flows:
First turn: the agent searches, the validator filters five results down to the relevant ones, Chroma saves them, the chain answers with citations. Follow-up turn: no new search — the retriever pulls from notes already saved, the answer arrives faster and still cites its sources.
Four production patterns you just deployed:
Validate every external input with a schema — never trust raw web content directly in a prompt
Cache retrieved knowledge in a vector store — don't re-search what you already know
Compose with LCEL — every component speaks Runnable, so they wire together without glue code
Always carry sources alongside answers — citations are non-negotiable in any production RAG system
These four patterns cover eighty percent of what production RAG applications actually do. One small file, five big ideas, ready to ship.
After an intense fourteen-hour night, Triangle reflects on his journey. From raw API calls to fully functional autonomous agents, he reviews the foundational patterns of LangChain and sets his sights on the next frontier: multi-agent orchestration.
Key Highlights:
The Journey A recap of the thirteen foundational lessons and the progression of Triangle's AI engineering skills.
Core Toolkit A summary cheat sheet of the essential building blocks: LCEL, RAG, Memory, and Agents.
Key Patterns Map Quick-reference code snippets for the most important LangChain implementations.
The Architecture Wall Recognizing the limitations of a single, all-in-one generic ReAct agent.
What’s Next? A glimpse into LangGraph—transitioning from linear cycles to complex, stateful multi-agent workflows.
Joi reflects on the limitations of linear prompt chains, realizing that robust AI agents require a cyclical, dialogic mindset. Setting the stage for LangGraph, she prepares to evolve from building simple pipelines to crafting intelligent, self-correcting neural systems.
Key Highlights:
Breaking the Linear Paradigm Recognizing that "lines" and "rails" frequently collapse under complex user interactions.
The Shift to Graphs Adopting nodes and edges to build systems capable of recursive logic and self-evaluation.
From Pipelines to Nervous Systems Transitioning from structural skeletons into self-aware, active agents capable of reasoning.
Joi moves away from linear LCEL chains and introduces LangGraph. She learns how to build state machines with nodes, edges, and persistent state, creating her very first Hello Node graph.
Key Highlights:
When Linearity Fails Why static chains like prompt | llm | parser are insufficient for cyclic agent behaviors.
The Anatomy of a Graph Understanding State (the agent's backpack), Nodes (crossroads/actions), and Edges (conditional routing).
The Hello Node Building the foundational graph structure using StateGraph, START, and END.
Visualizing with Studio Setting up LangSmith to visually trace and debug the graph's execution.
Anti-Pattern Writing single linear scripts with nested if-else statements instead of modular, graph-based routing.
Module 03 — LangGraph | Lesson 01: Graphs vs Chains — Your First Hello Node
LCEL chains run on rails — elegant until you need a loop. Real agents live in a city: crossroads, U-turns, retries. LangGraph is the GPS. You stop writing pipelines and start building state machines.
Chain vs Graph:
A chain is a train — prompt | llm | parser, one direction, no rerouting. If the model returns a bad format or a tool crashes, the train stops. A graph is a navigator. Blocked? Reroute. Need to call a tool twice? Loop back. The agent's control flow becomes a first-class object you can see and edit.
The three pillars:
State — the agent's backpack; a TypedDict holding dialog, results, counters, and anything else the graph needs across steps
Nodes — plain Python functions that read the state and return a dict of updates; each node owns its fields and nothing else
Edges — navigation rules: add_edge for unconditional routing, add_conditional_edges for decisions; compile() makes the map drivable
The merge contract — return only what you changed:
When a node returns a dict, LangGraph merges it into the existing state rather than overwriting it. Return only the fields you touched; everything else is preserved. For lists, opt into append behavior with Annotated and operator.add as a reducer — that one line turns a messages field into an ever-growing conversation log.
LangGraph Studio — not optional:
Register your graph in langgraph.json, run langgraph dev, and open Studio. Nodes highlight as they execute, state is inspectable at every step, and changes hot-reload on save. Studio is how you understand your graph — print() does not scale past two nodes.
Four rules to start with:
Loops or branches → LangGraph; linear pipelines → LCEL
State is a TypedDict — typed, explicit, merged per node
Every node returns only the fields it changed
Debug in Studio, not in print()
State, nodes, edges — three primitives, infinite graphs. You just left the rails.
Joi addresses state overload by learning encapsulation in LangGraph. She implements strictly enforced InputState and OutputState schemas, alongside PrivateState to securely route sensitive intermediate data through the nodes, avoiding data leaks.
Key Highlights:
The Problem of Global State Why a monolithic, shared state turns into a messy "courtyard" and risks internal data leakage.
Encapsulation Layers Breaking down state into Input, Output, Overall, and Private schemas to enforce strict data contracts.
Building a Secure Graph Creating a graph that classifies queries and performs internal searches without leaking raw logs to the user.
TypedDict vs Pydantic When to use lightweight type definitions inside the graph versus robust run-time validation on the edges.
Anti-Pattern Dumping everything into a single shared state dictionary instead of separating input/output boundaries.
Module 03 — LangGraph | Lesson 02: State Schemas — Input, Output, and Private Channels
One dict for everything works for a toy. Real agents leak — the caller sees scratchpads, sub-nodes trip over keys that don't belong to them. LangGraph lets you declare exactly what comes in, what goes out, and what stays private.
Four schemas, one clean contract:
InputState — the public door; only the fields the caller is allowed to send
OutputState — the public window; only the fields the caller will ever see
OverallState — inherits both and adds internal working fields like category; the full internal dict
PrivateSearchState — a scratchpad visible only between two specific nodes; never enters the global state
How nodes use schemas:
Each node's type signature declares exactly what it reads and writes. node_classify receives InputState — it sees only query. It returns {"category": ...} which lands in OverallState. node_search reads OverallState and returns PrivateSearchState — raw_data is available to the next node but never surfaces to the caller. node_format consumes the private state and returns OutputState — only answer leaves the graph.
Compiling with schemas:
StateGraph takes three arguments: the internal dict (OverallState), input_schema=InputState, and output_schema=OutputState. The framework enforces the boundary — callers can only send InputState fields and only receive OutputState fields, regardless of what moves internally.
One dict vs four schemas:
A single giant state is fast to write and painful to maintain. Every key is visible to every node, scratchpads leak to callers, and refactoring breaks clients. Four schemas give you a clean public API, internal freedom to add working fields, and safe refactors. You pay two extra classes and save hours of debugging.
Four habits to apply to every graph:
InputState defines the caller contract
OutputState defines the caller payoff
OverallState inherits both and adds working fields
PrivateState stays scoped between the two nodes that need it
Public schema is a promise. Private state is a scratchpad. Narrow the public surface — your future self will thank you.
Joi realizes that a graph without tools is just a philosopher. She integrates external functionality using @tool and ToolNode, builds a ReAct loop, and connects to an MCP (Model Context Protocol) server to dynamically provide an agent with a calculator and weather data.
Key Highlights:
Brain in a Vacuum The three limitations of vanilla LLMs: halluciations in math, lack of reality access, and static cutoff dates.
The ReAct Loop Understanding Reasoning + Acting and how LangGraph handles it with ToolNode and tools_condition.
MessagesState Leveraging the built-in LangGraph state schema for conversational agents.
Model Context Protocol (MCP) Connecting to separate, scalable tool servers via FastMCP and langchain_mcp_adapters.
Anti-Pattern Writing manual rule-based loops to check for tool calls instead of using the robust, standard tools_condition.
Принято — таблиц больше не будет, только списки.
Module 03 — LangGraph | Lesson 03: Tools and MCP — The ReAct Loop
An agent without hands is a chatbot. This lesson gives the graph real tools — first a local ToolNode with a ReAct loop, then the same loop powered by tools from a remote MCP server over HTTP.
The ReAct loop — three moves:
agent_node — calls the LLM with bind_tools; the model returns either plain text or an AIMessage with tool_calls
tools_condition — a prebuilt conditional edge that routes to ToolNode when tool_calls is present, or to END when it is not
ToolNode — executes each tool call and returns ToolMessage results back into the message list
The edge from tools back to agent closes the cycle — loop until the model stops calling tools
Typed tool schemas:
Pass a Pydantic BaseModel as args_schema to @tool. The LLM sees the argument contract including field descriptions — it never has to guess the shape. Keep docstrings short and exact; the docstring is the only prompt the model reads when deciding whether to call the tool.
Wiring the cycle:
add_conditional_edges on the agent node with tools_condition as the router creates the branch. A plain add_edge from tools back to agent closes it. Two nodes, one conditional edge, one back-edge — that is the entire ReAct topology.
Local tools vs MCP tools:
Local @tool — lives in the same repo, zero latency, but gets duplicated across every agent you build
MCP server — runs behind an HTTP endpoint; any agent connects to it; update once and every client gets the change
MCP in two processes:
The FastMCP server decorates functions with @mcp.tool and runs over streamable HTTP on port 8000. The agent uses MultiServerMCPClient to fetch tools at runtime — await client.get_tools() returns LangGraph-compatible tool objects that drop straight into ToolNode. From there, the graph is identical to the local version.
The rule: Start local. Move to MCP the moment a second agent needs the same tool. Tools want to be services.
Joi upgrades her graph by executing independent nodes concurrently. She moves beyond static parallelism (Fan-out/Fan-in) and discovers the Send API, unlocking dynamic Map-Reduce scaling where the graph spawns parallel workers based on real-time data needs.
Key Highlights:
The Bottleneck of Sequential Logic Why forcing an agent to wait for independent tasks wastes time and resources.
Supersteps and Reducers How LangGraph intelligently executes parallel nodes and safely aggregates results using Annotated reducers.
Static Fan-out/Fan-in Running pre-defined branches simultaneously, like querying a web search and a database at the same time.
Dynamic Send API (Map-Reduce) Using Send objects to dynamically generate an unknown number of identical worker nodes on the fly.
Anti-Pattern Passing the entire global state into every spawned worker instead of using minimal, encapsulated private states.
Module 03 — LangGraph | Lesson 04: Send API — Dynamic Map-Reduce Parallelism
Static graphs assume you know how many workers you need. Real agents don't. The Send API lets you spawn workers at runtime — one per topic, all in parallel, all collected by a reducer — without touching the graph structure.
The shape — planner → N researchers → analyst:
The planner looks at the query and emits a list of topics. For each topic, LangGraph spawns a researcher node in parallel as a single super-step. When they all finish, the analyst fans them back in and writes one report. The number of researchers is determined by the data, not the code.
The reducer — the key trick:
Declare research as Annotated[list[str], operator.add] in OverallState. Every researcher returns a single-element list. The reducer appends each result as super-steps complete. Without the reducer, the last writer would silently overwrite all the others.
The Send object:
spawn_researchers is a router function that returns a list of Send objects — one per topic. Each Send carries the node name and a minimal WorkerState payload containing just that topic. LangGraph runs every Send in parallel within a single super-step. This is the entire mechanism: width is determined at runtime by how many Send objects the router returns.
Wiring fan-out and fan-in:
add_conditional_edges on the planner node takes the router function and a list of possible destination nodes. Every researcher flows into analyst on a plain add_edge. The analyst waits for all parallel branches to complete before running — fan-in is automatic.
Rate limiting without changing the graph:
Pass max_concurrency in the invoke config to cap how many workers run simultaneously. Protects against provider 429 errors without modifying a single node or edge.
Static fan-out vs Send API:
Static edges — pre-declared branches, fixed width, requires a refactor when the shape changes
Send API — router returns a list of Send objects at runtime; same code handles three topics today and thirty tomorrow
The rule: If you already know how many workers you need, you don't need Send. Width is data — let the graph decide.
Joi explains why linear agents are unsafe for production. She designs a guarded pipeline featuring custom conditional routing, a prompt injection firewall, and a peer-review loop where one LLM critiques the draft generated by another, incorporating a strict retry counter to prevent infinite loops.
Key Highlights:
The Open Gate Problem Why a naive linear pipeline blindly accepts malicious prompts and outputs unchecked responses.
Custom Conditional Edges Moving beyond tools_condition to route graph execution based on arbitrary state variables (like safety checks or quality approvals).
Guarding the Entry Implementing a guard_node combining heuristic keyword filtering and LLM classification to defeat prompt injection.
Reviewer Cycle Cultivating a "writer-reviewer" dynamic where poor drafts are iteratively corrected.
Anti-Pattern Coding an infinite loop without a retry counter, which inevitably crashes the system or exhausts the rate limit.
Module 03 — LangGraph | Lesson 05: Guarded Pipeline — Guard, Writer, Reviewer
Shipping an agent means defending it. This lesson builds a three-stage pipeline that guards the input against prompt injection, drafts a response, and has a second agent review it — with a bounded retry loop so it never runs forever.
Pipeline at a glance:
guard_node — fast phrase filter first, then an LLM classifier if nothing obvious is caught
writer_node — drafts from scratch or rewrites using reviewer feedback; increments retries on every call
reviewer_node — separate LLM call that returns APPROVED or REVISE: <notes>
route_after_review — loops back to writer only if retries < MAX_RETRIES; falls through to finish either way
Two-level guard — cheap before expensive:
A BLOCKED_PHRASES list catches obvious injection attempts ("ignore all", "jailbreak", "system prompt") before spending a single token. Only prompts that pass the filter reach the LLM classifier — which uses the stronger gpt-4o model, the one place where paying for the bigger model is justified.
Writer with memory of feedback:
The writer checks state.get("review_feedback"). If feedback exists, it rewrites addressing the notes. If not, it drafts fresh. Either way it increments retries — that counter is the only thing bounding the loop.
Reviewer as a separate agent:
The reviewer gets its own system prompt with explicit evaluation criteria. It parses the first word of the response to set is_approved, and stores the full response as review_feedback for the next writer pass if needed. One agent judges another — no human in the loop required.
Bounded retry routing:
After guard: is_safe → writer, otherwise → blocked
After review: is_approved → finish, retries < MAX_RETRIES → writer, else → finish with the best draft available
Single pass vs guarded pipeline:
Single pass — one LLM, no critic, bad drafts ship
Guard + Reviewer — input checked at two levels, draft reviewed, loop capped by MAX_RETRIES; cost is one extra LLM call per turn, payoff is measurable quality
Three habits from this lesson:
Guard every prompt at two levels — phrases first, classifier second
Separate writer from reviewer — agents judging agents catch what self-review misses
Bound every loop with a counter — retries < MAX_RETRIES is non-negotiable
Joi addresses the amnesia inherent in default LLM architectures by integrating persistent memory layers. She implements both short-term per-thread memory via Checkpointers and long-term cross-thread user profiles using Stores to craft an agent that genuinely remembers its users.
Key Highlights:
Amnesia by Default Why stateless LLM sessions destroy continuity and hinder complex tasks.
The Memory Dual-Layer Understanding the distinct roles of the short-term Checkpointer (per-thread) and the long-term Store (cross-thread).
Implementing Persistence Transitioning from volatile InMemory backends to robust persistent databases like SQLite, PostgreSQL, and Redis.
Smart Extraction Employing an LLM to actively curate and update structured user profiles dynamically rather than indiscriminately dumping all chat logs.
Anti-Pattern Bloating the Store with raw conversation logs instead of isolating and saving actionable semantic facts.
Module 03 — LangGraph | Lesson 06: Memory — Checkpointers and the Long-Term Store
Agents without memory are goldfish with API keys. LangGraph solves this with two distinct layers: a checkpointer for per-thread continuity, and a store for durable cross-session facts.
Two memories, two jobs:
Checkpointer — saves full graph state after each super-step, keyed by thread_id; enables pause, resume, time travel, and crash recovery
Store — key-value store keyed by user_id; holds semantic facts like name, preferences, and tech stack that survive across threads, sessions, and devices
RPG analogy — two saves, two scopes:
Quick save F5 = checkpointer, per thread_id
Player profile = store, per user_id
Delete all saves — the profile remains
Both are InMemory by default — both die on reboot
SqliteSaver — persistent short-term memory:
Open the connection as a context manager, pass it to compile(), done. The thread now survives a process restart. Pass the same thread_id on the next run and the checkpointer rehydrates the full message history from disk. Pass a new thread_id and you start fresh.
Reading the store — inject into the system prompt:
The call_model node receives the store as a third argument. It builds a namespace = ("memories", user_id) tuple, calls store.get(namespace, "profile"), and if a record exists, serializes it into the system prompt. The LLM now knows who it's talking to before the first message.
Writing the store — extract, don't dump:
The extract_memories node reads the last four messages, asks the LLM for structured facts, merges them into the existing profile with current.update(...), and writes back with store.put(namespace, "profile", current). Never dump raw logs — extract semantic facts only.
Backends by environment:
dev → InMemorySaver + InMemoryStore
local persist → SqliteSaver + Postgres/custom store
production → PostgresSaver + PostgresStore
high-QPS → RedisSaver + RedisStore
Note: there is no SqliteSaver-equivalent for the store — use Postgres for local persistent long-term memory.
The rule: The checkpointer is F5. The store is the profile. In production, both must be on disk.
Key Concepts:
Human-in-the-Loop (HITL) Pausing the graph before critical operations using interrupt() to request explicit user approval.
Time Travel Using get_state_history() to retrieve past checkpoints and update_state() to roll back to any previous state for replay or correction.
Conditional Routing Using Command(goto=...) to dynamically redirect the graph flow based on human feedback.
Anti-Pattern Running agents with full tool access without approval gates, creating a "ticking time bomb" scenario.
Module 03 — LangGraph | Lesson 07: Human-in-the-Loop and Time Travel
Autonomous is fine — until the agent is about to email your CEO. This lesson covers three primitives for putting a human in the loop and rewinding mistakes: interrupt(), Command, and update_state.
Why pause:
Some actions are cheap to retry. Sending money, posting content, deleting rows — those want a human to verify before the super-step commits. The checkpointer makes this possible: it freezes the entire graph state to disk the moment interrupt() fires.
interrupt() — pause inside a node:
Call interrupt(payload) anywhere inside a node. Execution halts, the payload is returned to the caller as result["__interrupt__"], and the graph waits. When the client calls invoke again with Command(resume=human_reply) and the same thread_id, the graph picks up on the very next line — interrupt() returns the human's value.
Command — typed control flow:
Command is a return value with two fields: goto sets the next node, update merges into state. The Literal annotation on the return type tells LangGraph which edges are reachable, so the graph can still be statically visualized even when routing is dynamic.
HITL pipeline — two gates:
review_research — interrupts after research; human approves or edits; Command goes to write_draft or loops back to research_topic
review_draft — interrupts after writing; human approves or edits; Command goes to finalize or loops back to write_draft
Time travel — inspect the past:
app.get_state_history(config) returns every checkpoint in the thread in reverse order. Each entry has a values dict and a config with a checkpoint_id — an address for that exact moment in time.
Fork a branch — update_state + invoke(None):
update_state(target.config, values={...}, as_node="node_name") rewrites a past checkpoint as if that node had produced different output. Invoke with the returned config and you run an alternate timeline forward. The original branch is not deleted — both live in the same thread's history.
Resume vs Fork:
Command(resume=...) — continue the same timeline; human supplies the interrupt's return value; one linear thread
update_state + invoke(None) — rewrite a past checkpoint and run forward; use for debugging, regression tests, and what-if analysis
The rule: Interrupt at the irreversible. Resume when approved. Fork when you need a do-over. Every checkpoint is an address. Every address is a second chance.
Key Concepts:
Subgraphs as Nodes Treating compiled StateGraph instances as single, reusable nodes within a larger parent graph.
State Encapsulation Each subgraph maintains its own private State (TypedDict), isolated from the parent's global state.
Wrapper Functions Using adapter functions to map data between the parent's State schema and the subgraph's private State schema.
Shared vs Private Keys Understanding how overlapping keys are automatically mapped, while private keys remain isolated within the subgraph.
Independent Checkpointers Option to compile subgraphs with their own checkpointers for independent history and time travel.
Module 03 — LangGraph | Lesson 08: Subgraphs — Composing Graphs Like Functions
When one graph gets too large, you split it. A subgraph is a compiled graph used as a node inside a parent graph — private state stays private, public contracts stay stable.
Why subgraphs:
A flat monolith puts every node in one schema and every team bumping elbows. Subgraphs let each unit own its own state, with wrapper nodes enforcing the boundary. Refactors become local. Teams ship in parallel.
The shape — wrapper as translator:
The parent graph owns OverallState. Each subgraph owns its own schema. A wrapper node translates parent fields into the subgraph's input, invokes it as a regular Python callable, and maps only the needed output keys back into parent state. This is where boundaries are enforced — the wrapper is the contract.
Private state never reaches the parent:
ResearchState has sources, notes, and summary. The parent only ever sees research_summary — the single key the wrapper chooses to return. sources and notes are implementation details that live and die inside the subgraph.
Building a subgraph:
build_research_subgraph() is a plain StateGraph(ResearchState) with three linear nodes — fetch, notes, summarize — that ends with g.compile(). The result is a runnable that looks identical to any other LangGraph app. The parent has no idea what's inside.
Wrapper node pattern — three lines:
Map parent state into a sub_in dict matching the subgraph's input schema
Call subgraph_app.invoke(sub_in)
Return only the keys the parent needs: {"research_summary": sub_out["summary"]}
Anything else is a leak.
Parent graph stays readable:
build_parent() knows two node names — research and write. Two edges, two compiled subgraphs hidden behind wrapper nodes. The orchestrator reads like a high-level spec, not an implementation.
Flat monolith vs subgraphs:
Flat monolith — one giant state, merge conflicts, every change ripples everywhere
Subgraphs — private state per unit, wrapper nodes enforce the boundary, teams ship independently
The rule: Map parent into sub. Run sub. Map sub back. Scale graphs like functions.
Key Concepts:
Supervisor Pattern A central coordinator agent that receives tasks and delegates them to specialized sub-agents, maintaining overall control.
Swarm Pattern A decentralized network of agents that communicate and transfer tasks directly to each other using handoff tools, without a central coordinator.
Handoff Tool A special tool that enables an agent to transfer control of a task to another agent, passing along the necessary context.
Scalability Trade-off Supervisors scale linearly with task complexity (more agents = more coordination overhead), while Swarms scale near-linearly (more agents = more parallelism).
Complexity Trade-off Supervisors are easier to debug and control due to centralized logic, while Swarms are more flexible and fault-tolerant but harder to manage.
Module 03 — LangGraph | Lesson 09: Multi-Agent — Supervisor and Swarm
One agent for everything is a junk drawer. LangGraph gives you two blueprints for building agent teams: Supervisor — a central router at the top — and Swarm — peers that hand off directly to each other.
Why teams:
Specialists beat generalists. A researcher with search tools, a writer with style, a critic with a rubric — each keeps a small focused prompt and does one job well.
Supervisor — central router:
create_supervisor takes a list of create_react_agent workers, a model for the routing LLM, and a system prompt describing who does what. The supervisor reads the message history, emits a tool call to the appropriate worker, receives the result, and routes again — until it decides to stop. All routing logic lives in one place and is fully auditable.
Swarm — peer-to-peer handoffs:
create_handoff_tool(agent_name=...) creates a tool that, when invoked, transfers control directly to a named peer. Each agent gets the handoff tools for the agents it can call. create_swarm assembles them into a graph with a default_active_agent as the entry point — no supervisor, no central router, turn order is emergent.
Supervisor vs Swarm:
Supervisor — central router; observable; easy to debug; higher token cost per task; start here for production
Swarm — peer-to-peer handoffs; emergent routing; cheaper per turn; harder to trace; reach for it when the workflow is genuinely peer-to-peer
When to pick what:
Linear or near-linear flow → supervisor
Open-ended research with many specialists → swarm
Strict SLAs and observability requirements → supervisor
Nest a swarm inside a supervisor's worker slot if you need both
The rule: A supervisor is a manager. A swarm is a Slack channel. Centralize routing when you need to answer why.
Joi and Triangle meet to discuss the final project: an AI Dev Team. They learn how to put all the LangGraph concepts together into a real working pipeline with strict separation of agent roles, RAG collections, and human validation.
Key Highlights:
Agent Pipeline Why one monolithic agent fails and how separating responsibilities across PM, Researcher, Architect, and Dev improves results.
Shared File System Memory Using the local file system instead of large state contexts for passing artifacts between agents.
Specialized RAG Collections Isolating code, research, and plans into separate ChromaDB collections for targeted querying.
Human-in-the-Loop Review Implementing interrupt() along with Command(goto=...) at every stage for human validation and rollback.
Parallel Research Orchestrating fan-out and fan-in data flows with the Send API for concurrent worker agents.
Module 03 — LangGraph | Lesson 10: Capstone — The Dev Team
Every primitive from the module, in one graph. A seven-agent pipeline — PM, Researcher, Architect, Backend, Tester, Frontend, DevOps — each writing one artifact, each followed by a human review gate.
The contract — one field per agent:
DevTeamState is a flat TypedDict with one key per agent: spec, research, architecture, backend_code, tests, frontend_code, devops_plan. Every agent reads what it needs and writes exactly its own field. The schema is the contract between humans and machines.
The review gate pattern — repeated seven times:
Every review node is identical in shape: interrupt with the current artifact, check the human reply, return Command(goto="next_agent") on approval, or Command(goto="same_agent", update={field: human}) on rejection to loop back with the edited version. Seven agents, seven identical gates, zero special cases.
The pipeline — linear but gated:
Agent runs → writes its field → hands to review gate
Review gate interrupts → human approves or edits → Command routes forward or back
Approved edge flows right; rejected edge loops back to the same agent
SqliteSaver persists every checkpoint — the thread survives days between reviews
Persistent HITL across days:
Close the laptop on day one after the spec is approved. Come back day two, invoke with Command(resume="approved") and the same thread_id — the checkpointer rehydrates the full state and the graph hands you the next artifact. No replay, no lost context.
What every primitive from the module contributed:
TypedDict state with one field per agent — clean schema
interrupt + Command — loop on reject, advance on approve
SqliteSaver — multi-day thread continuity
Linear super-steps with conditional back-edges — gated pipeline topology
One big prompt vs Dev Team pipeline:
One big prompt — fast, opaque, every mistake discovered at the end
Dev Team pipeline — seven small artifacts, human gates at each step, mistakes caught early and cheaply; higher token spend, lower rework cost
The rule: Small agents, small artifacts, cheap gates. Every artifact deserves a review. Every review deserves a checkpoint.
Joi and Triangle review their journey through the LangGraph module. They summarize the key concepts learned, from basic state graphs to multi-agent pipelines with Human-in-the-Loop, before preparing for the next module.
Key Highlights:
Course Recap Reviewing all major LangGraph concepts including conditional routing, parallelism, memory, and specialized agent networks.
The Final Project Reflecting on the assembly of a complete working pipeline utilizing RAG, Tavily, and the Send API.
Looking Forward Setting the stage for the transition from the LangGraph philosophy to AutoGen.
This lecture unpacks Harness Engineering: the emerging discipline that transforms a raw language model into a production-grade coding agent. From the nine core components every Harness must have, to a map of 30+ tools dominating 2026, Joi learns that architecture matters as much as the model itself.
Key Highlights:
The Harness Defined What separates a chatbot from an agent — and why LLM is the brain, but Harness is the body, hands, operating room, and assistant combined.
Harness ≠ Framework Why confusing building materials with a finished house is an expensive mistake — and where LangChain, LangGraph, and Claude Code each sit on the spectrum.
The 9 Components A deep dive into While-Loop Engine, Context Engineering, Tools + Registry, Skills, Subagent Management, Memory, Dynamic Prompt Assembly, Lifecycle Hooks, and Permissions.
Context Rot The silent killer of agentic systems — how a poor Harness lets the task drown in noise, and three techniques (compaction, retrieval, prefix caching) that fight it.
Guides vs Sensors Martin Fowler's two classes of lifecycle hooks: feedforward controls that act before (specs, AGENTS.md) and feedback controls that react after (linters, tests, compilers).
The Core Paradox The same Claude Opus scores 77% in OpenCode and 93% in Cursor. The only difference is the Harness — proving architecture outweighs model choice.
The 2026 Tool Map 30+ tools across CLI, AI-native IDEs, cloud autonomous agents, OSS BYOK, and the often-ignored Chinese players — each representing a distinct Harness philosophy.
Lecture Title: Pi Coding Agent: Installation, Setup, and First Run
Description:
This lecture walks you through everything you need to get the Pi coding agent up and running from scratch. You'll install Pi globally via npm, configure a provider (Anthropic, OpenRouter, Nvidia, or a free-tier option), set up auth.json and settings.json, create your first AGENTS.md, and run a real agentic session — all step by step.
Key Highlights:
Installation One command installs Pi globally: npm install -g @mariozechner/pi-coding-agent. Node.js 18+ is the only prerequisite.
Three Auth Options OAuth login for Claude/ChatGPT/Copilot subscribers, API key via environment variable, or the recommended ~/.pi/agent/auth.json for ongoing work.
Provider Choice Anthropic direct, OpenRouter (300+ models, one key), or Nvidia — each with a ready-to-use JSON config snippet. Free-tier models included.
settings.json Global vs. project-level config, thinking levels (minimal → xhigh), and compaction parameters explained.
Pi File Structure How ~/.pi/agent/ and .pi/ in the project root interact — and the hierarchical loading order for AGENTS.md.
First Session The official recommended first prompt (Summarize this repository), what the Footer numbers mean, and when to call /compact.
Five Launch Modes Interactive, print (-p), JSON, no-session, and resume (-c, -r) — Pi is a platform, not just a TUI.
Common Problems Windows bash issues, 401 auth errors, AGENTS.md not loading, context overflow, and infinite loops — with concrete fixes.
Lecture Title: Pi Architecture from the Inside: Four Packages, One Agent
Description:
This lecture cracks open the pi-mono monorepo and traces every prompt from your keyboard to the LLM and back. You'll understand why Pi tops Terminal-Bench 2.0 with just 4 tools and a ~200-token system prompt, how the ReAct loop works in ~40 lines of TypeScript, and why session branching is a superpower most developers ignore.
Key Highlights:
The Four-Package Monorepo pi-ai (LLM abstraction) → pi-agent-core (agent loop + events) → pi-tui (terminal UI) → pi-coding-agent (the complete product). Each layer is independently usable.
Cross-Provider Context Handoff pi-ai normalizes every LLM API into one Context JSON array — so you can start a conversation with Claude and finish it with GPT or Gemini without losing history.
The ReAct Loop Think → Act → Observe → Repeat, implemented in ~40 lines. Walk through a 4-turn example: reconnaissance, structure analysis, implementation, verification.
16 Event Hooks Every step of the loop emits typed events (text_delta, tool_execution_start, agent_end, …) — the foundation for Extensions, logging, and real-time monitoring.
4 Tools > 20 Tools Why fewer tools yield sharper decisions — and the benchmark proof: Pi (#1) vs. Claude Code and OpenCode on Terminal-Bench 2.0.
~200-Token System Prompt The full structure of Pi's system prompt revealed. No excessive rules — because modern models are smart enough. Context belongs in AGENTS.md.
Session Tree vs. Linear History JSONL branching lets you experiment without losing work. /fork, /clone, /tree — and how to resume any point in history.
Compaction How Pi automatically summarizes old context before the window fills, and when to trigger it manually with /compact.
Lecture Title: AGENTS.md — The Agent's Living Instruction File
Description:
This lecture shows you how to turn a generic AI model into a project-aware teammate using a single Markdown file. Backed by peer-reviewed research (arXiv:2602.11988), you'll learn the five-section anatomy of an ideal AGENTS.md, the five anti-patterns that make it worse than useless, and how to keep it alive across the project lifecycle.
Key Highlights:
What AGENTS.md Does Without it the agent is a smart stranger. With it — an experienced teammate who knows the stack, conventions, off-limits zones, and how to verify its own work.
The Science A February 2026 arXiv study: a well-written AGENTS.md raises task success by 18–34%. A poorly written one performs worse than having none at all.
The Hierarchy Global (~/.pi/agent/) → parent directories → project root → sub-directory. All files merge; the closest one wins on conflicts.
The Golden Rule Write only what the agent cannot discover on its own — commands, constraints, non-obvious architectural decisions, and environment quirks.
The Five-Section Template Stack & Environment / Commands / Architecture & Structure / Rules & Conventions / Context & Quirks — with real-world examples for Python (FastAPI), TypeScript (Fastify), and monorepos.
Five Anti-Patterns Too long, stating the obvious, stale context, contradictions, and missing commands — each one explained with a concrete bad example.
Dynamic Updates Ask the agent to update AGENTS.md mid-session, then /reload — no restart needed. Changes apply on the very next request.
The 150-Line Rule Why AGENTS.md must fit within 150–200 lines, and a prompt to have Pi generate the first draft for any project.
Lecture Title: Skills — Books on the Shelf: On-Demand Modular Instructions for Pi
Description:
This lecture introduces Pi Skills: self-contained capability packages that load into context only when relevant, keeping AGENTS.md lean and every request cost-efficient. You'll build a full multi-file skill with Python scripts, a Gantt chart generator, and prompt templates — then learn when Skills, AGENTS.md, and Slash Commands each belong.
Key Highlights:
The "Books on the Shelf" Model AGENTS.md is what the agent knows by heart. Skills are reference books it pulls only when the task matches — fewer tokens per request, less noise, better quality.
Skill Anatomy A SKILL.md file with YAML frontmatter (name, description, allowed-tools, metadata) plus optional scripts/, assets/, and references/ directories following the Agent Skills specification.
How Pi Selects Skills For every request Pi makes a fast secondary LLM call to match the request against each skill's description field — the single most important field to write well.
Storage Hierarchy ~/.pi/agent/skills/ (global) and .pi/skills/ (project), with arbitrary nesting by topic (api/, devops/, testing/) for large projects.
Hands-On: time-estimator Skill A complete worked example: task JSON → calculator.py (schedule) → gantt.py (matplotlib PNG) → report via a prompt template. Full source included.
assets/ and references/ Best practices for storing runtime media/templates vs. documentation — keeping skills self-contained and portable.
Manual Skill Invocation /skill:time-estimator — when and why to force-load a skill instead of relying on auto-detection.
Skills vs. AGENTS.md vs. Slash Commands A decision tree: always-needed info → AGENTS.md; task-type-triggered → Skill; explicit per-call action → Slash Command.
Curated Skill Libraries badlogic/pi-skills, VoltAgent/awesome-agent-skills (1000+ skills from Stripe, Cloudflare, Vercel, Figma, and more), and agent-skills.cc aggregator.
Lecture Title: Slash Commands (Prompt Templates) — Macros for Repetitive Tasks
Description:
This lecture teaches you to build your own library of /review, /test, /commit, /pr, and /debug commands — Markdown files that Pi transforms into callable slash commands with bash-style argument substitution. You'll also implement the sub-agent pattern: launching a child Pi process to isolate heavy analysis from the main context.
Key Highlights:
Two Types of Slash Commands Built-in Pi commands (/compact, /model, /reload) vs. Prompt Templates — your own Markdown files that become /filename commands.
Template Anatomy YAML frontmatter with description (shown in autocomplete) and argument-hint, followed by the prompt body. Intentionally minimalist.
Bash-Style Variables $1, $2, $@, ${@:2} — full argument substitution, demonstrated with /review, /component, and /commit examples.
Storage Locations ~/.pi/agent/prompts/ for personal global macros and .pi/prompts/ for project-specific commands that belong in the repo.
Ready-Made Template Library Six complete templates — explain.md, docs.md, review.md, test.md, debug.md, commit.md (Conventional Commits + auto-branch ai), pr.md (merge to test, push, PR description).
The Sub-Agent Pattern Launch pi --print --no-tools --thinking off -p "..." as a child process inside a skill. The child reviews code in a clean context; only a structured summary returns to the main agent — preserving the token buffer.
code-review-subagent Skill Full implementation: SKILL.md + assets/review-prompt.md + prompts/code-review-summary.md. Diagram of the main → child → summary pipeline included.
Advanced Frontmatter Unofficial model: and thinking: fields via the pi-prompt-template-model extension — switch to a cheap fast model for mechanical tasks automatically.
Lecture Title: Extensions API — TypeScript Hooks for Pi
Description:
This lecture crosses the line from content configuration into engine-level engineering. You'll write TypeScript Extensions that hook into Pi's full lifecycle — adding custom tools, slash commands, UI status indicators, safety guards, and even calling a LangGraph agent as a native Pi tool, all wired through the ExtensionAPI interface.
Key Highlights:
Extensions vs. Skills/Commands Skills and Slash Commands change what the model says. Extensions change how the agent itself works — middleware, custom nodes, and observability at the engine level.
Auto-Discovery Pi scans ~/.pi/agent/extensions/ and .pi/extensions/ for *.ts or */index.ts files automatically — no manual registration needed for local extensions.
ExtensionAPI: Five Blocks lifecycle (subscribe to all agent events), tools (register callable functions), commands (custom /commands + hotkeys), ui (status bar, notifications, editor buffers), provider (switch model/provider at runtime).
Quick Start: tool-logger A minimal real extension — counts tool calls per session and updates the Footer in real time. Three lifecycle events, ~15 lines of TypeScript.
Custom /todo Command A two-file extension (index.ts + todo.ts) that runs git grep for TODOs and loads results into Pi's editor buffer — with proper error handling for "no matches" vs. "git error".
Safety Guard Pattern Intercept every tool_call_start event, inspect the bash command, and call event.cancel() to block rm -rf and other dangerous commands before they execute.
Auto-Test Hook On every turn_end that included a write or edit, automatically run pytest -q and notify success or failure in the TUI — a feedback sensor in Martin Fowler's sense.
LangGraph Integration Register a Pi tool that POSTs to a local FastAPI server running a LangGraph agent — Pi calls the workflow, renders progress, and returns the result in the main TUI context.
This lecture explains why the next step after prompt engineering and context engineering is harness engineering: building a system that constrains, informs, validates, and corrects AI coding agents automatically. It introduces Archon as an open-source workflow engine for AI coding, shows how commands and YAML workflows work in practice, and connects the idea to real reliability gains in multi-step software development.
Key Highlights:
Three Eras of AI Interaction — Prompt Engineering focused on phrasing, Context Engineering focused on feeding the right repository knowledge, and Harness Engineering focuses on reliable systems around the agent.
What Archon Is — Archon is a workflow engine for AI coding agents and is presented as the first open-source harness builder for AI coding.
Commands as Atomic Units — In Archon, a command is a Markdown file in .archon/commands/ that gives the AI one focused task.
Workflows as Reproducible Pipelines — A workflow is a YAML file in .archon/workflows/ that executes nodes in dependency order and can combine planning, implementation, validation, review, and self-correction.
Fresh vs Shared Context — Archon supports context: fresh for independent verification steps and shared context for continuation steps.
Parallel Reviews — Multiple review nodes can run concurrently after validation, each in a fresh AI session.
Artifacts and Variables — Commands can use variables like $ARGUMENTS, $ARTIFACTS_DIR, and positional inputs such as $1, $2, $3.
Reliability Over Raw Chat — The lecture frames Archon as the layer above a coding agent that turns one-off prompting into repeatable engineering workflows.
By 2026, the word agent has come to mean everything: an LLM with tools, a node in a graph, a module in a multi-agent system, a coding assistant config. This lecture cuts through the marketing noise with a clean four-layer taxonomy. You will walk away with a clear map of the tooling landscape and a reliable answer to the question: which layer am I building, and which tool should I reach for?
Key Highlights:
Four System Layers — Model Layer (the LLMs themselves) → Agent Layer (one smart worker with tools) → Harness / Orchestration Layer (the system that manages agents and the process) → Product Layer (the CLI, IDE, or UI where it all lives).
Pi: Terminal Minimalism — A single-agent terminal harness. One developer, one project, deep inline editing with diffs and skills. Ideal for focused terminal work.
Archon: Command Center — YAML workflows that orchestrate coding agents across plan → implement → test → review → approve → PR. Available from CLI, Web UI, Slack, Telegram, and GitHub.
LangGraph: The Graph Orchestrator — Stateful multi-agent workflows with typed state, checkpoints, time-travel debugging, and LangSmith observability. Not coding-specific.
ADK: Enterprise Multi-Agent — Google's production platform for large-scale agent systems on Vertex AI and Google Cloud with A2A protocol support.
Pi vs Archon — Pi is your personal terminal interface; Archon is the automation layer above it. In this module, Pi acts as the assistant inside an Archon workflow.
Decision Framework — One developer in a terminal → Pi. Repeatable coding pipeline for a team → Archon. General agentic systems beyond code → LangGraph/ADK. Google Cloud enterprise context → ADK.
This lecture walks you through spinning up Archon in Docker, wiring it to Pi and OpenRouter, registering a practice project, and launching your first read-only workflow that builds a plan without touching a single file. You will leave with a working local Archon instance and a clear mental model of how Docker, environment variables, and YAML workflows connect.
Key Highlights:
Docker-First Setup — The primary practical path for this course. No local Node install required; one docker compose command starts everything.
Environment Variables That Matter — CLAUDE_USE_GLOBAL_AUTH=false, PI_CODING_AGENT_DIR, DEFAULT_AI_ASSISTANT=pi, and DATABASE_URL explained in plain terms.
Pi as the Default Assistant — How to wire Pi through OpenRouter as the underlying coding agent inside Archon, including the global .archon/config.yaml.
Registering a Project — How Archon maps a container-side path to your codebase, and why /.archon/projects/ is the right place to start.
plan_only Workflow — Your first YAML workflow: allowed_tools: [] keeps the agent read-only, worktree.enabled: false simplifies the learning scenario, and interactive: true lets you watch execution live in the Web UI.
Best Practices at Install Time — Separate experiments from real projects, start with read-only scenarios, verify API limits before bulk runs, and document every workflow.
This lecture takes you from a read-only planning workflow to a full three-node pipeline that generates code and verifies it automatically. You will build a DAG that plans a Python calculator, implements it in two files, and runs pytest in a deterministic bash node — no manual copy-pasting required. Along the way you will learn the five node types in Archon, how depends_on and context: fresh wire nodes together, and how Dockerfile.user extends the base image with Python and pytest.
Key Highlights:
DAG Fundamentals — Nodes, edges (depends_on), parallelism, and conditional loops: how Archon models a workflow as a directed acyclic graph.
Five Node Types — prompt, bash, script, loop, and approval — and when to use each one.
$plan.output Variable Injection — Pass the output of one node directly into the prompt of the next without any glue code.
context: fresh — Start a brand-new AI session at a node so it evaluates the situation independently, without inheriting previous history.
allowed_tools and output_format — Lock the agent to only Read and Write, and enforce a structured JSON response that Archon validates.
retry with on_error: all — Automatically retry the implement node up to three times on any error.
Dockerfile.user — Extend the base Archon image with Python and pytest using a two-file override pattern.
Hybrid Model — AI where generation is needed, deterministic shell where verifiability is needed.
A linear workflow is great when everything passes on the first try. But in real development, the agent writes code, tests fail, you copy the error back into chat, the agent tries again. This lecture moves that manual cycle into the workflow itself. You will build a loop node that automatically runs "fix → verify" iterations until pytest goes green — and you will learn every knob that controls it: until, until_bash, fresh_context, $LOOP_PREV_OUTPUT, and max_iterations.
Key Highlights:
Loop Node Model — Each iteration is a full AI run with tool access. The loop exits on the first satisfied condition: LLM completion signal, deterministic bash exit code 0, or max_iterations reached.
until: ALL_GREEN — How the agent signals completion with a structured tag inside its text response.
until_bash — A shell check that runs after every iteration and is completely independent of the LLM.
fresh_context: true — Why starting each iteration with a clean AI session prevents context drift and confirmation bias.
$LOOP_PREV_OUTPUT — Passing a summary of the previous iteration into the next prompt without bloating the context.
$ARTIFACTS_DIR — Storing test logs so the final report node can read them.
orchestrator.ts Patch — Why worktree.enabled: false needs a one-time patch to run the loop in the correct working directory.
Six Rules for Loop Nodes — Always set max_iterations, use deterministic until_bash, keep each iteration scoped, and store logs in artifacts.
This lecture adds three mechanisms that turn a basic workflow into a truly flexible system: reusable commands (atomic Markdown instructions for the AI), conditional node skipping with when:, and precise control over AI session context. You will build a two-workflow project around a Python calculator — one that checks completeness before running tests, and one that routes to a fix branch or a verify branch based on structured JSON output.
Key Highlights:
Commands as Atomic Units — A Markdown file in .archon/commands/ is the smallest reusable instruction. It can run standalone or be wired into any workflow node.
Variable Substitution — $ARGUMENTS, $1, $2, $ARTIFACTS_DIR, $WORKFLOW_ID, $BASE_BRANCH, $DOCS_DIR — all resolved before the AI sees the file.
when: Conditional Skipping — String, numeric, and compound expressions (&&, ||) evaluated before a node starts. Fail-closed: invalid expressions skip the node with a warning.
context: fresh vs context: shared — When to give the AI a clean slate and when to let it carry conversation history forward.
$nodeId.output and $nodeId.output.field — Reference any upstream node's output, including structured JSON fields, as long as depends_on is declared.
output_format for Reliable Branching — Structured JSON responses enable deterministic when: expressions against specific fields.
trigger_rule — Control how join nodes behave when some upstream nodes were skipped.
Two Practical Workflows — calc-verify (completeness check → conditional test run → report) and calc-smart (classify → route to fix or verify → summarize).
Automated tests confirm functional correctness, but they cannot answer "Is this the right approach?" or "Does this violate the team's architectural invariants?" This lecture adds formalized human control points — approval nodes — to your workflows, and teaches you how to branch a DAG based on task classification. You will build a full calculator workflow with a review gate, implement an AI rework loop on rejection, and add a BUG vs FEATURE routing pattern.
Key Highlights:
Three Layers of Trust — Functional correctness (tests), architectural correctness (team invariants), and business correctness (right problem, no side effects). Tests only cover the first layer.
Approval Node Mechanics — Workflow pauses, Archon notifies the reviewer, human approves or rejects. On rejection, on_reject.prompt triggers an AI rework cycle before the gate repeats.
capture_response: true — The reviewer's comment is stored as $review-gate.output and can be injected into downstream prompts.
$REJECTION_REASON — Automatically substituted into the on_reject prompt so the agent knows exactly what to fix.
interactive: true — Required at the workflow level for approval messages to appear in the Web UI chat.
Approval vs Interactive Loop — Use approval when the workflow should proceed by default and a human is only a gatekeeper. Use interactive loop for active AI–human dialogue.
Conditional DAG Branching — when: + output_format + trigger_rule: all_done to route BUG and FEATURE tasks through separate nodes while sharing a single review gate.
Fork Archon — The lecture references a pre-patched fork that includes Web UI support for the approval node type.
This lecture introduces the two primary SDKs provided by Anthropic: the low-level Anthropic SDK for direct API calls and the high-level Claude Agent SDK for building agentic frameworks. It covers their differences, use cases, and provides a quick start guide for implementing both in your AI startup.
Key Highlights:
Two SDKs, Two Roles Understanding the fundamental difference between manual API calls and an automated agent loop.
When to Use What A clear breakdown of scenarios to choose between the Anthropic SDK and Claude Agent SDK.
Under the Hood How the agent loop works and how Claude Agent SDK abstracts away the complexity of tool use and message history.
Message Types A guide to handling SystemMessage, AssistantMessage, and ResultMessage in the Agent SDK.
Model Selection An overview of the latest Claude models (Opus 4.7, Sonnet 4.6, Haiku 4.5) as of May 2026.
Comparison A side-by-side comparison of Anthropic SDK, Claude Agent SDK, and Google ADK.
This lecture dives deep into the Messages API, the foundational layer of the Anthropic SDK. It covers everything from making basic stateless requests to handling various stop reasons and streaming responses. You will also learn about the Batch API for cost-effective processing and Claude's robust multilingual support, bringing all these concepts together in a comprehensive EdTech content pipeline script.
Key Highlights:
Stateless Architecture Understanding how the Messages API works without server-side history and why max_tokens is mandatory.
Stop Reasons A complete guide to handling stop_reason events like end_turn, max_tokens, tool_use, and refusal.
Streaming Responses Implementing simple streaming and async Server-Sent Events (SSE) using FastAPI for real-time text delivery.
Batch Processing How to save 50% on API costs by processing up to 100,000 requests asynchronously.
Multilingual Capabilities Leveraging Claude's language auto-detection and prompting strategies for global applications.
EdTech Pipeline A real-world combo script demonstrating batch generation, streaming, stop reason handling, and multilingual tutoring.
This lecture explores the advanced capabilities of Anthropic's Claude models, including Extended and Adaptive Thinking, the Effort parameter, Structured Outputs, and Citations. You will learn how to configure these features to optimize performance, manage costs, and build reliable, context-aware AI applications.
Key Highlights:
Extended & Adaptive Thinking: How to enable Claude to "think aloud" and automatically adjust its reasoning depth.
Effort Parameter: A high-level setting to control reasoning depth without manual token budgeting.
Structured Outputs: Reliable methods for forcing Claude to return JSON using tools or system prompts.
Citations: A built-in mechanism to retrieve structured answers with precise source quotes.
This lecture explores the built-in (server-side) and client-side tools available in the Anthropic Claude SDK. You will learn how to implement and use tools like Web Search, Web Fetch, Code Execution, Advisor, Bash, Computer Use, and Text Editor to enhance Claude's capabilities, automate tasks, and perform complex data analysis.
Key Highlights:
Server-Side Tools: Web Search, Web Fetch, Code Execution, and Advisor tools that run directly on Anthropic's servers.
Client-Side Tools: Bash, Computer Use, and Text Editor tools for local execution and GUI automation.
Code Execution: Performing statistical analysis and generating visualizations using Python in a secure sandbox.
Dynamic Filtering: Optimizing web search results to reduce token usage and improve accuracy.
This lecture covers how the Claude language model handles context windows, strategies for managing long conversations (compaction), the Files API, PDF support, and image processing (Vision). It includes practical Python code examples for implementing these features using the Anthropic SDK.
Key Highlights:
Context Windows: Understanding context limits and strategies for tracking and managing token usage.
Compaction: Automatic and manual context compression techniques for long agent sessions.
Files API & PDF Support: Working with documents, text files, and images across multiple prompts.
Images and Vision: Processing visual data, performing OCR, and analyzing multiple images.
This lecture explains how the Agent Loop works within the Claude Agent SDK, which is the same loop used inside Claude Code. You will learn the 5-step process of prompt processing, tool execution, and context management. It covers how to use query() as the main entry point, handle different message types, manage results and errors, and configure agent options like permissions and budgets.
Key Highlights:
Agent Loop Steps: Understanding the 5 steps: receive prompt, evaluate, execute tools, repeat, and return result.
Message Types: Handling SystemMessage, AssistantMessage, UserMessage, StreamEvent, and ResultMessage.
Result Handling: Checking subtype for success or errors (like max turns or budget limits) before accessing results.
Agent Configuration: Using ClaudeAgentOptions to set allowed tools, reasoning effort, permission modes, and limits.
This lecture explores the 11 built-in tools provided by the Claude Agent SDK, which Claude uses autonomously. You will learn about file operations, command line execution, search tools, web utilities, and interactive prompts. It also covers permission management, parallel tool execution, and how to configure custom tools.
Key Highlights:
Built-in Tools: Exploring Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, Monitor, AskUserQuestion, and ToolSearch.
Permissions: Understanding allowed_tools, disallowed_tools, and permission_mode to secure your agent.
Parallel Execution: How read-only tools run simultaneously while state-modifying tools run sequentially.
Custom MCP Tools: Naming conventions and using readOnlyHint for custom tools.
This lecture details how the Claude Agent SDK handles session management. You will learn the lifecycle of a session, how it stores conversation history (not the file system), and the differences between Continue, Resume, and Fork operations. It also covers using ClaudeSDKClient for automatic session tracking and managing session metadata (list, rename, tag).
Key Highlights:
Session Lifecycle: How the SDK automatically saves conversation history to disk.
Continue vs. Resume: Using continue_conversation=True for the most recent session vs. resume=session_id for specific or multiple sessions.
Forking Sessions: Creating independent conversation branches to test alternative approaches using fork_session().
ClaudeSDKClient: Using the client for continuous, multi-turn conversations within a single process.
This lecture covers the streaming capabilities of the Claude Agent SDK. It details the differences between single message and streaming input modes, how to handle partial messages for real-time streaming, the anatomy of StreamEvent objects, and advanced streaming input patterns such as dynamic context injection and interactive permissions. It includes practical Python code examples.
Key Highlights:
Streaming Input Modes: Understanding the differences between single message and streaming input modes.
Real-Time Streaming: Handling partial messages with include_partial_messages to stream responses as they are generated.
StreamEvent Anatomy: Extracting text and tracking tool calls using StreamEvent objects.
Advanced Patterns: Implementing dynamic context injection and interactive permissions via can_use_tool.
This lecture covers how to create and manage custom tools in the Agent SDK using an in-process MCP server. It explains the differences from Client SDK tool use, how to define tools with the @tool decorator, and how to set up tool annotations and permissions.
Key Highlights:
Why Custom Tools: Understanding when to use custom tools versus built-in tools.
The @tool Decorator: Defining tools, input schemas, and handling responses and errors.
In-process MCP Server: Creating a server with create_sdk_mcp_server and managing permissions via allowed_tools.
Tool Annotations: Using ToolAnnotations to control parallel execution and provide metadata hints.
This lecture covers the Model Context Protocol (MCP) capabilities in the Claude Agent SDK. It details the supported transports, server authentication, dynamic tool loading via tool_search, creating in-process and external FastMCP servers, and critical security considerations when working with MCP servers.
Key Highlights:
MCP Transports: Understanding stdio, HTTP, SSE, and in-process SDK MCP servers.
Authentication: Managing tokens and credentials for local and remote servers.
Dynamic Tool Loading: Utilizing ENABLE_TOOL_SEARCH to reduce context costs and improve accuracy.
Creating Servers: Building in-process servers with create_sdk_mcp_server versus external servers via FastMCP.
Security Warnings: Mitigating risks from local and remote MCP servers through permissions and best practices.
This lecture covers the subagent capabilities in the Claude Agent SDK. It details how Claude invokes subagents using the Agent tool, configuring AgentDefinition, managing context isolation, resuming subagents, handling parallel execution, and leveraging SubagentStart and SubagentStop hooks for monitoring and control.
Key Highlights:
The Agent Tool: How Claude automatically delegates tasks to subagents based on descriptions.
AgentDefinition: Full configuration of subagents including tools, models, MCP servers, and background execution.
Context Isolation: Understanding what subagents inherit and how parent and subagent communicate.
Resuming Subagents: Continuing a subagent's work with its full historical context.
Hooks & Parallelism: Using SubagentStart / SubagentStop for decision control and running subagents simultaneously.
This lecture provides an overview of the various hook events available in the Claude Agent SDK. You will learn how to intercept and modify tool executions, handle user prompts, manage agent lifecycles, and implement security measures using hooks like PreToolUse, PostToolUse, and more.
Key Highlights:
Hook Events: Understanding the core hook lifecycle events in the Python SDK.
Security & Modification: Using PreToolUse for short-circuit blocking and input modification.
Auditing & Context: Using PostToolUse and UserPromptSubmit to audit results and inject dynamic context.
Lifecycle Management: Archiving history with PreCompact and handling agent termination with Stop.
This lecture covers the creation and usage of Skills, Slash Commands, and Plugins in the Claude Agent SDK. You will learn how to structure reusable workflows, invoke them automatically or explicitly, build custom commands, and extend functionality using local plugins.
Key Highlights:
Skills: Structuring and using .claude/skills/*/SKILL.md for autonomous execution.
Slash Commands: Using built-in commands (/compact, /clear) and defining custom legacy commands.
Plugins: Extending the SDK with local plugins, namespaces, and the plugin.json manifest.
Integration: How the Skill tool works and how to load plugins programmatically.
This lecture covers structured output and file checkpointing in the Claude Agent SDK. You will learn how to enforce JSON Schema output, handle validation retries and errors, and use file checkpointing to track and rewind changes made by the agent.
Key Highlights:
Structured Output: Enforcing strict output formats using JSON Schema, Pydantic, and Zod.
Retry Logic: How the SDK handles validation failures and error_max_structured_output_retries.
File Checkpointing: Tracking file changes made by agent tools and restoring previous states.
Rewind Functionality: Reverting file changes without affecting the conversation history.
Install Hermes Agent
This course contains the use of artificial intelligence.
Not affiliated with Anthropic, LangChain, or NousResearch.
Welcome to the most complete AI Engineering Bootcamp :)
This is not a theory course. From day one, you write real code, build real agents, and ship real systems. Whether you're a developer, data scientist, or complete beginner — by the end you'll be able to design, build, and deploy production-grade AI systems confidently.
What you'll build:
MCP — connect agents to GitHub, databases, filesystems, and any external tool
Conversational AI assistants with memory, tools, and streaming
RAG pipelines that load PDFs, CSVs, web pages, and query them with LLMs
Multi-agent systems with orchestration patterns: Sequential, Parallel, Loop, Swarm, Supervisor
Autonomous agents using ReAct, MCTS, BeamSearch, and Tree of Thoughts
Personalized AI assistants powered by Hermes — with custom routing, tool & API integration, sub-agent orchestration, and multi-step task delegation
Cross-framework agent networks via A2A protocol — connecting ADK, LangChain, LangGraph, and AG2
Production-ready systems with guardrails, evaluation, observability, and HITL
Technologies covered:
Python — syntax, data types, functions, object-oriented programming, file handling, virtual environments.
LangChain — LCEL chains, RAG, memory, MCP, agents
LangGraph — stateful graphs, persistence, Time Travel, Send API, Subgraphs
Pi — skills, extensions, slash commands, session trees, sub-agent patterns, JSONL branching
Archon — YAML pipelines, harness engineering, deterministic multi-agent orchestration
Anthropic SDK & Claude Agent SDK — Client SDK, Tool Use, streaming, prompt caching, Claude Agent SDK, subagents, hooks, MCP
Hermes Agent — messaging routing, tool & API integration, sub-agent orchestration, multi-step task delegation, personalized assistants, workflow automation
AG2 (AutoGen) — GroupChat, CaptainAgent, ReasoningAgent, DocAgent
Google ADK — callbacks, plugins, artifacts, evaluation, UserSimulator
A2A Protocol — agent interoperability across all frameworks
Every module follows a hands-on structure:
each lesson has working code, real tasks, quizzes, and a final project that ties everything together.
By the end of this course, you won't just understand AI — you'll build it.
P.S.
The course is currently in early access mode — 3 modules are already available.
Get in now at the lowest price. As new modules are added, the price will increase. The earlier you enroll, the more you save.
Lock in your spot today before the next price bump.