
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Course Objectives: Develop LLM Powered Applications with LangChain
Agents
We also Cover:
LangChain Ecosystem: LangSmith, LangGraph
Prompt Engineering
Production
Target Audience:
Software Engineers
Data Scientists
Technical Product Managers
Anyone who is comftirable with Code
No AI/ML experience is needed
Prerequisites: This is NOT a beginner's course
Python knowledge
Git usage
Virtual Environments, environment variables
No AI/ML Knowledge is needed, we cover all here
Maximize learning by following the regular course order, using the resources, troubleshooting, and theory sections, and exploring prompts like chain-of-thought prompting and ReAct with the community on Discord and Discourse.
Build a hello world LangChain link chain that processes Elon Musk data with an llm to summarize and generate facts, while exploring prompt templates and prompts.
Phase 1: Preparing the Codebase
Cloning the Repository: Our first step was to get the source code. We navigated to the langchain-course GitHub repository, which serves as the central hub for all the course material. We copied the repository's HTTPS URL to our clipboard. Then, using our terminal, we executed the git clone command. This downloaded a complete copy of the project, including all its files and version history, onto our local machine.
Creating an Isolated Workspace: To ensure our work wouldn't interfere with the main course content and to start with a fresh slate, we created a new, independent "orphan" branch. We did this with the command git checkout --orphan project/hello-world. The --orphan flag was a key detail, as it created a branch with no parent and no previous commit history, effectively giving us a brand new, empty project within the same repository structure.
Cleaning the Branch: To finalize our clean start, we ran git rm -rf . which removed all the cloned files from our new branch's staging area. This left us with a completely empty directory, ready for our own project files.
Phase 2: Building Our Python Environment with uv
Introducing uv: We chose to use uv, a modern, high-performance Python package manager built in Rust. The instructor explained it as a blazingly fast alternative to traditional tools like pip and venv, capable of handling dependency resolution and virtual environments seamlessly.
Initializing the Project: We kickstarted our project by running uv init. This command automatically generated a foundational project structure for us, including:
A pyproject.toml file, which is the modern standard for configuring Python projects and defining dependencies.
A main.py file with some boilerplate "Hello, World!" code.
Installing Dependencies and Creating a Virtual Environment: We then began adding our necessary libraries. The first command, uv add LangChain, was pivotal. It not only fetched and installed the core LangChain library but also automatically detected that we didn't have a virtual environment and created a .venv directory for us. This crucial step ensures that all our project's packages are kept isolated from our global Python installation, preventing conflicts.
Adding Specific and Utility Packages: We continued by adding more packages, each for a specific purpose:
langchain-openai: We installed this as a separate package. The instructor explained that LangChain has modularized its integrations, so we only need to install the packages for the specific LLM providers we plan to use.
python-dotenv: This utility is essential for managing secrets. It allows us to load environment variables from a .env file, which is a best practice for handling sensitive data like API keys.
black and isort: To maintain clean and professional code, we installed these industry-standard code formatters. black handles code styling, and isort automatically organizes our import statements.
Phase 3: Securely Configuring API Keys and Ignoring Files
Managing Secrets with a .env File: We created a .env file at the root of our project. This file is specifically designed to hold our API keys and other sensitive credentials.
Generating the OpenAI API Key: We navigated to the OpenAI Platform website, went to the API Keys section, and generated a new secret key. The instructor issued a strong warning about the importance of keeping API keys private, treating them like passwords, and never committing them to public repositories.
Populating the .env File: We pasted our newly generated key into the .env file, assigning it to the variable OPENAI_API_KEY. The specific variable name is important, as the LangChain library looks for this exact name by default.
Using .gitignore: To prevent our secrets and unnecessary files from being tracked by Git, we created a .gitignore file. We copied a standard, comprehensive Python .gitignore template from a public GitHub repository and pasted it into our file. This ensures that files like our .env and the entire .venv directory are ignored by Git.
Phase 4: Verification and Finalization
Testing the Setup: To make sure everything was working together, we wrote a small test script in main.py. We imported the load_dotenv function and the os module. By calling load_dotenv(), we loaded the variables from our .env file into our session's environment. We then printed the value of os.environ.get("OPENAI_API_KEY").
Confirmation: When we ran the script, it successfully printed our API key to the terminal. This confirmed that our virtual environment was active, our packages were installed correctly, and our .env file was being loaded properly.
Committing Our Work: With the setup complete and verified, we cleaned up the test code, formatted our files with black and isort, and then committed all our new project files to our local branch with the descriptive commit message "environment setup". Finally, we pushed this new branch to the remote GitHub repository, making our work available there.
1. Setting the Stage: Preparing Our Code
We started in our main.py file, where we already had the load_dotenv() function set up. This ensures that our API keys from the .env file are loaded and available as environment variables for our application to use.
2. Understanding and Using Prompt Templates
What is a Prompt? We first established that a "prompt" is simply the text input we provide to a Large Language Model (LLM). The LLM processes this text and generates an output.
Introducing PromptTemplate: We imported PromptTemplate from langchain_core.prompts. We learned that this is a key LangChain abstraction that allows us to create reusable and dynamic prompts. Instead of hardcoding the entire input, a PromptTemplate lets us define a template with placeholders (parameters).
Dynamic and Reusable Prompts: We saw a practical example of a prompt template like: "I want you to write a cool, funny jingle for a {product} product."
This template allows us to programmatically insert different values for the {product} parameter.
We could run this once with product = "cat food", then again with product = "sports shoes", and a third time with product = "piano", getting a unique, context-specific output from the LLM each time without rewriting the entire prompt.
Essentially, PromptTemplate helps us format our inputs into the final string that gets sent to the LLM.
3. Interacting with LLMs via Chat Models
Importing ChatOpenAI: We then imported ChatOpenAI from langchain_openai. This class is a specific implementation of a "Chat Model."
The Role of Chat Models: We learned that Chat Models are the primary, standardized interface in LangChain for interacting with modern, conversational LLMs (like GPT-4, Claude 3, and Gemini).
We contrasted this with older LLMs that simply took a single string of text in and returned a single string out.
Modern chat models are more sophisticated; they are designed to handle conversational context and work best when given a list of structured messages (e.g., a System Message for instructions, a Human Message for user input, and an AI Message for the model's previous responses). The model then returns a new AI Message.
Exploring the Source Code: As a best practice for developers, we learned to look directly at the framework's source code to better understand its functionality. By holding Cmd (or Ctrl) and clicking on PromptTemplate and ChatOpenAI in our IDE, we were able to navigate to their class definitions and see their documentation and implementation details directly.
4. The Core Concept: LangChain Chains
Defining a Chain: We concluded by introducing the concept of a LangChain Chain. A chain is a sequence of components linked together to form a complete workflow. The defining characteristic of a chain is that the output of one step becomes the input of the next.
Building Complex Workflows: This "chaining" concept allows us to build powerful applications that go far beyond a single prompt-and-response interaction. We mapped out a hypothetical advanced workflow:
User Query: The process starts with input from a user.
Prompt Template: The query is formatted into a structured prompt.
Language Model: The formatted prompt is sent to an LLM to generate a response.
Output Parser: The LLM's raw text output is parsed into a structured format (like JSON).
External API / Tool Call: The structured data is used to call an external tool or API (e.g., a search engine).
Final LLM Call: The result from the API call is fed into another LLM to process it and generate the final, polished output for the user.
Summary:
Objective: We built our first LangChain chain to summarize a piece of text about Elon Musk.
Gathering Information: We started by copying a block of text about Elon Musk from Wikipedia and storing it in a Python variable called information.
Creating a Prompt Template: We defined a summary_template string that instructs the LLM what to do with the provided information. This template included a placeholder {information} to make it dynamic.
Initializing the PromptTemplate: We created an instance of the PromptTemplate class, passing it our summary_template and specifying that information is the input variable it should expect.
Initializing the ChatOpenAI Model: We created an instance of the ChatOpenAI class, which serves as the interface to the OpenAI API. We configured it to use the gpt-5 model and set the temperature to 0 for more deterministic and factual responses.
Building the Chain with LCEL: We used the LangChain Expression Language (LCEL) to create our chain. We "piped" (|) the summary_prompt_template into the llm (our chat model). This creates a runnable object where the formatted prompt from the first component becomes the input for the second component.
Invoking the Chain: We executed the chain using the .invoke() method, passing a dictionary with the information variable. This triggered the entire workflow:
The information text was inserted into our template.
The resulting prompt was sent to the OpenAI API.
The LLM generated a summary and two interesting facts based on our instructions.
Displaying the Result: Finally, we printed the response.content to the terminal, which displayed the structured output generated by our LangChain chain.
Summary:
Objective: We debugged our newly created LangChain chain to understand how data flows between its components and to inspect the objects involved in the process.
Setting Up the Debugger:
We set a breakpoint in our main.py file right after the chain.invoke() call.
We ran the Python script in debug mode, which paused the execution at our breakpoint.
Inspecting the Response Object:
We examined the response variable in the debugger. We discovered that its type is not a simple string, but rather an AIMessage object.
This AIMessage object is a special LangChain class that acts as a wrapper for the LLM's output.
Analyzing the AIMessage Object:
Content: We found that the main text generated by the LLM is stored within the .content attribute of the AIMessage object.
Response Metadata: We explored the response_metadata attribute, which contains valuable diagnostic information, such as:
The exact model name used (gpt-5-2025-08-07).
The token usage (e.g., input tokens, completion tokens, total tokens).
The finish_reason (e.g., stop), which tells us why the model stopped generating text.
This metadata is crucial for debugging, monitoring costs, and analyzing the performance of our LLM calls.
Conclusion: By stepping through the code with a debugger, we gained a much deeper understanding of how the PromptTemplate and ChatOpenAI model work together within the chain and what kind of structured data is passed between them.
Summary:
Objective: We learned how to switch from a proprietary, cloud-based LLM (like OpenAI's GPT-5) to a locally hosted, open-source model using Ollama and LangChain.
Installing and Setting Up Ollama:
We navigated to ollama.com and downloaded the application for our operating system.
After installing, we used the terminal to download a specific open-source model, gemma3:270m, using the command ollama pull gemma3:270m. This is a lightweight model suitable for running on a local machine.
We confirmed the model was downloaded successfully by running ollama list.
Running a Local Model:
We ran the command ollama run gemma3:270m in the terminal, which started an interactive chat session with the model, proving it was working correctly.
Integrating Ollama with LangChain:
Install the Package: We installed the necessary integration package with uv add langchain-ollama.
Modify the Code: In our main.py script, we made two simple changes:
We imported ChatOllama from the langchain_ollama library.
We replaced the ChatOpenAI instance with a ChatOllama instance, specifying the model="gemma3:270m". The rest of our chain's logic remained exactly the same.
Debugging and Observing the Difference:
We ran the script in debug mode to compare the output of the local model with the previous output from GPT-5.
We observed that while the local model was much faster, the quality of its response was lower—it provided a summary but failed to generate the two separate interesting facts as instructed.
Key Takeaway:
The tutorial highlighted a core strength of LangChain: its model-agnostic design. We can easily switch between different LLM providers (proprietary or open-source, local or cloud-based) by changing just a single line of code, without altering the rest of our application's workflow.
This demonstrates the trade-off: local models offer speed and cost savings, while larger, proprietary models often provide higher quality and better instruction-following capabilities.
Introduction to LangSmith: The tutorial begins by introducing LangSmith as a platform for tracing and observing LangChain applications to debug and monitor performance.
Account Setup: The speaker demonstrates how to sign up for a LangSmith account, highlighting the availability of a free "Developer" tier suitable for getting started.
Configuration: Key steps to enable tracing are covered:
Generating a new API key from the LangSmith dashboard.
Setting up the necessary environment variables in a .env file:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=<your-generated-key>
LANGSMITH_PROJECT=<your-project-name> (e.g., "hello-world")
Automatic Tracing: Once the environment is configured, running the LangChain application automatically sends detailed execution traces to the specified project in LangSmith.
Analyzing Traces: The LangSmith dashboard provides a comprehensive view of each run:
A "waterfall" view breaks down the execution flow of a RunnableSequence into individual steps like PromptTemplate and the LLM call (ChatOllama or ChatOpenAI).
Detailed metrics for each step are available, including latency, time-to-first-token, token count, and status (success/failure).
Comparing LLM Performance: The tutorial shows how to switch between different LLMs (a local ChatOllama model and OpenAI's gpt-5) and use the traces to compare their performance, such as the significant difference in latency (1.4s vs. 16s).
Code and Resources: The final code is committed to a GitHub repository, and the speaker notes that public links to the demonstrated traces will be shared as resources.
The course compatible with V0.1.+
Summary
1. What is an AI Agent?
An AI Agent is a software system that uses a Large Language Model (LLM) as its "reasoning engine."
Its primary function is to decide which actions to take to accomplish a goal and then execute those actions.
2. The Key Difference: Agent vs. Chain
Chain: The sequence of actions (the control flow) is hard-coded and defined by the developer. An LLM might be used as a step in the chain, but it does not decide what happens next.
Agent: The control flow is dynamic and determined by the LLM itself. The agent uses reasoning to decide which tool or step is needed next to solve the problem.
3. The Power of Tools
The core concept of modern agents is equipping an LLM with a "toolkit."
Tools are pre-defined functions that give the agent abilities, such as:
Searching the web
Making an API call
Querying a database
Writing and executing code
This allows LLMs to go beyond text generation and interact with external systems, effectively giving them "superpowers" to automate complex tasks.
4. ReAct Agent Architecture
ReAct is a foundational agent architecture that stands for Reasoning and Acting.
It combines the LLM's ability to reason about a problem (often using Chain-of-Thought prompting) with its ability to act by using tools.
The process works in an iterative loop:
The agent thinks about the task.
It decides which action to take (which tool to use).
It uses the tool and gets an observation (the result).
It feeds the observation back into its thought process and repeats the cycle until the task is complete.
Summary
Project Goal: This section will focus on building a search agent using LangChain. This is an AI agent equipped with web searching capabilities.
ChatGPT Demo: The video demonstrates ChatGPT's built-in web search feature as a practical example.
The Query: A query is made to "search for 3 job postings for an AI engineer using LangChain in the Bay Area on LinkedIn and list their details."
Agent in Action: ChatGPT's agent first understands that it needs to search the web to answer the query. It performs the search, specifically targeting LinkedIn.
Grounding and Trust: The agent returns a list of job postings and, importantly, includes source links for each finding. This process is called "grounding" and is crucial for building user trust by allowing them to verify the information.
The Problem LLMs Solve: Standard Large Language Models (LLMs) are "text-in, text-out" and have static knowledge based on their training data. They lack real-time information.
The Solution (Tools): By giving an LLM "tools" (like web search), it can access up-to-date, external information, overcoming its inherent limitations.
Course Context: The video highlights that built-in search in chat applications is now standard but was a new concept when the course was first created, showing how fast the field is evolving. The upcoming section will teach how to build this functionality from scratch.
Summary
This video outlines the evolution of ReAct agents within the LangChain framework, from their inception to the latest version, and explains the structure of the upcoming course.
Initial Stage: LangChain ReAct Agent
Introduced in November 2022.
Relied on "ReAct Prompting," where the language model reasoned through actions and observations in a text-based format.
Second Stage: Tool Calling Agent
Evolved with the advent of native function-calling capabilities in large language models (LLMs).
Switched from prompt-based tool selection to structured function calling, making tool execution more reliable and efficient.
Third Stage: LangGraph ReAct Agent
Continued using function calling but was rebuilt on top of LangGraph.
LangGraph's low-level orchestration provided durable execution, persistence, and the fine-grained control necessary for production-grade applications.
Current Stage (v1.0): LangChain create_agent()
Introduced in LangChain v1.0.
Offers a clean, high-level interface for creating agents.
It is powered by the battle-tested LangGraph ReAct agent under the hood, combining simplicity with robustness.
Course Structure:
The course will start with the latest create_agent() function to quickly get users started.
It will then progressively work backward to the original ReAct implementation, explaining each evolutionary step.
The goal is to provide a deep understanding of not just how to use the agents, but how they work internally, what each iteration improved, and the core concepts like tool calling and LangGraph.
Summary
This video walks through setting up the environment for a Python project that uses LangChain and Tavily to create a search-enabled AI agent.
Project Setup:
The code for this section is available in the project/search-agent Git branch.
The project is initialized using the uv init command.
Installing Dependencies:
The following packages are installed using uv add:
langchain and langchain-openai (covered in previous videos).
langchain-tavily: A new LangChain integration package for the Tavily service.
tavily-python: The official Python SDK for Tavily.
python-dotenv: To load environment variables.
black and isort: For code formatting.
Introducing Tavily:
Tavily is a third-party service that provides a search engine for AI agents, allowing them to connect to the web.
It offers APIs for search, crawling, and extracting content.
It's a popular choice for web search integration and is featured in the official LangChain documentation.
Tavily provides a generous free tier of 1,000 API requests per month.
Environment Configuration (.env file):
The .env file should contain the following variables:
OPENAI_API_KEY
LANGSMITH_TRACING=true
LANGSMITH_API_KEY
LANGSMITH_PROJECT: Set to "Search Agent" for this section.
TAVILY_API_KEY: A new key obtained from the Tavily website. LangChain automatically looks for this specific variable name to authenticate with Tavily.
Initial Code (main.py):
The first step in the main.py file is to import and call load_dotenv to make the environment variables available to the application
Summary
This video provides an overview of setting up an AI agent using LangChain, focusing on the fundamental components: tools and a large language model (LLM).
Core Components of a LangChain Agent:
LLM (Large Language Model): The reasoning engine or the "brain" of the agent. In this example, ChatOpenAI is used.
Toolkit: A collection of tools that the agent can use to perform actions.
Creating a Tool:
A tool in LangChain is essentially a Python function that the agent can execute.
To create a custom tool, you define a regular Python function with type hints and a docstring.
The docstring is crucial as it provides a description of the tool's purpose, its arguments, and what it returns. This information is used by the LLM to decide when and how to use the tool.
The function is then decorated with the @tool decorator from langchain.tools to convert it into a LangChain tool.
Example: A Search Tool:
A simple search function is created that takes a query string and returns a static string: "Tokyo weather is sunny".
A docstring is added to describe the function as a tool that "searches over the internet."
Building the Agent:
Import necessary modules: create_agent, tool, HumanMessage, and ChatOpenAI.
Instantiate the LLM: An instance of ChatOpenAI is created.
Define the tools: The search function (now a tool) is placed into a list.
Create the agent: The create_agent function is called, passing the LLM and the list of tools as arguments.
Invoking the Agent:
The created agent is a "runnable," so it can be executed using the .invoke() method.
The input to the agent is a dictionary with a "messages" key. The value is a list of message objects, typically starting with a HumanMessage containing the user's query.
LangChain automatically handles casting a single HumanMessage into a list if one is not provided.
Summary
This video explains how to build and execute a basic LangChain agent by defining its core components: the LLM and its tools. It also demonstrates how to debug the agent's execution flow.
Required Imports:
create_agent from langchain.agents to build the agent.
tool decorator from langchain.tools to define a custom tool.
HumanMessage from langchain_core.messages to represent the user's input.
ChatOpenAI from langchain_openai to serve as the agent's LLM (reasoning engine).
Understanding LangChain Tools:
A tool is a function that an agent can execute to perform an action (e.g., call an API, search a database, run code).
To create a tool, define a Python function with clear type hints and a descriptive docstring that explains its purpose, arguments, and return value. The LLM uses this metadata to decide when to call the tool.
Use the @tool decorator to transform the Python function into a LangChain tool.
Building the Agent:
Define a Tool: A search function is created with a @tool decorator. It's a placeholder that prints the query and returns a static string "Tokyo weather is sunny".
Instantiate the LLM: An instance of ChatOpenAI is created.
Assemble Tools: The search tool is put into a list.
Create the Agent: The create_agent function is called, passing the LLM and the list of tools.
Executing and Debugging the Agent:
Run the Code: The agent is invoked with the query "What is the weather in Tokyo?". The search function is called, and its print statement confirms the LLM correctly extracted the query "weather in Tokyo".
Analyze the Trace in LangSmith:
The execution is orchestrated by LangGraph internally.
Step 1 (First LLM Call): The agent sends the user's query and the available tool's description to the LLM. The LLM's output is not a direct answer but a "function call" instruction, telling the agent to execute the search tool with the query "weather in Tokyo."
Step 2 (Tool Execution): The LangChain runtime executes the search tool. The output "Tokyo weather is sunny" is captured as a ToolMessage.
Step 3 (Second LLM Call): The agent sends the entire history (original query, LLM's decision to call the tool, and the tool's result) back to the LLM.
Step 4 (Final Answer): With all the necessary information, the LLM generates the final, user-facing answer: "The weather in Tokyo is currently sunny."
ReAct Agent Architecture Recap: This process demonstrates the core ReAct (Reason and Act) architecture. The LLM thinks and decides on an action (tool call). The tool is executed, and the result is returned as an observation. The LLM then uses this observation to decide the next step, which in this case is to finish by providing the final answer
Summary
Adding Real-World Search: The speaker integrates the Tavily API to give the agent real-world internet search capabilities.
Initial Implementation (Custom Tool):
He imports TavilyClient and initializes it. The client automatically uses the TAVILY_API_KEY from the environment variables.
The custom search function is modified to call tavily.search(query=query) instead of returning a static string.
Analyzing with LangSmith:
A simple weather query demonstrates that the tool now returns live data from the web.
A more complex query for job postings shows the agent making multiple, parallel tool calls with refined search terms (e.g., "LangChain Bay Area," "LangChain San Francisco").
Best Practice (Using Built-in Integrations):
The speaker explains that creating custom wrappers for popular services is not ideal, as the official integrations provided by vendors are better maintained and more feature-rich.
He installs the langchain-tavily package.
He replaces the custom search function and TavilyClient with the pre-built TavilySearch tool imported directly from langchain_tavily.
Benefits of Integration: The LangSmith trace shows that by using the official tool, the LLM can leverage advanced parameters like include_domains and search_depth, leading to more accurate and specific tool calls
Summary
Problem: By default, Large Language Models (LLMs) return unstructured text, which is hard to use programmatically in downstream applications (e.g., serializing, parsing, or rendering in a UI).
Solution: Enforce a structured output format, such as a Pydantic object, to ensure the LLM's response is predictable and easy to work with.
LangChain Implementation: The create_agent function in LangChain has a response_format argument. By passing a Pydantic class to this argument, you can define the exact schema the agent's output should follow.
Example in Video:
A Source Pydantic class is created to hold a source URL.
A nested AgentResponse Pydantic class is created, which includes an answer (string) and a sources (a list of Source objects).
This AgentResponse class is then passed to the response_format parameter when creating the agent.
Result: The agent's output dictionary now includes a structured_response key. The value is a populated Pydantic object matching the defined AgentResponse schema, containing both the text answer and a list of source URLs.
Underlying Mechanism: The video mentions that this feature works "like magic" by leveraging the LLM's function-calling capabilities under the hood.
Summary:
Structured output in LangChain allows agents to return data in a predictable, predefined format, such as Pydantic objects, JSON, or data classes, making it easier to use downstream in applications. There are two main implementation strategies for this:
ProviderStrategy: Used by default if the model natively supports structured output through its own API. This approach offloads the task of formatting the data to the model provider (e.g., OpenAI, Anthropic), which is generally more reliable.
ToolStrategy: Used as a workaround for models that do not natively support structured output but do support tool calling. LangChain forces the model to use a specific "tool" that matches the desired schema, effectively enforcing the structured format.
Both strategies support various schema types, including Pydantic models, data classes, typed dicts, and JSON schemas.
Summary
In this tutorial introduction, the instructor outlines a plan to demystify AI agents by stripping away the abstraction layers provided by frameworks like LangChain.
The lesson plan moves from "Layer 0," where high-level abstractions hide the internal logic, to manually implementing the agent loop using LangChain primitives. Following this, the tutorial will demonstrate a "raw" implementation without frameworks—writing JSON schemas manually—to highlight the value of LangChain's unified interface for model flexibility.
Finally, the course will cover the foundational "ReAct" method, building an agent from scratch using regular expressions instead of function calling. The instructor emphasizes hands-on coding using provided GitHub resources and notes that the tutorial will utilize tools such as Ollama and open-weight models like Qwen.
Summary
In this video, we introduce the practical project for this section: building an AI agent for an e-commerce hardware store that sells items like laptops and headphones.
Our goal is to calculate final prices based on specific discount tiers (Bronze, Silver, and Gold). We outline the creation of an agent that utilizes two specific tools—one to retrieve the base product price and another to determine the discount percentage based on the customer's tier.
While we acknowledge that a high-level LangChain abstraction like create_agent could handle this automatically, we are choosing to manually implement a lean version of the agent loop to better understand the underlying logic, as illustrated by our workflow diagram.
Summary
The tutorial demonstrates the initial setup for building a custom agent loop using LangChain. It begins by creating a new Python script and importing essential tools, including environment variable loaders, message types, and the init_chat_model utility, which allows for easy switching between different language model providers (like OpenAI, Anthropic, or Ollama) using just a string identifier. A specific local model, "qwen3:1.7b," is initialized for the demonstration.
Next, two custom functions are created and decorated with LangChain's @tool to make them usable by the language model. The first tool, get_product_price, simulates retrieving a price from an e-commerce catalog based on a product name. The second tool, apply_discount, calculates a final price after applying a specific discount tier (bronze, silver, or gold). Emphasis is placed on writing clear docstrings and type hints for these functions, as the @tool decorator parses this metadata and formats it correctly for the language model to understand the available tools. Finally, a run_agent function is outlined to contain the raw logic of the agent loop, and LangSmith's @traceable decorator is added to this function to monitor execution, allowing developers to track token usage, runtime, and other metrics in the LangSmith dashboard
Summary:
This video tutorial focuses on implementing the "run agent" function for a LangChain ReAct (Reasoning and Acting) loop. The process begins by creating a dictionary mapping tool names to their corresponding Python functions, which enables the agent to dynamically execute the correct tool based on the LLM's output. The LLM is then initialized using LangChain's init_chat_model, a flexible abstraction that allows developers to easily switch between providers like Ollama or OpenAI by simply changing a string.
The tutorial then demonstrates how to bind the list of tools to the initialized LLM using the bind_tools function. This crucial step ensures that the model is aware of the available tools and their descriptions, allowing models that support function calling to return tool execution requests instead of standard text responses.
Next, the video covers the creation of the agent's "brain" through prompt engineering. The prompt is structured as a list containing a system message and a human message. The system message defines the agent's persona ("helpful shopping assistant") and incorporates "defensive prompting"—a set of strict rules designed to guide the model's behavior and prevent hallucinations. These rules explicitly instruct the agent to never guess prices, to execute specific tools in a particular order (e.g., getting the price before applying a discount), and to never perform math itself, relying instead on the provided tools. Finally, the user's input question is appended as a human message, completing the prompt setup for the ReAct loop.
Summary:
This video demonstrates how to implement a custom reasoning and acting (ReAct) loop for a LangChain agent.
The process involves an iterative loop where the message history is sent to a large language model (LLM).
During each iteration, the LLM processes the input and returns either a final answer or a "tool call"—a request to execute a specific function.
If a tool call is detected, the loop extracts the target tool's name and arguments, executes the corresponding Python function, and captures the result as an "observation."
Both the LLM's decision (the thought/action) and the tool's result (the observation) are then appended to the message history. This updated history is fed back into the LLM in the next iteration, allowing the model to contextually build upon previous actions. This cycle continues until the LLM returns an answer without requesting any further tool calls, terminating the loop.
The tutorial also showcases how to debug this process and use LangSmith to trace the agent's execution flow, detailing the inputs, outputs, and token usage for each step.
The video concludes by noting that this implementation relies heavily on convenient LangChain abstractions and previews a future lesson on building the same logic entirely from scratch to highlight the problems LangChain solves.
Summary:
The video begins by reviewing the implementation of an agent loop using LangChain objects. It highlights the convenience and flexibility of LangChain, demonstrating how easily a model can be switched from Ollama to OpenAI simply by changing a string in the code. However, a practical demonstration shows that this ease of switching is not sufficient for production applications. When a different model (like GPT-5.2) is used, it may fail to provide a satisfactory answer, illustrating that a model must be capable and well-suited for the specific use case. The tutorial emphasizes the crucial importance of evaluating and benchmarking models before deploying them in an agent run. Finally, the video previews the next lesson, which will involve building the same agent loop from scratch without LangChain abstractions to highlight the extensive "heavy lifting" the framework provides for developers
Summary
The video demonstrates the process of implementing raw Agent Loop with function calling using the Ollama Python SDK by removing LangChain dependencies. Without LangChain's @tool decorator, which automatically converts Python functions into large language model (LLM) readable formats, the developer must manually define tools using a specific JSON schema. This schema requires explicit definitions of the function's name, description, parameters, and data types.
While Ollama's Python SDK can natively convert Python functions into tools, doing so requires strict adherence to Google-style docstrings—a requirement that is found in the source code but is poorly documented in Ollama's official documentation.
The broader challenge highlighted in the tutorial is that different LLM providers require different JSON schema structures. For example, Anthropic’s tool-calling schema differs significantly from Ollama’s. Manually writing and switching between these distinct, vendor-specific JSON schemas is time-consuming and increases development costs. Ultimately, this exercise illustrates the primary benefit of LangChain's tool abstraction, as it automatically parses type hints and descriptions to generate the correct, vendor-specific JSON schema out of the box.
Summary
The tutorial demonstrates how to replace LangChain with the raw Ollama Python SDK within a ReAct agent loop. The process involves manually handling tasks that LangChain previously abstracted away. First, an auxiliary function is created to interact with the raw ollama.chat client, and it is wrapped with a LangSmith @traceable decorator to ensure execution visibility.
Next, the tutorial updates how tools and messages are formatted. A manual dictionary is created to map tool names (strings) to their corresponding Python functions. Message formatting is updated from LangChain’s specific objects (like HumanMessage) to raw dictionaries specifying the role (e.g., user, system) and content, adhering to Ollama's specific naming conventions.
Finally, the agent loop logic is modified to parse tool calls directly from the Ollama response object. Because the raw Ollama response has a different structure than LangChain's standardized AI message objects—such as missing a tool call ID and nesting function names and arguments differently—the parsing logic must be updated. After executing the selected Python function, the result (observation) is appended to the message history as a raw dictionary with the role tool to continue the ReAct loop. The script is then executed, confirming that the agent successfully interacts with the tools and tracks properly in LangSmith.
Summary
The tutorial demonstrates how to replace LangChain with the raw Ollama Python SDK within a ReAct agent loop. The process involves manually handling tasks that LangChain previously abstracted away. First, an auxiliary function is created to interact with the raw ollama.chat client, and it is wrapped with a LangSmith @traceable decorator to ensure execution visibility.
Next, the tutorial updates how tools and messages are formatted. A manual dictionary is created to map tool names (strings) to their corresponding Python functions. Message formatting is updated from LangChain’s specific objects (like HumanMessage) to raw dictionaries specifying the role (e.g., user, system) and content, adhering to Ollama's specific naming conventions.
Finally, the agent loop logic is modified to parse tool calls directly from the Ollama response object. Because the raw Ollama response has a different structure than LangChain's standardized AI message objects—such as missing a tool call ID and nesting function names and arguments differently—the parsing logic must be updated. After executing the selected Python function, the result (observation) is appended to the message history as a raw dictionary with the role tool to continue the ReAct loop. The script is then executed, confirming that the agent successfully interacts with the tools and tracks properly in LangSmith.
Summary:
The tutorial introduces the ReAct prompt, highlighting it as a crucial foundational element in AI engineering that allows Large Language Models (LLMs) to function as reasoning engines.
This specific prompt, originally uploaded by LangChain co-founder Harrison Chase (hwchase17/react), powered the first LangChain agents. The video provides a walkthrough of finding the prompt on the LangChain Hub and breaks down its structure.
The prompt works by guiding the LLM through a specific format:
"Question," "Thought," "Action," "Action Input," and "Observation."
It includes placeholders to inject tool descriptions and names, enabling the model to decide which tools to use and how to apply them.
A key feature discussed is the "agent scratchpad," which acts as the agent's memory, storing the history of chosen tools, the reasoning behind those choices, and the resulting observations.
This continuous updating of context helps the agent iterate and focus on subsequent steps. Utilizing techniques like few-shot prompting and chain of thought, this prompt serves as the basis for the agent's execution loop, which will be implemented in a subsequent lesson without the use of function calling.
Summary:
The video demonstrates how to modify an existing Python script to transition an LLM agent from using a structured function-calling API to relying on a "raw ReAct prompt." The goal is to force the LLM to output its reasoning and tool selections as plain text, which will later need to be parsed.
To achieve this, the code imports the re module for future text parsing and the inspect module to programmatically extract information from Python functions. The previous JSON schema defining the tools is replaced with a simple dictionary mapping tool names to the actual functions.
A crucial new function, get_tool_descriptions, is introduced. This function dynamically generates a formatted string detailing each available tool by extracting its signature (arguments and return types) and docstring using the inspect module. This comprehensive string, along with a list of tool names, is then injected into an f-string representing the ReAct prompt. This prompt closely mirrors the original LangChain ReAct prompt, instructing the LLM on how to reason through steps and format its output.
Finally, the function responsible for calling the LLM API is updated; the parameter previously used to pass the tool schema is removed, ensuring the LLM relies entirely on the provided text prompt for instructions.
Summary
The tutorial demonstrates how to use a ReAct (Reasoning and Acting) prompt as the core reasoning engine for an AI agent loop, specifically showcasing how agents functioned before modern function-calling capabilities existed. Using the LangSmith playground, a prompt is set up with specific tools, tool descriptions, and a user question (asking for the price of a laptop after a discount).
When initially run through an older model like GPT-3.5 Turbo, the language model correctly identifies the tool to use and the input to provide. However, it hallucinates an "Observation" (inventing a price) because the actual code tool has not been executed yet. To fix this, a "stop sequence" (\nObservation) is introduced into the model's configuration. This instructs the language model to pause token generation immediately after outputting the tool and input, preventing hallucinations.
We show that this resulting string must then be parsed using regular expressions to extract the tool name and input so the application can execute the actual code. Once the loop finishes, the model generates a "Final Answer," which must also be parsed. The tutorial notes that early frameworks like LangChain handled this complex regular expression parsing and error handling internally.
Finally, the we prove that this foundational prompting technique works consistently across various models, successfully testing it on GPT-3.5 Turbo, GPT-4o, and newer GPT-4.5 models.
Summary
The video tutorial explains how to manually construct a reasoning process for an LLM agent using a Python script and Ollama, without relying on built-in tool-calling features. The instructor highlights removing the tools argument from the chat API call, emphasizing that the LLM's behavior is entirely driven by the provided prompt rather than an innate understanding of being an agent.
The core of this approach involves crafting a specific prompt structure—often referred to as a ReAct pattern—that instructs the model to format its output with explicit "Thought," "Action," and "Action Input" steps.
To manage the ongoing loop, the script combines these rules with the user's question and an iterative "scratchpad" into one large string, sending it as a single user message. A critical component of this manual setup is implementing a stop sequence (\nObservation:).
This specific sequence halts the LLM's text generation exactly before it attempts to provide an observation, preventing it from hallucinating tool results. This pause allows the local Python script to execute the actual tool (like calculating a price), retrieve real data, and inject that factual result back into the prompt for the LLM's next iteration.
Summary:
This tutorial demonstrates how to build an AI agent loop from scratch when native function calling is unavailable, relying instead on raw text generation and regular expressions.
The process involves parsing the language model's output to extract a "Final Answer" or, alternatively, an "Action" (tool name) and "Action Input" (tool arguments).
The tutorial highlights the fragility of this method, as it depends on the model strictly adhering to formatting instructions, and shows how to implement error handling for missing actions or hallucinations of non-existent tools.
Furthermore, it explains the necessity of parsing raw string inputs into callable arguments, including type casting (e.g., converting a string to a float). Finally, the video covers updating the agent's scratchpad with its own reasoning and the resulting observations from tool executions, allowing it to maintain context across iterations.
The manual implementation underscores the value of frameworks like LangChain, which handle these complex and error-prone parsing tasks under the hood.
Summary
This video provides a theoretical introduction to "tool calling," a capability in modern Large Language Models (LLMs) that is used interchangeably with the term "function calling."
Core Concept: What is Tool Calling?
Definition: It is the model's ability to produce a structured, machine-readable output (typically JSON) that specifies a function to call and the arguments to use.
Purpose: Instead of just generating plain text, the model can interact directly with external systems like APIs, databases, or other functions.
Standard Feature: While not all LLMs support it, tool calling is now a standard feature for most state-of-the-art models from major vendors like OpenAI, Google, and Anthropic.
The Process: How It Works
Define Tools: The developer provides the model with a list of available tools (functions), including their names, descriptions, and required parameters (schema).
User Request: The user sends a prompt to the model (e.g., "What's the weather in Paris?").
Model Decides: The model analyzes the request and determines that it needs to use an external tool to get the answer.
Function Call Output: The model outputs a structured JSON object containing the name of the function to call (get_weather) and the necessary arguments ("location": "Paris").
Execute & Respond: The application code parses this JSON, executes the actual get_weather function, gets the result (e.g., 14°C), and sends it back to the model.
Final Answer: The model uses this new information to generate a final, natural language response to the user (e.g., "It's currently 14°C in Paris.").
Key Advantages of Tool Calling
Structured & Reliable Integration:
Provides a reliable, schema-strict JSON output that is easy to parse, reducing errors compared to parsing plain text.
Enables robust integration with external tools and APIs.
Efficiency and Cost Savings:
Reduces token usage and latency because the model can directly output the function call without a lengthy chain-of-thought explanation.
Structured Output Generation:
Beyond calling tools, this feature can be used to force the LLM to return information in a structured, organized JSON format, which is useful for data extraction.
Improved on ReAct: Tool calling was developed as a more reliable and deterministic alternative to older methods like the ReAct prompt, which was often unreliable and difficult to parse correctly.
Main Drawback
Opaque Reasoning (Black Box): The model’s internal thought process for choosing a specific function is hidden from the developer. This can make it difficult to debug or understand why a model made a particular decision, as the intermediate "chain of thought" is not exposed.
What is Language Modeling?
Language modeling is the task of predicting the next word in a sentence.
It is similar to autocomplete or word suggestions we see in our day-to-day life.
The language model predicts the probability of the next word based on the previous words in the sentence.
Formal Definition of Language Modeling
Language modeling involves computing the probability distribution of the next word in a sequence of words.
The probability of the next word (x t+1) is calculated based on the sequence of words before it (X1, X2, ..., XT) and needs to be a part of the vocabulary (V).
Large Language Models: A Brief Overview
A large language model (LLM) is a language model trained on a huge amount of data.
LLMs are capable of predicting the probability of the next word with high accuracy.
They have gained immense popularity in recent times due to their ability to perform a variety of language-related tasks.
How Large Language Models Work
LLMs work by taking an input of words and predicting the probability of the next word.
They make their predictions based on the input provided and the probabilities learned during the training phase.
LLMs can sometimes generate output that is far-fetched from reality and simply not true due to the limitations of probability-based predictions.
What is a Prompt in AI Language Models?
A prompt is the input given to an AI model to produce an output.
It guides the model to understand the context and generate a meaningful response.
Components of a Prompt:
Instruction
The heart of the prompt that tells the AI model what task it needs to perform.
It sets the stage for the model's response, whether it's text summary, translation, or classification.
Context
Additional information that helps the AI model understand the task and generate more accurate responses.
For some tasks, context may not be necessary, but for others, it can significantly improve the model's performance.
Input Data
The information that the AI model will process to complete the task set in the prompt.
It could be a piece of text, image, or anything relevant to the task.
Output Indicator
Signals the AI model that we expect a response.
Sometimes implicit in the instruction, but sometimes explicitly stated.
Here are the key points we'll cover:
Large language models and their immense knowledge base
What is zero shot prompting?
An example of a zero shot prompt
Why zero shot prompts are popular among AI beginners
The limitations of zero shot prompting
With zero shot prompting, AI models can generate outputs for tasks they haven't been explicitly trained on, using their pre-existing knowledge to perform the task based on the information provided in the prompt. However, this kind of prompt comes with its own set of limitations, such as accuracy and scope.
In this video, we will explore the concept of Few Shot Prompting, a technique used in prompt engineering that allows AI models to generate or classify new data by presenting them with a small number of examples or shots of a particular task or concept along with a prompt or instruction. Here are the main points we will cover:
What is Few Shot Prompting?
Few Shot Prompting is a prompt engineering technique that involves presenting the AI model with a small number of examples or shots of a task or concept to generate or classify new data that is similar to the examples provided. It is particularly useful in scenarios where there is limited data available for a given task or domain where data may be scarce.
How Does Few Shot Prompting Work?
Few Shot Prompting works by providing the AI model with a few examples of a particular task or concept and a prompt or instruction on how to generate or classify new data similar to the examples provided. It can quickly adapt models to new tasks and domains by fine-tuning existing models without requiring a large amount of new data.
Case Study: Zero Shot, One Shot, and Few Shot Prompting in Action
We will demonstrate the effectiveness of zero shot, one shot, and few shot prompting techniques in generating text-to-text descriptions for Blue Willow, an open source AI tool that generates images from text prompts. By comparing the outputs generated by each technique, we will see which one performed better according to our task of generating a good description to paint a picture.
Introduction to Chain of Thought
Explanation of Chain of Thought's purpose in improving LLM reasoning abilities
How Chain of Thought allows models to decompose complex problems into manageable steps
Standard Prompting Limitations
Examples of insufficient answers with standard zero-shot prompting
Explanation of zero-shot prompting
Chain of Thought Prompting
Explanation of Chain of Thought as a new prompting technique
Examples of Chain of Thought's success in solving complex reasoning problems
Comparison to human problem-solving methods
Zero-Shot and Few-Shot Chain of Thought Prompting
Explanation of zero-shot Chain of Thought prompting
Explanation of few-shot Chain of Thought prompting
Benefits and limitations of each method
In this video, we will explore the ReAct Prompting technique, a powerful approach to prompt engineering that combines reasoning and acting to accomplish complex tasks. Here are the main points we will cover:
What is ReAct Prompting?
ReAct Prompting is a technique that allows language models to reason and act upon a task to generate an output.
It is based on the chain of thoughts that the model can generate to accomplish a task.
How Does ReAct Prompting Work?
ReAct Prompting involves breaking down a task into multiple steps, reasoning the steps, acting upon them, and then completing the entire task.
The model can derive an action by accessing external sources or APIs, allowing it to accomplish more complex tasks.
Case Study: ReAct Prompting in Action
We will look at a research paper that demonstrates the power of ReAct Prompting in action.
The paper shows how a language model was able to derive the correct answer to a complex question by reasoning and acting upon it.
How to Produce Prompts for Better LLM Responses
Incorporating Context for Coherent and Accurate Responses
Context provides relevance for generating better responses
Leaving context to the AI model could lead to off-topic or irrelevant responses
Defining a Clear Task for Unambiguous Results
Clear and precise task definitions provide better results
Ambiguous tasks can lead to confusion and lower performance
Example: Improving user experience of an e-commerce website
Iterating for Optimized Prompts
Iterations involve refining prompts and evaluating output
Refining prompts over iterations leads to optimized prompts
Taking the time to write effective prompts saves time in the long run
By following these low-hanging fruit tips for prompt engineering, you can improve the performance of your AI models and get better responses. Incorporating context, defining clear tasks, and iterating to refine prompts will lead to more accurate and relevant responses, saving you time and improving the overall performance of your AI models.
Summary
Intro to Context Engineering: It's defined as the art and science of filling an LLM's context window with the right information at each step of a task. It's the natural evolution of prompt engineering.
Prompt vs. Context Engineering: While prompt engineering uses static prompts, context engineering deals with dynamic information from multiple sources like user input, tool calls, history, and developer instructions.
Importance for AI Agents: Effective AI agents (especially coding agents like Claude Code and Cursor) are more than just "wrappers" around LLMs. Their performance heavily depends on sophisticated context engineering to provide the right information at the right time.
The "Garbage In, Garbage Out" Problem: The quality of an AI agent's output is directly tied to the quality of its context. Providing irrelevant, incorrect, or poorly structured context degrades performance.
Challenges in Long-Running Tasks: As agents perform complex, multi-step tasks, their context window grows. This can lead to:
Exceeding Context Limits: The context becomes too large for the model.
Performance Degradation: The agent's effectiveness decreases due to noise.
Increased Cost & Latency: Larger contexts are more expensive and slower to process.
Specific Context-Related Failures:
Context Poisoning: An early error or hallucination pollutes the entire subsequent process.
Context Confusion: Irrelevant information distracts the model.
Context Clash: Contradictory information within the context confuses the model.
A Skill for Everyone: Context engineering is crucial not only for developers building AI applications but also for users who interact with them. Users can get much better results by understanding how to provide better context.
Summary
This tutorial explains the critical role of system prompts in context engineering for large language models (LLMs). It moves beyond the generic advice to "write a good system prompt" by providing concrete examples and best practices.
First, the video highlights a valuable GitHub repository called "system-prompts-and-models-of-ai-tools," which contains leaked system prompts from various state-of-the-art AI agents like Claude Code, Cursor, and Devin. These examples demonstrate that top-tier AI tools use detailed and lengthy prompts, often hundreds of lines long, to guide the model's behavior effectively.
Next, the tutorial introduces the concept of "calibrating the system prompt" to find the "Goldilocks zone"—a balance between being too specific and too vague. A prompt that is too specific treats the LLM like a deterministic machine with hardcoded logic, making it rigid and unable to handle unexpected situations. Conversely, a prompt that is too vague provides insufficient guidance, leading to inconsistent and unreliable outputs.
The ideal "just right" prompt empowers the model rather than constricting it. It does this by:
Defining a clear identity and scope: Establishes boundaries for the agent's role (e.g., customer support, not sales).
Providing a reasoning framework: Offers high-level principles and guidelines instead of a rigid flowchart, allowing the model to apply general rules to specific situations.
Establishing clear boundaries and heuristics: Uses compressed, efficient language to guide decision-making (e.g., "choose the simplest solution") without wasteful or repetitive instructions.
By following these principles, developers can craft superior system prompts that leverage the LLM's strengths in pattern recognition and reasoning, leading to more robust and adaptable AI agents.
Summary
The video discusses the taxonomy of AI agents, specifically distinguishing between "Shallow Agents" (typically based on the ReAct architecture) and "Deep Agents."
Shallow Agents (ReAct)
The speaker explains that ReAct agents operate on a loop of reasoning, acting (using a tool), and observing. While effective for simple, short-term tasks, they are classified as "shallow" because they struggle with deep, complex research. The primary limitation is the context window; as the agent iterates through decisions and tool outputs, the context bloats. This leads to increased costs, slower performance, and "context rot" (hallucinations and confusion) when attempting long-running tasks.
Deep Agents
Deep Agents are defined by their ability to handle complex, long-horizon tasks that may take minutes, hours, or days to complete. They can manage interruptions, such as waiting for user input, without losing track of the objective. The speaker highlights two main categories of Deep Agents currently in production:
Deep Research Agents: Examples include Perplexity's deep research feature and the open-source GPT Researcher.
Coding Agents: Examples include Claude Code, Devin, and Cursor. These agents can write code, run tests, and debug autonomously, effectively mimicking a human software engineer.
The Role of the Application Layer
The speaker argues that current innovation is being driven by the application layer (how agents are architected) rather than exponential leaps in raw LLM capabilities. By abstracting and orchestrating LLMs correctly, developers can create machines capable of automating complex reasoning tasks.
Deep Agent Architecture
To solve the context bloat issue found in shallow agents, Deep Agents typically implement four key components:
Planning Tool: To outline the steps required for the task.
Sub-agents (Specialized Workers): To perform specific tasks in isolated contexts, allowing for concurrency without polluting the main context.
File System: To store intermediate results and shared state on disk rather than in the context window.
Massive System Prompt: To provide extensive instructions and persona definitions.
Summary
The video analyzes the architecture of "Deep Agents," a concept articulated by the LangChain team, with a specific focus on the "Planning Tool." Unlike standard Large Language Models that rely on implicit Chain of Thought reasoning, Deep Agents utilize explicit planning tools, typically structured as dynamic markdown to-do lists. These agents actively review and update their plans between execution steps, marking items as pending, in-progress, or completed. This method allows the agent to handle failures by intelligently steering the plan rather than blindly retrying tasks, as seen in earlier algorithms like ReAct.
Using Claude Code as a concrete example, the video demonstrates how the agent calls internal tools to update its task list, a process visible to the user even if the tool itself is internal. The speaker concludes that this approach mimics human problem-solving strategies, where breaking down complex projects into trackable tasks is essential for success and provides a sense of progress.
Summary
This section explores the "Sub Agents" capability within Deep Agent architecture, introducing the concept of Hierarchical Delegation. A Deep Agent can spawn specialized instances of itself—sub agents—each equipped with a unique system prompt, description, and specific set of tools tailored for focused tasks.
The video uses a real-world analogy of home repair to illustrate this concept. The narrator describes a noisy skylight that required fixing but lacked the skills and tools (like a specific ladder) to do it himself. He delegated the task to his father-in-law (the sub agent), who possessed the necessary skills (system prompt) and equipment (tools). Crucially, the narrator provided the instructions but did not oversee the actual work, receiving only the final result (a quiet roof).
This analogy maps to Context Isolation in AI. Sub agents operate within their own independent context windows. They execute their own tool-calling and ReAct loops without polluting the main agent's memory with intermediate steps. This isolation prevents context bloating and allows for parallel task execution, significantly improving efficiency and result quality. The video concludes by showing a technical example of "Claude Code" spawning an exploration agent to search for authentication patterns, demonstrating how this architecture is applied in coding environments.
Summary
This video tutorial explores the architecture of context flow when utilizing subagents in AI systems, specifically referencing Claude Code. The speaker illustrates that a main agent thread accumulates tokens with every interaction, which eventually leads to performance degradation, increased latency, higher costs, and "context pollution."
To mitigate these issues, the tutorial proposes the use of subagents. When a main agent delegates a task, it generates a specific prompt that serves as the sole context for the subagent. The subagent operates in isolation with a fresh context window, unaware of the main thread's full history. It performs complex tasks independently and returns only a single, condensed response (or artifact) to the main agent. This process effectively compresses context, keeping the main thread lean and avoiding the need for manual context management commands like /compact or /clear. This architecture allows for the use of tailored system prompts for specific tasks without clogging the primary context window.
This course contains the use of artificial intelligence :)
Please note that this is not a course for beginners. This course assumes that you have a background in software
engineering and are proficient in Python. I will be using Pycharm IDE but you can use any editor you'd like
since we only use basic feature of the IDE like debugging and running scripts .
Who this is for: Software developers, data scientists, and AI/ML engineers proficient in Python. This is not a beginner course.
Welcome to AI Agents with LangChain. This course teaches you how AI agents actually work — then you build them from scratch.
You'll go deep into agent internals: how LLMs make decisions, how function calling works, how prompts drive
agent behavior, and how to build agents with LangChain.
What you'll learn:
LLM and GenAI foundations
Prompt engineering, Context engineering
Tool calling and function calling
Agent tracing with LangSmith
Deep agents with LangGraph
Open source models
Output parsers and structured output
Everything is hands-on — real code, real projects. Uses PyCharm but any Python IDE works.
DISCLAIMERS
Please note that this is not a course for beginners. This course assumes that you have a background in software engineering and are proficient in Python.
I will be using Pycharm IDE but you can use any editor you'd like since we only use basic feature of the IDE like debugging and running scripts.