
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Course Objectives: Develop LLM Powered Applications with LangChain
Agents
Retrieval Augmentation Generation (RAG)
We also Cover:
LangChain Ecosystem: LangSmith, LangGraph
Prompt Engineering
Production
Target Audience:
Software Engineers
Data Scientists
Technical Product Managers
Anyone who is comftirable with Code
No AI/ML experience is needed
Prerequisites: This is NOT a beginner's course
Python knowledge
Git usage
Virtual Environments, environment variables
No AI/ML Knowledge is needed, we cover all here
Phase 1: Preparing the Codebase
Cloning the Repository: Our first step was to get the source code. We navigated to the langchain-course GitHub repository, which serves as the central hub for all the course material. We copied the repository's HTTPS URL to our clipboard. Then, using our terminal, we executed the git clone command. This downloaded a complete copy of the project, including all its files and version history, onto our local machine.
Creating an Isolated Workspace: To ensure our work wouldn't interfere with the main course content and to start with a fresh slate, we created a new, independent "orphan" branch. We did this with the command git checkout --orphan project/hello-world. The --orphan flag was a key detail, as it created a branch with no parent and no previous commit history, effectively giving us a brand new, empty project within the same repository structure.
Cleaning the Branch: To finalize our clean start, we ran git rm -rf . which removed all the cloned files from our new branch's staging area. This left us with a completely empty directory, ready for our own project files.
Phase 2: Building Our Python Environment with uv
Introducing uv: We chose to use uv, a modern, high-performance Python package manager built in Rust. The instructor explained it as a blazingly fast alternative to traditional tools like pip and venv, capable of handling dependency resolution and virtual environments seamlessly.
Initializing the Project: We kickstarted our project by running uv init. This command automatically generated a foundational project structure for us, including:
A pyproject.toml file, which is the modern standard for configuring Python projects and defining dependencies.
A main.py file with some boilerplate "Hello, World!" code.
Installing Dependencies and Creating a Virtual Environment: We then began adding our necessary libraries. The first command, uv add LangChain, was pivotal. It not only fetched and installed the core LangChain library but also automatically detected that we didn't have a virtual environment and created a .venv directory for us. This crucial step ensures that all our project's packages are kept isolated from our global Python installation, preventing conflicts.
Adding Specific and Utility Packages: We continued by adding more packages, each for a specific purpose:
langchain-openai: We installed this as a separate package. The instructor explained that LangChain has modularized its integrations, so we only need to install the packages for the specific LLM providers we plan to use.
python-dotenv: This utility is essential for managing secrets. It allows us to load environment variables from a .env file, which is a best practice for handling sensitive data like API keys.
black and isort: To maintain clean and professional code, we installed these industry-standard code formatters. black handles code styling, and isort automatically organizes our import statements.
Phase 3: Securely Configuring API Keys and Ignoring Files
Managing Secrets with a .env File: We created a .env file at the root of our project. This file is specifically designed to hold our API keys and other sensitive credentials.
Generating the OpenAI API Key: We navigated to the OpenAI Platform website, went to the API Keys section, and generated a new secret key. The instructor issued a strong warning about the importance of keeping API keys private, treating them like passwords, and never committing them to public repositories.
Populating the .env File: We pasted our newly generated key into the .env file, assigning it to the variable OPENAI_API_KEY. The specific variable name is important, as the LangChain library looks for this exact name by default.
Using .gitignore: To prevent our secrets and unnecessary files from being tracked by Git, we created a .gitignore file. We copied a standard, comprehensive Python .gitignore template from a public GitHub repository and pasted it into our file. This ensures that files like our .env and the entire .venv directory are ignored by Git.
Phase 4: Verification and Finalization
Testing the Setup: To make sure everything was working together, we wrote a small test script in main.py. We imported the load_dotenv function and the os module. By calling load_dotenv(), we loaded the variables from our .env file into our session's environment. We then printed the value of os.environ.get("OPENAI_API_KEY").
Confirmation: When we ran the script, it successfully printed our API key to the terminal. This confirmed that our virtual environment was active, our packages were installed correctly, and our .env file was being loaded properly.
Committing Our Work: With the setup complete and verified, we cleaned up the test code, formatted our files with black and isort, and then committed all our new project files to our local branch with the descriptive commit message "environment setup". Finally, we pushed this new branch to the remote GitHub repository, making our work available there.
1. Setting the Stage: Preparing Our Code
We started in our main.py file, where we already had the load_dotenv() function set up. This ensures that our API keys from the .env file are loaded and available as environment variables for our application to use.
2. Understanding and Using Prompt Templates
What is a Prompt? We first established that a "prompt" is simply the text input we provide to a Large Language Model (LLM). The LLM processes this text and generates an output.
Introducing PromptTemplate: We imported PromptTemplate from langchain_core.prompts. We learned that this is a key LangChain abstraction that allows us to create reusable and dynamic prompts. Instead of hardcoding the entire input, a PromptTemplate lets us define a template with placeholders (parameters).
Dynamic and Reusable Prompts: We saw a practical example of a prompt template like: "I want you to write a cool, funny jingle for a {product} product."
This template allows us to programmatically insert different values for the {product} parameter.
We could run this once with product = "cat food", then again with product = "sports shoes", and a third time with product = "piano", getting a unique, context-specific output from the LLM each time without rewriting the entire prompt.
Essentially, PromptTemplate helps us format our inputs into the final string that gets sent to the LLM.
3. Interacting with LLMs via Chat Models
Importing ChatOpenAI: We then imported ChatOpenAI from langchain_openai. This class is a specific implementation of a "Chat Model."
The Role of Chat Models: We learned that Chat Models are the primary, standardized interface in LangChain for interacting with modern, conversational LLMs (like GPT-4, Claude 3, and Gemini).
We contrasted this with older LLMs that simply took a single string of text in and returned a single string out.
Modern chat models are more sophisticated; they are designed to handle conversational context and work best when given a list of structured messages (e.g., a System Message for instructions, a Human Message for user input, and an AI Message for the model's previous responses). The model then returns a new AI Message.
Exploring the Source Code: As a best practice for developers, we learned to look directly at the framework's source code to better understand its functionality. By holding Cmd (or Ctrl) and clicking on PromptTemplate and ChatOpenAI in our IDE, we were able to navigate to their class definitions and see their documentation and implementation details directly.
4. The Core Concept: LangChain Chains
Defining a Chain: We concluded by introducing the concept of a LangChain Chain. A chain is a sequence of components linked together to form a complete workflow. The defining characteristic of a chain is that the output of one step becomes the input of the next.
Building Complex Workflows: This "chaining" concept allows us to build powerful applications that go far beyond a single prompt-and-response interaction. We mapped out a hypothetical advanced workflow:
User Query: The process starts with input from a user.
Prompt Template: The query is formatted into a structured prompt.
Language Model: The formatted prompt is sent to an LLM to generate a response.
Output Parser: The LLM's raw text output is parsed into a structured format (like JSON).
External API / Tool Call: The structured data is used to call an external tool or API (e.g., a search engine).
Final LLM Call: The result from the API call is fed into another LLM to process it and generate the final, polished output for the user.
Summary:
Objective: We built our first LangChain chain to summarize a piece of text about Elon Musk.
Gathering Information: We started by copying a block of text about Elon Musk from Wikipedia and storing it in a Python variable called information.
Creating a Prompt Template: We defined a summary_template string that instructs the LLM what to do with the provided information. This template included a placeholder {information} to make it dynamic.
Initializing the PromptTemplate: We created an instance of the PromptTemplate class, passing it our summary_template and specifying that information is the input variable it should expect.
Initializing the ChatOpenAI Model: We created an instance of the ChatOpenAI class, which serves as the interface to the OpenAI API. We configured it to use the gpt-5 model and set the temperature to 0 for more deterministic and factual responses.
Building the Chain with LCEL: We used the LangChain Expression Language (LCEL) to create our chain. We "piped" (|) the summary_prompt_template into the llm (our chat model). This creates a runnable object where the formatted prompt from the first component becomes the input for the second component.
Invoking the Chain: We executed the chain using the .invoke() method, passing a dictionary with the information variable. This triggered the entire workflow:
The information text was inserted into our template.
The resulting prompt was sent to the OpenAI API.
The LLM generated a summary and two interesting facts based on our instructions.
Displaying the Result: Finally, we printed the response.content to the terminal, which displayed the structured output generated by our LangChain chain.
Summary:
Objective: We debugged our newly created LangChain chain to understand how data flows between its components and to inspect the objects involved in the process.
Setting Up the Debugger:
We set a breakpoint in our main.py file right after the chain.invoke() call.
We ran the Python script in debug mode, which paused the execution at our breakpoint.
Inspecting the Response Object:
We examined the response variable in the debugger. We discovered that its type is not a simple string, but rather an AIMessage object.
This AIMessage object is a special LangChain class that acts as a wrapper for the LLM's output.
Analyzing the AIMessage Object:
Content: We found that the main text generated by the LLM is stored within the .content attribute of the AIMessage object.
Response Metadata: We explored the response_metadata attribute, which contains valuable diagnostic information, such as:
The exact model name used (gpt-5-2025-08-07).
The token usage (e.g., input tokens, completion tokens, total tokens).
The finish_reason (e.g., stop), which tells us why the model stopped generating text.
This metadata is crucial for debugging, monitoring costs, and analyzing the performance of our LLM calls.
Conclusion: By stepping through the code with a debugger, we gained a much deeper understanding of how the PromptTemplate and ChatOpenAI model work together within the chain and what kind of structured data is passed between them.
Summary:
Objective: We learned how to switch from a proprietary, cloud-based LLM (like OpenAI's GPT-5) to a locally hosted, open-source model using Ollama and LangChain.
Installing and Setting Up Ollama:
We navigated to ollama.com and downloaded the application for our operating system.
After installing, we used the terminal to download a specific open-source model, gemma3:270m, using the command ollama pull gemma3:270m. This is a lightweight model suitable for running on a local machine.
We confirmed the model was downloaded successfully by running ollama list.
Running a Local Model:
We ran the command ollama run gemma3:270m in the terminal, which started an interactive chat session with the model, proving it was working correctly.
Integrating Ollama with LangChain:
Install the Package: We installed the necessary integration package with uv add langchain-ollama.
Modify the Code: In our main.py script, we made two simple changes:
We imported ChatOllama from the langchain_ollama library.
We replaced the ChatOpenAI instance with a ChatOllama instance, specifying the model="gemma3:270m". The rest of our chain's logic remained exactly the same.
Debugging and Observing the Difference:
We ran the script in debug mode to compare the output of the local model with the previous output from GPT-5.
We observed that while the local model was much faster, the quality of its response was lower—it provided a summary but failed to generate the two separate interesting facts as instructed.
Key Takeaway:
The tutorial highlighted a core strength of LangChain: its model-agnostic design. We can easily switch between different LLM providers (proprietary or open-source, local or cloud-based) by changing just a single line of code, without altering the rest of our application's workflow.
This demonstrates the trade-off: local models offer speed and cost savings, while larger, proprietary models often provide higher quality and better instruction-following capabilities.
Introduction to LangSmith: The tutorial begins by introducing LangSmith as a platform for tracing and observing LangChain applications to debug and monitor performance.
Account Setup: The speaker demonstrates how to sign up for a LangSmith account, highlighting the availability of a free "Developer" tier suitable for getting started.
Configuration: Key steps to enable tracing are covered:
Generating a new API key from the LangSmith dashboard.
Setting up the necessary environment variables in a .env file:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=<your-generated-key>
LANGSMITH_PROJECT=<your-project-name> (e.g., "hello-world")
Automatic Tracing: Once the environment is configured, running the LangChain application automatically sends detailed execution traces to the specified project in LangSmith.
Analyzing Traces: The LangSmith dashboard provides a comprehensive view of each run:
A "waterfall" view breaks down the execution flow of a RunnableSequence into individual steps like PromptTemplate and the LLM call (ChatOllama or ChatOpenAI).
Detailed metrics for each step are available, including latency, time-to-first-token, token count, and status (success/failure).
Comparing LLM Performance: The tutorial shows how to switch between different LLMs (a local ChatOllama model and OpenAI's gpt-5) and use the traces to compare their performance, such as the significant difference in latency (1.4s vs. 16s).
Code and Resources: The final code is committed to a GitHub repository, and the speaker notes that public links to the demonstrated traces will be shared as resources.
The course compatible with V0.3.0
Summary
1. What is an AI Agent?
An AI Agent is a software system that uses a Large Language Model (LLM) as its "reasoning engine."
Its primary function is to decide which actions to take to accomplish a goal and then execute those actions.
2. The Key Difference: Agent vs. Chain
Chain: The sequence of actions (the control flow) is hard-coded and defined by the developer. An LLM might be used as a step in the chain, but it does not decide what happens next.
Agent: The control flow is dynamic and determined by the LLM itself. The agent uses reasoning to decide which tool or step is needed next to solve the problem.
3. The Power of Tools
The core concept of modern agents is equipping an LLM with a "toolkit."
Tools are pre-defined functions that give the agent abilities, such as:
Searching the web
Making an API call
Querying a database
Writing and executing code
This allows LLMs to go beyond text generation and interact with external systems, effectively giving them "superpowers" to automate complex tasks.
4. ReAct Agent Architecture
ReAct is a foundational agent architecture that stands for Reasoning and Acting.
It combines the LLM's ability to reason about a problem (often using Chain-of-Thought prompting) with its ability to act by using tools.
The process works in an iterative loop:
The agent thinks about the task.
It decides which action to take (which tool to use).
It uses the tool and gets an observation (the result).
It feeds the observation back into its thought process and repeats the cycle until the task is complete.
Summary
Project Goal: This section will focus on building a search agent using LangChain. This is an AI agent equipped with web searching capabilities.
ChatGPT Demo: The video demonstrates ChatGPT's built-in web search feature as a practical example.
The Query: A query is made to "search for 3 job postings for an AI engineer using LangChain in the Bay Area on LinkedIn and list their details."
Agent in Action: ChatGPT's agent first understands that it needs to search the web to answer the query. It performs the search, specifically targeting LinkedIn.
Grounding and Trust: The agent returns a list of job postings and, importantly, includes source links for each finding. This process is called "grounding" and is crucial for building user trust by allowing them to verify the information.
The Problem LLMs Solve: Standard Large Language Models (LLMs) are "text-in, text-out" and have static knowledge based on their training data. They lack real-time information.
The Solution (Tools): By giving an LLM "tools" (like web search), it can access up-to-date, external information, overcoming its inherent limitations.
Course Context: The video highlights that built-in search in chat applications is now standard but was a new concept when the course was first created, showing how fast the field is evolving. The upcoming section will teach how to build this functionality from scratch.
Summary
This video outlines the evolution of ReAct agents within the LangChain framework, from their inception to the latest version, and explains the structure of the upcoming course.
Initial Stage: LangChain ReAct Agent
Introduced in November 2022.
Relied on "ReAct Prompting," where the language model reasoned through actions and observations in a text-based format.
Second Stage: Tool Calling Agent
Evolved with the advent of native function-calling capabilities in large language models (LLMs).
Switched from prompt-based tool selection to structured function calling, making tool execution more reliable and efficient.
Third Stage: LangGraph ReAct Agent
Continued using function calling but was rebuilt on top of LangGraph.
LangGraph's low-level orchestration provided durable execution, persistence, and the fine-grained control necessary for production-grade applications.
Current Stage (v1.0): LangChain create_agent()
Introduced in LangChain v1.0.
Offers a clean, high-level interface for creating agents.
It is powered by the battle-tested LangGraph ReAct agent under the hood, combining simplicity with robustness.
Course Structure:
The course will start with the latest create_agent() function to quickly get users started.
It will then progressively work backward to the original ReAct implementation, explaining each evolutionary step.
The goal is to provide a deep understanding of not just how to use the agents, but how they work internally, what each iteration improved, and the core concepts like tool calling and LangGraph.
Summary
This video walks through setting up the environment for a Python project that uses LangChain and Tavily to create a search-enabled AI agent.
Project Setup:
The code for this section is available in the project/search-agent Git branch.
The project is initialized using the uv init command.
Installing Dependencies:
The following packages are installed using uv add:
langchain and langchain-openai (covered in previous videos).
langchain-tavily: A new LangChain integration package for the Tavily service.
tavily-python: The official Python SDK for Tavily.
python-dotenv: To load environment variables.
black and isort: For code formatting.
Introducing Tavily:
Tavily is a third-party service that provides a search engine for AI agents, allowing them to connect to the web.
It offers APIs for search, crawling, and extracting content.
It's a popular choice for web search integration and is featured in the official LangChain documentation.
Tavily provides a generous free tier of 1,000 API requests per month.
Environment Configuration (.env file):
The .env file should contain the following variables:
OPENAI_API_KEY
LANGSMITH_TRACING=true
LANGSMITH_API_KEY
LANGSMITH_PROJECT: Set to "Search Agent" for this section.
TAVILY_API_KEY: A new key obtained from the Tavily website. LangChain automatically looks for this specific variable name to authenticate with Tavily.
Initial Code (main.py):
The first step in the main.py file is to import and call load_dotenv to make the environment variables available to the application
Summary
This video provides an overview of setting up an AI agent using LangChain, focusing on the fundamental components: tools and a large language model (LLM).
Core Components of a LangChain Agent:
LLM (Large Language Model): The reasoning engine or the "brain" of the agent. In this example, ChatOpenAI is used.
Toolkit: A collection of tools that the agent can use to perform actions.
Creating a Tool:
A tool in LangChain is essentially a Python function that the agent can execute.
To create a custom tool, you define a regular Python function with type hints and a docstring.
The docstring is crucial as it provides a description of the tool's purpose, its arguments, and what it returns. This information is used by the LLM to decide when and how to use the tool.
The function is then decorated with the @tool decorator from langchain.tools to convert it into a LangChain tool.
Example: A Search Tool:
A simple search function is created that takes a query string and returns a static string: "Tokyo weather is sunny".
A docstring is added to describe the function as a tool that "searches over the internet."
Building the Agent:
Import necessary modules: create_agent, tool, HumanMessage, and ChatOpenAI.
Instantiate the LLM: An instance of ChatOpenAI is created.
Define the tools: The search function (now a tool) is placed into a list.
Create the agent: The create_agent function is called, passing the LLM and the list of tools as arguments.
Invoking the Agent:
The created agent is a "runnable," so it can be executed using the .invoke() method.
The input to the agent is a dictionary with a "messages" key. The value is a list of message objects, typically starting with a HumanMessage containing the user's query.
LangChain automatically handles casting a single HumanMessage into a list if one is not provided.
Summary
This video explains how to build and execute a basic LangChain agent by defining its core components: the LLM and its tools. It also demonstrates how to debug the agent's execution flow.
Required Imports:
create_agent from langchain.agents to build the agent.
tool decorator from langchain.tools to define a custom tool.
HumanMessage from langchain_core.messages to represent the user's input.
ChatOpenAI from langchain_openai to serve as the agent's LLM (reasoning engine).
Understanding LangChain Tools:
A tool is a function that an agent can execute to perform an action (e.g., call an API, search a database, run code).
To create a tool, define a Python function with clear type hints and a descriptive docstring that explains its purpose, arguments, and return value. The LLM uses this metadata to decide when to call the tool.
Use the @tool decorator to transform the Python function into a LangChain tool.
Building the Agent:
Define a Tool: A search function is created with a @tool decorator. It's a placeholder that prints the query and returns a static string "Tokyo weather is sunny".
Instantiate the LLM: An instance of ChatOpenAI is created.
Assemble Tools: The search tool is put into a list.
Create the Agent: The create_agent function is called, passing the LLM and the list of tools.
Executing and Debugging the Agent:
Run the Code: The agent is invoked with the query "What is the weather in Tokyo?". The search function is called, and its print statement confirms the LLM correctly extracted the query "weather in Tokyo".
Analyze the Trace in LangSmith:
The execution is orchestrated by LangGraph internally.
Step 1 (First LLM Call): The agent sends the user's query and the available tool's description to the LLM. The LLM's output is not a direct answer but a "function call" instruction, telling the agent to execute the search tool with the query "weather in Tokyo."
Step 2 (Tool Execution): The LangChain runtime executes the search tool. The output "Tokyo weather is sunny" is captured as a ToolMessage.
Step 3 (Second LLM Call): The agent sends the entire history (original query, LLM's decision to call the tool, and the tool's result) back to the LLM.
Step 4 (Final Answer): With all the necessary information, the LLM generates the final, user-facing answer: "The weather in Tokyo is currently sunny."
ReAct Agent Architecture Recap: This process demonstrates the core ReAct (Reason and Act) architecture. The LLM thinks and decides on an action (tool call). The tool is executed, and the result is returned as an observation. The LLM then uses this observation to decide the next step, which in this case is to finish by providing the final answer
Summary
Adding Real-World Search: The speaker integrates the Tavily API to give the agent real-world internet search capabilities.
Initial Implementation (Custom Tool):
He imports TavilyClient and initializes it. The client automatically uses the TAVILY_API_KEY from the environment variables.
The custom search function is modified to call tavily.search(query=query) instead of returning a static string.
Analyzing with LangSmith:
A simple weather query demonstrates that the tool now returns live data from the web.
A more complex query for job postings shows the agent making multiple, parallel tool calls with refined search terms (e.g., "LangChain Bay Area," "LangChain San Francisco").
Best Practice (Using Built-in Integrations):
The speaker explains that creating custom wrappers for popular services is not ideal, as the official integrations provided by vendors are better maintained and more feature-rich.
He installs the langchain-tavily package.
He replaces the custom search function and TavilyClient with the pre-built TavilySearch tool imported directly from langchain_tavily.
Benefits of Integration: The LangSmith trace shows that by using the official tool, the LLM can leverage advanced parameters like include_domains and search_depth, leading to more accurate and specific tool calls
Summary
Problem: By default, Large Language Models (LLMs) return unstructured text, which is hard to use programmatically in downstream applications (e.g., serializing, parsing, or rendering in a UI).
Solution: Enforce a structured output format, such as a Pydantic object, to ensure the LLM's response is predictable and easy to work with.
LangChain Implementation: The create_agent function in LangChain has a response_format argument. By passing a Pydantic class to this argument, you can define the exact schema the agent's output should follow.
Example in Video:
A Source Pydantic class is created to hold a source URL.
A nested AgentResponse Pydantic class is created, which includes an answer (string) and a sources (a list of Source objects).
This AgentResponse class is then passed to the response_format parameter when creating the agent.
Result: The agent's output dictionary now includes a structured_response key. The value is a populated Pydantic object matching the defined AgentResponse schema, containing both the text answer and a list of source URLs.
Underlying Mechanism: The video mentions that this feature works "like magic" by leveraging the LLM's function-calling capabilities under the hood.
Summary:
Structured output in LangChain allows agents to return data in a predictable, predefined format, such as Pydantic objects, JSON, or data classes, making it easier to use downstream in applications. There are two main implementation strategies for this:
ProviderStrategy: Used by default if the model natively supports structured output through its own API. This approach offloads the task of formatting the data to the model provider (e.g., OpenAI, Anthropic), which is generally more reliable.
ToolStrategy: Used as a workaround for models that do not natively support structured output but do support tool calling. LangChain forces the model to use a specific "tool" that matches the desired schema, effectively enforcing the structured format.
Both strategies support various schema types, including Pydantic models, data classes, typed dicts, and JSON schemas.
Summary
In this tutorial introduction, the instructor outlines a plan to demystify AI agents by stripping away the abstraction layers provided by frameworks like LangChain.
The lesson plan moves from "Layer 0," where high-level abstractions hide the internal logic, to manually implementing the agent loop using LangChain primitives. Following this, the tutorial will demonstrate a "raw" implementation without frameworks—writing JSON schemas manually—to highlight the value of LangChain's unified interface for model flexibility.
Finally, the course will cover the foundational "ReAct" method, building an agent from scratch using regular expressions instead of function calling. The instructor emphasizes hands-on coding using provided GitHub resources and notes that the tutorial will utilize tools such as Ollama and open-weight models like Qwen.
Summary
In this video, we introduce the practical project for this section: building an AI agent for an e-commerce hardware store that sells items like laptops and headphones.
Our goal is to calculate final prices based on specific discount tiers (Bronze, Silver, and Gold). We outline the creation of an agent that utilizes two specific tools—one to retrieve the base product price and another to determine the discount percentage based on the customer's tier.
While we acknowledge that a high-level LangChain abstraction like create_agent could handle this automatically, we are choosing to manually implement a lean version of the agent loop to better understand the underlying logic, as illustrated by our workflow diagram.
Summary
The tutorial demonstrates the initial setup for building a custom agent loop using LangChain. It begins by creating a new Python script and importing essential tools, including environment variable loaders, message types, and the init_chat_model utility, which allows for easy switching between different language model providers (like OpenAI, Anthropic, or Ollama) using just a string identifier. A specific local model, "qwen3:1.7b," is initialized for the demonstration.
Next, two custom functions are created and decorated with LangChain's @tool to make them usable by the language model. The first tool, get_product_price, simulates retrieving a price from an e-commerce catalog based on a product name. The second tool, apply_discount, calculates a final price after applying a specific discount tier (bronze, silver, or gold). Emphasis is placed on writing clear docstrings and type hints for these functions, as the @tool decorator parses this metadata and formats it correctly for the language model to understand the available tools. Finally, a run_agent function is outlined to contain the raw logic of the agent loop, and LangSmith's @traceable decorator is added to this function to monitor execution, allowing developers to track token usage, runtime, and other metrics in the LangSmith dashboard
Summary:
This video tutorial focuses on implementing the "run agent" function for a LangChain ReAct (Reasoning and Acting) loop. The process begins by creating a dictionary mapping tool names to their corresponding Python functions, which enables the agent to dynamically execute the correct tool based on the LLM's output. The LLM is then initialized using LangChain's init_chat_model, a flexible abstraction that allows developers to easily switch between providers like Ollama or OpenAI by simply changing a string.
The tutorial then demonstrates how to bind the list of tools to the initialized LLM using the bind_tools function. This crucial step ensures that the model is aware of the available tools and their descriptions, allowing models that support function calling to return tool execution requests instead of standard text responses.
Next, the video covers the creation of the agent's "brain" through prompt engineering. The prompt is structured as a list containing a system message and a human message. The system message defines the agent's persona ("helpful shopping assistant") and incorporates "defensive prompting"—a set of strict rules designed to guide the model's behavior and prevent hallucinations. These rules explicitly instruct the agent to never guess prices, to execute specific tools in a particular order (e.g., getting the price before applying a discount), and to never perform math itself, relying instead on the provided tools. Finally, the user's input question is appended as a human message, completing the prompt setup for the ReAct loop.
Summary:
This video demonstrates how to implement a custom reasoning and acting (ReAct) loop for a LangChain agent.
The process involves an iterative loop where the message history is sent to a large language model (LLM).
During each iteration, the LLM processes the input and returns either a final answer or a "tool call"—a request to execute a specific function.
If a tool call is detected, the loop extracts the target tool's name and arguments, executes the corresponding Python function, and captures the result as an "observation."
Both the LLM's decision (the thought/action) and the tool's result (the observation) are then appended to the message history. This updated history is fed back into the LLM in the next iteration, allowing the model to contextually build upon previous actions. This cycle continues until the LLM returns an answer without requesting any further tool calls, terminating the loop.
The tutorial also showcases how to debug this process and use LangSmith to trace the agent's execution flow, detailing the inputs, outputs, and token usage for each step.
The video concludes by noting that this implementation relies heavily on convenient LangChain abstractions and previews a future lesson on building the same logic entirely from scratch to highlight the problems LangChain solves.
Summary:
The video begins by reviewing the implementation of an agent loop using LangChain objects. It highlights the convenience and flexibility of LangChain, demonstrating how easily a model can be switched from Ollama to OpenAI simply by changing a string in the code. However, a practical demonstration shows that this ease of switching is not sufficient for production applications. When a different model (like GPT-5.2) is used, it may fail to provide a satisfactory answer, illustrating that a model must be capable and well-suited for the specific use case. The tutorial emphasizes the crucial importance of evaluating and benchmarking models before deploying them in an agent run. Finally, the video previews the next lesson, which will involve building the same agent loop from scratch without LangChain abstractions to highlight the extensive "heavy lifting" the framework provides for developers
Summary
The video demonstrates the process of implementing raw Agent Loop with function calling using the Ollama Python SDK by removing LangChain dependencies. Without LangChain's @tool decorator, which automatically converts Python functions into large language model (LLM) readable formats, the developer must manually define tools using a specific JSON schema. This schema requires explicit definitions of the function's name, description, parameters, and data types.
While Ollama's Python SDK can natively convert Python functions into tools, doing so requires strict adherence to Google-style docstrings—a requirement that is found in the source code but is poorly documented in Ollama's official documentation.
The broader challenge highlighted in the tutorial is that different LLM providers require different JSON schema structures. For example, Anthropic’s tool-calling schema differs significantly from Ollama’s. Manually writing and switching between these distinct, vendor-specific JSON schemas is time-consuming and increases development costs. Ultimately, this exercise illustrates the primary benefit of LangChain's tool abstraction, as it automatically parses type hints and descriptions to generate the correct, vendor-specific JSON schema out of the box.
Summary
The tutorial demonstrates how to replace LangChain with the raw Ollama Python SDK within a ReAct agent loop. The process involves manually handling tasks that LangChain previously abstracted away. First, an auxiliary function is created to interact with the raw ollama.chat client, and it is wrapped with a LangSmith @traceable decorator to ensure execution visibility.
Next, the tutorial updates how tools and messages are formatted. A manual dictionary is created to map tool names (strings) to their corresponding Python functions. Message formatting is updated from LangChain’s specific objects (like HumanMessage) to raw dictionaries specifying the role (e.g., user, system) and content, adhering to Ollama's specific naming conventions.
Finally, the agent loop logic is modified to parse tool calls directly from the Ollama response object. Because the raw Ollama response has a different structure than LangChain's standardized AI message objects—such as missing a tool call ID and nesting function names and arguments differently—the parsing logic must be updated. After executing the selected Python function, the result (observation) is appended to the message history as a raw dictionary with the role tool to continue the ReAct loop. The script is then executed, confirming that the agent successfully interacts with the tools and tracks properly in LangSmith.
Summary
The tutorial demonstrates how to replace LangChain with the raw Ollama Python SDK within a ReAct agent loop. The process involves manually handling tasks that LangChain previously abstracted away. First, an auxiliary function is created to interact with the raw ollama.chat client, and it is wrapped with a LangSmith @traceable decorator to ensure execution visibility.
Next, the tutorial updates how tools and messages are formatted. A manual dictionary is created to map tool names (strings) to their corresponding Python functions. Message formatting is updated from LangChain’s specific objects (like HumanMessage) to raw dictionaries specifying the role (e.g., user, system) and content, adhering to Ollama's specific naming conventions.
Finally, the agent loop logic is modified to parse tool calls directly from the Ollama response object. Because the raw Ollama response has a different structure than LangChain's standardized AI message objects—such as missing a tool call ID and nesting function names and arguments differently—the parsing logic must be updated. After executing the selected Python function, the result (observation) is appended to the message history as a raw dictionary with the role tool to continue the ReAct loop. The script is then executed, confirming that the agent successfully interacts with the tools and tracks properly in LangSmith.
Summary:
The tutorial introduces the ReAct prompt, highlighting it as a crucial foundational element in AI engineering that allows Large Language Models (LLMs) to function as reasoning engines.
This specific prompt, originally uploaded by LangChain co-founder Harrison Chase (hwchase17/react), powered the first LangChain agents. The video provides a walkthrough of finding the prompt on the LangChain Hub and breaks down its structure.
The prompt works by guiding the LLM through a specific format:
"Question," "Thought," "Action," "Action Input," and "Observation."
It includes placeholders to inject tool descriptions and names, enabling the model to decide which tools to use and how to apply them.
A key feature discussed is the "agent scratchpad," which acts as the agent's memory, storing the history of chosen tools, the reasoning behind those choices, and the resulting observations.
This continuous updating of context helps the agent iterate and focus on subsequent steps. Utilizing techniques like few-shot prompting and chain of thought, this prompt serves as the basis for the agent's execution loop, which will be implemented in a subsequent lesson without the use of function calling.
Summary:
The video demonstrates how to modify an existing Python script to transition an LLM agent from using a structured function-calling API to relying on a "raw ReAct prompt." The goal is to force the LLM to output its reasoning and tool selections as plain text, which will later need to be parsed.
To achieve this, the code imports the re module for future text parsing and the inspect module to programmatically extract information from Python functions. The previous JSON schema defining the tools is replaced with a simple dictionary mapping tool names to the actual functions.
A crucial new function, get_tool_descriptions, is introduced. This function dynamically generates a formatted string detailing each available tool by extracting its signature (arguments and return types) and docstring using the inspect module. This comprehensive string, along with a list of tool names, is then injected into an f-string representing the ReAct prompt. This prompt closely mirrors the original LangChain ReAct prompt, instructing the LLM on how to reason through steps and format its output.
Finally, the function responsible for calling the LLM API is updated; the parameter previously used to pass the tool schema is removed, ensuring the LLM relies entirely on the provided text prompt for instructions.
Summary
The tutorial demonstrates how to use a ReAct (Reasoning and Acting) prompt as the core reasoning engine for an AI agent loop, specifically showcasing how agents functioned before modern function-calling capabilities existed. Using the LangSmith playground, a prompt is set up with specific tools, tool descriptions, and a user question (asking for the price of a laptop after a discount).
When initially run through an older model like GPT-3.5 Turbo, the language model correctly identifies the tool to use and the input to provide. However, it hallucinates an "Observation" (inventing a price) because the actual code tool has not been executed yet. To fix this, a "stop sequence" (\nObservation) is introduced into the model's configuration. This instructs the language model to pause token generation immediately after outputting the tool and input, preventing hallucinations.
We show that this resulting string must then be parsed using regular expressions to extract the tool name and input so the application can execute the actual code. Once the loop finishes, the model generates a "Final Answer," which must also be parsed. The tutorial notes that early frameworks like LangChain handled this complex regular expression parsing and error handling internally.
Finally, the we prove that this foundational prompting technique works consistently across various models, successfully testing it on GPT-3.5 Turbo, GPT-4o, and newer GPT-4.5 models.
Summary
The video tutorial explains how to manually construct a reasoning process for an LLM agent using a Python script and Ollama, without relying on built-in tool-calling features. The instructor highlights removing the tools argument from the chat API call, emphasizing that the LLM's behavior is entirely driven by the provided prompt rather than an innate understanding of being an agent.
The core of this approach involves crafting a specific prompt structure—often referred to as a ReAct pattern—that instructs the model to format its output with explicit "Thought," "Action," and "Action Input" steps.
To manage the ongoing loop, the script combines these rules with the user's question and an iterative "scratchpad" into one large string, sending it as a single user message. A critical component of this manual setup is implementing a stop sequence (\nObservation:).
This specific sequence halts the LLM's text generation exactly before it attempts to provide an observation, preventing it from hallucinating tool results. This pause allows the local Python script to execute the actual tool (like calculating a price), retrieve real data, and inject that factual result back into the prompt for the LLM's next iteration.
Summary:
This tutorial demonstrates how to build an AI agent loop from scratch when native function calling is unavailable, relying instead on raw text generation and regular expressions.
The process involves parsing the language model's output to extract a "Final Answer" or, alternatively, an "Action" (tool name) and "Action Input" (tool arguments).
The tutorial highlights the fragility of this method, as it depends on the model strictly adhering to formatting instructions, and shows how to implement error handling for missing actions or hallucinations of non-existent tools.
Furthermore, it explains the necessity of parsing raw string inputs into callable arguments, including type casting (e.g., converting a string to a float). Finally, the video covers updating the agent's scratchpad with its own reasoning and the resulting observations from tool executions, allowing it to maintain context across iterations.
The manual implementation underscores the value of frameworks like LangChain, which handle these complex and error-prone parsing tasks under the hood.
Summary
This video provides a theoretical introduction to "tool calling," a capability in modern Large Language Models (LLMs) that is used interchangeably with the term "function calling."
Core Concept: What is Tool Calling?
Definition: It is the model's ability to produce a structured, machine-readable output (typically JSON) that specifies a function to call and the arguments to use.
Purpose: Instead of just generating plain text, the model can interact directly with external systems like APIs, databases, or other functions.
Standard Feature: While not all LLMs support it, tool calling is now a standard feature for most state-of-the-art models from major vendors like OpenAI, Google, and Anthropic.
The Process: How It Works
Define Tools: The developer provides the model with a list of available tools (functions), including their names, descriptions, and required parameters (schema).
User Request: The user sends a prompt to the model (e.g., "What's the weather in Paris?").
Model Decides: The model analyzes the request and determines that it needs to use an external tool to get the answer.
Function Call Output: The model outputs a structured JSON object containing the name of the function to call (get_weather) and the necessary arguments ("location": "Paris").
Execute & Respond: The application code parses this JSON, executes the actual get_weather function, gets the result (e.g., 14°C), and sends it back to the model.
Final Answer: The model uses this new information to generate a final, natural language response to the user (e.g., "It's currently 14°C in Paris.").
Key Advantages of Tool Calling
Structured & Reliable Integration:
Provides a reliable, schema-strict JSON output that is easy to parse, reducing errors compared to parsing plain text.
Enables robust integration with external tools and APIs.
Efficiency and Cost Savings:
Reduces token usage and latency because the model can directly output the function call without a lengthy chain-of-thought explanation.
Structured Output Generation:
Beyond calling tools, this feature can be used to force the LLM to return information in a structured, organized JSON format, which is useful for data extraction.
Improved on ReAct: Tool calling was developed as a more reliable and deterministic alternative to older methods like the ReAct prompt, which was often unreliable and difficult to parse correctly.
Main Drawback
Opaque Reasoning (Black Box): The model’s internal thought process for choosing a specific function is hidden from the developer. This can make it difficult to debug or understand why a model made a particular decision, as the intermediate "chain of thought" is not exposed.
Summary
The Problem: Large Language Models (LLMs) cannot effectively answer questions about information they weren't trained on, such as very large documents (e.g., a full book like Harry Potter) or private, proprietary data (e.g., a financial report).
Naive Solution (Solution 1): A simple but flawed approach is to "stuff" the entire document into the LLM's prompt along with the user's question.
Problems with the Naive Solution: This method is not scalable due to several issues:
Hard Token Limit: Most LLMs have a maximum context window size that a large document will exceed.
"Needle in the Haystack" Problem: LLM performance degrades when trying to find specific information within a very long prompt.
Cost: Processing larger prompts is more expensive.
Latency: It takes longer for the LLM to generate a response from a very long prompt.
A Better Approach (Solution 2 - RAG): The core idea of Retrieval Augmentation Generation is a multi-step process:
Chunking: Pre-process the large document by splitting it into smaller, manageable chunks.
Retrieval: When a user asks a question, a mechanism searches for and retrieves only the most relevant chunk(s) of text related to the query.
Augmentation & Generation: The retrieved, relevant chunk(s) are then added to the prompt (augmenting it) and sent to the LLM to generate a focused and accurate answer.
Benefits of RAG: This technique solves the issues of the naive approach by reducing cost and latency, staying within token limits, and improving the accuracy of the LLM's responses.
Intro
In this video, we introduce some new topics in the field of developing and empowering GEN AI applications. We will discuss the use of vector databases and embeddings, and how they can be used to process large amounts of data efficiently.
LangChain DocumentLoader
We start by discussing document loaders, which are used to help us work with data in the form of documents. Document loaders make it easy to connect to third-party applications and retrieve data from them. By loading data into documents, we can work with it more easily and send it to an NLP model for processing.
LangChain TextSplitters
We also discuss the use of text splitters, which are used to split long pieces of text into smaller, more manageable chunks. This is useful when dealing with large amounts of data and helps to avoid token limits. We will show examples of how to implement text splitters in the coming videos.
Intro to Embeddings:
We introduce embeddings, which are a classic technique in NLP used to create a higher dimensional space in which words, sentences, and other objects can be represented by vectors. Embeddings are useful for calculating the distance between vectors in the space, which can help to identify semantic relationships between objects.
Intro to Vector Databases/ VectorStores:
Finally, we discuss vector databases, which are used to persist embeddings and make it easy to search for and retrieve them. By storing embeddings in a vector database like Pinecone, we can efficiently search for relevant vectors and retrieve them for use in our LLMs.
Introduction
In this video, we will review the classes and objects that we will use for this session. We will explain each class related to the theoretical particle reviewed earlier, including Document Loaders, Text Splitters, OpenAI Embeddings, Pinecone, and VectorDBQA Chain.
Document Loaders
Document Loaders are classes that implement how to load and process data to make it digestible by the language model. We will be using the class of TextLoader, which helps us send data to LLMS.
This abstraction allows us to attach other things to the text, such as processing WhatsApp messages, downloading PDF files, or working with a Notion notebook.
TextSplitter
TextSplitter us with long pieces of text.
Large text files often contain more tokens than the modeling forces allow, causing the request to fail.
TextSplitter solves this problem by splitting the text into smaller chunks, which can be sent to the language model. TextSplitter has a lot of logic, including splitting strategies and calculating appropriate chunk size. The chunk size may change according to what we want to accomplish, depending on the different weighting systems. We can specify the chunk size and overlap between the chunks to ensure that the text isn't sped up in a way that disturbs the context or meaning.
OpenAIEmbeddings
OpenAIEmbeddings is a black box that takes inputs as text and outputs vectors in an embedded Vector space.
We can use OpenAI Embeddings' API to embed a text, and we will get back a vector. There are many embedding providers available, but OpenAI Embeddings is one of the best. The Embedding abstraction It creates a uniform interface for us to access different embeddings from different providers. It is straightforward to switch between different providers by just changing a parameter.
Pinecone
Pinecone is an excellent Vector database that has recently gone viral. It provides a persistent storage solution for the vectors we receive from embedding like OpenAI Embeddings.
Pinecone allows us to search in the vector space for the closest vectors of the current one. We can add new vectors to the vector space and retrieve relevant documents from the vector database.
In this video we will write a python script script that demonstrates how to use various libraries and tools to create an LLM powered assistant that can answer questions based on the content of a given text document.
This is the ingestion part.
Here are the main steps that the code follows:
Import necessary libraries including LangChain modules:
TextLoader, CharacterTextSplitter, OpenAIEmbeddings, PineconeVectorStore, and OpenAI.
Initialize Pinecone by providing an API key and environment.
Load a text document from a file using TextLoader.
Split the document into smaller chunks using CharacterTextSplitter.
Use OpenAIEmbeddings to create embeddings for each text chunk.
Index the embeddings in Pinecone using the PineconeVectorStore.from_documents method.
Overall, this code demonstrates how to leverage several powerful libraries and tools to build a simple but effective search engine that can provide answers to natural language queries.
Summary
The tutorial demonstrates how to implement a Retrieval-Augmented Generation (RAG) pipeline in Python without using LangChain Chains.
First, the necessary components are initialized in a main.py file: environment variables are loaded, the OpenAI embedding model and LLM are set up, and a PineconeVector store is initialized using an existing index.
A retriever object is created from the vector store with specific search arguments to fetch the top three relevant documents.
The video highlights the importance of RAG by comparing a raw LLM query, which produces a hallucinated answer about Pinecone being a generic algorithm, against a RAG-enabled query, which correctly identifies Pinecone as a vector database. To achieve this, a format_docs helper function is created to combine retrieved document contents into a single string.
The core of the tutorial focuses on a manual implementation function named retrieval_chain_without_lcel. This function performs the RAG process step-by-step:
Retrieval: It invokes the retriever with the user's query to get relevant LangChain documents.
Formatting: It formats these documents into a context string.
Prompting: It injects the context and the user's query into a prompt template that restricts the AI to answer based only on the provided context.
Generation: It sends the formatted message to the LLM and returns the response content.
Finally, the execution is analyzed using the debugger and LangSmith traces. The trace analysis reveals that while the individual components (retriever and LLM) work correctly, they appear as separate, disconnected traces rather than a cohesive chain. This demonstrates the limitations of the manual approach, such as lack of unified tracing, streaming support, and composability, setting the stage for a future implementation using LangChain Chains.
Summary
This tutorial transitions the Retrieval-Augmented Generation (RAG) implementation from a basic function-based approach to using the LangChain Expression Language (LCEL). This method offers better observability, composability, and built-in support for streaming and batching.
The RAG chain is constructed by piping the prompt template, the LLM, and the string output parser. A key component introduced is RunnablePassthrough, specifically its .assign() method. This allows the chain to maintain the original input (the user's question) while simultaneously calculating a new "context" key. The context is generated by a sub-chain that utilizes itemgetter to extract the question, passes it to the retriever, and formats the documents using a standard Python function, which LangChain automatically converts into a RunnableLambda.
The video demonstrates that while the output remains identical to the naive implementation, the observability is significantly improved. By using LangSmith, the execution trace shows a clear, step-by-step breakdown of the pipeline, including inputs, outputs, and latency for every component in the runnable sequence.
Summary
This video provides a critical review of the LangChain documentation regarding Retrieval Augmented Generation (RAG). While acknowledging the strength of the LangChain ecosystem, the tutorial points out issues with the documentation updates following version 1.0, specifically the removal of valuable content and the recommended approaches for building RAG applications.
The critique focuses on two specific implementations found in the documentation:
The Agentic RAG Approach: The documentation suggests using a "React agent" equipped with a similarity search tool. The tutorial argues this method is flawed for production because it grants the LLM too much autonomy, leading to potential security risks (jailbreaking) and answering irrelevant questions. Furthermore, it introduces latency and higher costs due to redundant tool calling and multiple inference steps.
The Two-Step Chain Approach: While the documentation offers a faster, single-pass alternative, the implementation relies on injecting retrieval functionality via middleware into an agent loop without tools. This method is criticized for being overly abstract and lacking transparency, making it difficult to maintain and potentially unstable across package updates.
The video concludes by recommending a more deterministic approach where developers maintain explicit control over the retrieval and generation steps. It specifically praises the "Custom RAG Agent with LangGraph" tutorial found in the documentation, noting that this robust, research-based architecture will be covered in depth later in the course.
Cloning the Repository
First, we'll clone the repository by running the command
"git clone https://github.com/emarco177/documentation-helper.git -b 1-start-here"
in the terminal. This will clone the repository and start us off on the beginning branch.
Creating the Pinecone Index
Once we have all the files downloaded, we'll move on to creating a Pinecone index.
We'll log in to our user and create an index that will store the embeddings of the documentation of LangChain. We'll search for embeddings of OpenAI to determine the dimension of the vectors that are returned, which is 1536.
Summary:
Introduction: The video outlines the necessary imports and main class initializations for the ingestion phase of a RAG (Retrieval-Augmented Generation) pipeline.
Environment Setup: It covers setting up environment variables for API keys (OpenAI, Pinecone, LangChain, Tavily) and configuring LangSmith for tracing. It also explains the setup of SSL context using certifi to handle HTTP requests securely, with a note on disabling VPNs on corporate networks to avoid certificate errors.
Logging: A pre-prepared logger.py file is introduced to provide colored and readable logs for tracking the ingestion process, with functions like log_info, log_success, and log_error.
Core Imports: The video details the essential libraries and classes to be imported:
LangChain Text Splitters: RecursiveCharacterTextSplitter is imported for breaking down large documents into smaller chunks.
Vector Stores: It shows how to import and use either a local vector store like Chroma or a cloud-based one like PineconeVectorStore. The presenter opts for Pinecone for the demonstration.
Embeddings: OpenAIEmbeddings is used to convert text documents into vector representations. It also mentions that open-source alternatives can be used.
Data Loading: TavilyCrawl, TavilyExtract, and TavilyMap are imported for web scraping and data extraction.
LangChain Core: The Document class is imported, which is the fundamental data structure in LangChain for handling text with associated metadata.
Class Initialization: The video walks through initializing the main classes:
OpenAIEmbeddings: Initialized with the text-embedding-3-small model. It explains how to manage rate limiting using chunk_size and retry_min_seconds to avoid API errors when processing large batches of documents.
PineconeVectorStore: The presenter demonstrates creating a new index in the Pinecone UI and then initializes the PineconeVectorStore class in the code, linking it to the newly created index and the embedding function.
Tavily Tools: Initializes TavilyCrawl, TavilyExtract, and TavilyMap for use in the pipeline.
Sanity Check: The video concludes by running the script to ensure all imports and initializations are working correctly without any errors.
Summary:
The video shows how to use the TavilyCrawl tool to automatically crawl and extract documentation from the LangChain website.
Crawling is an automated process of navigating a website by following hyperlinks to gather related content, which is a vital capability for AI agents.
The demonstration begins with setting up boilerplate code using asyncio to run the main crawling function.
The video then details how to use the TavilyCrawl tool by invoking it with the URL of the LangChain documentation (https://python.langchain.com/).
It explains and demonstrates how to use the max_depth parameter to control how deep the crawler explores from the base URL and the extract_depth parameter to retrieve more detailed content like tables and embedded media.
A key feature highlighted is the ability to provide natural language instructions to the crawler, which allows for more precise and relevant content extraction by filtering pages based on specific topics.
The results of the crawling process are shown in debug mode, detailing the structure of the output which includes the URL and raw content of each crawled page.
Finally, the video explains how to convert the crawled results into LangChain document objects, including metadata such as the source URL, in preparation for the next steps of the RAG pipeline like splitting and embedding.
Summary: Extracting LangChain Documentation at Scale
This video demonstrates a robust, scalable pipeline for extracting the entire LangChain documentation. The process uses the Tavily library for sitemap discovery and content extraction, combined with Python's asyncio for high-speed, concurrent processing.
Key Concepts & Steps:
Objective: To map and scrape all pages of the LangChain documentation (python.langchain.com) and convert them into structured LangChain Document objects.
Built for Scale: The solution is designed to be highly efficient by processing hundreds of URLs concurrently, reducing a task that could take hours to just a few minutes.
Step 1: Website Discovery (Mapping URLs)
The pipeline begins with a discovery phase using a tool called TavilyMap. This tool is pointed at the main documentation URL.
It then automatically explores the website's sitemap to generate a complete list of all individual documentation pages.
In the video's example, this step successfully identifies over 500 unique URLs to be processed.
Step 2: URL Batching for Efficient API Calls
To process the URLs efficiently and avoid overwhelming the server with hundreds of individual requests, the pipeline employs a batching strategy.
The full list of over 500 URLs is broken down into smaller, manageable chunks (e.g., batches of 20 URLs each).
This is a critical optimization because the content extraction API (TavilyExtract) is designed to handle multiple URLs in a single call, which is much faster than processing them one by one.
Step 3: Concurrent Content Extraction
This is the core of the high-speed pipeline, where two levels of parallelism are used.
First, each API call already processes a batch of URLs. Second, all of these batch-processing tasks are executed concurrently using Python's asyncio library.
Instead of waiting for one batch of URLs to finish before starting the next, the program sends out requests for multiple batches at the same time. This asynchronous approach ensures that the program is always busy fetching data, dramatically accelerating the overall extraction process.
Step 4: Structuring Data and Handling Results
As the concurrent tasks complete, the raw HTML content from each page is processed and structured. The system is designed to be resilient, capable of handling any failed requests or exceptions without halting the entire pipeline.
For each successfully scraped page, the raw text is formatted into a standard LangChain Document object.
This object contains the main page_content and, critically, metadata that includes the original source URL. Storing the source URL is a best practice for traceability and citation in RAG systems.
The final output is a clean, flattened list of these Document objects, ready for the next stage of the RAG pipeline.
Summary:
RAG Pipeline Overview: The video focuses on the next steps in the RAG (Retrieval-Augmented Generation) pipeline: chunking documents, embedding the chunks into vectors, and storing them in a vector store.
Chunking (Splitting) Phase:
Tool: The RecursiveCharacterTextSplitter from LangChain is used to break down the large, crawled documents into smaller, semantically coherent chunks.
Configuration: The splitter is configured with a chunk_size of 4000 characters and a chunk_overlap of 200 characters to maintain context between chunks.
Why RAG is Still Relevant: The presenter discusses that despite large context windows in modern LLMs (like 1M or 2M tokens), RAG is not obsolete. It remains crucial for:
Cost-Efficiency: Retrieving small, relevant chunks is cheaper and faster than feeding massive documents to an LLM.
Precision and Noise Reduction: RAG filters out irrelevant information, reducing hallucinations and overcoming the "lost in the middle" problem of long contexts.
Traceability: It allows users to see the source of the information, which builds trust, especially in regulated fields.
Vector Storage Phase:
Asynchronous Indexing: To optimize performance, a new asynchronous function index_documents_async is created to handle embedding and indexing concurrently.
Batching: The split documents are divided into batches (e.g., 500 documents per batch) to manage API rate limits from both the embedding model (e.g., OpenAI) and the vector store (e.g., Pinecone).
Concurrent Processing: The asyncio.gather function is used to run the indexing for all batches in parallel, significantly speeding up the overall ingestion process.
Error Handling: The add_batch coroutine includes a try-except block to catch and log any errors that occur during the indexing of a specific batch, ensuring the pipeline can report failures without crashing.
Final Logging and Execution:
After the indexing is complete, the script logs a summary of the entire ingestion pipeline, including the number of URLs mapped, documents extracted, and chunks created and stored.
The video concludes by running the complete ingestion script and showing the process in the terminal, including the final successful completion message.
Summary:
This tutorial demonstrates the implementation of a backend Retrieval Augmented Generation (RAG) system using Python, LangChain, and Pinecone. The process begins by setting up the necessary package structure and importing essential libraries, including os, dotenv for environment variables, and specific LangChain components like create_agent and init_chat_model for flexible model initialization (e.g., using GPT-5.2 via OpenAI).
Pinecone is utilized as the vector store, initialized with a specific index name and an embeddings model (specifically text-embedding-3-small) to convert text into vectors. A custom tool function, retrieve_context, is then created using the @tool decorator. A key feature of this tool is the response_format="content_and_artifact" setting. This allows the tool to return two values: a serialized string of content and sources for the LLM, and a raw list of document objects (artifacts) for the application to display to the user.
The tutorial creates a wrapper function, run_llm, which constructs a LangChain agent equipped with the retrieval tool and a specific system prompt instructing the agent to cite sources. The function manages the message history, invokes the agent, and extracts the final text answer. Crucially, the logic iterates through the message history to locate the ToolMessage containing the artifacts (the source documents) and separates them from the content sent to the model. This setup ensures that the user receives both the AI-generated answer and the specific documents used to ground that answer, fostering trust in the system.
Summary:
This tutorial demonstrates how to debug and trace a Retrieval Augmented Generation (RAG) agent using LangChain and LangSmith. The process begins by inspecting the execution results, specifically the answer and the retrieved context documents, followed by a debugging session to visualize the internal message flow.
Key concepts covered include:
Response Structure: The agent's response object contains a list of messages: the initial HumanMessage, the ToolCall (where the model decides to query the vector store), and the resulting ToolMessage.
Tool Messages (Content vs. Artifact): A crucial distinction is made within the ToolMessage. It returns two values:
Content: A serialized string containing source information and page content, which is sent back to the LLM to generate the answer.
Artifact: The raw list of document objects. These are not sent to the LLM (keeping the context clean) but are retained for use by the application or frontend, such as rendering sources.
Tracing with LangSmith: The tutorial utilizes LangSmith to visualize the trace, showing how the model rephrases the user query (e.g., changing "what are deep agents" to "LangChain deep agents definition") and executes the tool.
Retrieval Implementation: The video explains the preference for using vectorstore.as_retriever() over vectorstore.similarity_search(). The former is shown to render more effectively in LangSmith traces.
The session concludes by committing the updated code to the 2-retrieval-qa-finish branch on GitHub.
Introduction:
In this video, we will go over the steps to create a new file called main.py and use Streamlit to build an elegant and simple user interface.
Creating a Streamlit Runner
We will create a Streamlit Runner to run the Streamlit CLI with our main.py file. We will select Python as our language, name the runner "Streamlit Runner", and include the path to the Streamlit CLI.
Creating a User Prompt
We will then create a text input for the user to input their prompt. The prompt will be stored in a variable called "prompt".
Generating a Response
We will then generate a response to the user's prompt by running the run_llm function. The response will be stored in a variable called "generated_response".
Updating the Session State
We will then update the session state to include the user's prompt and the response from the AI. This will allow us to maintain chat history and provide context to the AI.
Displaying Chat History
If the chat history is not empty, we will display the chat history to the user. This will allow the user to see the questions they've asked and the AI's responses.
This course contains the use of artificial intelligence :)
2026- COURSE WAS RE-RECORDED and supports- LangChain Version 1.2+
**Ideal students are software developers / data scientists / AI/ML Engineers**
Welcome to the Agentic AI Engineering with LangChain and LangGraph course.
In this course you will learn how to design and build AI agents and agentic AI systems using LangChain and LangGraph, the most powerful frameworks for developing modern LLM applications.
Agentic AI Engineering focuses on building AI systems that can reason, plan, use tools, and autonomously complete tasks. With LangChain and LangGraph, you will build production-ready AI agents, RAG systems, and advanced LLM applications.
What is LangChain?
LangChain is an open-source development framework designed to simplify creating applications powered by large language models (LLMs).
Using LangChain, LangGraph, MCP, and modern LLM frameworks, you will build production-ready AI agents, multi-agent systems, and advanced RAG applications.
Please note that this is not a course for beginners. This course assumes that you have a background in software engineering and are proficient in Python. I will be using Pycharm IDE but you can use any editor you'd like since we only use basic feature of the IDE like debugging and running scripts .
You will build real-world Agentic AI systems using LangChain and LangGraph:
Search Agent
Documentation Helper – A chatbot over Python package docs (and any data you choose), using advanced retrieval and RAG.
Prompt Engineering Theory
Context Engineering Theory
Introduction to LangGraph
Model Context Protocol (MCP)
Deep Agents
Agentic AI Engineering Topics Covered:
Agentic AI Fundamentals
AI Agents
Agentic AI architectures
Multi-agent systems
AI engineering principles
LLM and Prompt Engineering
Prompt Engineering
Few-Shot Prompting
Chain of Thought
ReAct prompting
Context Engineering
Agent Frameworks
LangChain
LangGraph
Model Context Protocol (MCP)
Tool Calling
AI Agent Infrastructure
Vector databases (Pinecone, FAISS, Chroma)
Retrieval Augmented Generation (RAG)
Memory systems
LangSmith tracing
Throughout the course, you will work on hands-on exercises and real-world projects to reinforce your understanding of the concepts and techniques covered. By the end of the course, you will be proficient in using LangChain to create powerful, efficient, and versatile LLM applications for a wide array of usages.
Why This Course?
Up-to-date: Covers LangChain V.1+ and the latest LangGraph ecosystem.
Practical: Real projects, real APIs, real-world skills.
Career-boosting: Stay ahead in the LLM and GenAI job market.
Step-by-step guidance: Clear, concise, no wasted time.
Flexible: Use any Python IDE (Pycharm shown, but not required).
This course is ideal for developers who want to learn Agentic AI Engineering, AI agents with Python, and LLM application development.
You will learn how to design agent architectures, implement tool-using agents, and build scalable agentic AI systems using LangChain and LangGraph.
DISCLAIMERS
Please note that this is not a course for beginners. This course assumes that you have a background in software engineering and are proficient in Python.
I will be using Pycharm IDE but you can use any editor you'd like since we only use basic feature of the IDE like debugging and running scripts.