
Course Introduction!
Follow the prescribed learning path from basic python to lang chain, lang graph, and private agentic rag to achieve higher success. Skipping prerequisites leads to confusion and incomplete learning.
Download the code file from the GitHub repository, unzip it, open it in VSCode, and install requirements.txt with pip for the Lang Smith setup and Landgraf references.
Install ollama from olama.com by downloading the appropriate executable for Windows, Linux, or macOS, then explore the llm, lm serving, and api workflow with available models and the gui.
Explore the Ulama UI to access cloud and local models. Learn how to download models, run them on GPU, and observe the thinking and response process.
Explore how to configure ollama settings, manage accounts, upgrade to cloud, set model location and context length, and enable internet access for real-time search.
Learn to use the Lama UI as a rag system to attach images and PDFs, read with vision models like Gemma, and code workflows with Lang chain or Lang Groff.
Inspect the Quinn three 8‑billion-parameter model in the ulama library, examining its mixture of experts, four‑bit quantization, 32 attention heads, 36 transformer layers.
Evaluate benchmarks across tasks—from graduate level question answering to Olympiad mathematics and live code bench—comparing 8b, 30b, 235b, and 330b parameter models, tools, and tool calling.
Benchmark and test a queen 330 billion parameter model locally with live coding, answering physics and algebra questions, compare to GPT-5, and evaluate thinking versus non-thinking models.
Select models by task: run large models in the cloud and use embedding models like gnomic embed text or Gemma; leverage vision models such as Gma3 and Llama 3.2 vision.
Learn to use ollama pull and run commands from the terminal to download and serve models locally, switch between the UI and CLI, and manage model history.
Start ollama serve and inspect model files, then manage models with pull, list, and remove commands. Explore the UI to run Quinn and llama models with memory optimization.
Master the Ollama cp and ps commands to copy models, name copies, and monitor running instances, including context windows and GPU usage.
Learn to create and run an Ollama model with predefined settings by loading a model file, configuring temperature and context, adding a system prompt, and executing the run.
Explore Ollama message commands in message mode, view help, inspect model info and files, and compare loading a llama 3.2 model or a saved seldon model while setting parameters.
Learn to create an Ollama model from llama 3.2 using message commands, set temperature and context, configure a system prompt, save the model, and adjust history and verbose settings.
Explore llama raw APIs to generate completions and chat completions, test streaming and non-streaming modes, and format outputs as JSON using curl and git bash without frameworks.
Learn to download a GGUF model from the hugging phase and load an uncensored wizard LM model into Ullama locally. Configure a system message for educational content exploration.
Learn to run local llms on Ullama Olama, focusing on Nemotron 3 nano (4b) and a 32b model, compare mamba and mix-of-experts architectures, and review context window and Quint 3.5.
Calculate model memory by multiplying parameter counts by 0.5 bytes for 4-bit quantization, illustrating qwen3.5, nemotron, and mixtral on ollama with tokenizer, metadata, and vision memory.
Compare dense, sparse mixture of expert, and mamba architectures, detailing attention, kvcache, routing, and ssm state. Explain how active parameters differ and why context handling affects text generation speed.
Compare dense transformer, sparse MoE, and hybrid member transformers—Nemotron 3 Nano, Quin 3.59b, Quin 3.535b—locally with Ullama to tackle LeetCode hard problems and assess performance and acceptance.
Compare Nemotron3 nano 4B and Quen 3.5 4B on LeetCode hard and medium problems, showing 4B models struggle with hard tasks and perform better on medium tasks.
Understand that agent states are stored as a dictionary of messages (human, AI, tool, system) and managed by memory with volatile RAM and short-term and long-term memory options.
Compare the generation speed of Nemotron 3 nano and Quint 3.54 billion parameter model using mamba versus dense architectures, and examine per-token throughput with no KV cache.
Demonstrates solving LeetCode hard problems with Nemotron 30 billion parameter model, benchmarking generation speed and test case validation, and comparing with Quint 3.5 35 billion parameter model.
Explore flow engineering and finite state machines through the line graph, learning state, node, and edge concepts while managing messages and history.
Create a custom LangGraph state with a typed dict, defining input and output text fields, and build a state graph canvas to manage nodes and edges.
Create custom nodes in a line graph by writing a Python method and typing the input state, then transform input text to uppercase and return the updated state.
Create and connect custom nodes like add prefix and add suffix, manage state for input and output text, and execute nested nodes in a line graph to produce the output.
Create and visualize your first LangGraph graph by building a state graph canvas, adding nodes like process_input, add_prefix, and add_suffix, and connecting edges to form a runnable line chain.
Compile and invoke a LangGraph workflow with graph.invoke, passing input_text state and generating output_text automatically. See how keys flow across nodes and how simple versus nested states shape execution.
Advance your LangGraph workflows by building a SQL agent with the line graph, exploring rag applications, and designing private agents like a MySQL agent, after reinforcing fundamentals.
Introduce the MySQL agent with line graph fundamentals, outlining routing and workflows for get database schema, generate SQL query, validate SQL query, execute SQL query, and fix SQL query.
Set up a MySQL agent notebook by importing libraries, loading the employees db, creating the db, and connecting with a SQL database connector using long chain tools and reasoning.
establish a MySQL database connection, enumerate six tables, and extract the database schema to empower an agent with schema-driven data retrieval and few-shot prompts.
Implement the get_database_schema Langchain tool for MySQL to return full schema or a specific table's schema, validating table names and providing a helpful error with available tables.
Design and implement a generate_sql_query Langchain tool for MySQL that uses a defined schema and prompts to generate only select queries for read operations.
Validate a MySQL sql query using a LangChain tool for safety and syntax before execution. Clean and normalize the query, removing sql code blocks with regex.
Design and test a validate_sql_query Langchain tool for MySQL that enforces only select statements and blocks dangerous keywords, ensuring safe, validated queries.
learn to implement an execute_sql_query tool in LangChain for MySQL, validate queries before execution, handle errors, run queries, and interpret results with practical testing.
Design and test a fix_sql_error tool for SQL using LangChain, passing original query, error message, and question, then return a corrected SQL query that follows SQL syntax.
Create a MySQL agent by defining agent state and annotating messages, wiring tools like get_database_schema and generate_sql_query within an LM with tools. Build the Elm with the tools.
Create an agent node with a name and variable, and craft a system prompt detailing a SQL analyst workflow: get schema, generate SQL, validate SQL, execute SQL, and retry fixes.
Create a conditional router that controls agent execution using should_continue, reads the latest agent state, and handles tool calls until a final answer is reached.
Create a MySQL agent with LangGraph by building a graph of agent and tool nodes, wiring edges and conditional flows, so the agent orchestrates tool calls until final answer.
Test and troubleshoot a MySQL agent by running sample queries, handling input parameters and agent state, and compare Qwen3 with GPT Oasis for SQL generation, noting model quality affects results.
Evaluate a MySQL agent by testing complex queries, including group by and joins, to compute the average salary by department and identify top paid employees.
Explore the PageRAG sneak peek: ingest json data into a vector db, extract metadata, apply reranking, and prepare documents for a private agent RAG workflow.
Master retrieval augmented generation (rag) concepts from data ingestion to retrieval and reranking, then design agentic rag using advanced techniques and vector databases with an embedding model.
Explore the page rag architecture that chunks financial documents page-wise, adds metadata, and ingests them into a vector DB for precise, reranked retrieval.
Design a page architecture where the agent automatically fetches relevant chunks using embeddings and metadata, filters by metadata, reranks by cosine similarity, and delivers a final answer.
Set up PageRAG notebook with metadata ingestion, filtering based on the metadata, cosine-based ranking and reranking of chunks, and vector DB integration using embeddings, doc link, and dedup hashing.
Learn to set up chroma vector db, create a financial box collection, configure nomic embed text, specify base url, and persist data for pdf ingestion and retrieval.
Extracts metadata from file names by parsing company name, document type, fiscal quarter optional, and fiscal year into a dictionary. Handles pdf removal, 4-part vs 3-part formats for precise filtering.
extract markdown text from pdf pages with the doclink document converter, converting pdf pages to markdown data and preparing for page-wise access.
Compute a sha256 hash of a PDF by reading 4096-byte chunks to prevent duplicate ingestion in the vector DB, storing the hash as metadata.
Track processed files to prevent duplicate ingestion and deduplication using a chroma vector store, fetch metadata and file hashes, and prepare for document ingestion.
Ingest documents into a chroma vector DB by converting the data dir to a pathlib path and recursively listing PDF files, computing metadata, embeddings, and hashes.
Ingest PDF pages by converting them to markdown, extract metadata, assemble page content with metadata, and ingest prepared documents into the vector DB.
Ingest documents into a vector db using the doc link to auto-detect file types, select rapid OCR with cuda, convert to markdown.
Learn how to ingest documents into a vector DB, apply metadata filtering and LLM-based extraction to retrieve and rerank relevant chunks using cosine similarity for accurate RAG results.
Master data retrieval and reranking in rag systems by ingesting data correctly, extracting metadata from queries with structured llm outputs, and using mmr and bm25 ranking for vector db search.
Execute data retrieval and reranking by configuring vector embeddings, metadata extraction, and LM-driven structured outputs, then apply BM25 reranking with filtered search to improve document relevance.
Define fiscal quarter and document type using a pydantic model and enums, creating a pedantic schema for structured lm outputs and guiding classification for 10-K, 10-Q, 8-K and Q1–Q4.
Learn to extract chunk metadata from user queries by using a metadata schema and a language model to produce a structured dictionary with optional fields for vector DB filtering.
Define a base Pydantic model for chunk metadata extraction, with optional fields for company name, doc type, fiscal year, and fiscal quarter, enforcing predefined values via enum-based model config.
Define a ranking keywords pydantic model and a ranking keywords class that produces exactly five financial keywords from the user query. Integrate the model with the LLM to rank documents.
Extract metadata filters and ranking keywords from user queries with a structured llm output, guided by a defined schema, detailed prompts, and few-shot mappings for company names and document types.
Generate five exact financial keywords from 10-K and 10-Q filings, then apply a bm25 ranking on extracted chunks to rank related content.
Implement the search docs method to retrieve the top five documents from the vector DB using MMR search with ranking keywords and metadata filters.
Implement metadata filtering in Chroma DB by composing a search keyword with filters and multiple conditions, using and/or logic to refine results. Set a default k and ranking keywords.
Implement enhanced document retrieval by applying full text search with ranking keywords, using where document filters and contains/not contains, plus metadata and embedding model filtering.
Explore enhanced retrieval by combining metadata filters with full-text keyword ranking, increasing k, and reranking to push the most relevant chunks to the top.
Extract headings, subheadings, and the following paragraph to support reranking, focusing on table headings and concise content for ranking keywords.
Process chunks for re-ranking explains how to pair sections and headings with their content, validate next content availability, and build formatted heading-content chunks for later re-ranking.
Rank documents using BM25 plus on the heading and content chunks, then tokenize the query and corpus, compute scores, and return the top-k most relevant documents.
Rank documents with BM25 plus by extracting, joining, and lowercasing document chunks, using ranking keywords, then tokenizing and scoring against query tokens to retrieve top-k results.
Rank documents by keyword with a Python sort using a lambda key to sort in descending order of doc scores, and print top k indices to verify proper ranking.
Explore production-level data retrieval and re-ranking with keyword-based ranking to surface contextually relevant documents, using cash flow examples like consolidated statements of cash flow and free cash flow.
Explore the data ingestion and retrieval workflow of agentic rag, including vector stores, hash-based ingestion, cosine similarity with filtering and reranking, and the generation stage.
Centralize data retrieval and reranking code into a reusable utils.py, enabling universal data retrieval across rag systems and applications, and export notebook methods to a clean Python module.
Refactor notebook code into a centralized utils.py for retrieval and reranking, import and test utils.extract_filters with a sample query, and wire the utilities into an application.
Design a rag agent workflow and create the agent state by building a retrieve dock node, applying filters, ranking keywords, searching docs, and preparing context.
Implement a Python retrieval tool for a LangChain workflow, using filters, ranking keywords, and document search and reranking, then expose it as a LangChain tool with logging and environment setup.
Learn to build a retrieve docs workflow with LangChain, format retrieved docs (metadata, content), handle empty results, and save the final context as a .md file.
Store retrieved text in a local debug_logs directory as a utf-8 markdown file, then return the retrieved text to serve as the agent's context for debugging and understanding.
Design and implement an agent node by defining agent state, reading messages, attaching a tool, binding tools, and integrating a detailed system prompt to drive tool calls and document retrieval.
Create an agent page with a graph workflow that routes between an agent node and a tool node, handling tool calls and delivering answer via a retrieval system with ranking.
Demonstrate testing an agentic rag workflow, where a query is broken into multiple retrievals, documents are retrieved and ranked, and a final answer with revenue data for 2023 is presented.
Explore the corrective RAG approach, where retrieved documents pass through an evaluator to discard irrelevant results and refetch from internal or external knowledge bases, ensuring robust production-ready RAG.
Design and implement a corrective RAG system that retrieves from internal vector, grades relevance, and routes to answer generation or web search via DuckDuckGo, with query rewriting when needed.
Set up the CRAG notebook, configure state graphs and messages, import tools and embeddings, and build a retrieval tool with a structured output format for grading.
Implement two centralized tools—the retrieve docs tool and the web search tool—by importing utilities, loading environment variables, and modularizing code for production-ready Langchain workflows.
Apply wide retrieval and narrow selection using BM25 reranking, guiding document retrieval with filters and ranking keywords in the search docs pipeline to retrieve and rank documents.
Build and test an agent state for retrieved documents and rewritten queries, then implement and use my tools for doc retrieval and web search to power multi-node workflows.
Create a retrieve node that fetches user question from state messages, calls the document retrieval tool with default k, logs retrieved documents to debug logs, and returns results for grading.
Create a document grading node that uses a router-based decision to route to answer or rewrite query, via a structured Pydantic data model and a boolean relevance field.
Learn to create a rewrite query node that transforms the user question into a concise, retrieval-targeted prompt for document search, integrating it with a web search retriever.
Build a web search node using DuckDuckGo to retrieve external knowledge by rewriting the original query, following the research paper and contrasting with internal vector DB results.
Create an answer generator node that uses retrieved docs and wave search to generate the final answer, then route to the answer node or through rewrite and web search.
Learn to build a graph execution router that routes to the answer or rewrite node based on relevancy, including debug messages and proper node labeling.
Create a crag agent in LangGraph by wiring retriever, grade, rewrite, web search, and answer nodes, linking edges, compiling, and testing performance.
conduct a practical crag agent performance evaluation by invoking the agent with a user query, retrieving documents through the retriever, calling tools, and validating the final answer with the grader.
In this CRAG agent performance evaluation, the speaker demonstrates how a retriever and vector store retriever use rewritten queries and web search, noting single‑company success and multi‑company challenges.
Explore a reflection-based rag agent architecture: draft and revised nodes, self-reflection, evaluator, and retrieval loop, inspired by the reflection language agent with verbal reinforcement learning.
Set up the reflection notebook for the agentic rag workflow, initialize agent state, tools, and structured output, and configure the retrieve-revise loop with a max iteration limit.
Create a draft node for an agentic reflection system using a structured llm and Pydantic schema to generate answers and surface missing information for search queries.
Create a draft node that formats text into a structured JSON response with answer and reflection, capturing missing information and search queries for the AI message guiding the next node.
Create a retrieval node that fetches documents from the vector store for each generated search query and assemble retrieved_text into retrieved_docs for the critic agent to use as context.
Explains how the retrieve documents for reflection node gathers queries, fetches up to three documents per query using mmr and vector db, and formats a combined, searchable result.
Create a revise node that critiques and self-reflects on its generated answers, produces search queries, and follows an answer schema with a detailed system prompt to refine results.
Explore creating a self-reflection based revise node that critiques its prior answer, outputs JSON data, and tests completion and search queries in a retrieval-based prompt workflow.
Implement router logic for the revise node by evaluating evaluator feedback, the complete flag, max iterations, and routing via search queries to the retriever, revised node, and reflection rig agent.
Build a reflection agent graph with a graph canvas and shared state. Add draft, retrieve, and revise nodes; connect edges and conditional routing; compile the graph.
Conduct performance testing of the reflection agentic RAG system, showcasing prompt engineering, state management, and iterative querying to retrieve documents and generate a final answer.
Conduct performance testing of the reflection agentic RAG, detailing draft, retrieve, and revise cycles, including Amazon and Apple 2024 Q1 comparisons and 2023 iPhone and MacBook segment earnings.
**This course is not for absolute beginners in AI - you should first learn LangChain fundamentals, then LangGraph, and only after that take this course for the best learning experience.**
Private Agentic RAG with LangGraph and Ollama is an advanced, project-based course that teaches you how to build private, production-ready Retrieval-Augmented Generation (RAG) systems using LangGraph, LangChain, Ollama, ChromaDB, Docling, and Python.
This course is designed for developers who want strong control over their data, full privacy, and complete end-to-end workflows using local LLMs.
You will learn how to build modern RAG systems, implement advanced retrieval pipelines, add agent workflows, use LangGraph state machines, integrate SQL agents, and run everything on your own machine using Ollama. All projects run 100 percent locally, with no external API cost and no data leaving your system.
The entire course is practical. Every concept is explained with step-by-step notebooks, complete Python code, and real examples using SEC financial filings from Amazon, Google, Apple, and Microsoft.
What You Will Learn
Ollama and Local LLM Setup
Install and configure Ollama for private LLM deployment
Use models like Qwen3, GPT-OSS, Llama 3.2, and nomic-embed
Create custom LLMs with Modelfiles
Use Ollama CLI and REST API for text, chat, and embeddings
LangGraph Fundamentals
Build state machines using TypedDict
Create nodes, reducers, and conditional edges
Build multi-step workflows with START/END logic
Visualize execution with diagrams
Understand message accumulation and state merging
Complete RAG Systems (from scratch)
Ingest PDFs using Docling with OCR and table extraction
Build page-level chunks for accurate retrieval
Extract metadata from filenames and LLMs
Remove duplicates using SHA-256 hashing
Store documents in ChromaDB with metadata filters
Two-Stage Retrieval Pipeline
Build metadata filters from natural language
Generate financial keywords using structured LLM outputs
Use ChromaDB with MMR search
Implement BM25Plus re-ranking for better accuracy
Extract headings and sections for improved ranking
Agentic RAG using LangGraph
Build tool-calling agents using the ReAct pattern
Implement document retrieval tools using LangChain
Build agents that call tools multiple times
Add table-based answers with citations
Support multi-turn conversations with memory
Corrective RAG (CRAG)
Grade retrieved documents using a Pydantic schema
Detect irrelevant results and rewrite queries
Add web search fallback using DuckDuckGo
Prevent infinite loops with controlled retries
Generate final answers with correct citations
MySQL SQL Agent
Build a natural-language SQL agent with LangGraph
Retrieve schema, generate SQL, validate, run, and fix errors
Handle multi-table joins and complex metrics
Automatically correct broken SQL queries
Support explanations and safe database access
Financial Document Analysis Project
Work with real SEC filings: 10-K, 10-Q, 8-K
Build a complete RAG system that answers questions like:
“What was Amazon’s revenue in 2023?”
“Compare Google and Apple’s cash flow for 2024”
“Show segment revenue with citations and tables”
Use ChromaDB + BM25 for accurate retrieval
Produce clean, formatted answers with tables and reasoning
Who This Course Is For
Developers and engineers who want to build advanced RAG systems
ML practitioners who want full privacy using local LLMs
AI engineers working on LangGraph, LangChain, or agent systems
Backend developers who want to build real GenAI applications
Anyone interested in private, production-grade LLM workflows
This is an advanced-level course. Good LangGraph or Langchain knowledge is required.
Why This Course Is Different
The entire course runs locally using Ollama
Zero API cost and complete data privacy
Covers modern RAG techniques: PageRAG, CRAG, Reflexion ideas
Real datasets from top tech companies
Covers LangGraph deeply with real production workflows
Includes SQL agents, financial RAG systems, and multi-step agents
Step-by-step, practical, and code-heavy
By the End of This Course You Will Be Able To
Build private, production-ready RAG systems
Deploy and fine-tune local LLMs with Ollama
Build graph-based agents using LangGraph v1
Create advanced retrieval pipelines using MMR and BM25Plus
Analyze financial documents with precise citations
Build SQL agents for natural language database queries
Handle query rewriting, grading, and web fallback
Build complete agentic RAG applications end-to-end