
Course Introduction!
Build a chainlit front-end connected to a back-end agentic system and vector database, enabling data uploads, OpenAI-compatible endpoints, and production-grade RAG app deployment.
Discover Ragwire, a production-grade REG framework that connects vector databases to document converters and supports multiple LLMs like Ullama, OpenAI, Anthropic, and Grok. Compare Ragwire with basic REG and explore its architecture for easier deployment.
RAGWire presents a three-part production rag pipeline, ingest into a vector store, query relevant chunks, and generate answers with an llm, plus smart markdown chunking and dense plus sparse embeddings.
Import env vars and enable langsmith tracing, then set up rag wire, check version, and configure info-level logging. Load config.yaml with ulama embedding, ulama llm, vector store, and hybrid retriever.
Connect rag wire to Qdrant vector database by loading config.yaml, initializing document loader, text splitter, and embedding model, then create and verify a hybrid search enabled collection on localhost 6333.
Design a pedantic yaml metadata schema to extract company name, document type, and fiscal year. Configure ragwire prompts for structured output and verify metadata ingestion from documents like a 10-K.
Explore RAGWire APIs and metadata for hybrid search by discovering metadata fields, filtering options, and structured extraction to build production-grade GenAI apps with agents.
Set up OpenAI for RAGWire by creating an API key, storing it in environment variables, and configuring YAML with text embedding three small and LLM models (GPT 5.4 or nano).
Copy and rename the grok config to Gemini, set the Google API key, and update embedding and llm to Gemini 001 and Gemini models for Ragwire.
Run and test the production rag app by launching app.py in the conversational rag chatbot directory with chainlet, and observe memory, vector db access, and UI at localhost 8000.
Wire OpenAI compatible routes into FastAPI and initialize the Ragwire server. Test health, models, and chat completions with Postman and observe the API responses.
LangGraph self-correcting RAG: explore end-to-end testing by implementing rewrite query, retriever, generate, and conditional ages nodes to route and refine results.
Learn to design an end-to-end RAG workflow with a synthesizer agent, multiple specialist agents, and an aggregator, then test streaming outputs and deploy the Microsoft multi-agent framework.
Deploy your fastapi rag backend to production by creating a separate repository, forking the ragware fastapi rag backend, and preparing minimal requirements and environment variable for renderer, railway, and aws.
Deploy and connect a live rag app on render to a local chainlet chat UI by configuring the API URL and running the frontend and backend together.
Retrieval-Augmented Generation (RAG) is at the core of every serious AI application today. But basic RAG pipelines quickly hit their limits when documents are large, queries are complex, or your application needs to run reliably in production.
In this course, you will build RAGWire — a production-grade RAG toolkit built on LangChain, Qdrant, and LangGraph — from the ground up. You will start with a simple hybrid search pipeline and progressively add advanced retrieval, metadata filtering, agentic RAG, multi-agent frameworks, a full chat UI, and multi-cloud deployment.
By the end of this course you will know how to:
Build a hybrid RAG pipeline with BM25 sparse + dense retrieval and Reciprocal Rank Fusion (RRF)
Configure RAGWire with OpenAI GPT, Groq, Google Gemini, Ollama, and HuggingFace embeddings
Implement LLM-driven auto metadata filtering over complex, nested document structures
Build agentic RAG pipelines with LangChain agent tools, memory, and reasoning
Build a self-correcting RAG agent that grades its own retrieval and rewrites queries when quality is low
Build supervisor multi-agent systems that route queries to specialist agents using LangGraph
Build multi-agent document analysts with CrewAI, Microsoft AutoGen, and Microsoft Agent Framework
Build a production Chainlit chat UI with authentication, chat history, and document upload
Build a FastAPI backend with OpenAI-compatible /v1/chat/completions endpoints and SSE streaming
Deploy RAG agents to Render, Railway, AWS ECS Fargate, GCP Cloud Run, and Azure
Secure production APIs with API keys and protect credentials with Docker .dockerignore
This is a hands-on, code-first course. Every section produces working, runnable code that you can adapt to your own documents and use cases.