
Kick off your AI journey by understanding the roadmap for building real, hands-on AI agents step by step.
Welcome to Day 1 of the Hands-On AI Agents Course—your gateway to building intelligent, autonomous applications from scratch. In this opening lecture, we’ll outline exactly what you’ll accomplish today as you begin crafting your very first AI agent using modern tools like Ollama, Python, and lightweight UIs.
This session sets the stage for your learning. You’ll understand what makes an AI agent different from a chatbot, what tools you’ll install, and how the course balances theory with hands-on development. We’ll highlight the core goals for the day:
Understand the concept and purpose of AI agents
Set up the Ollama framework for running LLMs locally
Use Python to build your first agent that can respond and reason
Add memory capabilities so your agent can recall context
Wrap everything in a web-based UI for real-world interactivity
You’ll also preview the broader skills you’ll gain, including deploying AI agents with voice input, building AI-powered web scrapers, and constructing document-reading bots.
By the end of this lecture, you’ll have a crystal-clear view of what you’re learning, why it matters, and how each tool fits into the bigger picture. Whether you're aiming to build your own AI startup tools, automate tasks, or just deepen your skills, this course is designed to empower you.
This is the foundation for everything that follows. Let’s get ready to build intelligent systems that do more than chat—they act.
Go beyond chatbots—discover what makes AI agents truly autonomous, goal-driven, and context-aware.
In this lecture, we break down the core concept of AI agents—a powerful leap beyond traditional chatbots and large language models. You’ll learn what defines an agent, how it differs from a simple model, and why agents are the foundation of the next wave of intelligent automation.
An AI agent is more than just a question-answer machine. It can make decisions, execute tools, maintain memory, and work toward goals over multiple steps. You'll explore the components of an agent, including:
A reasoning engine powered by an LLM
A memory module to retain context across sessions
Access to external tools, like web search or databases
The ability to act autonomously toward a defined outcome
We’ll compare agents to typical prompt-based interactions and demonstrate how they operate in loops—thinking, planning, and acting repeatedly to solve problems or complete tasks.
This session introduces real-world use cases for agents, including:
AI personal assistants that manage schedules and automate messages
AI research agents that browse, read, and summarize content
Enterprise bots that streamline operations across departments
By the end of this lecture, you’ll understand the architecture of AI agents, their growing role in agentic AI systems, and why they are becoming essential tools in fields like customer support, automation, and productivity enhancement.
This conceptual foundation is key for the hands-on work ahead. You're not just building another chatbot—you’re creating software that thinks, remembers, and acts on your behalf.
Learn why Ollama is the best local platform for running fast, private, and customizable AI agents.
In this lecture, you’ll discover why Ollama is the ideal environment for building and running your own AI agents locally. While many AI tools rely on cloud APIs and external servers, Ollama enables you to run large language models on your own machine—securely, quickly, and offline.
We’ll start by exploring what Ollama is: a lightweight, container-style runtime that simplifies model installation, deployment, and management. Unlike cloud services, Ollama gives you full control over your AI development environment, without needing an API key or subscription.
You'll learn the key benefits of using Ollama:
Privacy-first: Run your models locally with no external data sharing
Offline-ready: Build and use AI agents even without an internet connection
Fast performance: Optimized for speed with local model execution
Customizable: Download and fine-tune open-source LLMs to fit your use case
We’ll also compare Ollama to other platforms like OpenAI, Hugging Face, and LangChain, showing you when to use each—and why Ollama is perfect for rapid prototyping and building hands-on AI tools.
Through examples, you’ll see how Ollama integrates seamlessly with Python, web apps, and your local file system. Whether you're creating a voice assistant, a Q&A bot, or a smart web scraper, Ollama gives you flexibility and control without vendor lock-in.
By the end of this lecture, you’ll understand how Ollama powers autonomous, privacy-friendly AI agents, and why it’s quickly becoming the go-to choice for developers building with open-source LLMs.
When you own your AI runtime, you unlock the freedom to innovate. Ollama puts that power in your hands.
Install and configure Ollama to run local AI models—no cloud, no API keys, just fast and secure development.
In this lecture, you’ll get hands-on with Ollama setup, the essential first step in building your own AI agents locally. Ollama enables you to run large language models directly on your machine, offering a developer-friendly alternative to cloud-based APIs like OpenAI or Hugging Face.
We’ll walk you through the installation process step by step, covering how to:
Download and install Ollama for macOS, Windows, or Linux
Set up the Ollama CLI to launch and manage models
Verify your environment and troubleshoot common setup issues
You'll also explore how Ollama works under the hood. Unlike typical installations that require manual configuration, Ollama handles model containers, caching, and memory management seamlessly—making it ideal for anyone building LLM-based agents, even with minimal DevOps experience.
We’ll also cover system requirements and tips for smooth performance, including:
Recommended RAM and GPU specs
Choosing models optimized for your hardware
How to use Ollama logs for debugging
By the end of this lecture, you’ll have Ollama fully installed, tested, and ready to power your first locally run AI agent. You’ll be able to pull models with a single command and run inference directly from your terminal or Python scripts.
This setup unlocks a new development workflow—where your AI is fast, offline, and entirely under your control. If you're serious about building smart agents, Ollama is your launchpad.
Learn how to choose, download, and run local large language models tailored to your AI agent’s needs.
In this lecture, you’ll download your first language model using Ollama, enabling you to build intelligent, local-first AI agents. The model is the brain of your AI system—so selecting the right one is key to performance, speed, and capability.
You’ll begin by exploring the Ollama model library, a curated set of optimized LLMs (large language models) that run efficiently on personal machines. You’ll compare available models like LLaMA 3, Mistral, and Code Llama, based on their:
Size (7B, 13B, etc.)
Language capabilities
Inference speed
Resource usage
Next, you’ll learn how to download a model using a simple terminal command like ollama run llama3, and how to cache it locally for repeated use without redownloading. You’ll see how Ollama containers manage models like Docker—isolated, lightweight, and easy to run.
We’ll walk through:
Checking system compatibility before download
Monitoring model download progress
Running a quick test prompt to ensure it responds correctly
Switching between models depending on your agent's tasks
You’ll also get tips on choosing models for specific use cases—whether you’re building a voice assistant, web scraper, or document reader agent. We’ll even discuss how you can eventually fine-tune models to customize behavior.
By the end of this lecture, you’ll have a powerful, fully functioning LLM running locally, ready to process prompts, maintain memory, and drive your AI agent’s actions—all without relying on cloud services.
This is the moment your AI becomes intelligent. With the right model in place, you’re ready to build agents that can think, respond, and solve real-world problems.
Get up to speed with the essential Python skills you need to build, extend, and customize AI agents.
This optional lecture is designed for those who are new to Python or need a refresher before diving deeper into hands-on AI agent development. While Ollama handles the model runtime, Python is your glue—the language that connects tools, data, and logic into a functioning AI system.
You’ll learn the core Python programming concepts required to build and customize AI agents, including:
Variables, data types, and basic operators
Lists, dictionaries, and loops
Defining and calling functions
Reading input and printing output
Working with files and system commands
We’ll also introduce Python packages you’ll be using throughout the course such as:
subprocess for executing commands
requests for making API calls
json for handling structured data
os and pathlib for navigating the file system
Through short, real-world examples, you’ll write code that could power core AI agent functions—like handling user input, managing agent memory, triggering tool use, and updating the user interface.
This lecture is fast-paced but beginner-friendly, helping you feel confident even if you’ve never coded before. For more experienced learners, it serves as a quick reference before diving into the project builds ahead.
By the end of this session, you’ll be able to read, write, and understand the Python code behind your AI agents, giving you full control over their logic and behavior.
Python is the engine room of every modern AI system—and this is your chance to get fluent in the language that powers the future of intelligent automation.
Build your first working AI agent from scratch—capable of understanding prompts and generating smart, dynamic responses.
In this lecture, you’ll put theory into action by creating your very first AI agent using a locally run large language model (LLM) via Ollama. This will be a fully functional, lightweight agent capable of taking user input, thinking with an LLM, and returning intelligent output—all coded in Python.
You’ll walk through the core architecture of a basic AI agent:
Accepting user prompts from a command-line interface
Passing input to a local LLM (like LLaMA or Mistral) using Ollama
Receiving, formatting, and displaying the model’s response
Running the agent in a conversational loop
You’ll build a simple script that lets your agent “think” by generating text-based responses in real time. This foundation introduces the key elements of agentic design—a loop of perception (input), reasoning (LLM), and action (response).
In the process, you’ll learn how to:
Structure your agent logic with reusable Python functions
Customize system prompts to guide agent behavior
Add basic logging and output formatting for clarity
Test the agent with different use cases: summarization, Q&A, explanation
By the end of this lecture, you’ll have a working prototype of an AI-powered agent running locally on your machine—no internet, no APIs, just smart automation that’s fully under your control.
This is your first true AI agent build—and it’s only the beginning. You’ve now crossed the line from user to builder, and your journey toward powerful, autonomous software systems has officially begun.
Upgrade your AI agent with short-term memory so it can remember, reference, and build on past conversations.
In this lecture, you’ll take your agent from reactive to context-aware by integrating a simple yet powerful memory system. One of the defining features of advanced AI agents is their ability to remember prior interactions, maintain context, and behave more like intelligent assistants.
You’ll learn how to build and implement conversation memory in Python—allowing your agent to store past user inputs and model responses during a session. This makes your AI more coherent, capable of referring back to previous questions, and better at carrying on multi-turn conversations.
You’ll walk through:
Designing a memory buffer to store chat history
Concatenating memory into the system prompt for context preservation
Handling memory limits and trimming for performance
Improving user experience with context-aware responses
Using local LLMs via Ollama, you’ll craft prompts that grow dynamically with the session—giving your agent the ability to “remember” as it converses. You’ll also add logic to summarize or compress long interactions when needed, preparing your agent for real-world usage.
By the end of this session, your AI agent will:
Maintain short-term memory across multiple user inputs
Reference past messages for more relevant replies
Simulate natural conversation with context chaining
Memory is what transforms a chatbot into a true AI assistant—one that doesn’t reset with every prompt but evolves with the interaction.
This lecture marks a turning point. You’re no longer just building a speaking tool—you’re building an agent that listens, remembers, and adapts.
Create a clean, interactive web interface for your AI agent—so anyone can chat with it like a real app.
In this lecture, you’ll bring your AI agent to life in the browser by building a simple yet elegant web-based user interface (UI). Moving beyond the terminal, this web UI allows users to interact with your agent in a familiar chat-like experience—just like ChatGPT or other AI tools.
You’ll use Python alongside lightweight frameworks like Flask or Streamlit to:
Set up a basic web server
Create an input field for user messages
Display AI-generated responses in real time
Style the interface for usability and clarity
You’ll learn how to connect the frontend UI to your backend agent logic, passing inputs from the browser to your locally running Ollama LLM and returning responses in a chat format. The web interface will also reflect memory—showing the ongoing conversation between user and agent.
This hands-on build includes:
HTML and CSS styling (optional) for layout enhancement
Live form submission and message updates
Hosting the app locally and testing in your browser
Error handling and input validation for a smooth user experience
By the end of this lecture, you’ll have a fully functional AI agent web app running on your local machine—one that you or others can use with ease. This web UI becomes your foundation for future deployments, whether you're building a personal assistant, AI concierge, or even a customer-facing product.
This is where your AI agent becomes more than just code—it becomes usable, accessible, and real. Welcome to the front-end of the agentic revolution.
Get ready to build a voice-powered AI assistant that can listen, speak, and interact like a digital companion.
Welcome to Day 2 of the Hands-On AI Agents Course! In this lecture, you’ll get a clear roadmap for today’s project: building your own personal AI assistant. This agent will go beyond text—it will respond with voice, process speech input, and interact through a friendly web interface.
You’ll start by understanding the goals for Day 2, which include:
Installing voice processing dependencies (like speech recognition and text-to-speech tools)
Integrating your local LLM via Ollama into a real-time voice assistant
Building a continuous listening loop to capture and process spoken input
Rendering AI-generated replies as human-like audio output
Deploying the assistant via a web UI for easy interaction
This session frames how voice interfaces are revolutionizing AI user experience, from smart homes to productivity apps. You’ll explore the difference between basic command-recognition bots and intelligent, language-aware agents capable of handling natural conversation.
By the end of this lecture, you’ll understand what’s ahead for the day, how the pieces come together, and what skills you’ll walk away with—including working knowledge of:
Voice interface engineering
Real-time AI interaction design
Connecting front-end and back-end in agent architectures
Whether you’re building a smart desktop companion, a home automation assistant, or a prototype for a future product, this day’s project equips you with the skills to bring speech-capable AI agents to life.
Today, your AI learns to listen and speak—and your agent becomes a true assistant.
Set up all the tools your AI assistant needs to hear, think, and speak—seamlessly and in real time.
In this lecture, you’ll install all the Python packages and supporting libraries required to build a fully functional voice-powered AI assistant. With your local LLM from Ollama already running, it’s time to add audio input and output capabilities—turning your agent into an interactive, speech-enabled assistant.
We’ll guide you step by step through installing and testing the core dependencies:
SpeechRecognition for capturing and converting voice to text
PyAudio or whisper-jax (for improved transcription)
gTTS, pyttsx3, or other text-to-speech (TTS) engines
Flask or Streamlit for building a browser-based assistant interface
Tools for managing virtual environments and versioning (e.g. pip, venv, or poetry)
You’ll also get help troubleshooting common issues like microphone access errors, audio device compatibility, and dependency conflicts on macOS, Windows, and Linux.
By the end of this lecture, you’ll have a development environment fully equipped to:
Accept spoken commands from your microphone
Send transcribed input to your AI agent
Generate voice responses in real time
Serve the entire assistant through a local web app
These dependencies form the infrastructure behind your AI assistant’s speech pipeline. Whether you're automating tasks, answering questions, or building voice-first applications, this setup gives your agent ears and a voice.
Now that your agent has language understanding thanks to the LLM, it’s time to give it human-like interaction skills. This lecture ensures everything runs smoothly—so you can build with confidence.
Turn your AI agent into a voice-powered assistant that listens, thinks, and responds with natural-sounding speech.
In this lecture, you’ll build a complete AI voice assistant—a powerful agent that combines speech recognition, local LLM inference, and text-to-speech synthesis into one seamless experience. This is where your project starts to feel alive.
You’ll develop a Python-based application that:
Listens for audio input from the microphone
Converts spoken language into text using SpeechRecognition or Whisper
Sends that text to a locally running language model via Ollama
Takes the model’s response and converts it into audio using TTS engines like gTTS or pyttsx3
Plays the AI-generated voice response back to the user
The assistant operates in a loop, creating a fluid back-and-forth interaction—similar to virtual assistants like Siri, Alexa, or Google Assistant, but powered entirely by your own custom-built agent.
You’ll learn how to:
Manage timing and response latency
Filter noise and handle misrecognized input
Customize voices, speaking rates, and response styles
Keep the conversation context-aware with memory if enabled
You’ll also explore options to enhance the assistant with hotword detection or use wake-word activation in future upgrades.
By the end of this lecture, you’ll have a fully functional AI voice assistant, running locally, that speaks and listens just like a commercial product—but entirely built and owned by you.
This project gives your AI a true personality, bridging the gap between machine and human interaction. It’s not just an agent anymore—it’s a companion you built from scratch.
Understand the architecture and logic behind your AI voice assistant—from audio capture to LLM-powered responses.
In this lecture, we pull back the curtain to examine how your AI voice assistant actually works. You’ve built the agent—now it’s time to break down the system architecture, logic flow, and how each component interacts to deliver seamless human-computer communication.
You’ll follow the entire processing pipeline step by step:
Microphone input captures the user’s voice in real time
Speech-to-text engine (like SpeechRecognition or Whisper) transcribes the audio
The transcribed input is sent to a local large language model via Ollama
The model generates a response based on the current context
The response is passed to a text-to-speech engine (like pyttsx3 or gTTS)
The assistant speaks the output using your device’s speakers
This modular breakdown reveals how your AI agent’s architecture mirrors that of commercial digital assistants—except yours is running locally, privately, and with fully customizable components.
You’ll also learn:
How conversation loops are managed with Python
How memory and context can be stored between turns
Where latency occurs and how to optimize it
How errors are handled and recovery strategies implemented
Understanding the “how” behind the assistant empowers you to troubleshoot, extend, and enhance your build. Want to add logging, visual feedback, or whisper-to-text for improved accuracy? Now you’ll know exactly where to do it.
By the end of this lecture, you’ll have a mental model of how voice-based AI agents operate—from signal to response—giving you the foundation to iterate, scale, or productize your assistant.
This is where engineering meets intelligence—your agent isn’t just working; it’s working by design.
Launch and test your AI assistant in real-time—and experience human-like interaction powered by local intelligence.
In this lecture, you’ll take your fully built AI voice assistant live. After writing and connecting all the components—from speech recognition to local LLM processing to text-to-speech—it's time to see your creation in action.
You’ll learn how to properly run the assistant loop, monitor its performance, and interact with it naturally. This involves triggering the assistant with your voice, receiving intelligent responses, and listening to it speak those answers back.
We’ll walk through:
Running the Python script from your terminal or IDE
Speaking into your microphone and watching real-time transcription
Observing how the local model responds to your input using Ollama
Hearing your assistant speak back using a TTS engine
You’ll also learn how to debug issues that may arise, such as:
Microphone not picking up input
Text-to-speech stalling or misfiring
Incorrect transcription or poor response quality
To help you refine the experience, we’ll demonstrate how to:
Log all user input and model responses
Display the conversation in the console or browser
Optimize the response speed and reduce latency
By the end of this session, you’ll have a live, interactive AI assistant running entirely on your machine—no API keys, no cloud calls, no external servers. It listens, thinks, speaks, and adapts—built 100% by you.
This is the real test of your AI engineering journey: seeing your agent in action, performing like the tools used by companies worldwide.
When you hit “run,” your AI comes alive. And with that, you’ve stepped into the future of intelligent, autonomous systems.
Build a web app for your AI assistant—making it accessible, polished, and ready for real-world use.
In this final lecture of Day 2, you’ll take your locally running AI voice assistant and deploy it into a browser-based interface. This transforms your terminal-based project into a sleek, interactive web application that’s intuitive for users and easy to extend.
You’ll use a lightweight framework like Flask or Streamlit to:
Create a user-friendly web UI with voice input and output
Send microphone input from the browser to your Python backend
Display AI-generated responses in text and play back audio replies
Maintain conversation history visually within the interface
We’ll guide you through connecting the frontend UI with your existing AI assistant logic, creating a clean, end-to-end pipeline that captures voice, processes it through your local LLM via Ollama, and speaks the results in-browser.
You’ll also explore:
Setting up simple buttons and message displays
Embedding real-time logging for debugging and feedback
Hosting the app locally or preparing for production deployment
By the end of this lecture, you’ll have a full-featured, voice-powered AI assistant web app—perfect for personal productivity, demos, or turning into a commercial product. It combines natural conversation, local privacy, and real-world usability in one powerful interface.
You’re no longer just working with code—you’re building user experiences. With this final step, your agent becomes accessible to anyone with a browser.
This is where your AI starts to feel like software, not just scripts. It’s usable, elegant, and designed with people in mind.
Build an intelligent agent that reads websites, extracts useful data, and stores it for analysis—all automatically.
Welcome to Day 3 of the Hands-On AI Agents Course! Today’s project takes a new direction: creating an AI-powered web scraper that can autonomously browse, extract, and structure information from live websites.
In this lecture, you’ll get a clear overview of the objectives for the day and understand how this agent differs from your previous builds. Instead of chatting or responding to speech, this AI interacts with the internet—reading and retrieving content based on dynamic prompts.
Goals for the day include:
Installing the tools required for web scraping and document parsing
Building a scraper that uses AI reasoning (via your local LLM) to decide what data to extract
Automating the process of extracting, cleaning, and summarizing information
Optionally storing data in a vector database for semantic search later
You’ll learn how to combine traditional tools like BeautifulSoup, requests, and Selenium with your local LLM via Ollama to create a hybrid agent—one that can browse, comprehend, and store insights from online content.
This type of agent is ideal for:
Market research and competitor analysis
Tracking news and product updates
Gathering training data for other AI models
Powering knowledge bases and intelligent dashboards
By the end of this lecture, you’ll be clear on the structure, flow, and outcome of today’s project—and excited to build an agent that thinks like a researcher and works like a data miner.
Today, your AI learns to explore the web—and bring back answers automatically.
Set up the tools your AI scraper needs to browse, read, and extract valuable data from any website.
In this lecture, you’ll prepare your system for building an AI-powered web scraper agent by installing all required Python dependencies. Scraping the web with intelligence requires a mix of traditional scraping libraries and modern LLM tools that let your agent understand what it sees.
We’ll walk you through the installation and configuration of key packages:
requests – for sending HTTP requests to retrieve HTML content
BeautifulSoup – for parsing and navigating HTML structures
lxml or html.parser – for fast and robust HTML processing
Selenium (optional) – for scraping dynamic websites rendered with JavaScript
Ollama CLI – to run your local LLM for smart content interpretation
FAISS or ChromaDB (optional) – for storing scraped content in a vector database
You’ll also:
Set up a new virtual environment (recommended)
Verify internet permissions and install browser drivers if using Selenium
Test basic scraping scripts to confirm your setup
By the end of this lecture, your system will be ready to:
Access and parse live webpages
Pass content to your local LLM for understanding and summarization
Prepare extracted data for storage or later analysis
Installing the right tools now ensures smooth performance later when your AI agent is crawling pages, identifying relevant content, and making autonomous decisions.
This setup bridges traditional automation with modern intelligence. You’re not just scraping HTML—you’re empowering your AI to read the internet like a human researcher.
Build an autonomous AI agent that reads websites, extracts key data, and summarizes insights in real time.
In this lecture, you’ll build your very first AI-powered web scraper—an intelligent agent that doesn’t just crawl web pages, but actually understands and processes the content it collects. Unlike traditional scrapers that extract raw data, your agent will combine HTML parsing with LLM reasoning to decide what’s important and summarize it like a human.
You’ll construct a script that:
Takes a URL as input
Uses requests and BeautifulSoup to extract and clean the webpage’s text
Passes the cleaned content to your local language model via Ollama
Generates a summary, answer, or formatted dataset based on user goals
You’ll learn to identify and handle:
Common HTML tags and page structures
Stripping away ads, footers, and irrelevant noise
Rate-limiting and polite scraping techniques
Large content blocks and chunking strategies for LLMs
We’ll also explore how to design smart prompts that guide your AI agent—asking it to summarize, highlight named entities, or extract product specs depending on the use case.
By the end of this lecture, you’ll have a fully functional AI web scraper that combines structured extraction with natural language intelligence. This hybrid approach is ideal for:
Market analysis
Academic research
Automated content monitoring
Feed generation for dashboards or RAG pipelines
This project demonstrates what makes modern agents so powerful: they don’t just fetch data—they understand it.
You’re now building tools that reason through the web. This is what AI-powered research truly looks like.
Understand the step-by-step architecture that powers your intelligent web-scraping agent.
In this lecture, you’ll dissect the inner workings of your AI web scraper to understand how its components interact—from URL input to meaningful AI-driven output. This breakdown helps you debug, customize, and extend your agent with confidence.
You’ll walk through the full processing pipeline:
URL input from the user or script
Fetching page content using requests or Selenium
Parsing and cleaning the HTML using BeautifulSoup
Segmenting or chunking the text for large documents
Passing each chunk to a local LLM via Ollama
Receiving and assembling summaries, answers, or insights
Displaying or storing results for later use
We’ll explore how the agent uses prompt engineering to guide the LLM's understanding of different pages—e.g., summarizing blog posts, extracting headlines, or identifying pricing tables.
You’ll also understand:
The role of content filters and how to exclude ads and junk
Why LLMs need structured chunking to avoid token overflow
How contextual memory improves results across multiple pages
Strategies for looping through multiple URLs or categories
By the end of this session, you’ll have a deep understanding of how your scraper reads the web, reasons about what it finds, and transforms it into clean, usable output. Whether you’re scraping job listings or compiling competitor intel, you now control every layer of the process.
An AI agent isn’t magic—it’s modular, logical, and fully programmable. And now, you know exactly how yours works.
Test your web scraper in action and extract insights from live websites using your own AI-powered agent.
In this lecture, you’ll take your AI web scraper live—running the complete pipeline from a user-defined URL to summarized, structured output. This is where your code transforms into a real, usable tool that can read the internet and generate actionable insights.
You’ll walk through how to:
Execute your scraper script from the terminal or inside a Python IDE
Input a URL or list of URLs for scraping
See your agent retrieve, clean, and process the webpage content
Watch the local LLM via Ollama generate a human-like summary, answer a question, or extract key data
We’ll demonstrate how to fine-tune the scraping session:
Add filters to include/exclude specific content areas
Handle multi-page scraping loops (e.g., paginated results)
Customize the AI prompt to match the domain of the website (e.g., news vs. e-commerce)
Print results in structured formats (plain text, JSON, Markdown)
You’ll also troubleshoot common issues such as:
Websites blocking requests (and how to handle headers/user agents)
Encoding issues or noisy content
LLM prompt failures or incomplete responses
By the end of this lecture, you’ll have a robust AI web scraping agent that can navigate the internet, extract real-time information, and summarize it with intelligent precision—all from your local machine.
This is no longer just a Python script—it’s an autonomous AI research tool, capable of working on your behalf.
You’ve now unlocked a powerful use case for agentic AI: letting your code think, read, and analyze the web at scale.
Save, search, and retrieve AI-processed web content using powerful vector databases like FAISS or ChromaDB.
In this lecture, you’ll learn how to store the output of your AI web scraper agent in a vector database—enabling smart, semantic search over your scraped and summarized content. This is the key to building scalable, memory-enhanced agents that can retrieve relevant information instantly.
You’ll begin by understanding what a vector database is and how it differs from traditional storage:
It stores embeddings, not raw text
It supports semantic similarity search, not keyword matching
It allows your agent to “recall” content with natural language queries
We’ll walk you through:
Installing and configuring ChromaDB or FAISS
Converting scraped text into vector embeddings using built-in Ollama or sentence-transformers
Indexing documents by page, section, or topic
Performing fast similarity queries using user input or LLM prompts
Storing metadata alongside embeddings for traceability
You’ll also see how this integrates with future agents that can:
Answer questions across hundreds of scraped pages
Revisit knowledge from past sessions
Power RAG pipelines (retrieval-augmented generation)
By the end of this lecture, your agent will not just extract information—it will retain and organize it intelligently, forming the foundation of a searchable, personalized knowledge base.
Storing embeddings makes your agent remember, reason, and respond more like a human researcher—and that’s a game-changer.
You’ve just unlocked the ability to build agents that not only browse and read the web but understand and remember it.
Build an AI agent that reads entire documents, understands the content, and answers questions with precision.
Welcome to Day 4 of the Hands-On AI Agents Course! Today’s project focuses on building a specialized AI document reader—an agent that can ingest PDFs, Word files, or text documents, understand their content using a local LLM, and answer user questions interactively.
In this lecture, you’ll explore the key goals and outcomes for the day:
Install the tools required to process documents (PDFs, text, markdown)
Extract and clean large blocks of text for LLM-friendly input
Build a retrieval-based question-answering system
Add summarization and interactive file analysis capabilities
Enable users to download AI-generated summaries or reports
You’ll be combining file parsing, text chunking, LLM reasoning, and optionally, vector database indexing—to create an AI that mimics how a human would read, comprehend, and extract meaning from dense documents.
This agent is ideal for:
Automating policy or legal document analysis
Summarizing academic research or technical manuals
Answering user questions about uploaded files in real time
Generating summaries, highlights, and structured reports
By the end of today, you’ll have built an agent that transforms documents into dynamic, queryable knowledge—bridging the gap between static files and interactive AI-powered insight.
From this point on, your agents don’t just read web pages—they can parse and respond to offline documents, transforming how users work with unstructured data.
Today, your AI becomes a reader, a summarizer, and a personal researcher—all in one.
Install the libraries needed to build an AI agent that reads, understands, and interacts with documents.
In this lecture, you’ll set up all the necessary tools to build your AI-powered document reader and Q&A bot. This agent will parse, summarize, and answer questions about files such as PDFs, DOCX, and plain text—powered by a local LLM via Ollama.
You’ll walk through the installation of key Python libraries for:
PDF processing: PyMuPDF, pdfplumber, or PyPDF2
DOCX handling: python-docx for reading Word documents
Text extraction and chunking: for breaking down large documents into manageable pieces
Embedding generation: using sentence-transformers or Ollama
Vector database integration (optional): with ChromaDB, FAISS, or Weaviate
Question-answering pipelines: with basic logic for sending queries to your LLM
We’ll also:
Set up a clean virtual environment for your document reader project
Discuss optional enhancements like OCR for scanned documents
Show how to test your environment with a sample PDF or DOCX file
By the end of this lecture, you’ll have a fully configured Python environment ready to:
Load and extract content from various file formats
Prepare that content for semantic search or summarization
Use an AI agent to answer natural-language questions about any document
This setup is the backbone of any intelligent document agent—from personal productivity tools to enterprise compliance bots.
You’re not just preparing to build an AI reader—you’re creating the foundation for agents that unlock hidden knowledge in files.
Create an AI agent that reads entire documents and extracts meaning—turning static files into dynamic knowledge.
In this lecture, you’ll build a fully functional AI document reader—an intelligent agent that loads files like PDFs or Word documents, processes their content, and understands them using a local large language model (LLM) via Ollama.
You’ll begin by writing a script that:
Accepts document uploads (PDF, DOCX, or TXT)
Uses libraries like PyMuPDF or python-docx to extract text
Cleans and chunks the content for LLM processing
Feeds the cleaned content to the LLM to generate a summary, outline, or key highlights
You’ll design the agent to function like a human reader: scan through paragraphs, reason about context, and extract relevant details. You’ll also learn how to:
Create chunking logic to respect token limits
Maintain document structure in your summaries
Handle large documents in smaller, iterative passes
Combine AI outputs into cohesive summaries or visual outputs
This lecture gives your AI the ability to process offline knowledge sources—opening up use cases like:
Corporate document review
Policy or legal file summarization
Academic literature digestion
Proposal and grant understanding
By the end of this session, your agent will read documents, think critically using a local LLM, and deliver structured, human-friendly output—all without cloud dependencies.
This isn’t just about reading—it’s about interpreting, summarizing, and translating complex documents into insight.
You’ve now built an AI agent that turns information into understanding—one file at a time.
Understand the internal architecture of your AI agent—from document parsing to intelligent LLM-based reasoning.
In this lecture, you’ll explore the full inner workings of your AI document reader, gaining a detailed understanding of how each component contributes to intelligent document analysis. This architectural overview will help you extend, debug, and optimize your agent for real-world use.
Here’s the typical pipeline your agent follows:
File ingestion – The user uploads a document (PDF, DOCX, or TXT).
Text extraction – The agent uses tools like PyMuPDF or python-docx to read the raw content.
Preprocessing – The text is cleaned, chunked, and tokenized into manageable units.
LLM interaction – Chunks are passed to a local LLM via Ollama, using smart prompts to request summaries, highlights, or answers.
Output aggregation – Results are combined and formatted into structured text or reports.
You’ll learn how each of these stages work together to create a smooth and intelligent experience for the user. We’ll also cover:
Why chunking is crucial for long documents
How to prompt the LLM differently depending on the document type (e.g., legal, academic, business)
How to maintain contextual memory if users ask follow-up questions
Optional features like auto-tagging, section-wise summaries, and metadata generation
By the end of this lecture, you won’t just have a working agent—you’ll fully understand the design principles behind it, from start to finish.
This lecture bridges the gap between building and engineering. You’re now able to explain, improve, and scale your agent like a true AI developer.
This is where tools become systems—and where your AI agent becomes a reusable document analysis engine.
Execute your document reader in real-time—process files, generate summaries, and interact with your AI.
In this hands-on lecture, you’ll learn how to run your AI-powered document reader from end to end. With your local LLM via Ollama integrated, the agent is now ready to read, summarize, and respond to questions about uploaded documents—completely offline, with full control over privacy and performance.
We’ll walk you through:
Running the document reader script from the command line or Python IDE
Uploading and selecting files to be processed
Observing real-time output from your local AI model
Reviewing document summaries, bullet points, or extracted insights in structured formats
You’ll also learn how to:
Dynamically change prompts for different types of documents
Select specific sections (e.g., executive summary, appendix) to process separately
Troubleshoot common issues like unreadable files or LLM token errors
Improve LLM responses by adjusting chunk size, overlap, or prompt instructions
This is where you bring your AI agent to life—watching it read multi-page PDFs, intelligently condense content, and output digestible insights in seconds. You’ll test it on real-world files such as reports, whitepapers, or manuals, and see how it performs across formats.
By the end of this lecture, you’ll be confident using your agent to assist with:
Business document analysis
Legal or policy review
Research paper summarization
Personal knowledge management
You’ve now reached the point where your AI can autonomously read and reason over full documents—making it a true assistant, not just a script.
This is where productivity meets intelligence—and your agent starts saving hours of manual reading.
Let users download clean, structured AI-generated summaries and insights as professional reports.
In this final lecture of Day 4, you’ll add one of the most useful features to your AI-powered document reader: the ability to generate downloadable reports from the AI’s output. This turns your agent from an on-screen assistant into a tool that delivers tangible, shareable value.
You’ll learn how to:
Capture AI-generated summaries, key points, and answers into structured text
Format the content using Markdown, HTML, or PDF-ready templates
Save the report locally as a .txt, .md, or .pdf file
Automatically name and organize reports based on document titles or timestamps
This feature unlocks use cases where users need to:
Archive meeting summaries or policy documents
Share insights from long technical manuals or reports
Save responses to regulatory or legal documents
Build personalized reading logs or study notes
We’ll also explore enhancements like:
Including metadata (document source, date processed)
Allowing users to choose sections for download (e.g., full summary, bullet points only)
Generating visual outputs like tables or styled headings
By the end of this lecture, your AI document agent will not only read and respond—it will package insights into downloadable formats that users can keep, review, or share.
This is a crucial step toward productizing your AI agent. It bridges the gap between intelligent interaction and persistent, exportable results.
You’ve now built an agent that reads, reasons, and delivers professional-grade summaries—all in one seamless pipeline.
Artificial intelligence is transforming the way we work, automate tasks, and interact with technology. This course is designed to help learners build AI-powered agents, automation bots, chat assistants, and task management systems using open-source tools without relying on external APIs or cloud-based services. Whether you are a beginner exploring artificial intelligence or a developer looking to integrate AI into real-world applications, this course provides a hands-on approach to building AI-driven automation solutions.(AI)
Throughout this course, learners will gain practical experience in developing intelligent assistants that can process text, respond to user queries, automate repetitive tasks, and manage workflows efficiently. The focus will be on implementing AI-powered chatbots, smart task managers, document readers, web scrapers, and personal productivity assistants. By leveraging local AI models, vector databases, and natural language processing techniques, students will learn how to create AI solutions that function entirely on their machines without any reliance on cloud APIs.
The course starts with an introduction to AI agents, covering the fundamental concepts of natural language processing, automation workflows, and task execution. Learners will build chatbots capable of carrying on meaningful conversations while maintaining memory of past interactions. By integrating AI models with local vector databases such as FAISS, students will store and retrieve information efficiently, allowing their AI agents to answer complex queries based on stored knowledge. As the course progresses, students will develop AI-powered task automation bots capable of scheduling, organizing, and prioritizing tasks using machine intelligence.
One of the key aspects of this course is building AI-driven document readers that extract, summarize, and provide answers from PDF files. Learners will implement an AI system that processes and retrieves relevant information, enabling intelligent document search and Q&A functionalities. Additionally, students will create an AI-powered web scraper that extracts text from websites, summarizes content, and stores valuable insights in a searchable vector database for later use. These AI automation techniques can be applied in various domains, including research, business intelligence, and content generation.
As learners progress through the course, they will work on projects that integrate AI into daily productivity tools. They will develop personal AI assistants that help with scheduling, reminders, and workflow management. The course also covers AI-powered task prioritization, where students will train models to analyze deadlines and assign importance to different activities. By the end of the course, students will have a strong understanding of how to build AI agents capable of automating complex tasks, enhancing productivity, and managing data-driven workflows.
This course is designed for software developers, data analysts, AI enthusiasts, and anyone interested in building AI automation solutions. No prior experience in artificial intelligence is required, as all concepts are introduced progressively with step-by-step implementations. Learners will gain hands-on experience with AI tools, machine learning models, and automation frameworks, making this course ideal for those who want to integrate AI into real-world applications. All projects are built using open-source software and executed locally, ensuring privacy, security, and full control over AI-driven automation systems.
By the end of this course, students will have the knowledge and practical skills to create AI-powered chatbots, automation bots, document readers, web scrapers, and intelligent personal assistants. They will be equipped to develop AI solutions that streamline workflows, enhance productivity, and automate repetitive tasks efficiently. This course provides a solid foundation in AI-driven automation and equips learners with the ability to design, build, and deploy AI agents for various use cases.