
explore python based ai frameworks and tools to build great ai applications, leveraging APIs like OpenAI, Mistral, anthropic XAI, and multi-agent ai frameworks with UI design using Streamlit and Gradio.
Learn how to add single line and multi-line comments in Python using the hash symbol and triple quotes, explaining code and preventing execution when testing.
Learn to build real-time voice and video apps in Python using the RTC library, fast RTC, WebRTC or WebSocket, and Gradio-powered UIs for low latency multi-modal conversational experiences.
Discover how to build an MVP server in a few lines of Python using Gradio, turning any function into an MCP server for MCP client apps like kesa or windsurf.
Get started with the Gemini file search API in Python by uploading documents, converting them to embeddings, and querying with Gemini 2.5 Pro or 2.5 flash to retrieve precise answers.
Create a real-time voice and video AI agent in Python with the Vision Agent framework, configuring uv and env credentials, and using Jetstream with Gemini Live or OpenAI backends.
Build a voice agent that can see your surroundings by streaming real-time audio and video through vision agents, OpenAI and Jetstream, with Python project setup and an interactive web UI.
Explore how Cartesia Sonic 3 enables low-latency text-to-speech in Python voice AI projects, integrated via vision agents, with customizable voice models and multilingual support.
Create a realistic voice ai app in Python by assembling a custom voice pipeline with speech-to-text, tone detection, and text-to-speech using Vision Agents, Deepgram, smart ten, and Fish Audio.
Build a voice-controlled MCP AI agent in python using vision agents and OpenAI real-time API to perform real-time function calling and interact with GitHub repositories, including issues and pull requests.
Build and deploy a real-time yoga AI instructor in Python by integrating vision agents, Gemini Live API, and Ultralytics YOLO for pose detection.
Learn to integrate ElevenLabs scribe v2 real-time speech-to-text with Vision Agents for real-time transcription and note taking in AI meetings.
Design and implement a custom modular AI pipeline for vision and voice agents in Python, integrating Moon Dream for real-time object detection with Open Router, 11 Labs, and Deepgram.
Build a python-based voice and vision agent that uses Kimiko thinking via Open Router to detect objects in real time and draw green bounding boxes.
Build a voice and vision app with Gemini 3 as the LLM in Vision Agents, enabling camera-feed descriptions and answers to questions. Configure thinking level and media resolution for tasks.
Create a maths and physics voice AI tutor in python using deep seek v3.2 models on open router, via vision agents with speech-to-text and text-to-speech, swapping llms with a plugin.
Build an electronic setup and repair voice assistant in python using vision agents and base ten, and follow a camera demo that guides battery, memory card, lens, and power-on testing.
Build a drive-thru AI ordering system in Python by integrating voice and vision AI with Gemini live speech to speech and Vision Agents for real-time, low-latency orders.
Learn to build a vision agent with KimiK 2.5 in Python using Vision Agents, integrating speech-to-text, text-to-speech, and OpenAI chat completions for real-time video and voice AI.
Build a real-time speech-to-text app using Voxtra Transcribe 2 with Vision Agents, Mixtra models, and DeepGram text-to-speech to enable low-latency voice and video transcription.
Learn to build a real-time voice agent in Python using Amazon NovaSonic speech-to-speech, set up Vision Agents, GetStream, and AWS credentials to run a story-telling demo.
Build production-ready AI agents in Python with the OpenAI Agents SDK, using tools and orchestration. Set up the environment and create your first agent with the agent and runner APIs.
Learn to build and run a local Devcheck R1 agent with the OpenAI agents SDK, using ollama and a streamlit interface on localhost, with performance traces via the OpenAI dashboard.
Build UIs for OpenAI agents in Python using the OpenAI agents SDK to orchestrate multi-agent workflows with LLMs, tools, and guardrails. Explore Generate, Gradio, and Streamlit for interactive interfaces.
Learn to build a voice AI agent in python using the OpenAI agents SDK, converting a defined workflow into a voice app with speech-to-text and text-to-speech.
Build a file system MCP agent in Python using GPT 4.1 and the OpenAI agents SDK to chat with files in a directory through a Streamlit interface.
Launch your first agent with Google's agent development kit for Python by setting up a virtual environment, installing the sdk, and building an agent with weather and time tools.
Explore building an AI chat UI using the Cairo AI IDE, a VSCode-like environment, to code with the Cmyk two model from Moonshot and a Streamlit interface for Python apps.
Build a local AI chat UI in Python with Streamlit and llama to run the Dopesick R1 model offline via Ollama, featuring a simple input and typewriting streaming output.
Learn to use the Gemini file search API in Python, which provides built in retrieval augmented generation to upload documents, create embeddings, and query with a vector store.
Learn to build voice AI apps with OpenAI's text-to-speech model using the response API, test multiple voices, and create a Streamlit UI to switch models and run Python examples.
Build a real-time weather-aware voice assistant in Python using the OpenAI agents SDK, wiring a voice pipeline—speech-to-text, LLM, text-to-speech—with a weather tool and Finnish language support.
Explore building an AI agent using GPT-4.5 via the OpenAI API, with the Arduino Python framework, DuckDuckGo tooling, and a setup that fetches latest web information.
Learn to generate images from scratch and create variations using OpenAI's DALL-E 3 and DALL-E 2 in a Python app, covering prompts, image sizes, quality, and styles.
Learn how to turn text into lifelike spoken audio using OpenAI's text to speech API in Python, including multilingual prompts and real-time streaming options.
Build AI agents with OpenAI swarm, an experimental framework that uses LLMs powered assistants and tools to perform tasks. Install, configure two agents A and B, and run.
Explore structuring llm outputs with the OpenAI API using JSON schemas, including function calls and response formats, and define a Pydantic calendar event to constrain fields.
Explore using structured outputs and a JSON schema to moderate text with OpenAI's API, defining a Pydantic-based category taxonomy (violence, sexual, self-harm) and testing phrases.
Explore how the Anthropic API web search tool delivers real-time web content via cloud models such as cloud 3.7 and 3.5 sonnet, and how to call it with Python.
Get started with the Xai API by creating an account, generating an API key, and making your first curl call with grok beta model. Export the key and monitor usage.
Access image generation on X with grok's multimodal interface, paste and modify prompts, and compare grok outputs with Flux Run dev on Hugging Face, ideogram 2, and Dall E 3.
Learn everything about live search using Grok 4
Learn to run the Devcic AR1 reasoning model offline with Olama, selecting from distilled versions, and test a physics prompt that derives six meters per second.
Explore Kimi K2 via its website, playground, and application programming interface to test a mixture-of-experts model with 32 billion activated parameters and 1 trillion parameters, excelling in math and coding.
Build a high-speed python ai agent with grok and agno, selecting the r1 distill llama 70b model, installing dependencies, and running a local playground interface.
Gemini CLI brings a free, open-source coding agent to your terminal, letting you install it globally, log in with Google, and query repo updates or generate SwiftUI code with PencilKit.
Learn to set up Gemini 2.5 Pro via the Google API, load your API key with Python, and build apps using a SwiftUI animation and a Streamlit UI.
Get started with the Google Gemini API and make your first API call using Gemini 2.0 flash by installing the SDK, acquiring an API key, and running a Python script.
Learn to generate and edit photorealistic images with Gemini 2.5 image preview via the API and Python, including blending photos and Photoshop-like edits.
Learn to run the Devcic R1 thinking model offline with LM Studio, download models from HuggingFace, and watch it solve a physics problem with detailed intermediate reasoning steps.
Monitor a vision agent's operation with Prometheus and OpenTelemetry, capturing LLM, speech-to-text, and text-to-speech metrics. Set up a Prometheus server and visualize metrics locally on port 9464.
Add agent scales in Antigravity using the scales folder with scale.md and optional resources. Create and access workspace and global scales, and run commands to manage their locations.
Learn to build your first AI agent in Python with the Pydata framework in three lines of code, using a system prompt and print response to run and test.
Learn how to capture an AI agent's response in a variable using run response and streaming, so you can pass it to the front end or another agent.
Build AI agents in Python using Fei data. Create web and finance agents and assemble multi-agent systems with open and closed LLMs and custom tools.
Create a retriever AI agent and agent rack in Python with Filedata, wiring Lance DB, Tantivy, PDF library, and SQLAlchemy to pull and analyze data from PDFs and other sources.
Build an ai agent for vector search and chat with your pdf documents using Pydata, OpenAI API, and langs db to create a searchable knowledge base.
Build a Python computer-use agent with the browser-use library to automate browser actions. Guide the agent to search flights and find the cheapest option using GPT-4 and an OpenAI key.
Install and start your first multi-agent ai project with Gruyere, select providers, set up api keys, and run the research and data analyst agents.
Demonstrates building an AI agent with Browser Use to automate web tasks in Python, including setting up a virtual environment and finding the cheapest flight from Helsinki to San Francisco.
Learn to edit and run Python code using ChatGPT canvas in the web version of ChatGPT. Discover how to access canvas via view tools, collaborate on writing, and add comments.
Generate 86 iOS buttons from an image using Casa AI code editor for SwiftUI projects, then apply button styles and shapes in Swift and share the boilerplate on GitHub.
learn ai assisted coding in swift with alex for an integrated xcode workflow. generate swift code from images, chat with your codebase, and add files seamlessly.
Learn to create and animate progress rings in SwiftUI using the AI code editor and Claude 3.5 sonnet, crafting prompts to generate and refine SwiftUI views.
build a fully functional ios 18 calculator using casa cloud 3.5, sonnet, and swiftui, with division, multiplication, subtraction, addition, and equals, generated from a screenshot and refined in code.
moshi is the first real-time multi-stream speech language model that you can interrupt during conversation, install locally with pip, download from huggingface, run server, and test via web ui.
Learn to run gguf models locally with a Gradio interface by downloading a model from Lama Comm or Hugging Face and using the llama cpp app to stream responses.
Explore image generation and video segmentation with Sam two, demonstrating object tracking, background removal, and demos from Hugging Face Spaces and API workflows.
Run large language models locally with LLM Studio. Configure parameters like temperature and max tokens, and use a local inference server with Chainlink to test prompts offline.
Create your first fast HTML app in Python with VS Code, building a simple form and a hello world page using Python objects that render HTML.
Learn to add a standard HTML video player to a first HTML app with pure Python, and style the page with inline CSS and a CSS style component.
Learn to load svg graphics into a first html app from the svg repo using SF symbols, material symbols, and converted svg code with fast html.
AI is rapidly transforming how we approach everything, from learning to code to building apps to solving complex problems. Over the past three years, I have written AI-related articles, such as "The 6 Best LLM Tools To Run Models Locally," on Medium, and created several AI content on YouTube. Many people, including students and developers, have asked me how to start building AI apps, specifically in Python.
In this course, I will guide you in learning fundamental AI concepts (Retrieval Augmented Generation, Fine-tuning, Embeddings, AI-native Vector Databases) to build AI apps, agents, and chat interfaces.
Join this course, and let's start building AI for video, vision, voice/speech, and more. You will discover and start creating AI agents using the best and easy-to-use Python frameworks, such as OpenAI Realtime, OpenAI Agents SDK, and Vision Agents. You will also learn to utilize APIs from OpenAI, Anthropic, Mistral, Meta AI, Kimi AI, Qwen, DeepSeek, and xAI for agentic app creation, as well as for image, video, audio/voice, and text generation. By following all the tutorials in this course, you will understand the various concepts in AI and how to implement them in actual AI-related projects. In addition, you will be familiar with many Python-based libraries and web frameworks for creating AI apps.