
The video explains how to navigate Azure AI Foundry and create an Azure OpenAI resource
In this lecture, you will learn how to create an AI Foundry workspace and deploy an OpenAI model within it. We will walk through the deployment process, including selecting the appropriate model, such as the GPT-4o or the more cost-effective 4o mini model. You will gain an understanding of the different deployment types available in Azure OpenAI and their implications on routing and cost. By the end of the session, you will be able to deploy a model, manage its rate limits, and interact with it using chat. This knowledge will enable you to effectively utilize large language models in your projects.
In this lecture, you will learn how to navigate the AI Foundry user interface with the deployed GPT-4o model. You will review the system prompt and it's significance in interacting with AI Models. Additionally, you will explore various parameters such as temperature and max response, which help in fine-tuning the model. Finally, you will understand how to clear the chat and test the deployed model, focusing on the practical aspects of using AI Foundry for chat-based interactions.
In this lecture, you will learn about the concept of chat history in AI models and how prompts and completions function within these models. We will explore how AI understands context from previous interactions and the importance of maintaining chat history for accurate responses. You will also understand the stateless nature of AI models and the necessity of sending past prompts and completions with each new query. Finally, we will discuss the significance of managing the length of chat history to avoid exceeding character limits and controlling costs.
In this lecture, you will learn about model instructions, also known as system prompts, and their role in shaping the behavior and persona of AI models. You'll explore how these instructions override individual prompts, enabling the AI to adopt specific roles or answer in customized ways, such as speaking like a pirate or becoming an Xbox salesperson. Through practical examples, you'll understand how system prompts guide the AI's responses and enforce restrictions, such as limiting answers to a specific domain like Xbox products. Additionally, you'll grasp the importance of both chat history and model instructions being sent with every user prompt, ensuring consistent and tailored interactions.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
This module introduces Semantic Kernel, a lightweight open-source software development kit designed to build AI agents and integrate AI models into .NET, Python, and Java application. You will learn how the kernel abstracts the complexities of managing and communicating with various AI models such as OpenAI's GPT, Meta's Lama, and Microsoft's PHI models, among others. Additionally, you will explore how semantic kernel orchestrates interactions between code, applications, and AI models, enabling the creation of skills or plugins that AI models can call upon. The course will cover integrating external services like vector databases and multimodal modules, ultimately guiding you through creating sophisticated AI agents and applications.
What to do if you don't want to use Azure AI Models
In this lecture, you will learn how to create your first AI application using Semantic Kernel. You will start by setting up your development environment, including adding necessary NuGet packages for the Microsoft Semantic Kernel and Azure OpenAI connectors. You will be guided through creating the kernel object and integrating it with an Azure OpenAI model by providing deployment details such as the endpoint and API key. Finally, you will implement a control loop to prompt the user for input, send it to the AI model, and display the response, demonstrating the ease of building AI applications with Semantic Kernel.
In this lecture, you will learn how to enhance your application by adding prompt execution settings for OpenAI models. We will explore the OpenAI prompt execution settings class, which allows you to set various parameters for your chats, including the system prompt that defines the AI's persona and the temperature setting that affects the creativity of the responses. Additionally, we will discuss the importance of controlling costs by setting the maximum number of tokens. You will understand how these settings are specific to OpenAI and how changing models will require adjustments to the prompt execution settings. Finally, we will demonstrate how to display token usage on the screen and highlight the differences in responses based on token limits.
In this lecture, you will learn about the fundamentals of Chat History, a crucial component in enhancing user interactions with conversational AI systems. We will discuss the benefits of maintaining chat history, including how it enables more dynamic and personalized conversations. You will also understand the practical steps to implement chat history effectively, such as creating and managing chat history objects and incorporating user and assistant messages. Additionally, we will explore how to leverage chat history to build more intuitive and engaging AI interactions. By the end of this lecture, you will be equipped to implement chat history in your AI applications, ensuring a seamless and contextual conversation experience.
In this lecture, you will learn about the significance of managing AI chat history to maintain context-aware and efficient conversations. We'll discuss how an ever-growing chat history can exceed the model's context window, affecting response quality and processing speed. You'll understand the need for reducing chat history for performance optimization, context window management, memory efficiency, and privacy and security. We will explore various reduction strategies including truncation, summarization, token-based reduction, and custom strategies using the IChatHistoryReducer Interface. Finally, we'll demonstrate these strategies in code to help you effectively manage chat history in your AI applications.
In this lecture, you will learn how to implement response streaming to enhance your application’s interaction with AI systems. Instead of waiting for the entire AI-generated response, you will modify the application to receive and display part of the response in real time, creating a more dynamic and responsive user experience. The lecture will guide them through the code changes required, including updating the method to get streaming chat messages and handling the text chunks with a for-each loop. This approach significantly improves the overall user experience by providing immediate feedback as the AI generates the response.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
In the lecture, you will learn how to expand your AI application by leveraging the Semantic Kernel SDK to connect to various models and hosting environments. You will modify your application to use OpenAI models directly and explore inferencing APIs, such as Azure inferencing models, free GitHub developer models, and open-source options like Hugging Face
In this lecture, you will learn how to transition an application from using Azure OpenAI to directly connecting to OpenAI by setting up an API key and configuring the application. You will create a new project in OpenAI's API platform, generate a secret API key, and store it securely in the app settings file for streamlined access. Additionally, you'll explore how to select and integrate OpenAI models, such as the cost-effective "4.0 mini" model, while understanding rate limits for testing purposes. Finally, you will modify the application code to dynamically handle model changes using the Semantic Kernel framework, ensuring minimal impact on the existing codebase.
In this lecture, you will learn how to utilize Azure AI for hosting various AI inferencing models, including those from Hugging Face, Microsoft, Meta, and more. The instructor demonstrates how to set up a project in Azure AI Foundry, deploy models such as DeepSeek, and configure necessary parameters like model ID, API key, and endpoint URL. You will also explore updating your code to integrate Azure AI inferencing by installing the required Nougat package and modifying execution settings. Finally, you will see how leveraging Azure's GPU infrastructure enhances performance, making model inferencing significantly faster than running locally.
In this lecture, you will explore the free AI models available on GitHub Marketplace for development and testing purposes, including models such as OpenAI's GPT-4x, Grok and many others. You will learn how to access these models, understand rate limits based on your GitHub account type, and integrate them into your code by setting API keys and endpoints. The lecture demonstrates how to deploy and use these models effectively, highlighting the differences in request limits and token capacities between tiers like Free and Pro. Finally, you'll gain hands-on experience in switching between models and applying them to various tasks, showcasing their capabilities in a practical development environment.
In this lecture, you will gain a clear understanding of how to leverage Hugging Face’s powerful AI tools and model hub for your own projects. You’ll learn to search for, select, and evaluate models, as well as understand how to integrate and utilize them within your workflows. By the end, you will have the practical skills to confidently use Hugging Face’s resources to accelerate your AI development and experimentation.
In this lecture, you will learn how to run AI models locally using the ONNX (Open Neural Network Exchange) framework rather than relying exclusively on cloud-based solutions. You will be guided through installing the necessary NuGet packages, specifically focusing on integrating the ONNX runtime into your development environment. The lecture covers the steps required to configure and utilize the correct ONNX runtime agent for your model, ensuring compatibility and efficiency in local inference. By the end, you will understand how to set up, install, and execute ONNX models on your own machine, laying the groundwork for efficient offline AI deployments.
In this lecture, you will learn how to simplify the process of building a chat application by replacing complex kernel setups with Semantic Kernel standalone instances. You will explore how to eliminate the need for kernel builders, dependency injection, and plumbing work, streamlining the creation of a basic chat application. Additionally, you will see how to directly instantiate the chat completion service using parameters like model ID and model path, enabling faster and cleaner execution. By wrapping the ONNX Provider in a "using" statement, you will ensure automatic disposal of resources, resulting in efficient and error-free application termination.
In this lecture, you will learn how to set up and host AI models locally using Ollama, a powerful tool for managing multiple models. You will explore the process of installing the necessary NuGet packages, configuring the API client, and updating settings to specify endpoints and default models. Additionally, the lecture demonstrates how to utilize the Ollama command line interface to download, manage, and run models locally, with specific examples like Google's Gemma 3 model. Finally, you will understand how Ollama leverages GPUs for enhanced performance and provides a streamlined way to switch between locally hosted models in your applications.
In this lecture, you will learn how to use LM Studio as a platform for hosting local AI models and integrating them into applications. The tutorial covers downloading and installing LM Studio, selecting and managing models based on parameters like size and accuracy, and configuring the software to serve models locally. Finally, you will learn how to integrate LM Studio hosted models into your applications using Semantic Kernel.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
In this lecture, you will learn about multimodal models, which are capable of processing various input types beyond text, such as images, audio, documents, and video. You'll explore how these models can perform tasks like generating web pages from sketches, summarizing content from PDFs or audio clips, and even analyzing videos for detailed insights. The lecture will introduce key multimodal models like GPT-4o, Microsoft's PHI, and Google's Gemini, highlighting their diverse capabilities and potential applications. In subsequent lessons, you'll apply this knowledge by building an application that uses multimodal models to analyze images and extract actionable insights for real-world use cases.
In this lecture, you will learn how to work with a fictitious application designed to analyze traffic camera images for congestion levels—light, moderate, or heavy—and detect potential camera malfunctions. You'll explore the process of configuring an Azure OpenAI model, specifically ChatGPT 4o, to interpret images and provide responses based on prompts coded into the system. The lecture also demonstrates how to load images into memory, process them using AI, and handle issues such as throttling for large image files. Finally, you will be introduced to the structure of the application and prepare to update the system message for enhanced image analysis in the next module.
In this lecture, you will learn the importance of crafting a system prompt to establish the AI model's persona and expectations, specifically for traffic analysis and congestion monitoring. The lecture explains how to define traffic congestion levels—heavy, medium, and low—based on vehicle behavior and density, and introduces methods for detecting potential camera malfunctions through image analysis. Finally, this foundational setup will prepare you for the next lecture, where traffic camera images will be uploaded for processing.
In this lecture, you will learn how to enhance AI prompts by incorporating multimodal inputs, such as images, alongside traditional text-based user messages. You'll explore the process of creating a chat message content item collection to handle diverse input types, including passing image data in byte arrays and specifying metadata like MIME types. Additionally, the lecture demonstrates how to structure prompts to include contextual queries for the AI, such as analyzing images to determine traffic congestion levels. Finally, you will understand the importance of transitioning from non-deterministic outputs to structured outputs in order to improve application reliability and facilitate actionable responses, such as generating alerts or tickets for camera malfunctions or traffic issues.
In this lecture, you will learn how to define structured outputs for AI models to achieve deterministic responses, enhancing the functionality of applications. You will explore methods such as using JSON schemas or creating classes to structure outputs, ensuring compatibility with different programming environments like .NET. The lecture also introduces the concept of replacing non-deterministic string outputs with enumerations for precise traffic congestion levels, simplifying interpretation and improving reliability. Finally, you will understand how to register the defined output structure with Semantic Kernel for seamless integration with language models, setting the stage for testing response formats in the next session.
In this lecture, you will learn how to define a structured output format for AI model responses using either JSON or a class structure. You will explore the process of adding a field to the prompt execution settings in Semantic Kernel to specify the desired response format, focusing on creating a class that represents the expected output. The lecture demonstrats how to pass these execution settings into the AI model to generate structured responses. In the next session, you will tackle deserialization of the JSON response into a class object, completing the workflow and preparing the application for full functionality testing.
In this module, you will explore how to enhance application functionality by using deterministic outputs to set alerts, automate ticket creation for broken cameras, and implement color-coded visual indicators for system clarity. This structured approach empowers applications to consistently handle AI-analyzed data and make informed decisions based on traffic and camera conditions.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
In this lecture, you will learn about the Aspire dashboard and its functionality as a dashboard for the web application used to exploring plugins. You will understand how the dashboard operates, including the process of launching applications and using traces to view prompts and AI-generated responses. Additionally, you will explore how chat history is managed, transmitted to the backend, and logged using console and structured logs. Finally, you will examine metrics and traces for monitoring application performance and gain insight into how plugins are implemented behind the scenes.
In this lecture, you will learn how to integrate the Semantic Kernel into an ASP.NET application and understand the differences between console applications and web applications in this context. You will explore how to configure the Semantic Kernel as a service within the web application host, along with AI model configurations and generic prompt execution settings, enabling flexibility in switching models without altering downstream code. Additionally, you'll see how the web application services collection intertwines objects and services, allowing plugins and AI models to access shared resources like HTTP context seamlessly. Finally, you will examine the flow of user interactions, including how prompts and chat history are managed by the web page and passed to the server for processing, ensuring an ongoing chat experience.
In this lecture, you will learn how to write your first plugin for an application. We will cover the steps necessary to ensure your application is running correctly and the AI model is properly configured. You will explore best practices for naming methods and how to register the plugin with the Semantic Kernel to enable AI function calling. Additionally, you will understand the orchestration process handled by Semantic Kernel, which facilitates tool calls and response formulation, allowing the AI to provide real-time information based on user prompts.
In this lecture, you will learn to create a native plugin that enables an AI model to access current weather data by making external API calls. You will explore how to utilize weather.gov for real-time weather data, which AI models typically lack due to outdated training data. We will guide you through the process of setting up the plugin, which involves obtaining latitude and longitude coordinates, making initial API calls to get forecast URLs, and then retrieving detailed weather information in the Digital Weather Markup Language (DWML) format. Additionally, you'll understand the importance of configuring user agents and handling API requests effectively, and how to register the plugin to integrate seamlessly with the AI model for generating accurate weather responses.
In this lecture, you will learn how to enhance a weather plugin by integrating a Geocode API to convert addresses into latitude and longitude coordinates. Additionally, you will understand the process of minimizing data sent back to the AI model by extracting only the latitude and longitude from the API response. Finally, you will explore how Semantic Kernel can orchestrate calls to multiple plugins, enabling the AI model to fulfill user requests efficiently.
In this lecture, you will learn how to enhance an AI model's ability to provide weather information by integrating a personal information plugin. This plugin allows the AI to access personal data, such as your address, which helps it determine your current location without needing to ask. The lecture demonstrates how this personal information plugin works in conjunction with geo-coordinates and weather plugins.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
In this lecture, you will learn how to create a Semantic Kernel plugin using an Open API specification. We will explore the addition of a new customer API to our existing project, focusing on standard CRUD operations. You will observe how the Swagger UI is generated to facilitate these operations, including creating, retrieving, updating, deleting, and searching customer data. Finally, you will understand how the Semantic Kernel uses the JSON file of the API to dynamically convert it into a plugin, enhancing interaction capabilities within a chat program.
In this lecture, you will learn how to integrate an OpenAPI plugin into a chat application using the Semantic Kernel. Furthermore, the lecture will highlight challenges in using OpenAPI specs for AI models, such as versioning difficulties and the unsuitability of descriptions and endpoint naming conventions for AI consumption. In the next lecture, you will validate the plugin's functionality and explore advanced semantic kernel orchestration techniques.
In this lecture, you will learn how to test and validate the successful integration of a model into a chat application. We will explore how to send requests to an AI model, examine function calls, and ensure proper wiring of components. You will also learn how to perform complex queries involving multiple plugin calls, such as retrieving customer data and making API calls in parallel for efficiency. Finally, we will demonstrate how to update customer information using the API, showcasing the power and flexibility of the plugin within the application.
Common Questions and Answers related to creating Semantic Kernel Plugins from API endpoints. Post a questions and it might appear in this video
Introduction to Local and Remote MCP Servers
Where to find MCP Servers and configuration information
Prepare Code for MCP Client Access
Adding and configuring a Local MCP server using Semantic Kernel
Testing the local MCP server added in the previous lecture
Adding and configuring a Remote MCP server using Semantic Kernel
Testing the remote MCP server added in the previous lecture
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
In this first lecture, students will explore the foundational concepts of RAG. The session begins by comparing standard generative AI with RAG-based systems, highlighting RAG’s ability to retrieve external knowledge in real time. Students will learn the two core components of RAG: retrieval (fetching relevant data from an external source like a vector store or search index) and generation (using that data to produce accurate, grounded responses).
The lecture will cover:
The RAG architecture: retrieval + generation flow
Role of embedding models and vector similarity search
How grounding improves factual accuracy
Limitations and challenges in RAG (e.g., hallucination, retrieval relevance)
By the end of the session, students will clearly understand how RAG enables AI systems to respond with current, domain-specific knowledge and sets the stage for building their own RAG pipelines in later modules.
In this lecture, students will learn how to prepare and ingest content for RAG systems using Azure AI Foundry and Azure AI Search. The session covers best practices for document preprocessing, including how to chunk documents into semantically meaningful units and how to generate embeddings for each chunk using Azure AI's embedding models.
Key topics include:
Ingesting unstructured and semi-structured content (PDFs, Office docs, etc.)
Chunking strategies for optimal retrieval performance
Embedding generation using Azure AI Foundry
Storing content in Azure AI Search with vector capabilities
By the end of this lecture, students will know how to transform raw content into an AI-searchable knowledge base, enabling effective retrieval in RAG workflows.
In this hands-on lecture, students will learn how to build a fully functional RAG plugin for Semantic Kernel using Azure AI Search as the vector store. The session focuses on connecting the Semantic Kernel planner or orchestrator to the vectorized knowledge base created in the previous lecture.
Key topics include:
Creating a Semantic Kernel plugin that performs semantic retrieval
Querying Azure AI Search using vector search APIs
Integrating plugin outputs into the kernel's context for grounding
Handling retrieval relevance
Testing the plugin within a chat application
By the end of this lecture, students will be able to write and deploy a RAG plugin that enriches AI responses with accurate, real-time information retrieved from enterprise content.
Common Questions and Answers related to this Section. Post your questions and it might appear in this video
Master AI Application Development with Semantic Kernel: The Comprehensive Developer's Journey
Transform your development skills and applications with cutting-edge AI capabilities! This immersive course takes you on a comprehensive journey from fundamental AI concepts to building sophisticated, intelligent applications using Microsoft's powerful Semantic Kernel framework.
In today's rapidly evolving technological landscape, AI integration has moved from a luxury to a necessity for modern applications. This course equips you with the practical skills and theoretical knowledge to stay ahead of the curve, making you an invaluable asset to any development team or organization looking to leverage AI's transformative potential.
Begin your AI mastery journey with a solid foundation in Large Language Models (LLMs) before diving into the robust capabilities of Azure AI Foundry, where you'll learn to deploy, customize, and optimize OpenAI models for your specific needs. You'll explore model deployment strategies, create effective user prompts, manage chat history, and craft system prompts that yield optimal results.
The core of the course focuses on Semantic Kernel's single Agent orchestration framework, teaching you to seamlessly connect with multiple model providers including Azure OpenAI, standard OpenAI endpoints, and various Inferencing APIs. You'll also master working with local models through ONNX, Ollama, and Hugging Face integrations, giving you ultimate flexibility in your AI implementation approach.
Take your applications to new heights by learning to create intelligent plugins through multiple approaches:
Building native plugins that integrate directly with your application's functionality
Creating OpenAPI plugins that leverage existing API endpoints to extend AI capabilities
Integrating cutting-edge Model Context Protocol (MCP) servers for enhanced model interactions
Implementing Retrieval-Augmented Generation (RAG) plugins for private data
Practical implementation is emphasized throughout, with dedicated sections on integrating Semantic Kernel into ASP.NET applications and working with multimodal models to handle various data types including text, images, and more.
Whether you're a seasoned developer looking to add AI capabilities to your toolkit, or a business analyst seeking to leverage AI for data-driven insights, this course provides the perfect balance of theoretical understanding and hands-on application development.
No extensive AI background required—just bring your basic .NET knowledge and prepare to build the next generation of intelligent applications that will transform user experiences, streamline operations, and unlock new business opportunities!