Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Certified Infra AI Expert: End-to-End GPU-Accelerated AI

Name: Certified Infra AI Expert: End-to-End GPU-Accelerated AI
Rating: 3.8 (191 reviews)

Master GPUs, Omniverse, Digital Twins, AI Containers, Triton Inference, DeepStream, and ModelOps

Created byVivian Aranha, School of AI

Last updated 2/2026

English

What you'll learn

Architect and deploy GPU-accelerated AI pipelines using NVIDIA hardware (A100, H100, L4, Jetson) and the full NVIDIA AI Enterprise software stack.
Optimize AI models for performance and efficiency using TensorRT, TAO Toolkit, and advanced quantization techniques for both cloud and edge deployments.
Implement real-time AI applications with DeepStream, RAPIDS, and Triton Inference Server for video analytics, sensor fusion, and data processing.
Integrate AI solutions with cloud, edge, and digital twin environments, leveraging Kubernetes, Helm, and Omniverse for scalable deployment and simulation.
Apply security, licensing, and containerization best practices to ensure enterprise-grade reliability and compliance in AI systems.

Course content

11 sections • 50 lectures • 2h 34m total length

Certificate of Completion0:29
Introduction to Certified Infra AI Expert: End-to-End GPU-Accelerated AI3:54
The Certified NVIDIA AI Expert: End-to-End GPU-Accelerated AI program is a comprehensive, hands-on training experience designed for professionals who want to master the full spectrum of GPU-powered AI systems. This course takes learners deep into the NVIDIA ecosystem, covering everything from cutting-edge GPU hardware to AI frameworks, SDKs, and deployment pipelines, ensuring that graduates are equipped to design, build, and optimize real-world AI solutions across cloud, edge, and on-premises environments.
At its core, the program is built to bridge the gap between AI theory and production-grade deployment. Students begin by developing a foundational understanding of NVIDIA GPU architectures—including the A100, H100, L4, and the Jetson family—and how each is optimized for different AI workloads such as deep learning training, inference at scale, and low-power edge computing. They will also learn to leverage cloud GPU instances on AWS, Azure, and DGX Cloud, giving them the flexibility to work in diverse deployment environments.
From there, the course transitions into the NVIDIA AI software stack, focusing on NVIDIA AI Enterprise, containerized AI workflows via the NGC Registry, and core toolkits such as DeepStream for real-time video analytics, RAPIDS for GPU-accelerated data science, and Triton Inference Server for scalable inference. Students will gain experience pulling, modifying, and deploying AI containers, as well as integrating pretrained models and SDKs into their own projects.
A major emphasis of the course is model optimization and operationalization. Through the TensorRT framework, learners will master techniques such as quantization, pruning, and transfer learning to improve inference speed without compromising accuracy. They will also explore ModelOps practices, from experiment tracking with Weights & Biases or MLflow, to building automated retraining pipelines using Kubernetes and Helm for orchestrating large-scale AI workloads.
The program dives into industry-specific NVIDIA SDKs, including Metropolis for smart cities, Riva for speech AI, NeMo for natural language processing, Clara for healthcare AI, and Merlin for recommender systems. This enables students to apply their skills to domain-focused AI solutions with immediate business impact.
Security, licensing, and compliance are also key pillars. Students will understand how to secure AI models and containers, manage enterprise licensing through the NVIDIA License Server, and implement best practices for data privacy and regulatory adherence—critical for deploying AI in sensitive industries.
The highlight of the program is the capstone project, where learners choose between three tracks: Video Surveillance with DeepStream, Digital Twin Development with Omniverse, or Smart Edge AI with Jetson and IoT Sensors. Each track requires full integration of hardware, optimized AI models, containerized deployment, and either cloud or edge deployment pipelines. This ensures students graduate with production-ready, portfolio-worthy projects that showcase their end-to-end AI expertise.
By the end of the course, students will not only understand the NVIDIA AI stack in depth, but will also have the practical skills to architect, deploy, and optimize GPU-accelerated AI solutions at scale. Whether for AI engineering, product deployment, or enterprise innovation, this certification marks graduates as experts capable of delivering high-performance, future-ready AI systems.

Introduction to GPU Architecture (A100, H100, L4, Jetson)4:52
In this submodule, students will gain a comprehensive understanding of NVIDIA’s GPU architecture, exploring the core technologies that power modern AI, machine learning, deep learning, and high-performance computing (HPC) workloads. We’ll examine the evolution of GPU design and how NVIDIA’s architecture has evolved from earlier generations to the cutting-edge Ampere (A100) and Hopper (H100) architectures, as well as specialized GPUs like the L4 for inference acceleration and Jetson modules for edge AI.
We begin with the fundamental building blocks of GPU architecture — CUDA cores, Tensor Cores, RT Cores, memory hierarchy (HBM2e, LPDDR5), and interconnect technologies like NVLink and PCIe Gen5. Students will learn how CUDA cores handle massive parallel computations, how Tensor Cores accelerate deep learning matrix operations, and how RT Cores are leveraged for real-time ray tracing in simulation and rendering.
Next, we explore GPU families and their primary use cases:
NVIDIA A100 (Ampere Architecture) – Optimized for training large AI models, HPC workloads, and multi-instance GPU (MIG) configurations. Students will understand SM (Streaming Multiprocessor) structure, mixed-precision computing, and memory bandwidth considerations.
NVIDIA H100 (Hopper Architecture) – The next generation for extreme-scale AI training and inference, with Transformer Engine acceleration for LLMs, improved sparsity support, and FP8 precision for massive speedups.
NVIDIA L4 GPUs – Energy-efficient inference accelerators for cloud deployments and video analytics, integrating Tensor Cores optimized for streaming workloads.
NVIDIA Jetson Modules (Orin, Xavier, Nano) – Edge AI platforms combining ARM CPUs and NVIDIA GPUs in a compact form factor, enabling real-time processing for robotics, IoT, and autonomous systems.
Students will also explore scaling strategies for multi-GPU and multi-node environments, including NVLink/NVSwitch interconnects and DGX system topologies. We’ll compare performance characteristics across FP64, FP32, TF32, FP16, and INT8 precision levels, explaining why mixed precision is essential for balancing speed and accuracy.
The submodule includes an in-depth look at power efficiency and thermal design, critical for both data center deployments and embedded edge devices. Students will analyze TDP (Thermal Design Power) differences, cooling options (air, liquid), and how to optimize GPU utilization for sustained workloads.
In addition, we’ll discuss software integration — how CUDA, cuDNN, and NCCL libraries leverage NVIDIA hardware, and how TensorRT ties into the architecture for maximum inference performance. We’ll also briefly introduce NVIDIA GPU instances on AWS and Azure, setting the stage for cloud-based labs in later submodules.
By the end of this submodule, students will be able to:
Identify key components of NVIDIA GPU architecture and their roles in AI acceleration.
Differentiate between A100, H100, L4, and Jetson GPUs and choose the right hardware for specific workloads.
Understand the performance trade-offs of different precision formats and architectures.
Recognize how hardware design impacts AI model training, inference, and deployment efficiency.
This knowledge forms the foundation for all subsequent modules, ensuring students can make hardware-aware decisions when building GPU-accelerated AI systems.
GPU Instances on AWS and Azure4:06
In this submodule, students will learn how to leverage cloud-based NVIDIA GPU instances on AWS and Azure to run AI training, inference, and high-performance computing (HPC) workloads without the need for on-premises infrastructure. This is a critical skill for AI engineers and solution architects, as cloud deployment offers on-demand scalability, cost flexibility, and global accessibility for GPU-powered workloads.
We begin by introducing the benefits of using NVIDIA GPUs in the cloud — rapid provisioning, pay-as-you-go pricing, and the ability to match instance types to workload requirements. Students will compare training vs. inference scenarios in the cloud and understand how different GPU types impact performance, latency, and cost-efficiency.
Next, we dive into AWS GPU instance offerings:
P4 and P5 Instances – Featuring A100 GPUs for large-scale AI model training, HPC simulations, and multi-GPU workloads.
G5 Instances – Using A10G GPUs optimized for graphics rendering, ML inference, and video processing.
G4dn Instances – Equipped with T4 GPUs for low-latency inference and smaller-scale training workloads.
Elastic Fabric Adapter (EFA) integration for high-bandwidth, low-latency distributed training.
We then explore Azure GPU VM families:
NC Series – Optimized for compute-intensive AI workloads, often featuring V100 or A100 GPUs.
ND Series – Designed for deep learning training, supporting multiple GPUs per VM for parallel workloads.
NV Series – Targeted at graphics-heavy applications, remote rendering, and AI inference.
Students will learn how to select the right cloud GPU instance by evaluating:
GPU architecture (Ampere, Volta, etc.)
Number of GPUs per instance
Memory capacity and bandwidth
Networking capabilities (InfiniBand, EFA, NVLink)
Pricing models (on-demand, reserved, spot instances)
The submodule then covers provisioning and setup workflows:
Launching a GPU instance in AWS or Azure using their respective consoles or CLI tools.
Configuring NVIDIA drivers and verifying GPU availability via nvidia-smi.
Installing CUDA and cuDNN for deep learning frameworks like PyTorch and TensorFlow.
Attaching cloud storage for datasets and model checkpoints.
We also explore cloud-native NVIDIA integrations:
AWS Deep Learning AMIs preconfigured with CUDA, cuDNN, and frameworks.
Azure Machine Learning with NVIDIA GPU compute clusters for managed training and deployment.
NVIDIA AI Enterprise licensing options in the cloud.
Security and cost management are also addressed. Students will learn IAM roles and permissions for GPU workloads, network security groups, and monitoring GPU utilization to avoid unnecessary costs. We’ll discuss using auto-scaling policies for inference endpoints to handle fluctuating traffic efficiently.
By the end of this submodule, students will be able to:
Identify and select the right NVIDIA GPU instance type for a given workload.
Provision and configure GPU-enabled cloud environments for AI.
Optimize cost, performance, and scalability when running workloads in AWS and Azure.
Integrate cloud GPU instances into hybrid or multi-cloud AI pipelines.
This knowledge ensures that students can make data-driven, cost-effective decisions about cloud GPU usage, setting the stage for scalable AI deployments in later modules.
DGX Systems and DGX Cloud Overview3:47
In this submodule, students will explore NVIDIA DGX Systems and DGX Cloud, the flagship platforms for AI supercomputing that enable large-scale training, inference, and simulation workloads with unmatched performance and scalability. These systems are the backbone of enterprise AI infrastructure, used by industry leaders for LLMs (Large Language Models), digital twins, drug discovery, and autonomous systems.
We begin with the DGX Systems lineup:
DGX A100 – Featuring 8× NVIDIA A100 GPUs connected via NVLink and NVSwitch, enabling massive parallelism and multi-instance GPU (MIG) capabilities. Ideal for both AI training and inference at scale.
DGX H100 – Powered by the Hopper architecture with Transformer Engine for unprecedented speed in training foundation models, LLMs, and generative AI workloads.
DGX Station – A smaller, desk-side AI workstation designed for data science teams who need DGX power without data center installation.
Students will learn the hardware architecture of DGX systems, including:
High-bandwidth memory (HBM2e, HBM3) for large model training.
NVLink/NVSwitch topology enabling up to 600 GB/s GPU-to-GPU bandwidth.
Storage configurations for high I/O throughput (NVMe SSD arrays).
Networking capabilities with InfiniBand for distributed training across multiple DGX systems.
We then move to DGX Cloud, NVIDIA’s AI supercomputing as a service platform. Students will understand how DGX Cloud delivers the same DGX hardware capabilities but in a fully managed cloud environment accessible from anywhere.
Key DGX Cloud concepts covered include:
Preconfigured AI environments with NVIDIA AI Enterprise stack, CUDA, cuDNN, and NCCL.
On-demand scaling for massive parallel training jobs.
Integration with AWS, Azure, and Oracle Cloud Infrastructure (OCI) for hybrid deployments.
Use cases like fine-tuning LLMs, synthetic data generation, and complex physics simulations in Omniverse.
The submodule will also highlight NVIDIA Base Command, the orchestration platform for managing DGX clusters and DGX Cloud instances. Students will see how Base Command provides:
Centralized job scheduling and monitoring.
Multi-user access control.
Data management for AI workflows.
Integration with SLURM for HPC environments.
To connect theory with practice, students will:
Compare on-premises DGX vs DGX Cloud for various AI workloads.
Evaluate performance benchmarks for training and inference.
Understand cost structures and when to choose CAPEX vs OPEX investment in AI infrastructure.
See real-world case studies from industries like automotive (autonomous driving), healthcare (drug discovery), and finance (risk modeling).
By the end of this submodule, students will be able to:
Describe the hardware and architecture of NVIDIA DGX systems.
Understand DGX Cloud’s capabilities and its integration into enterprise AI pipelines.
Use Base Command to manage multi-node, multi-GPU AI workloads.
Decide between on-prem DGX and DGX Cloud based on project requirements, scalability needs, and budget constraints.
This knowledge ensures that students can design AI infrastructure strategies that balance performance, scalability, and cost, preparing them for advanced deployment and optimization scenarios in later modules.
AI Enterprise: Drivers, Operators, Setup4:22
In this submodule, students will gain a deep understanding of NVIDIA AI Enterprise, the end-to-end software suite designed to run AI workloads efficiently on NVIDIA GPUs across data centers, clouds, and edge environments. This is a critical skill for AI engineers because hardware is only as good as the software stack that optimizes it.
We start with what NVIDIA AI Enterprise is — a curated, enterprise-grade stack of AI frameworks, pretrained models, SDKs, and deployment tools that are certified, supported, and optimized for NVIDIA GPUs. Students will see how AI Enterprise ensures compatibility, stability, and performance for production-grade AI systems.
Key components of NVIDIA AI Enterprise include:
GPU-Optimized Frameworks – TensorFlow, PyTorch, RAPIDS, Triton, TAO Toolkit, and more.
NVIDIA Drivers – The backbone for enabling GPU acceleration, including CUDA drivers for computational workloads.
NVIDIA Operators – Kubernetes-ready deployment packages that simplify GPU setup and monitoring.
Enterprise Support & Security – Regular updates, bug fixes, and security patches for mission-critical environments.
Installing and Configuring NVIDIA Drivers
Students will learn how to install NVIDIA GPU drivers across Linux and Windows environments. We’ll cover:
Identifying GPU hardware with nvidia-smi.
Choosing between data center drivers (Tesla) and desktop drivers (RTX/Quadro).
Installing drivers manually and via package managers like apt, yum, or using containerized driver operators.
Ensuring CUDA Toolkit compatibility with the chosen driver version.
Using NVIDIA Operators for Kubernetes Environments
For AI workloads running in containerized clusters, NVIDIA Operators automate the installation and configuration of GPU resources. We will explore:
NVIDIA GPU Operator – Automates deployment of drivers, container runtime, and monitoring agents on Kubernetes nodes.
NVIDIA Device Plugin for Kubernetes – Enables GPU scheduling and allocation in Kubernetes pods.
Configuring Helm charts to deploy GPU workloads seamlessly.
Students will perform a hands-on exercise deploying GPU Operator on a test Kubernetes cluster, ensuring that GPU resources are detected and available to workloads.
Setting up NVIDIA AI Enterprise on Cloud and On-Prem
We will walk through deployment scenarios:
VM-based AI Enterprise setup on AWS, Azure, and VMware vSphere.
Bare-metal installation for high-performance clusters.
Preconfigured NGC-ready AI Enterprise images for cloud instances.
Special emphasis will be placed on licensing, including NVIDIA License System (NLS) setup, license token retrieval, and tracking license usage for compliance.
By the end of this submodule, students will be able to:
Install and configure NVIDIA GPU drivers correctly for AI workloads.
Deploy NVIDIA Operators for Kubernetes GPU resource management.
Set up NVIDIA AI Enterprise environments for both cloud and on-premises infrastructure.
Troubleshoot driver, CUDA, and container integration issues effectively.
This knowledge ensures that students can bridge the gap between NVIDIA hardware and AI software environments, enabling reliable, high-performance, and production-ready AI deployments — a foundation for later modules like containerization, inference optimization, and model serving.
Hands-on Lab: Set up a GPU-powered VM on AWS with drivers0:03
In this submodule, students will gain practical experience by creating a GPU-powered virtual machine in AWS and preparing it for AI workloads through the installation of NVIDIA drivers, CUDA Toolkit, and essential AI libraries. The lab begins with an exploration of AWS’s EC2 GPU instance offerings, focusing on the differences between P4, P5, and G5 families, and how each is suited for specific AI tasks such as large-scale model training, inference acceleration, or video analytics. Students will learn how to select the appropriate instance type based on GPU architecture, memory capacity, and network performance, ensuring the environment aligns with the requirements of the AI workloads they plan to run.
Once the correct GPU instance type is chosen, the lab will guide students through provisioning the VM via the AWS Management Console, including selecting the right Amazon Machine Image (AMI) and configuring networking and storage settings. This step will include setting up security groups to allow secure access via SSH and ensuring that the instance has the correct IAM role permissions for data storage and retrieval. The focus here is on building a secure and scalable foundation for running GPU-accelerated AI workloads in the cloud.
The next phase of the lab involves verifying the GPU hardware using the NVIDIA System Management Interface to ensure that the GPU is properly detected and functional. Students will then proceed to install the NVIDIA GPU drivers, a crucial step in enabling CUDA acceleration. They will explore the relationship between driver versions and CUDA Toolkit compatibility, gaining an understanding of why mismatches can cause performance issues or runtime errors. This section also covers best practices for updating drivers in a cloud environment, ensuring minimal downtime and maximum compatibility with AI frameworks.
Following the driver installation, students will set up the CUDA Toolkit and cuDNN libraries, which are essential for running deep learning frameworks such as TensorFlow and PyTorch with GPU acceleration. They will also learn how to configure environment variables and perform a quick benchmark test to validate GPU performance. This ensures that the environment is fully optimized for high-performance AI workloads, whether for model training or real-time inference.
To complete the lab, students will install a sample AI application from the NVIDIA NGC registry, run it within the newly provisioned GPU environment, and monitor GPU utilization to verify that the workload is indeed leveraging the GPU resources effectively. This final step ties together all the concepts covered in the submodule, from provisioning the instance to configuring it for production-ready AI tasks.
By the end of this submodule, students will have gained the confidence and technical ability to deploy GPU-powered VMs in AWS, install and configure NVIDIA drivers, and prepare cloud-based environments for GPU-accelerated AI development and deployment. This hands-on experience provides the essential skills needed for working with cloud-based AI infrastructure, setting the stage for deeper containerization and orchestration topics covered in later modules.

Introduction to NGC Ecosystem3:39
The NVIDIA NGC ecosystem is the central hub for GPU-optimized AI software, providing developers, data scientists, and researchers with immediate access to a vast library of prebuilt containers, pretrained models, and industry-specific SDKs. In this submodule, students will explore how NGC serves as a one-stop platform for accelerating AI development, reducing setup complexity, and ensuring optimal performance on NVIDIA GPUs across cloud, on-premises, and edge environments.
The learning journey begins with an understanding of why containerization is critical for AI workflows. Students will see how NGC provides Docker-based containers that are already configured with compatible versions of CUDA, cuDNN, and relevant deep learning frameworks such as TensorFlow, PyTorch, and RAPIDS. This eliminates the time-consuming process of manually setting up environments, avoiding version mismatches and dependency conflicts.
From there, students will dive into the structure of the NGC catalog, learning how it organizes resources into categories such as AI frameworks, HPC applications, industry SDKs, and Helm charts for Kubernetes deployment. The submodule will highlight key offerings such as DeepStream for video analytics, Riva for conversational AI, Clara for medical imaging and genomics, and Merlin for recommender systems. Each example will demonstrate how NGC enables professionals to start with a high-performance baseline and focus on building application-specific features rather than setting up infrastructure from scratch.
The course will then explore the workflow for accessing and pulling containers from NGC, beginning with account creation, API key generation, and authentication using the NGC CLI. Students will understand the role of container tags for selecting specific versions, and why version control is vital for reproducible AI experiments. They will also learn about the performance tuning inherent in NGC containers, which are optimized to fully leverage NVIDIA hardware capabilities such as Tensor Cores, multi-GPU scaling, and mixed precision training.
Security and compliance are integral to the NGC ecosystem, so this submodule will address how NGC containers are digitally signed to verify integrity and prevent tampering. Students will also gain insight into NVIDIA’s continuous testing and validation process, ensuring that the containers meet enterprise-grade stability and reliability standards.
Finally, students will explore NGC’s integration with cloud platforms, including AWS, Azure, and Google Cloud, enabling seamless deployment of containers in both managed and self-hosted GPU environments. By connecting these concepts, students will see how NGC can serve as the foundation of their AI pipeline, whether they are building proof-of-concept models, scaling production inference, or deploying edge AI solutions.
By the end of this submodule, students will understand the strategic importance of NVIDIA NGC in modern AI workflows. They will be able to navigate the catalog, identify the right resources for their projects, and initiate a containerized AI environment that is secure, optimized, and ready for deployment. This foundational knowledge sets the stage for the next submodule, where they will learn how to deploy AI containers using NGC CLI and Helm in practical scenarios.
Deploying AI Containers via NGC CLI and Helm4:02
Deploying AI workloads efficiently requires a reliable containerization strategy, and in the NVIDIA ecosystem, this is accomplished through the NGC Command Line Interface (CLI) and Helm for Kubernetes orchestration. In this submodule, students will master the process of pulling, customizing, and deploying GPU-optimized containers from the NVIDIA NGC registry, ensuring that AI applications are both scalable and production-ready.
The learning experience begins with a deep dive into the NGC CLI, a lightweight yet powerful tool that allows users to authenticate with NGC, browse available resources, and pull containers directly into their development or production environments. Students will learn how to authenticate with an NGC API key, explore the available AI frameworks, SDKs, and pretrained models, and select the appropriate container version using image tags. This process ensures that every deployment is reproducible and consistent with the project’s performance requirements.
Once the container is pulled, students will explore the process of running it locally with GPU acceleration, ensuring that all dependencies and drivers are correctly mapped into the container runtime. This stage emphasizes the importance of CUDA compatibility and correct driver configurations to prevent performance bottlenecks or deployment errors. Students will also experiment with customizing containers by adding project-specific scripts, data preprocessing pipelines, and inference endpoints so that the container is fully tailored to their AI application.
The submodule then transitions to deploying these AI containers at scale using Helm, a Kubernetes package manager that simplifies the process of defining, installing, and upgrading containerized applications. Students will understand the structure of a Helm chart, including how values files control configuration parameters, such as resource allocation, GPU scheduling, and environment variables. They will learn to use GPU-enabled Kubernetes nodes in combination with Helm to ensure that each pod receives the correct hardware resources.
A key section of this submodule focuses on scaling AI workloads across multiple GPUs and nodes, allowing models to serve high-throughput inference or train large datasets efficiently. Students will study deployment patterns for stateless inference services, batch processing jobs, and real-time streaming applications, all orchestrated through Helm. This reinforces the concept that containerization is not only about portability but also about orchestration and resource optimization in production environments.
Security best practices will also be discussed, including pulling only signed containers from trusted sources, using private container registries for proprietary models, and managing authentication credentials securely in Kubernetes using secrets. This ensures that deployments are not only efficient but also compliant with enterprise security standards.
By the end of this submodule, students will be able to use the NGC CLI to locate and retrieve optimized AI containers, customize them for specific workloads, and deploy them at scale using Helm charts on GPU-powered Kubernetes clusters. They will have a clear understanding of how to move seamlessly from development to production deployment, leveraging NVIDIA’s optimized containers to maximize performance while minimizing operational complexity.
Pretrained Models and SDKs Available on NGC4:15
The NVIDIA NGC registry is not only a source for GPU-optimized containers but also a powerful repository of pretrained models and domain-specific SDKs that can drastically accelerate AI development. In this submodule, students will explore how to leverage these resources to bypass the most time-consuming stages of AI projects, enabling them to move from concept to deployment with unprecedented speed and efficiency.
The learning begins with an understanding of pretrained models and why they have become essential in modern AI workflows. Training a high-performing model from scratch often requires massive datasets, specialized hardware, and weeks or months of compute time. Pretrained models from NGC eliminate this barrier by offering ready-to-use neural networks that have already been trained on large, high-quality datasets using NVIDIA’s optimized hardware. Students will see how these models can be applied immediately for inference or fine-tuned for specific tasks through transfer learning, significantly reducing development cycles.
Students will explore the NGC model catalog, which contains pretrained models for a wide variety of applications. For computer vision, they will find models for image classification, object detection, segmentation, and pose estimation. For natural language processing (NLP), the catalog offers transformer-based models such as BERT, Megatron-LM, and GPT variants optimized for NVIDIA GPUs. For speech AI, they will discover models from Riva for automatic speech recognition (ASR), text-to-speech (TTS), and translation. Industry-specific models are also available, including Clara medical imaging models for healthcare, Metropolis video analytics models for smart cities, and Merlin recommender system models for retail and e-commerce personalization.
Alongside pretrained models, NGC provides domain-optimized SDKs that bundle together frameworks, tools, and reference applications. These SDKs offer a complete environment for rapid development in targeted industries. Students will explore SDKs such as DeepStream for real-time video analytics, Riva for conversational AI, Nemo for large-scale NLP model training, and Clara for healthcare AI applications. Each SDK comes with pre-integrated workflows, making it easier to adapt them to specific business requirements without starting from scratch.
The submodule will guide students through the process of downloading models and SDKs from NGC, verifying their integrity, and integrating them into their AI pipelines. They will learn how to choose between using a model directly for inference or adapting it through fine-tuning to suit unique data distributions and domain-specific challenges. Special attention will be given to model optimization techniques such as quantization, pruning, and TensorRT acceleration to ensure that even pretrained models deliver the best possible performance in production.
Students will also examine real-world case studies of organizations that have successfully leveraged NGC’s pretrained models and SDKs to accelerate their AI initiatives, from autonomous vehicle perception systems to multilingual customer support chatbots. This will reinforce the practical, business-oriented benefits of using NGC resources.
By the end of this submodule, students will have the ability to navigate the NGC model and SDK catalog, identify the most suitable resources for their projects, and integrate them seamlessly into their AI workflows. They will understand how pretrained models and industry SDKs can serve as both a launchpad for rapid prototyping and a foundation for scalable, production-grade AI solutions.
Container Trust, Licensing, and Security Best Practices3:32
In this submodule, students will explore the critical aspects of trust, licensing, and security when working with NVIDIA AI containers from the NGC registry. While performance and functionality are essential, ensuring that containers are secure, verified, and compliant with licensing terms is equally important for production-grade deployments, especially in enterprise environments where governance and regulatory requirements are strict.
The learning journey begins with an understanding of container trust. Students will discover how NVIDIA digitally signs containers available on the NGC registry, allowing developers to verify that the container they are using has not been tampered with and originates from a trusted source. This verification process is crucial for maintaining integrity in production systems, preventing malicious code injection, and safeguarding intellectual property. The submodule will emphasize the importance of verifying container signatures before deployment and how to incorporate signature checks into automated CI/CD pipelines to ensure ongoing trust.
Next, students will explore licensing considerations for NVIDIA software. Many NVIDIA enterprise-grade solutions, such as NVIDIA AI Enterprise, DeepStream, or certain SDKs, require proper license activation for full functionality and compliance. The submodule will cover the use of the NVIDIA License System (NLS), detailing how to obtain licenses, configure license servers, and manage license entitlements across multiple deployments. Students will also learn the difference between evaluation, developer, and enterprise licenses, and how these affect redistribution, scaling, and commercial usage rights. This knowledge is essential for organizations that must remain compliant with legal and contractual obligations.
Security best practices form the final and most extensive section of this submodule. Students will learn about the security risks of containerized AI workloads, including vulnerabilities in base images, misconfigured permissions, and potential data leaks. They will explore methods to mitigate these risks, such as using minimal base images, applying timely security patches, and enforcing principle of least privilege in container environments. Additionally, they will learn how to use private container registries for sensitive or proprietary workloads, ensuring that only authorized users can access and deploy them.
Special attention will be given to runtime security. Students will see how GPU-enabled containers require additional consideration, as GPU drivers and runtime components must also be secured and kept up to date. They will understand the implications of allowing privileged access to GPU devices inside containers, and how to balance performance requirements with security controls.
By the end of this submodule, students will be proficient in verifying container trust through signature validation, managing NVIDIA licensing for compliance, and implementing robust security best practices for AI container deployments. They will appreciate that deploying an AI solution is not only about speed and performance but also about building a trustworthy, compliant, and secure foundation. This ensures that their GPU-accelerated AI systems can operate confidently in production without exposing organizations to unnecessary risks.
Hands-on Lab: Pull, modify, and run a Deep Learning container from NGC0:03
In this hands-on submodule, students will apply the knowledge gained in the earlier sections of the module to practically work with an NVIDIA deep learning container from the NGC registry. The goal is to gain confidence in pulling, customizing, and executing a GPU-optimized container for a real AI workload. This lab will simulate the process of going from container discovery to functional AI deployment, covering every step in between.
The exercise begins with discovering a suitable deep learning container on the NGC registry. Students will learn how to navigate the catalog, filter results to identify containers that are GPU-optimized, and examine the detailed documentation provided for each container. They will understand the significance of version tags, framework compatibility (e.g., TensorFlow, PyTorch, RAPIDS), and CUDA/cuDNN versions to ensure the container is fully aligned with their environment and hardware capabilities.
Once the container is selected, students will proceed to authenticate with NGC using the CLI. They will generate an API key from their NGC account, configure the CLI for access, and pull the container image to their local machine or cloud-based GPU instance. The importance of verifying image integrity through NVIDIA’s digital signature checks will be reinforced here, ensuring the container comes from a trusted and unaltered source.
The next phase involves running the container interactively with GPU acceleration enabled. Students will mount local directories into the container for accessing datasets, scripts, and output results. They will practice adjusting runtime options for GPU allocation, environment variables, and batch size settings to optimize performance. Inside the container, students will explore the pre-installed deep learning framework, run a sample training or inference script, and verify that the GPU is being fully utilized by monitoring system metrics.
Customization is the next critical step. Students will add their own code, modify configuration files, and install additional Python packages to extend the container’s functionality. This demonstrates the flexibility of NGC containers as a starting point that can be adapted to meet the unique requirements of different AI projects. They will also learn how to commit their modified container into a new image, tag it appropriately, and store it in a private registry for secure future use.
The lab concludes with exporting results and cleaning up resources. Students will save their trained model checkpoints, logs, and other artifacts to persistent storage outside the container. They will then stop the container, remove unnecessary images, and document their workflow for reproducibility.
By the end of this hands-on lab, students will have completed the full lifecycle of working with an NVIDIA NGC deep learning container—from discovery and secure retrieval to execution, customization, and preservation for production use. This experience will give them the practical skills needed to integrate NGC containers into their AI development pipelines, ensuring they can deploy GPU-accelerated AI workloads rapidly and reliably in real-world scenarios.

Triton Inference Server: Architecture & Features6:16
The NVIDIA Triton Inference Server is a powerful open-source platform designed to simplify and accelerate AI inference across a wide range of deployment environments. In this submodule, students will explore Triton’s architecture, supported model formats, and advanced features that make it a critical component in scaling AI workloads from prototype to production.
The learning journey starts with an introduction to the core purpose of Triton—to serve machine learning models efficiently in production, regardless of the framework in which they were trained. Students will understand how Triton enables deployment of multiple models simultaneously, each potentially from different frameworks such as TensorFlow, PyTorch, ONNX Runtime, or TensorRT, within a single inference server instance. This flexibility eliminates the need to build and maintain separate serving stacks for different model types, streamlining operations.
From there, students will dive into Triton’s modular architecture, learning about the HTTP/REST and gRPC endpoints that allow applications to send inference requests. They will explore the backend architecture, where each backend is responsible for executing a specific model format or runtime, and the scheduler that optimizes request handling to minimize latency and maximize throughput. Triton’s dynamic batching capabilities will be covered in depth, showing how requests from multiple clients can be combined into larger batches to improve GPU utilization without increasing response time beyond acceptable thresholds.
Students will also explore model versioning within Triton, which allows multiple versions of a model to be deployed and served concurrently. This feature enables A/B testing, rollback strategies, and smooth transitions between updated models without disrupting production services.
The submodule will highlight Triton’s integration with TensorRT, NVIDIA’s high-performance inference optimizer and runtime. Students will learn how TensorRT can drastically reduce latency, increase throughput, and optimize models for specific GPU architectures through techniques like layer fusion, precision calibration, and quantization.
Triton’s monitoring and metrics capabilities will be another key focus. Students will see how the server integrates with Prometheus to expose performance metrics such as request counts, latency histograms, and GPU utilization, enabling data-driven optimization of inference services. The importance of these metrics in ensuring service-level objectives (SLOs) are met in production environments will be emphasized.
Security considerations will also be covered, including deploying Triton behind API gateways, securing endpoints with authentication and encryption, and managing access control in multi-tenant environments.
By the end of this submodule, students will have a comprehensive understanding of Triton’s architecture and features. They will know how Triton can serve as a unified, optimized inference platform capable of handling diverse workloads at scale, and they will be ready to configure it for real-world deployments. This foundation will prepare them for subsequent submodules, where they will dive deeper into model formats, TAO Toolkit integration, and performance optimization.
Model Formats: TensorRT, ONNX, TorchScript, etc.4:10
In this submodule, students will gain a deep understanding of the model formats supported by NVIDIA Triton Inference Server and why choosing the right format is critical for performance, compatibility, and deployment scalability. The focus will be on TensorRT, ONNX, TorchScript, and other popular inference-ready formats, with an emphasis on how each fits into the AI production pipeline.
The learning begins with TensorRT, NVIDIA’s specialized, high-performance inference engine. Students will discover how TensorRT takes trained models from frameworks like TensorFlow or PyTorch and applies advanced graph optimizations such as layer fusion, kernel auto-tuning, precision calibration for FP16 and INT8, and memory optimizations. These transformations can significantly reduce inference latency and improve throughput, especially on NVIDIA GPU architectures like A100, H100, and Jetson Orin. Students will learn how to convert models into the .plan format, Triton’s native TensorRT deployment file, and understand when this conversion is most beneficial.
Next, the submodule explores the ONNX (Open Neural Network Exchange) format, which is designed for framework interoperability. Students will learn how ONNX allows models trained in PyTorch, TensorFlow, or other frameworks to be exported into a standardized format that can be deployed on a wide variety of runtimes, including Triton’s ONNX Runtime backend. This section will cover how to export a model to ONNX, validate its graph, and troubleshoot compatibility issues that can arise during conversion. The advantages of ONNX for cross-platform deployment and long-term maintainability will also be emphasized.
The discussion then moves to TorchScript, a PyTorch-specific model representation that enables optimized execution in production environments without the need for the original Python training environment. Students will explore the two main approaches to generating TorchScript models—tracing and scripting—and understand when each is most appropriate. They will see how TorchScript can be loaded directly into Triton’s PyTorch backend, enabling a fast path from research to production for PyTorch models.
Beyond these major formats, students will also examine support for other specialized model types, such as TensorFlow SavedModel, Python custom backends for unique inference logic, and ensemble models that combine multiple models into a single pipeline within Triton. The concept of model repositories in Triton will be introduced, showing how different formats can coexist, enabling hybrid workloads that leverage the strengths of multiple frameworks.
Performance considerations will be woven throughout the submodule. Students will learn how certain formats yield faster inference depending on the target hardware, how precision modes like FP32, FP16, and INT8 impact speed and accuracy, and how to benchmark models to determine the optimal deployment format for a given use case.
By the end of this submodule, students will understand the strengths, weaknesses, and best use cases for TensorRT, ONNX, TorchScript, and other supported formats in Triton. They will be able to select the ideal format for their application, perform necessary conversions, and ensure models are deployment-ready for high-performance GPU inference at scale. This knowledge will form the foundation for applying TAO Toolkit and optimization techniques in the next stages of the course.
TAO Toolkit for Transfer Learning & Quantization4:24
In this submodule, students will dive into the NVIDIA TAO Toolkit—a powerful, enterprise-grade framework designed to simplify transfer learning, model fine-tuning, and quantization for GPU-accelerated AI workflows. The focus will be on how TAO enables developers to start with high-quality pretrained models from the NVIDIA NGC registry and adapt them quickly to their own custom datasets without having to train from scratch, drastically reducing development time and compute costs.
The learning begins with a deep understanding of transfer learning and why it has become a dominant approach in production AI development. Students will explore how pretrained models, already trained on massive datasets like ImageNet or large-scale speech corpora, contain valuable feature representations that can be fine-tuned with smaller, domain-specific datasets. This makes it possible to achieve state-of-the-art performance with significantly fewer training iterations, reduced labeling requirements, and lower GPU resource consumption.
From here, the submodule introduces the TAO Toolkit architecture and its command-line-driven workflow. Students will learn about the key stages of the TAO pipeline—data preparation, model training or fine-tuning, evaluation, pruning, quantization, and export—and how each stage integrates seamlessly with NVIDIA GPU acceleration. Real-world examples will illustrate how TAO can fine-tune object detection models for retail store analytics, ASR models for domain-specific speech commands, or medical imaging models for healthcare diagnostics.
A major focus will be placed on quantization, the process of reducing a model’s numerical precision to improve performance while maintaining accuracy. Students will explore FP16 and INT8 quantization in detail, understanding how reduced-precision models can dramatically lower inference latency and memory footprint—particularly important for edge AI deployments on devices like Jetson Xavier and Jetson Orin. They will also learn how TAO integrates quantization with TensorRT optimization, ensuring that exported models are ready for maximum efficiency in Triton Inference Server or DeepStream pipelines.
The submodule will then cover model pruning, a process that removes unnecessary parameters from the network to further improve inference speed without significant accuracy degradation. Students will understand how pruning can work in tandem with quantization to produce lightweight, high-speed models that still meet production accuracy requirements.
Students will also gain hands-on exposure to experiment tracking and hyperparameter tuning within TAO. They will see how altering batch sizes, learning rates, and augmentation strategies can directly impact model accuracy and performance. Best practices for dataset preparation will be covered, including labeling standards, augmentation strategies, and class balance considerations.
By the end of this submodule, students will have a complete understanding of how to use the NVIDIA TAO Toolkit for transfer learning, pruning, and quantization. They will be capable of taking a pretrained NVIDIA model, fine-tuning it for a domain-specific use case, optimizing it for latency and throughput, and exporting it in an inference-ready format for deployment with Triton or DeepStream. This skillset will allow them to build highly optimized, production-ready AI models in a fraction of the time and cost compared to traditional training approaches.
Performance Optimization Techniques using TensorRT4:38
In this submodule, students will gain deep expertise in using NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime, to maximize the speed, efficiency, and scalability of AI models in production. They will learn how TensorRT transforms trained models into highly optimized deployment artifacts that take full advantage of GPU hardware acceleration, achieving significantly lower inference latency and higher throughput compared to unoptimized models.
The journey begins with an understanding of TensorRT’s optimization pipeline. Students will explore how TensorRT parses models from various frameworks—such as TensorFlow, PyTorch, or ONNX—and applies a series of graph optimizations to streamline computation. These optimizations include layer fusion (combining multiple operations into a single GPU kernel), kernel auto-tuning (selecting the most efficient kernel implementation for the target GPU), and memory optimizations to reduce data movement overhead.
A core focus will be on precision calibration. Students will see how TensorRT supports multiple numerical precisions—FP32, FP16, and INT8—and how reducing precision can drastically improve performance while keeping accuracy within acceptable thresholds. They will learn how FP16 precision offers a balanced trade-off between speed and accuracy for most workloads, while INT8 quantization can deliver up to 4x performance improvement on supported hardware. Practical guidance will be provided on how to run calibration processes to maintain accuracy when moving to lower precision.
Another key area will be dynamic shapes and batch optimization. Students will understand how TensorRT can optimize models for varying input sizes and batch sizes, enabling a single engine to serve multiple use cases efficiently. They will also learn how to configure max batch sizes and use dynamic batching in deployment scenarios—particularly when serving models with Triton Inference Server—to balance latency and GPU utilization.
Students will also explore plugin layers for TensorRT, which allow developers to implement custom operations not natively supported by the framework. This flexibility ensures that even specialized AI architectures can be fully optimized and deployed using TensorRT without being constrained by standard operator libraries.
The submodule will highlight profiling and benchmarking techniques using tools like trtexec to measure performance improvements from each optimization step. Students will learn how to interpret metrics such as latency, throughput (FPS), and GPU utilization to make data-driven decisions about further tuning. Real-world examples will demonstrate how to use profiling to identify bottlenecks, adjust parameters, and achieve optimal performance for specific deployment hardware—whether that’s a high-end H100 GPU in a data center or a Jetson Orin at the edge.
By the end of this submodule, students will be proficient in applying TensorRT optimization techniques to prepare AI models for high-performance inference. They will understand how to select the right precision, tune for batch sizes, use custom plugins, and measure results to ensure that models meet strict production performance requirements. This knowledge will enable them to confidently deploy GPU-optimized AI workloads that operate at maximum efficiency, whether in cloud, on-premises, or edge environments.
Hands-on Lab: Optimize and serve a model using Triton Inference Server0:02
In this hands-on submodule, students will bring together all the concepts from Module 3 to perform a complete optimization and deployment workflow using TensorRT and the NVIDIA Triton Inference Server. The lab will guide them step-by-step through the process of taking a trained deep learning model, optimizing it for GPU-accelerated inference, and serving it at scale with Triton.
The exercise begins with selecting and preparing a trained model. Students will either use a provided sample model or export their own from frameworks such as PyTorch or TensorFlow into an ONNX format, which serves as the bridge to TensorRT. They will validate the model to ensure compatibility, checking for unsupported layers and performing any necessary graph simplifications before optimization.
The next stage focuses on TensorRT optimization. Students will import the ONNX model into TensorRT, configuring parameters such as maximum batch size, workspace memory allocation, and precision mode (FP32, FP16, or INT8). They will run INT8 calibration if needed, using a representative dataset to preserve accuracy while maximizing performance gains. The result will be a TensorRT engine file (.plan)—an optimized, deployment-ready model tailored to the target GPU.
Once the optimized engine is ready, students will set up the Triton Inference Server model repository. They will create the correct folder structure, add a config.pbtxt file to define model properties, input/output dimensions, batch size configurations, and optimization settings. Special attention will be given to enabling dynamic batching, which allows Triton to combine requests from multiple clients into larger batches, improving GPU utilization without impacting latency.
The deployment phase will involve running the Triton server with the optimized model loaded. Students will expose both HTTP/REST and gRPC endpoints so that client applications can send inference requests. Using provided sample clients, they will send data to the server, measure inference latency and throughput, and verify that the outputs match expectations. They will also monitor Prometheus metrics exposed by Triton, tracking GPU utilization, request counts, and response times in real time.
A critical component of this lab is performance benchmarking and tuning. Students will experiment with adjusting parameters such as batch size, number of concurrent model instances, and execution threads to find the optimal configuration for their specific hardware and workload. They will see firsthand how even small adjustments can have significant effects on performance.
The final step will be documenting and containerizing the deployment. Students will create a Docker image containing the optimized model and Triton configuration, ensuring the deployment is portable across environments. This will prepare them for scenarios where the same inference service needs to be replicated on multiple GPU nodes in the cloud or edge devices in the field.
By the end of this lab, students will have gone through the full optimization and serving lifecycle—from raw model to fully optimized TensorRT engine, deployed at scale with Triton, benchmarked, and containerized. This practical experience will cement their ability to deliver production-ready AI inference services that are fast, scalable, and GPU-optimized.

DeepStream SDK Overview: Real-Time Video & Sensor Processing5:19
In this submodule, students will explore the NVIDIA DeepStream SDK, a specialized framework for creating real-time AI-powered video and sensor analytics applications. The focus will be on understanding DeepStream’s architecture, supported use cases, and how it enables high-throughput inference across multiple streams and devices.
The session begins with an introduction to what DeepStream is and why it is a key part of NVIDIA’s end-to-end AI ecosystem. Students will learn that DeepStream is designed for low-latency, high-throughput AI inference, capable of handling multiple concurrent video or sensor streams on NVIDIA GPUs, including Jetson edge devices, NVIDIA A100/H100 data center GPUs, and cloud GPU instances.
Students will then explore the DeepStream architecture, which is built on top of GStreamer, an open-source multimedia framework. They will understand how GStreamer pipelines allow developers to connect various components—such as video capture, decoding, preprocessing, inference, post-processing, and rendering—into a highly optimized workflow. This modular design makes DeepStream flexible and scalable across industries and hardware platforms.
The submodule will detail how DeepStream integrates with TensorRT for optimized inference, enabling AI models to run at maximum efficiency on NVIDIA GPUs. Students will learn that DeepStream supports models from TensorFlow, PyTorch, ONNX, and other frameworks, allowing seamless deployment of existing AI solutions.
Real-world applications will be emphasized throughout the lesson. Students will see how DeepStream powers video surveillance systems capable of detecting and tracking multiple objects in real time, retail analytics solutions that count customers and analyze shopping behavior, smart city traffic monitoring systems that classify vehicles and detect incidents, and industrial inspection systems that identify defects on assembly lines.
Another key area will be sensor fusion capabilities. Students will explore how DeepStream can ingest not just video but also additional sensor data—such as LiDAR, radar, or IoT feeds—and merge them into a unified AI pipeline for advanced situational awareness.
The submodule will also highlight deployment options for DeepStream applications. Students will learn how to run DeepStream on Jetson devices for edge AI use cases, on DGX systems for high-scale enterprise analytics, or in containerized cloud environments for distributed deployments.
By the end of this submodule, students will have a solid understanding of what the DeepStream SDK is, how it works, and why it’s essential for building high-performance AI video and sensor analytics solutions. They will be ready to move forward into building customized real-time AI pipelines for specific industries, which will be covered in the next submodule.
Building Real-Time Pipelines for Surveillance, Retail, or IoT4:50
In this submodule, students will move from understanding DeepStream’s architecture to actually designing and building real-time AI pipelines tailored to specific domains such as video surveillance, retail analytics, and IoT sensor processing. The focus will be on how to translate a real-world problem into an optimized end-to-end NVIDIA DeepStream workflow that can run in production at scale.
The learning begins with pipeline design principles. Students will understand that building a DeepStream pipeline involves connecting modular components—video or sensor inputs, decoding, preprocessing, AI inference, post-processing, and output streaming—into a GStreamer-based graph. They will explore how to plan these pipelines for minimal latency, high throughput, and efficient GPU utilization.
For video surveillance use cases, students will learn how to ingest multiple high-resolution camera feeds simultaneously, decode them efficiently using NVDEC hardware acceleration, and process them with object detection or multi-object tracking models. Real-time examples will demonstrate how DeepStream enables license plate recognition, facial recognition, and intrusion detection in smart security systems.
In the retail analytics domain, the submodule will focus on pipelines that process customer movement patterns, perform people counting, detect shelf stocking levels, and generate heatmaps of in-store traffic. Students will see how to integrate these analytics with external data systems—such as POS data or inventory management APIs—to deliver actionable business intelligence.
For IoT-focused applications, students will explore how DeepStream can handle non-video sensor inputs such as LiDAR point clouds, radar signals, and MQTT-based IoT messages. They will learn how to integrate these streams alongside video feeds for sensor fusion, enabling use cases like autonomous vehicle perception or industrial safety monitoring.
An important part of this submodule will be model integration. Students will learn how to deploy pretrained or custom models into the pipeline, configure them in DeepStream’s config_infer_primary.txt files, and link them to inference plugins. They will understand how to manage multiple inference stages, such as running object detection first, followed by attribute classification on detected objects.
The submodule will also address real-time performance tuning. Students will see how to control pipeline buffer sizes, enable asynchronous processing, and adjust batch sizes to achieve the ideal balance between latency and throughput. They will learn techniques for load balancing across multiple GPUs and for scaling pipelines in containerized environments such as Kubernetes.
Finally, students will explore output and integration options. They will learn how to stream processed results over RTSP, send structured event data to Kafka or MQTT brokers, and integrate with cloud-based dashboards for remote monitoring and analytics.
By the end of this submodule, students will be able to design, configure, and deploy a fully functional real-time AI pipeline using DeepStream for a variety of domains. They will understand how to map a problem statement into a production-ready pipeline and be prepared to integrate these pipelines with RAPIDS for advanced GPU-accelerated analytics in the following lessons.
Using RAPIDS for GPU-Accelerated Data Analytics4:29
In this submodule, students will explore RAPIDS, NVIDIA’s open-source suite of libraries for GPU-accelerated data science and analytics, and learn how it can be integrated into AI pipelines to achieve dramatic improvements in data processing speed and scalability. The emphasis will be on how RAPIDS enables data scientists and engineers to perform tasks traditionally handled by CPU-based frameworks—such as pandas, scikit-learn, and Spark—at GPU speeds without fundamentally changing their workflows.
The lesson begins with an introduction to what RAPIDS is and why it’s a game-changer in the AI ecosystem. Students will discover that RAPIDS allows for end-to-end data science pipelines—from ingestion and preprocessing to training and inference—to be run entirely on NVIDIA GPUs. By keeping all data transformations in GPU memory, RAPIDS avoids costly CPU-to-GPU data transfers, drastically reducing latency and increasing throughput.
A key focus will be cuDF, RAPIDS’ GPU DataFrame library, which mirrors the familiar pandas API but executes operations in parallel on the GPU. Students will see how operations like filtering, grouping, and joining massive datasets can be performed 10–100x faster compared to CPU-based approaches. They will also understand how cuDF integrates seamlessly with DeepStream output data, allowing real-time video or sensor event streams to be transformed, enriched, and analyzed on the fly.
The submodule will then introduce cuML, the machine learning library within RAPIDS. Students will explore how algorithms such as k-means clustering, logistic regression, random forests, and PCA can be executed at GPU scale. Real-world examples will show how cuML can be used for anomaly detection in industrial IoT data, customer segmentation in retail, or predictive modeling in smart city applications.
For large-scale distributed workloads, students will learn about Dask with RAPIDS, which enables multi-GPU and multi-node execution. They will see how Dask can distribute RAPIDS computations across a cluster, making it possible to handle datasets far larger than a single GPU’s memory while retaining GPU acceleration benefits.
Integration with streaming frameworks will also be covered. Students will understand how RAPIDS can process data from Apache Kafka, MQTT, or Edge AI pipelines in near real time. This is especially powerful in scenarios where DeepStream-generated metadata—such as detected object counts, classifications, and sensor readings—needs to be aggregated, analyzed, and acted upon within milliseconds.
Performance considerations will play a central role in the discussion. Students will learn best practices for optimizing RAPIDS workloads, including memory management, columnar storage formats like Apache Arrow, and efficient GPU resource allocation when sharing GPUs between analytics and inference tasks.
By the end of this submodule, students will understand how RAPIDS transforms data analytics from a bottleneck into a high-speed component of the AI lifecycle. They will be capable of ingesting, processing, and analyzing large-scale, real-time datasets entirely on GPUs, enabling faster insights and more responsive AI-driven decision-making. This foundation will prepare them to integrate RAPIDS into real-time streaming architectures in the next submodule.
Integrating Kafka, MQTT, and Stream Processing5:01
In this submodule, students will learn how to integrate real-time messaging systems like Apache Kafka and MQTT into AI-driven analytics pipelines, enabling the seamless flow of video, sensor, and inference data between edge devices, cloud systems, and analytical dashboards. The focus will be on understanding how these technologies work, why they are essential for real-time AI, and how they connect with NVIDIA DeepStream and RAPIDS to create scalable, event-driven architectures.
The lesson begins with Apache Kafka, a distributed streaming platform designed for high-throughput, fault-tolerant data pipelines. Students will explore Kafka’s publish-subscribe model, where producers (such as DeepStream AI pipelines) publish structured event messages to topics, and consumers (such as RAPIDS analytics engines or cloud dashboards) subscribe to process those messages in real time. They will learn about Kafka’s brokers, partitions, and offsets, understanding how these features enable horizontal scalability and guaranteed message delivery even at massive data rates.
Next, students will turn to MQTT, a lightweight messaging protocol optimized for IoT and edge devices. They will see how MQTT’s broker-client architecture allows resource-constrained devices—such as Jetson Nano or Jetson Orin—to efficiently send sensor data or AI inference results to cloud services over minimal bandwidth. Special attention will be given to how MQTT’s topic hierarchy and quality of service (QoS) levels can be configured to ensure reliable message delivery in environments with intermittent connectivity.
The submodule will then explore integrating these protocols into DeepStream pipelines. Students will learn how DeepStream can output AI inference metadata directly into Kafka topics or MQTT brokers via its message broker plugin, enabling distributed applications to act on AI events in milliseconds. Examples will include streaming object detection metadata from surveillance cameras to a city’s traffic control system or sending product shelf status from retail stores to centralized inventory systems.
Integration with RAPIDS will also be demonstrated. Students will see how Kafka or MQTT streams can be consumed directly into cuDF DataFrames, enabling GPU-accelerated filtering, aggregation, and machine learning inference on live event streams. This allows for scenarios like fraud detection in financial transactions, predictive maintenance in manufacturing, or real-time anomaly detection in IoT networks.
Finally, the lesson will address stream processing best practices. Students will understand the importance of serialization formats such as JSON, Avro, or Apache Arrow for maintaining schema consistency across pipelines. They will also explore strategies for load balancing consumers across multiple GPU nodes, ensuring that processing capacity scales with data volume.
By the end of this submodule, students will be able to design and implement real-time AI event streaming architectures that integrate DeepStream inference, RAPIDS analytics, and Kafka/MQTT messaging into a cohesive system. This skill will empower them to build AI solutions that are not only fast and scalable, but also capable of responding to the physical world in near real time—an essential requirement for smart cities, autonomous systems, and high-speed industrial automation.
Hands-on Lab: Create a DeepStream pipeline with Jetson Nano0:02
In this hands-on lab, students will take the theoretical concepts from the earlier submodules and put them into practice by building a fully functional DeepStream pipeline on a Jetson Nano, NVIDIA’s compact yet powerful edge AI development platform. This exercise will provide practical experience in setting up real-time AI video analytics directly on an edge device, preparing students for deploying production-ready systems in smart city, retail, and IoT environments.
The lab begins with preparing the Jetson Nano development environment. Students will learn how to flash the latest JetPack SDK onto the device, which includes CUDA, TensorRT, and the DeepStream SDK optimized for ARM-based edge computing. They will configure the Nano for maximum performance by enabling 5W/10W power modes, setting up cooling solutions, and ensuring proper network connectivity for video stream ingestion and output transmission.
Next, students will work with video input sources. They will connect either a USB camera, an IP camera stream, or load pre-recorded sample footage. This stage introduces them to GStreamer video source elements, teaching how DeepStream leverages these to capture live feeds efficiently while minimizing latency.
The core of the lab involves pipeline construction. Students will configure a DeepStream application that decodes incoming video streams using hardware-accelerated NVDEC, preprocesses frames for inference, and runs object detection models—such as YOLOv5 or SSD-Mobilenet—optimized with TensorRT. They will explore how to adjust model parameters, batch sizes, and inference intervals to balance performance and accuracy on the Nano’s limited GPU resources.
Once inference results are available, students will configure metadata parsing and display. They will learn how to overlay bounding boxes, class labels, and confidence scores onto video streams in real time. More importantly, they will send these inference results as structured JSON metadata to external endpoints. This will be implemented via DeepStream’s message broker plugin, streaming AI event data over MQTT or Kafka to a remote analytics server or dashboard.
An advanced step in the lab will involve sensor fusion. Students will simulate integrating additional sensor data—such as temperature readings or motion sensor outputs—into the DeepStream pipeline. This demonstrates how a Jetson Nano can combine multiple inputs into a unified, AI-driven decision-making system, making it more relevant for IoT edge deployments.
The lab will also cover performance profiling and optimization. Students will use DeepStream’s built-in performance counters to monitor FPS, latency, and GPU utilization, then adjust pipeline configurations to improve throughput without degrading detection accuracy.
Finally, students will package and deploy their DeepStream pipeline as a Docker container using the NVIDIA Container Runtime for Jetson. This ensures portability, version control, and easy replication of the pipeline across multiple Nano devices or other Jetson platforms such as Xavier or Orin.
By the end of this lab, students will have hands-on experience in deploying real-time AI analytics at the edge using the Jetson Nano and DeepStream SDK. They will understand how to configure inputs, run optimized inference, send AI events over streaming protocols, and containerize the entire workflow—skills directly applicable to production-grade edge AI deployments in surveillance, retail, and industrial automation.

What is a Digital Twin? Use Cases in Industry 4.04:35
In this submodule, students will gain a deep understanding of the concept of a Digital Twin, a virtual replica of a physical object, process, or environment that is continuously updated with real-time data from its real-world counterpart. The focus will be on how this technology enables predictive insights, operational efficiency, and accelerated innovation in the context of Industry 4.0.
The lesson begins with the definition and origin of the digital twin concept, tracing its evolution from early CAD-based virtual models to fully dynamic, data-driven simulations powered by AI and IoT. Students will learn that a true digital twin goes beyond static visualization—it incorporates live sensor data, historical analytics, and predictive modeling to mirror and forecast the state of its physical twin.
Next, the module will explore core components of a digital twin: the physical entity being modeled, the digital representation (geometry, physics, AI models), and the data connection layer that synchronizes them. Students will see how AI models integrated into a digital twin allow for real-time anomaly detection, predictive maintenance, and scenario simulation without disrupting physical operations.
A significant part of the submodule will focus on Industry 4.0 applications. In manufacturing, digital twins are used to monitor assembly lines, detect inefficiencies, and simulate new workflows before implementing them physically. In energy and utilities, they help operators manage wind farms, power grids, and pipelines by predicting failures before they occur. In smart cities, digital twins model entire districts, integrating traffic, weather, and infrastructure data to improve urban planning and emergency response.
Students will also explore aerospace and automotive use cases, where digital twins allow engineers to test designs under simulated conditions, monitor fleets in real time, and optimize maintenance schedules. Similarly, in healthcare, digital twins of medical devices, hospital systems, or even human organs are enabling more precise treatment planning and patient monitoring.
The submodule will highlight how AI, IoT, and high-performance computing (HPC) converge to make these digital twins more accurate and responsive. Students will learn that real-time synchronization between the physical and digital worlds is enabled by sensor networks, edge AI processing, and cloud-based simulation platforms such as NVIDIA Omniverse.
Finally, the lesson will address business value and ROI considerations. Digital twins reduce downtime, accelerate prototyping, optimize resource allocation, and provide a data-rich environment for decision-making. Students will understand how companies justify investments in digital twin technology through measurable improvements in efficiency, safety, and product quality.
By the end of this submodule, students will have a solid grasp of what a digital twin is, how it functions, and why it’s transforming industries across the globe. They will be prepared to explore NVIDIA Omniverse, the platform that makes building and integrating digital twins more accessible, which will be the focus of the next submodule.
Omniverse Overview and Digital Twin Applications4:14
In this submodule, students will be introduced to NVIDIA Omniverse, a powerful platform for real-time collaboration, simulation, and 3D content creation, specifically designed to build and operate digital twins at scale. They will learn how Omniverse serves as a hub where multiple stakeholders, applications, and AI models can work together in a unified, photorealistic environment.
The lesson begins with an overview of what Omniverse is and the problems it solves. Traditionally, creating realistic simulations required separate software tools for 3D modeling, physics simulation, and AI integration, which often led to workflow fragmentation and data translation issues. Omniverse, built on Pixar’s Universal Scene Description (USD) framework, provides a shared scene representation that allows different tools—such as Autodesk Maya, Blender, Unreal Engine, or CAD software—to connect and work on the same live simulation environment without data loss or conversion bottlenecks.
Students will then explore the core components of Omniverse. Omniverse Kit is the development toolkit that allows customization of simulation applications. Omniverse Nucleus is the collaboration and asset management service that synchronizes scene data in real time across different users and tools. Omniverse Connectors are plugins that bridge popular 3D and engineering applications into Omniverse. These components together enable distributed teams to design, simulate, and iterate faster than ever before.
The submodule will move into digital twin-specific applications. Students will see how Omniverse can visualize and simulate real-world assets with extreme accuracy, integrating AI models for predictive analytics. For example, a manufacturing plant can be recreated in Omniverse, linked to IoT sensor data from the physical facility, and used to run simulations that predict machinery failures or optimize workflow layouts. Similarly, an entire urban environment can be modeled to study traffic flow, energy consumption, or emergency response scenarios.
A major highlight will be Omniverse’s physics simulation capabilities through PhysX and Flow. These engines allow digital twins to not only look realistic but also behave realistically, enabling simulations that factor in collisions, fluid dynamics, material properties, and environmental effects. Students will learn how these simulations, when paired with AI, can be used for what-if scenario testing, safety drills, and training without real-world risks.
The submodule will also cover AI integration. Omniverse is designed to work hand-in-hand with NVIDIA AI frameworks like DeepStream for real-time vision analytics, Triton Inference Server for scalable inference, and TAO Toolkit for model customization. This allows digital twins to become intelligent, autonomous systems that can respond to live data feeds from the physical world.
Collaboration and interoperability will be emphasized as critical benefits. Multiple engineers, designers, and AI developers can work on the same project simultaneously—whether they are across the hall or across the globe. Any updates to geometry, textures, or AI behaviors are reflected instantly for all collaborators, dramatically accelerating decision-making.
By the end of this submodule, students will understand how NVIDIA Omniverse revolutionizes the creation and operation of digital twins by combining real-time collaboration, photorealistic rendering, physics-based simulation, and AI integration. They will be ready to explore how to connect AI models directly into Omniverse simulations, which will be the focus of the next lesson.
Connecting AI Models to Omniverse Simulations4:42
In this submodule, students will learn how to connect AI models directly to NVIDIA Omniverse simulations, creating intelligent, data-driven digital twins that can process real-world information, make predictions, and autonomously respond to environmental changes. The emphasis will be on transforming Omniverse from a purely visual and physics-based platform into a live, AI-enabled decision-making environment.
The lesson begins by establishing the importance of AI-driven digital twins. While traditional simulations are powerful for visualization and planning, their true potential is realized when they can ingest real-time sensor data, run it through trained AI models, and adapt the simulation’s behavior accordingly. For example, a factory’s Omniverse model can dynamically reroute workflow when predictive maintenance AI flags a potential machine failure, or a smart city simulation can adjust traffic signals based on real-time vehicle detection models.
Students will explore the technical pathways for integrating AI models with Omniverse. This includes using NVIDIA Triton Inference Server to host and serve AI models within the simulation ecosystem. By deploying models in Triton, Omniverse can query them for inference results, such as object detection, anomaly detection, or predictive forecasting, and apply these results directly to simulation objects and scenarios.
A key focus will be on DeepStream and sensor integration. Students will see how live camera or IoT sensor feeds can be processed by DeepStream pipelines at the edge or in the cloud, with the resulting metadata streamed into Omniverse. This enables simulations to reflect live world conditions—such as detecting the number of vehicles at an intersection, recognizing defective products on an assembly line, or monitoring occupancy in a building.
The submodule will also cover data exchange formats and synchronization. Using USD as the shared scene format, students will learn how AI inference outputs can be mapped to Omniverse objects. For example, a detected object’s coordinates from a vision AI model can be directly linked to an Omniverse asset’s position, allowing a virtual robot in the simulation to react accordingly.
Another important aspect will be training and fine-tuning AI models using simulation-generated data. Omniverse simulations can produce synthetic datasets—highly realistic images, sensor readings, or motion patterns—that can be used to train AI models via the TAO Toolkit. This loop allows for faster and safer AI development without relying solely on real-world data collection.
Students will also examine event-driven automation within Omniverse. They will learn how AI models can trigger scripted behaviors inside the simulation—for example, rerouting autonomous vehicles in a traffic simulation when an accident is detected, or adjusting robotic arm sequences in a virtual factory when part supply changes.
By the end of this submodule, students will understand the complete process of integrating AI inference pipelines with Omniverse simulations—from real-time data ingestion and AI model deployment to synchronized scene updates and event-driven simulation behaviors. This knowledge will set the stage for working with Omniverse Isaac Sim and robotics integration, which will be covered in the next submodule.
Omniverse Isaac Sim and Robotics Integration4:33
In this submodule, students will explore Omniverse Isaac Sim, NVIDIA’s advanced robotics simulation platform, and learn how it integrates with digital twins to enable the design, testing, and deployment of autonomous robotic systems in both industrial and commercial environments. The focus will be on how Isaac Sim, powered by Omniverse, allows robotics engineers and AI developers to work together in a photorealistic, physics-accurate virtual environment before deploying robots in the real world.
The lesson begins with an introduction to Isaac Sim’s role in the Omniverse ecosystem. Students will learn that Isaac Sim is designed for simulating robots with realistic physics, sensor models, and environmental interactions. It enables developers to test and refine robot perception, navigation, and manipulation algorithms without risking hardware damage or downtime in a real production environment.
A key highlight will be physics-based accuracy. Students will understand how Isaac Sim uses NVIDIA PhysX for rigid body dynamics, Flex for soft body and fluid simulations, and OptiX for realistic rendering of LiDAR, depth cameras, and other perception sensors. This makes the virtual testing environment almost indistinguishable from the real world, ensuring that models trained or validated in simulation behave consistently when deployed physically.
Next, the submodule will focus on robot perception pipelines. Students will see how Isaac Sim integrates with DeepStream to process synthetic camera or LiDAR data, running AI models for object detection, semantic segmentation, or pose estimation directly within the simulation. This synthetic sensor data can be streamed to Triton Inference Server, making the AI testing workflow identical to real-world inference pipelines.
The lesson will also cover navigation and path planning. Students will learn how Isaac Sim supports ROS 2 (Robot Operating System) integration, enabling developers to test autonomous navigation stacks in virtual environments. They will explore how robots can map virtual warehouses, navigate around obstacles, and optimize delivery routes using AI-driven decision-making.
Another key capability is robot arm manipulation. Using Isaac Sim’s built-in support for industrial robots and manipulators, students will simulate assembly tasks, pick-and-place operations, and quality inspections. They will understand how AI models for pose detection, grasp planning, and defect detection can be tested entirely in the digital twin before any physical trial.
The submodule will also emphasize synthetic data generation for AI training. Isaac Sim can produce vast amounts of labeled, high-quality training data for computer vision models—covering diverse lighting conditions, object variations, and rare edge cases that would be time-consuming or costly to capture in the real world. Students will learn how to integrate this data into the TAO Toolkit to improve AI accuracy and robustness.
Finally, the lesson will cover deployment workflows. Once validated in Isaac Sim, robotic control policies, perception models, and navigation algorithms can be exported to real robots—such as those powered by Jetson Orin—with minimal reconfiguration. This greatly shortens the development cycle and ensures safer, more predictable robot behavior.
By the end of this submodule, students will understand how Omniverse Isaac Sim bridges the gap between AI-driven simulation and real-world robotics, allowing teams to develop, test, and refine intelligent robotic systems entirely within a digital twin environment before deployment. This prepares them for the next step—building their own Omniverse digital twin connected to AI inference, which will be tackled in the hands-on lab.
Hands-on Lab: Build a basic Omniverse Digital Twin connected to AI inference0:03
In this hands-on lab, students will put together all the concepts from the Digital Twin and Omniverse module by creating a functional Omniverse simulation environment that is directly linked to a live AI inference pipeline. The goal is to demonstrate how a digital twin can respond to real-time data, run AI-powered predictions, and visualize changes instantly within the simulation.
The lab begins with environment setup. Students will install and configure NVIDIA Omniverse on a capable workstation or cloud GPU instance, ensuring that the required components—Omniverse Kit, Nucleus Server, and Connectors—are running. They will then create a basic virtual scene, representing a simplified real-world environment such as a warehouse floor, a small production line, or an intersection in a smart city.
Once the scene is prepared, students will integrate live sensor feeds. These could be simulated IoT devices or camera feeds, either coming from pre-recorded video or generated from a local USB/IP camera. The data from these sources will be processed by an AI inference pipeline—for example, a DeepStream application running an object detection or traffic monitoring model optimized with TensorRT.
The critical step involves data mapping between AI output and Omniverse objects. Students will connect the AI inference results to scene elements inside Omniverse using the Universal Scene Description (USD) framework. For instance, if the AI detects vehicles in a live camera feed, these detections will trigger corresponding vehicle movements in the Omniverse simulation, complete with physics-based animations. Similarly, detecting a fault in a virtual production line could trigger changes in machine status indicators within the scene.
Students will also learn to stream AI metadata into Omniverse using a message broker such as MQTT or Kafka. This real-time event data ensures the simulation updates instantly when new AI results are received, creating a truly synchronized physical-digital feedback loop.
The lab will include customizing simulation behavior based on AI events. For example, in a smart city scene, when AI detects high pedestrian density, the simulation could trigger a change in virtual traffic signals. In an industrial setting, detecting defective products could trigger robotic arms to remove them from the assembly line—all visualized in Omniverse.
Students will further explore synthetic data generation by capturing simulation frames and exporting them with labels for retraining or fine-tuning the AI models via the TAO Toolkit. This step reinforces how digital twins are not just passive monitoring tools but active contributors to AI improvement.
Finally, the project will be containerized for deployment. The Omniverse scene and the AI inference pipeline will be packaged into separate containers, orchestrated via Docker Compose or Kubernetes, enabling portability and scalability. This approach mirrors real-world enterprise deployment workflows where simulation, AI, and data services are modular and easily upgradable.
By the end of this lab, students will have created a basic but fully functional Omniverse digital twin that processes real-time AI inference data and reacts visually within the simulation environment. This experience will give them a complete understanding of how Omniverse, AI models, and live sensor data can work together to create powerful, interactive, and intelligent digital twins for Industry 4.0 applications.

Jetson Xavier and Orin: Capabilities and Use Cases4:27
In this submodule, students will explore NVIDIA Jetson Xavier and Jetson Orin, two of the most powerful edge AI computing platforms designed for high-performance inference, robotics, and IoT applications. The focus will be on understanding their hardware architecture, GPU capabilities, and real-world deployment scenarios where these devices enable intelligent decision-making directly at the edge without depending on cloud connectivity.
The lesson begins with an introduction to the Jetson ecosystem. Students will learn how NVIDIA has developed a family of devices, from entry-level modules like the Jetson Nano to advanced industrial-grade systems such as Jetson AGX Xavier and Jetson Orin, to serve diverse computational needs. While cloud AI excels at large-scale training and analytics, Jetson devices are optimized for on-device inference, making them ideal for scenarios requiring low latency, real-time processing, and offline operation.
Students will dive into Jetson Xavier first. They will examine its Volta GPU architecture with Tensor Cores, its support for mixed-precision inference (FP32, FP16, INT8), and its multi-core ARM CPU that allows for running complex AI pipelines alongside control logic. They will also learn about Xavier’s hardware accelerators, including the Deep Learning Accelerator (DLA) for efficient AI workloads and the Vision Image Compositor (VIC) for camera and video processing.
Next, they will study Jetson Orin, NVIDIA’s latest generation of edge AI hardware. Powered by the Ampere GPU architecture, Jetson Orin delivers significantly higher TOPS (Tera Operations Per Second) performance, enabling it to run multiple deep learning models in parallel while handling computer vision, natural language processing, and sensor fusion workloads. Students will understand why Orin is increasingly used in autonomous vehicles, robotics, and industrial automation, where real-time multi-modal AI is critical.
The submodule will also explore software compatibility. Both Xavier and Orin run JetPack SDK, which includes CUDA, cuDNN, and TensorRT, allowing developers to accelerate AI applications directly on the device. They support NVIDIA’s DeepStream SDK for video analytics, TAO Toolkit for model training and fine-tuning, and RAPIDS for GPU-accelerated data science—all without needing external GPUs.
From a deployment perspective, students will see how these devices excel in real-world use cases. Examples include autonomous delivery robots processing LiDAR and camera feeds locally for navigation, smart security cameras running object detection without sending raw video to the cloud, industrial quality inspection systems identifying defects on production lines in milliseconds, and agricultural drones analyzing crop health directly in the field.
Another key area of focus will be energy efficiency and form factor. Students will appreciate how Xavier and Orin deliver high AI performance in compact, power-efficient packages, making them ideal for edge deployments where space and energy availability are constrained.
Finally, the lesson will highlight integration possibilities. Jetson devices can work in standalone mode for fully autonomous systems or be part of hybrid edge-cloud architectures, where they pre-process data locally before sending insights to cloud AI platforms for aggregation and long-term analytics. This flexibility allows organizations to tailor solutions for latency, bandwidth, and privacy requirements.
By the end of this submodule, students will have a deep understanding of Jetson Xavier and Jetson Orin capabilities, how they fit into the broader NVIDIA edge AI ecosystem, and the critical role they play in enabling real-time, intelligent, and efficient AI solutions for industries ranging from smart cities to autonomous machines.
Deploying Models on Jetson with TensorRT3:30
In this submodule, students will learn how to deploy AI models on NVIDIA Jetson devices using TensorRT, NVIDIA’s high-performance deep learning inference optimizer and runtime library. The emphasis will be on achieving maximum inference speed, lower latency, and efficient memory usage—all while maintaining accuracy—so that AI applications can run smoothly on the constrained resources of edge devices.
The lesson begins with an overview of TensorRT’s role in the Jetson AI workflow. Students will understand that while models are often trained in data centers or the cloud using frameworks like TensorFlow, PyTorch, or ONNX, they are rarely deployed in that raw format on edge devices. Instead, TensorRT takes these trained models, optimizes them for the Jetson GPU, and converts them into highly efficient engines that run significantly faster and consume less power.
Students will learn about precision modes—FP32, FP16, and INT8—and how switching to lower precision can drastically increase performance without noticeable accuracy loss. They will see how Jetson Xavier and Jetson Orin leverage Tensor Cores to accelerate mixed-precision inference, and how INT8 quantization can be applied for extreme performance in real-time systems such as robotics navigation or video analytics.
The submodule will then walk through model conversion. Students will explore how to export a trained model from PyTorch or TensorFlow, convert it to the ONNX format, and then use the trtexec utility or the TensorRT Python API to compile the model into an optimized inference engine. They will also understand how layer fusion, kernel auto-tuning, and memory optimizations in TensorRT reduce computational overhead.
Practical deployment strategies will be discussed in depth. Students will learn how to package their optimized TensorRT engines into DeepStream pipelines for real-time video analytics or integrate them into ROS 2 robotic systems for sensor-based decision-making. They will also see how to combine multiple AI models into a single pipeline, enabling multi-task inference—for example, running object detection, segmentation, and tracking simultaneously.
The lesson will also cover batching strategies. While edge inference often operates on single inputs, students will explore micro-batching techniques to boost throughput in scenarios like processing video feeds from multiple cameras in parallel.
Performance measurement will be a key part of this submodule. Students will learn how to benchmark TensorRT engines on Jetson devices using profiling tools to measure frames per second (FPS), latency, and GPU utilization, ensuring deployments meet the application’s real-time requirements.
Another critical topic will be deployment automation. Students will see how Docker containers can be used to package TensorRT applications, allowing easy updates and scaling across multiple Jetson devices. They will also learn how to integrate deployment with NVIDIA Fleet Command for remote management of edge AI systems at scale.
By the end of this submodule, students will be able to take a trained AI model, optimize it using TensorRT for NVIDIA Jetson, and deploy it in a production-ready edge environment. They will understand not only the technical workflow of TensorRT optimization but also the best practices for achieving the highest performance and reliability in real-world AI applications running at the edge.
Integrating IoT Sensor Feeds into AI Pipelines3:53
In this submodule, students will learn how to connect IoT sensor data streams directly into AI inference pipelines running on NVIDIA Jetson devices. The goal is to enable real-time decision-making at the edge by combining sensor inputs—such as temperature, vibration, motion, LiDAR, ultrasonic, and camera data—with AI models for analysis, prediction, and automation.
The lesson begins by establishing why sensor fusion is so critical for modern edge AI. While a single sensor provides limited insights, combining multiple data sources creates a richer context for decision-making. For example, an industrial machine equipped with both vibration sensors and thermal cameras can detect maintenance needs more accurately than relying on a single metric.
Students will start with an overview of IoT communication protocols used for data transfer. They will explore MQTT, a lightweight publish-subscribe protocol ideal for low-bandwidth, high-frequency sensor data; Kafka, for high-throughput event streaming in more complex deployments; and ROS 2 (Robot Operating System), widely used in robotics for real-time sensor communication. Understanding these protocols is essential for building pipelines where sensors continuously feed into AI processing loops.
The submodule will then cover data ingestion into Jetson-based AI pipelines. Students will learn how to connect IoT sensors either directly to the Jetson via GPIO, USB, or serial interfaces, or indirectly through networked edge gateways. Once connected, sensor data will be preprocessed—filtering noise, normalizing values, and batching inputs—before being passed into the AI inference engine.
A major focus will be synchronization between sensors and AI models. Students will understand how to timestamp and align data from multiple sensors so that AI models can make accurate inferences based on simultaneous readings. This is particularly important for robotics navigation, autonomous vehicles, and smart city systems where multiple streams—such as GPS, LiDAR, and cameras—must be processed in unison.
The lesson will also explore integrating IoT feeds into NVIDIA DeepStream pipelines for real-time analytics. For example, students will see how to combine video analytics with temperature or motion sensor data to trigger specific actions—like sending alerts only when both motion and abnormal heat signatures are detected in a security application.
Another essential skill covered will be edge-cloud hybrid integration. Students will learn how to process sensor data locally for immediate responses while selectively sending aggregated insights to cloud services for historical analysis, model retraining, and dashboard visualization. This approach optimizes bandwidth usage and ensures low-latency responsiveness while still benefiting from cloud scalability.
Security and reliability will also be addressed. Students will understand the importance of encrypting sensor data in transit, using authentication for MQTT or Kafka brokers, and implementing failover mechanisms to handle temporary sensor outages.
By the end of this submodule, students will be able to design and deploy AI pipelines on Jetson devices that continuously consume and process IoT sensor feeds in real time. They will understand how to fuse multiple sensor modalities, ensure data synchronization, integrate with AI models, and create responsive, intelligent edge systems for industries such as manufacturing, agriculture, smart cities, and autonomous robotics.
Monitoring, Updating, and Managing Edge Deployments3:28
In this submodule, students will learn how to monitor, update, and manage AI-powered edge systems built on NVIDIA Jetson devices to ensure continuous performance, security, and reliability in real-world deployments. The focus will be on applying DevOps principles to edge AI environments, where devices operate in distributed, resource-constrained, and often mission-critical settings.
The lesson begins by explaining why ongoing management of edge AI deployments is essential. Unlike static systems, AI applications require continuous oversight to maintain accuracy, adapt to new data patterns, and patch security vulnerabilities. In industrial, healthcare, or autonomous applications, even minor downtime or degraded model performance can have significant operational and safety impacts.
Students will first explore real-time performance monitoring. They will learn how to use NVIDIA tools like tegrastats for GPU, CPU, and memory usage tracking on Jetson devices, as well as DeepStream telemetry for monitoring AI inference performance. They will also see how these metrics can be streamed into centralized dashboards using tools like Prometheus and Grafana, enabling fleet-wide visibility across multiple edge devices.
Next, the submodule will cover remote device management. Students will understand how to leverage NVIDIA Fleet Command to provision, configure, and update Jetson-powered AI applications at scale without requiring physical access to the devices. This includes pushing new models, updating containerized services, and applying firmware or OS patches—all while minimizing downtime.
The lesson will then focus on model lifecycle management at the edge. Students will learn how to deploy updated AI models without interrupting live services by using blue-green deployment and A/B testing strategies. This ensures that new versions are validated on a subset of devices before being rolled out fleet-wide.
Security is a major consideration for edge deployments, so students will also examine device hardening techniques. These include securing SSH access, implementing container trust policies, encrypting data at rest and in transit, and using signed model packages to prevent tampering. They will also learn how to apply NVIDIA License Manager for managing AI Enterprise licensing compliance across deployments.
Updating pipelines will also be discussed in the context of continuous integration and continuous deployment (CI/CD) for edge AI. Students will explore how to integrate Jetson deployments with CI/CD platforms so that code changes, model updates, or pipeline improvements are automatically built, tested, and deployed to devices.
The submodule will also cover alerting and incident response. Students will set up automated alerts for performance degradation, hardware faults, or unusual AI behavior, and define workflows for troubleshooting and rolling back updates if necessary.
Finally, students will understand the concept of hybrid edge-cloud orchestration, where cloud platforms coordinate deployments and updates while Jetson devices handle local processing. This architecture enables centralized control with decentralized execution, making it possible to scale AI workloads efficiently across thousands of devices.
By the end of this submodule, students will have the knowledge and skills to maintain, secure, and evolve their AI-powered Jetson edge systems over time. They will understand best practices for monitoring, remote management, model updates, and fleet orchestration, ensuring that deployed edge AI solutions remain high-performing, reliable, and secure in production environments.
Hands-on Lab: Build and deploy a full pipeline on Jetson Orin0:02
In this hands-on lab, students will apply all the concepts from Module 6 to design, build, and deploy a complete AI pipeline on an NVIDIA Jetson Orin device. The goal is to simulate a real-world edge AI deployment—from model preparation and optimization to integrating IoT sensor data and enabling real-time inference at the edge. By the end of this lab, students will have a fully functional, production-ready pipeline running locally on Jetson Orin.
The lab begins with environment setup. Students will boot and configure their Jetson Orin with the latest JetPack SDK, ensuring CUDA, cuDNN, TensorRT, and DeepStream are installed. They will verify hardware acceleration using tegrastats and confirm that the GPU, CPU, and memory are properly recognized for AI workloads. This step ensures the foundation for high-performance inference is ready.
Next, students will select and prepare an AI model. They will choose a computer vision model—such as an object detection network trained in PyTorch or TensorFlow—and export it to the ONNX format. Using TensorRT, they will optimize the model for the Orin architecture, experimenting with FP16 and INT8 quantization to achieve the best trade-off between accuracy and inference speed.
Once the model is optimized, students will integrate IoT sensor data into the pipeline. This may include connecting a USB camera for live video feeds, a temperature sensor over I²C, and a motion detector over GPIO. They will write a small preprocessing script to clean and synchronize the data streams so that both the video and sensor inputs are available to the inference engine in real time.
The next phase involves building the inference pipeline using NVIDIA DeepStream SDK. Students will configure a DeepStream pipeline that ingests live video, runs inference using the TensorRT-optimized model, overlays detection results on the video stream, and logs additional sensor data to a local database. They will also configure DeepStream to send events—such as detections exceeding certain thresholds—to an MQTT broker for further processing or alerting.
With the pipeline running, students will implement real-time monitoring and management features. They will track GPU utilization, model FPS, and system health using built-in Jetson telemetry tools and visualize these metrics using a Grafana dashboard. This step reinforces the importance of maintaining performance visibility in production deployments.
The lab will also cover containerization and deployment. Students will package their pipeline into a Docker container, ensuring that all dependencies, model files, and configuration settings are included. They will test the container locally and then simulate remote deployment by pushing it to a container registry and pulling it back onto the Jetson Orin.
Finally, students will test the pipeline under realistic workload conditions by introducing multiple video streams or simulating fluctuating sensor readings. They will analyze the system’s behavior, identify potential bottlenecks, and apply optimizations—such as adjusting DeepStream batch sizes or enabling GPU memory optimizations—to ensure smooth operation.
By completing this lab, students will have practical experience in end-to-end edge AI deployment using Jetson Orin, TensorRT, DeepStream, and IoT sensor integration. They will leave with not only a functioning pipeline but also a clear understanding of how to build scalable, maintainable, and high-performance AI solutions for the edge.

Model Lifecycle: Train → Tune → Deploy → Monitor4:01
In this submodule, students will gain an end-to-end understanding of the AI model lifecycle, focusing on the processes, tools, and best practices needed to move a model from initial training to production deployment and long-term monitoring. This knowledge is essential for maintaining high-performance, reliable, and adaptable AI systems in real-world scenarios.
The lesson begins by breaking down the training phase. Students will explore how to prepare datasets, select the right model architecture, and use NVIDIA GPU acceleration—via CUDA, cuDNN, and mixed-precision training—to reduce training times. They will also understand the role of hyperparameters, optimizers, and regularization techniques in shaping model performance. This stage emphasizes the importance of data quality and diversity, as the foundation for any successful AI solution.
From there, the submodule transitions to the tuning phase, where models are optimized for accuracy and efficiency. Students will learn about advanced optimization techniques such as hyperparameter search, learning rate scheduling, and data augmentation. They will also explore transfer learning, leveraging pre-trained models from NVIDIA NGC to accelerate development while maintaining high performance in domain-specific tasks.
The next stage focuses on deployment, where trained models are moved from a research or staging environment into production. Students will study the considerations for deploying on different platforms—cloud, on-premises, and edge—with a particular emphasis on GPU-optimized inference using TensorRT and Triton Inference Server. They will also understand the role of containerization using Docker and Kubernetes to ensure consistent, reproducible deployments across environments.
Once deployed, the monitoring phase becomes critical. Students will learn how to set up continuous tracking of key performance metrics such as accuracy drift, latency, throughput, and resource utilization. They will explore how monitoring tools like Prometheus, Grafana, and NVIDIA’s telemetry APIs provide real-time insights into model health and system performance.
A major focus will be on the feedback loop between monitoring and retraining. As data distributions change over time—due to evolving user behavior, environmental factors, or market conditions—models can suffer from concept drift. Students will understand how to detect drift and implement retraining pipelines to keep models accurate and relevant. This includes automating retraining workflows with CI/CD pipelines, ensuring updates can be tested, validated, and rolled out with minimal downtime.
The submodule will also address governance and compliance within the AI lifecycle. Students will learn how to maintain audit trails for datasets, models, and decision-making logic to meet industry standards and regulatory requirements. They will explore techniques for securing models against tampering, verifying integrity, and ensuring licensing compliance for NVIDIA AI Enterprise environments.
By the end of this submodule, students will be able to orchestrate the full AI lifecycle—from training and tuning through deployment and monitoring—while maintaining performance, security, and scalability. They will understand not only the technical workflows but also the operational discipline required to sustain AI systems in dynamic, production-grade environments.
Transfer Learning, Fine-tuning, and Quantization3:16
In this submodule, students will develop a deep understanding of how to adapt and optimize AI models for specific applications using transfer learning, fine-tuning, and quantization. These techniques are central to building high-performance AI solutions when working with limited data, constrained hardware, or the need for rapid deployment.
The lesson begins with transfer learning, a method where students start with a pre-trained model—often trained on massive datasets like ImageNet or COCO—and repurpose it for a new but related task. Instead of training a model from scratch, which can be computationally expensive and time-consuming, transfer learning leverages the feature extraction power already learned by the base model. Students will see how models available in the NVIDIA NGC catalog can serve as starting points, significantly reducing training time while improving accuracy on niche tasks.
Next, the focus shifts to fine-tuning, where students learn to selectively retrain layers of the pre-trained model to adapt it more closely to the target dataset. This process involves freezing early layers that capture generic features and adjusting deeper layers that encode task-specific patterns. Students will learn strategies for balancing the amount of fine-tuning with available compute resources, avoiding overfitting, and optimizing learning rates for stable convergence.
A major part of the submodule will cover quantization, an optimization technique that reduces the precision of a model’s numerical representation—commonly from FP32 to FP16 or INT8—to speed up inference and lower memory usage without significantly impacting accuracy. Students will explore how TensorRT automates much of the quantization process for NVIDIA GPUs, including Jetson devices, and will analyze performance benchmarks to understand the trade-offs between precision and speed.
The lesson will also address mixed precision training, where models use lower-precision arithmetic for certain operations while retaining higher precision for sensitive calculations. This approach, supported by NVIDIA’s APEX and native frameworks, can drastically reduce training times while preserving model fidelity.
Students will then see how these techniques integrate into a ModelOps workflow. For example, a pre-trained model from NVIDIA NGC could be fine-tuned on custom manufacturing defect images, quantized for Jetson deployment, and monitored in production for accuracy drift. Real-world case studies will show how companies accelerate time-to-market by avoiding full retraining while still delivering optimized, domain-specific AI solutions.
Security and reproducibility will also be discussed, ensuring that adapted models are version-controlled, tested across different hardware targets, and validated for compliance with enterprise standards.
By the end of this submodule, students will have the ability to take any baseline AI model and customize it for specific edge, cloud, or enterprise applications. They will understand how to apply transfer learning to leverage existing knowledge, fine-tune models for higher accuracy, and use quantization to make deployments faster, lighter, and more efficient. This combination of skills will allow them to rapidly deliver optimized AI solutions across diverse deployment environments without sacrificing quality or performance.
Using TensorBoard, Weights & Biases, or MLFlow3:28
In this submodule, students will learn how to effectively track, visualize, and manage AI experiments using three of the most widely used tools in the industry: TensorBoard, Weights & Biases (W&B), and MLFlow. Mastering these tools is essential for reproducible machine learning, informed decision-making during model development, and maintaining visibility into performance metrics over the entire model lifecycle.
The lesson begins with TensorBoard, originally developed for TensorFlow but now compatible with PyTorch and other frameworks. Students will explore how TensorBoard enables real-time visualization of training curves, including loss, accuracy, learning rates, and more. They will learn to interpret these curves to detect overfitting, underfitting, or unstable training behavior. Beyond scalar metrics, TensorBoard also supports histograms to monitor weight distributions, embeddings for dimensionality reduction, and image/audio previews for models that work with visual or audio data. By the end of this section, students will know how to integrate TensorBoard logging into their training scripts and run it both locally and on remote GPU servers.
Next, the focus shifts to Weights & Biases (W&B), a powerful cloud-based experiment tracking and collaboration tool. Students will see how W&B not only tracks metrics like TensorBoard but also adds features like experiment comparisons, hyperparameter sweeps, and shared dashboards for teams. They will understand how W&B integrates seamlessly with NVIDIA GPU workflows, allowing large-scale experiments to be monitored in real time from anywhere. The submodule will also highlight W&B’s dataset and model versioning capabilities, which are critical for collaborative AI development in enterprise settings.
The lesson will then move on to MLFlow, an open-source platform designed for managing the complete machine learning lifecycle. Students will learn how MLFlow can track experiments, package models in a standardized format, and deploy them to various platforms, including NVIDIA GPU environments. The emphasis will be on model registry and artifact tracking, ensuring that every trained model is stored with its associated metadata, parameters, and datasets for full reproducibility.
A key takeaway from this submodule will be how these tools fit into a ModelOps pipeline. Students will walk through a scenario where a model is trained and logged with TensorBoard, optimized and tracked in W&B, and finally stored in MLFlow’s registry for deployment. This workflow ensures traceability, enabling teams to reproduce results exactly and roll back to previous model versions when necessary.
The submodule will also address best practices for using these tools effectively in GPU-accelerated workflows. Students will learn to log performance metrics specific to NVIDIA environments—such as GPU utilization, memory consumption, and inference latency—so that optimization decisions are data-driven.
By the end of this submodule, students will be equipped to implement robust experiment tracking and model management workflows using TensorBoard, Weights & Biases, and MLFlow. They will understand not just the technical commands, but the strategic importance of maintaining transparency, reproducibility, and collaboration throughout the AI model lifecycle.
Deployment Pipelines using Kubernetes + Helm3:15
In this submodule, students will learn how to design and implement scalable, automated deployment pipelines for AI workloads using Kubernetes and Helm. These technologies form the backbone of cloud-native AI deployment, enabling teams to deliver optimized models into production environments quickly, reliably, and at scale.
The lesson begins by introducing the role of Kubernetes in AI deployment. Students will explore how Kubernetes orchestrates containerized applications across GPU-enabled clusters, automatically managing scaling, load balancing, and fault tolerance. They will understand why Kubernetes is essential for deploying NVIDIA GPU workloads in both on-premises DGX clusters and cloud-based environments such as AWS, Azure, or Google Cloud. Special emphasis will be placed on configuring GPU nodes using the NVIDIA Kubernetes Device Plugin to ensure that AI inference jobs have direct access to CUDA cores.
Next, the submodule focuses on Helm, the package manager for Kubernetes. Students will learn how Helm streamlines deployments by encapsulating complex Kubernetes configurations into reusable charts. These charts can define model-serving endpoints, environment variables, volume mounts for datasets, and GPU resource allocation—all in a structured, version-controlled format. Students will see how Helm makes it easy to upgrade, roll back, and customize deployments without manually editing dozens of YAML files.
The lesson then walks through designing a deployment pipeline for an AI model. Students will follow the process from packaging a TensorRT-optimized model into a container, pushing it to a private registry, and deploying it to a GPU-enabled Kubernetes cluster using Helm. They will also learn how to integrate Triton Inference Server into Helm charts to serve multiple model versions with dynamic configuration updates.
A key focus will be on automation and CI/CD integration. Students will see how to connect their model repository to CI/CD platforms like Jenkins, GitLab CI, or GitHub Actions so that whenever a new model version passes validation, it is automatically packaged, pushed to the registry, and deployed via Helm to the Kubernetes cluster. This approach minimizes human intervention and reduces the time from model development to production deployment.
The submodule will also cover observability and scaling. Students will configure monitoring tools like Prometheus and Grafana within the Helm deployment to track GPU utilization, inference latency, and throughput in real time. They will also learn how to set Horizontal Pod Autoscaling (HPA) policies based on workload demand, ensuring cost efficiency while maintaining service-level agreements.
Security and compliance considerations will be integrated throughout the lesson. Students will see how to deploy workloads in isolated namespaces, enforce role-based access control (RBAC), and use signed Helm charts to guarantee deployment integrity. NVIDIA’s License Manager will also be discussed for managing AI Enterprise licensing in Kubernetes clusters.
By the end of this submodule, students will be able to build production-grade AI deployment pipelines that leverage Kubernetes for orchestration and Helm for repeatability, scalability, and maintainability. They will be prepared to deliver AI solutions that are highly available, easily upgradable, and optimized for GPU acceleration, meeting the demands of enterprise-scale deployments.
Hands-on Lab: Automate model retraining and redeployment with Transfer Learning0:02
In this hands-on lab, students will build a fully automated AI pipeline that continuously retrains and redeploys a model using transfer learning, ensuring the deployed solution stays accurate and relevant as new data arrives. This lab simulates a real-world ModelOps workflow, where models must adapt to changing environments and evolving datasets without manual intervention.
The lab begins with environment setup. Students will configure a GPU-enabled Kubernetes cluster—either in the cloud or on-premises—ensuring the NVIDIA Kubernetes Device Plugin is installed for GPU access. They will prepare a dedicated namespace for the project, with role-based access control (RBAC) configured for security. The pipeline will be managed through a CI/CD platform such as GitLab CI, Jenkins, or GitHub Actions.
The first step in the pipeline is data ingestion. Students will connect the workflow to a simulated data source, such as a continually updated S3 bucket or cloud storage directory containing labeled images. The CI/CD system will be triggered whenever new data is detected, automatically pulling the latest dataset and preparing it for training.
Next, the pipeline will perform transfer learning using a pre-trained model from the NVIDIA NGC catalog. Students will configure the training job to run on a GPU node, freezing the early layers of the network while fine-tuning deeper layers to adapt to the new dataset. They will implement hyperparameter tuning to optimize the learning rate, batch size, and number of epochs for the updated model.
Once training is complete, the pipeline will execute model optimization using TensorRT. Students will apply FP16 or INT8 quantization to maximize inference speed while maintaining acceptable accuracy. The optimized model will be saved in a designated model registry—such as MLFlow or a container registry—ensuring full version control and rollback capability.
The next stage focuses on deployment automation. Students will integrate Helm charts to package and deploy the model to the Kubernetes cluster. If the new model passes predefined accuracy and performance thresholds—validated by automated testing—it will replace the previous production version in the Triton Inference Server deployment. If it fails, the pipeline will automatically roll back to the last stable model.
Monitoring will be integrated directly into the pipeline. Students will configure Prometheus and Grafana dashboards to track inference latency, GPU utilization, and model accuracy in real time. Alerts will be sent if performance drops below acceptable thresholds, triggering a retraining cycle.
By the end of this lab, students will have a self-sustaining AI system that detects new data, retrains a model with transfer learning, optimizes it with TensorRT, and redeploys it to production—entirely without manual steps. They will understand how to connect data pipelines, training pipelines, and deployment pipelines into a cohesive ModelOps framework that supports continuous learning and rapid iteration in production environments.

Kubernetes with GPU Nodes: Setup and Management3:53
In this submodule, students will gain the skills to set up and manage GPU-enabled Kubernetes clusters, forming the foundation for scalable cloud-native AI deployments. Kubernetes, when paired with NVIDIA GPU acceleration, becomes the backbone for deploying, orchestrating, and maintaining high-performance AI workloads across cloud, edge, and on-premises environments.
The lesson begins by exploring the role of GPU nodes within a Kubernetes cluster. Students will learn why GPU acceleration is critical for AI training and inference at scale, enabling workloads such as deep learning, video analytics, and large language model inference to run efficiently. They will also understand how Kubernetes manages these GPU resources, ensuring that each workload receives the necessary compute power without conflicts or idle capacity.
Students will then be guided through the setup process for GPU nodes. This includes provisioning GPU-powered virtual machines in the cloud—on platforms such as AWS EC2 P4/P5 instances, Azure NC/NV series, or GCP A2 instances—or configuring physical GPU servers like NVIDIA DGX systems. Once the hardware is in place, they will install the NVIDIA GPU drivers, CUDA toolkit, and NVIDIA Kubernetes Device Plugin, which allows Kubernetes to detect and schedule GPU workloads.
Cluster management will be a core focus, covering node scaling, workload scheduling, and resource isolation. Students will learn how to label and taint GPU nodes so only specific AI workloads can access them, preventing resource contention with non-AI services. They will also explore namespace separation and role-based access control (RBAC) to maintain security in multi-tenant environments.
Monitoring and maintenance best practices will also be covered. Students will integrate tools such as Prometheus, Grafana, and NVIDIA Data Center GPU Manager (DCGM) to visualize GPU utilization, temperature, memory usage, and workload performance in real time. They will learn how to use this telemetry to troubleshoot performance bottlenecks and optimize scheduling policies for maximum throughput.
Finally, the submodule will discuss scalability strategies, including auto-scaling GPU nodes based on workload demand, and hybrid deployments that span cloud and on-premises resources. Students will see how to build flexible infrastructure that can dynamically adjust to handle burst workloads while optimizing costs.
By the end of this submodule, students will be able to provision, configure, and manage GPU nodes within a Kubernetes cluster, ensuring AI workloads run at peak efficiency in secure, scalable, and cost-effective environments. They will have the foundation needed for integrating higher-level tools like Helm and Triton Inference Server for full production AI pipelines.
Helm Charts for AI Workload Deployment3:26
In this submodule, students will learn how to use Helm charts to deploy GPU-accelerated AI workloads onto Kubernetes clusters in a way that is repeatable, version-controlled, and easy to maintain. Helm, often referred to as the package manager for Kubernetes, abstracts away the complexity of managing large sets of Kubernetes manifests, making AI model deployment significantly faster and more reliable.
The lesson begins by explaining the role of Helm in AI operations. Students will understand how Helm packages Kubernetes YAML definitions into reusable charts that can be installed, upgraded, or rolled back with a single command. This approach is particularly valuable in AI environments, where models, dependencies, and serving infrastructure must be updated frequently without introducing downtime or errors.
Students will then learn the anatomy of a Helm chart—including the Chart.yaml file, templates directory, and values file. They will explore how these components define deployments, services, config maps, and resource requests for GPU workloads. The emphasis will be on setting GPU resource limits and node selectors so that AI workloads are scheduled onto NVIDIA GPU nodes rather than CPU-only nodes.
A practical example will demonstrate deploying an NVIDIA Triton Inference Server instance using Helm. Students will configure the chart to mount a model repository, set environment variables for TensorRT optimization, and expose the service for external inference requests. They will also see how Helm makes it simple to deploy multiple model-serving instances, each with different configurations, for A/B testing or version rollouts.
The submodule will then cover Helm values files in depth. Students will learn how to maintain separate values files for development, staging, and production, enabling environment-specific configurations without altering the core chart. They will also learn how to leverage Helm hooks to run initialization jobs, such as downloading pre-trained models from the NVIDIA NGC registry before the deployment starts.
Version control will be a key focus, showing how to store and manage Helm charts in a Git repository for collaborative development. Students will integrate Helm with CI/CD pipelines, so that any approved chart update is automatically deployed to the Kubernetes cluster. They will learn how to run helm diff before upgrades to preview changes and reduce the risk of deployment errors.
Security best practices will also be addressed, including signing Helm charts, using private chart repositories, and controlling access through Kubernetes RBAC policies. Students will understand how these measures prevent unauthorized or unverified workloads from being deployed.
By the end of this submodule, students will be able to package and deploy AI workloads with Helm, making it possible to manage complex NVIDIA GPU-powered deployments with minimal manual configuration. They will have the skills to ensure that AI services are scalable, reproducible, and easy to upgrade, a necessity for enterprise-grade AI operations.
Licensing: License Server and Enterprise Considerations3:14
In this submodule, students will gain a deep understanding of how NVIDIA licensing works in enterprise AI environments, focusing on the NVIDIA License Server and the licensing models used for NVIDIA AI Enterprise, Omniverse, and other GPU-accelerated platforms. Proper license management is essential for ensuring compliance, avoiding service disruptions, and maximizing the ROI of GPU infrastructure investments.
The lesson begins by explaining why licensing matters in the NVIDIA ecosystem. Students will learn that many NVIDIA enterprise products—including NVIDIA AI Enterprise, Omniverse Enterprise, and certain SDKs—require active license validation to operate beyond trial limits. Without proper configuration, workloads may be restricted in performance, limited in features, or even stop functioning entirely.
Next, the focus will shift to the NVIDIA License Server, the centralized service that manages and distributes licenses across an organization’s GPU resources. Students will learn about perpetual versus subscription-based licenses, as well as floating license models that allow a pool of licenses to be shared dynamically among multiple nodes or users. This flexibility is particularly valuable for AI workloads that run on GPU-enabled Kubernetes clusters or hybrid cloud environments, where workloads shift between nodes frequently.
The submodule will guide students through the installation and configuration of the NVIDIA License Server. This includes setting up the license server on a secure VM, uploading license files provided by NVIDIA, configuring network settings to allow GPU nodes to communicate with the server, and ensuring license redundancy to prevent outages. They will also learn how to integrate license checks into deployment pipelines so that a model deployment does not proceed if required licenses are unavailable.
Students will also explore license monitoring and auditing. Using NVIDIA’s built-in reporting tools, along with integrations to systems like Prometheus and Grafana, they will track license usage over time, identify unused licenses, and forecast future licensing needs. This proactive approach helps organizations optimize costs while ensuring there are no unexpected bottlenecks during peak usage periods.
Special attention will be given to cloud-specific licensing considerations. Students will learn how licensing differs when deploying NVIDIA AI Enterprise workloads on AWS, Azure, or Google Cloud, including cases where licensing is bundled with GPU instances and cases where it must be purchased separately. They will also examine how NVIDIA’s DGX Cloud service handles licensing internally, removing the need for on-premise license management.
The submodule will conclude by covering security best practices for licensing infrastructure. This includes isolating the license server in a restricted network zone, enforcing role-based access control for license administration, encrypting license files, and ensuring that license keys are never exposed in public repositories or container images.
By the end of this submodule, students will be able to deploy, configure, and manage NVIDIA licensing systems for enterprise AI environments. They will understand how to ensure uninterrupted operation of NVIDIA GPU-powered workloads, maintain compliance with licensing agreements, and make informed decisions about license purchasing and allocation in scalable AI infrastructures.
Security: Securing Models and Containers at Scale3:26
In this submodule, students will learn how to implement end-to-end security for AI models and containerized workloads deployed on NVIDIA GPU-powered environments. As AI systems scale into production across cloud, edge, and hybrid infrastructures, securing every layer—from the model to the runtime environment—becomes essential for protecting intellectual property, ensuring compliance, and preventing malicious attacks.
The lesson begins with an overview of the AI security landscape, highlighting the unique threats that arise in GPU-accelerated AI deployments. Students will understand risks such as model theft, adversarial attacks, container tampering, and data leakage during inference. These threats are amplified in environments where AI workloads are deployed at scale, such as in Kubernetes clusters serving multiple tenants or exposed to the public internet.
The first layer of defense covered is model security. Students will learn strategies to protect trained models, including model encryption, key-based access control, and obfuscation techniques to make reverse-engineering more difficult. The lesson will explore how to store models in private repositories, control access through RBAC (Role-Based Access Control), and integrate security checks before a model is loaded into Triton Inference Server or similar serving systems.
Next, the focus shifts to container security. Students will learn how to ensure containers are built from trusted base images, ideally sourced from the NVIDIA NGC Registry or other verified repositories. They will be introduced to container image signing and verification processes that prevent the execution of tampered images. The submodule will also cover the importance of scanning container images for vulnerabilities using tools such as Trivy, Clair, or cloud-native security scanners before deployment.
Runtime security will be a major focus. Students will understand how to enforce network policies within Kubernetes to restrict container communication, preventing data exfiltration or lateral movement in the event of a breach. They will also explore pod security standards to limit container privileges, apply AppArmor or SELinux profiles, and use read-only file systems for inference workloads.
The lesson will also cover data-in-transit and data-at-rest encryption. Students will see how to secure inference requests and responses using TLS/SSL certificates, protect sensitive configuration files, and ensure that all model weights and datasets stored on disk are encrypted with strong algorithms such as AES-256.
In addition, students will learn how to integrate security monitoring into AI deployments. Using tools like Falco, Sysdig Secure, or Aqua Security, they will detect suspicious activity, unauthorized model loading, or abnormal GPU utilization patterns that might indicate an attack. Alerts will be configured to trigger automated incident response workflows, such as pausing affected workloads or rolling back to a safe model version.
By the end of this submodule, students will have a robust understanding of how to secure AI models, containers, and runtime environments in NVIDIA GPU-powered systems. They will be equipped to design deployments that are resilient to cyber threats, compliant with enterprise security standards, and capable of safeguarding AI assets in mission-critical applications.
Hands-on Lab: Secure and deploy AI workloads with GPU-enabled Kubernetes cluster0:03
In this hands-on lab, students will apply the concepts learned in the previous submodules to securely deploy a GPU-accelerated AI workload on a Kubernetes cluster hosted in Microsoft Azure. This lab provides an end-to-end experience, from provisioning cloud infrastructure to implementing security best practices for models, containers, and runtime environments in a production-ready AI pipeline.
The lab begins with Azure infrastructure provisioning. Students will create a GPU-enabled Azure Kubernetes Service (AKS) cluster, selecting nodes from the NC, ND, or NV-series virtual machines optimized for NVIDIA GPUs. They will ensure that the NVIDIA GPU Operator is installed, enabling Kubernetes to manage GPU scheduling, monitoring, and driver updates automatically. The cluster will be configured with RBAC-enabled access and namespace separation to isolate AI workloads from other services.
Next, students will configure Helm for workload deployment. They will create a Helm chart for deploying an NVIDIA Triton Inference Server instance, packaged with a sample deep learning model from the NVIDIA NGC Registry. The Helm chart will specify GPU resource limits, node selectors to ensure GPU placement, and container image references from trusted, signed sources. Before deployment, students will scan the container images for vulnerabilities using Microsoft Defender for Containers or Trivy.
Once the workload is deployed, the focus shifts to model security. Students will store the model in a private Azure Container Registry (ACR), using access keys and Azure AD integration to control permissions. The model will be encrypted at rest, and inference requests will be served over TLS/SSL using Kubernetes Ingress controllers with HTTPS termination.
Runtime security hardening will then be applied. Students will define Kubernetes network policies to restrict communication between pods, ensuring that the inference service only accepts requests from approved sources. They will implement Pod Security Standards, enabling read-only file systems and dropping unnecessary Linux capabilities. This limits the attack surface in case of a compromised container.
The lab will also introduce security monitoring and logging. Students will integrate Azure Monitor and Falco to detect unusual activities, such as unauthorized model loading, excessive GPU usage, or abnormal network traffic patterns. Alerts will be configured to send notifications to Microsoft Teams or Slack, and automated remediation scripts will be deployed to quarantine suspicious pods.
As a final step, students will simulate a model update workflow. They will push a new version of the model to the private registry, update the Helm chart values, and perform a zero-downtime rollout using Helm’s built-in upgrade process. Security scans and policy checks will run automatically as part of this update, ensuring no unverified code enters production.
By the end of this lab, students will have built and deployed a fully secure AI workload on Azure’s GPU-enabled Kubernetes infrastructure, complete with model protection, container image security, runtime hardening, and proactive monitoring. They will leave with the practical skills needed to manage enterprise-grade AI deployments in cloud environments where security, compliance, and scalability are paramount.

Infra Metropolis for Smart Cities3:09
In this submodule, students will explore NVIDIA Metropolis, NVIDIA’s end-to-end platform for building smart city and intelligent video analytics applications. This SDK leverages GPU acceleration to process vast amounts of real-time video and sensor data, enabling advanced capabilities in areas such as traffic management, public safety, retail analytics, and urban infrastructure monitoring.
The lesson begins by introducing the vision behind NVIDIA Metropolis—to transform cities into safer, more efficient, and more sustainable environments through AI-powered perception systems. Students will see how the platform is built on top of NVIDIA’s DeepStream SDK, Triton Inference Server, and NVIDIA AI Enterprise stack, creating a robust environment for deploying computer vision models at scale.
Students will learn about the key components of Metropolis, starting with edge AI processing on devices like Jetson Xavier and Jetson Orin, which enable real-time analytics close to the data source. They will also explore GPU-powered servers and cloud integration, where heavy workloads such as multi-camera object tracking or high-resolution analytics can run in centralized data centers or cloud platforms.
A deep dive into Metropolis AI pipelines will show how raw video streams from IP cameras, drones, or traffic sensors can be ingested, decoded, and passed through deep learning models for tasks like object detection, facial recognition, license plate reading, and anomaly detection. Students will understand how to integrate metadata outputs from these pipelines into business intelligence dashboards or IoT systems for automated decision-making.
The submodule will also cover industry-specific use cases. In transportation, Metropolis can enable adaptive traffic light control and accident detection. In retail, it can track customer movement patterns to optimize store layouts. In law enforcement, it supports real-time threat detection and evidence collection. Each of these applications will be discussed with examples of latency requirements, scalability considerations, and data privacy compliance.
Students will gain an understanding of deployment strategies, from on-premises GPU clusters for high-security environments to hybrid cloud-edge setups for scalability. They will also learn how containerized AI services in Metropolis integrate with Kubernetes for orchestration, ensuring that workloads can be deployed, scaled, and updated efficiently.
Security will also be addressed, emphasizing video encryption, secure data transmission, and model access control to comply with regulations such as GDPR and CCPA. Students will explore methods for anonymizing or masking sensitive data while retaining analytical value.
By the end of this submodule, students will understand how NVIDIA Metropolis serves as a comprehensive toolkit for building real-time, AI-driven urban intelligence systems. They will be able to architect solutions that combine GPU-accelerated analytics, edge computing, cloud orchestration, and secure data handling to meet the demanding needs of smart city deployments.
Infra Riva for Speech AI3:06
In this submodule, students will explore NVIDIA Riva, an enterprise-grade SDK for developing Speech AI applications that deliver real-time automatic speech recognition (ASR), text-to-speech (TTS), and natural language understanding (NLU) capabilities. Built on GPU acceleration and optimized for low-latency processing, Riva enables developers to deploy conversational AI systems at scale for industries such as customer service, healthcare, financial services, and assistive technologies.
The lesson begins with an overview of why speech AI matters in today’s digital landscape. Students will understand how voice interfaces are becoming essential for frictionless human-computer interaction, allowing users to communicate with devices naturally and efficiently. NVIDIA Riva is designed to meet enterprise demands for accuracy, speed, and customizability, making it a preferred choice for mission-critical applications.
Students will learn about Riva’s architecture, which is built on Triton Inference Server for scalable model deployment and TensorRT for optimized GPU inference. The SDK supports streaming inference, enabling real-time speech-to-text transcription and immediate text-to-speech conversion with latencies as low as a few milliseconds.
The submodule will cover Riva’s core capabilities in detail. For ASR, students will see how Riva can transcribe speech in multiple languages, handle domain-specific vocabulary, and adapt to noisy environments through robust acoustic modeling. For TTS, they will explore Riva’s ability to synthesize speech with natural intonation, emotion control, and multi-voice support. The NLU component allows applications to interpret user intent, extract key information, and trigger relevant actions, enabling end-to-end conversational experiences.
Customization will be a major focus. Students will learn how to fine-tune Riva’s pre-trained models with domain-specific datasets using NVIDIA’s TAO Toolkit, ensuring higher accuracy in specialized industries such as medical transcription, legal dictation, or call center analytics. They will also explore methods for model quantization and pruning to optimize performance for edge deployments on devices like Jetson Orin.
The submodule will also address deployment strategies. Students will understand how to deploy Riva in containerized environments using Helm charts, integrate it with Kubernetes for scaling across multiple GPU nodes, and connect it to downstream applications via gRPC APIs. They will see how to enable secure communication with TLS encryption and how to manage authentication for API access.
Real-world use cases will be discussed, including virtual assistants for banking, AI-powered transcription services for media companies, voice-enabled medical record systems, and automated customer service bots for e-commerce. Each example will highlight specific Riva features that make these solutions performant and reliable.
By the end of this submodule, students will have a deep understanding of NVIDIA Riva’s speech AI capabilities and how to integrate them into scalable, real-time applications. They will be equipped to build high-performance voice interfaces that deliver natural, responsive, and secure conversational experiences in diverse enterprise settings.
Infra Nemo for NLP3:39
In this submodule, students will explore NVIDIA NeMo, a powerful, open-source toolkit designed for building, training, and deploying state-of-the-art natural language processing (NLP) models at scale. NeMo provides a modular framework and pre-trained models for tasks such as large language modeling (LLM), text classification, named entity recognition (NER), summarization, machine translation, and speech-to-text-to-NLP pipelines. Optimized for GPU acceleration and integrated with the NVIDIA AI Enterprise ecosystem, NeMo empowers organizations to develop advanced language AI solutions for industries ranging from finance and healthcare to customer service and research.
The lesson begins by introducing NeMo’s architecture, which is built on PyTorch Lightning for training efficiency and scalability, and supports distributed training across multiple GPUs or even multi-node clusters. Students will understand how NeMo leverages mixed-precision training via NVIDIA’s Apex library to optimize speed while maintaining model accuracy, significantly reducing the cost and time to train large NLP models.
A key focus will be on NeMo Collections, which are domain-specific modules tailored for different AI domains, including nlp, asr (automatic speech recognition), and tts (text-to-speech). Within the nlp collection, students will see how to work with transformer-based architectures like BERT, GPT, Megatron-LM, and T5, all of which can be fine-tuned for custom datasets and specialized business tasks.
Students will then explore model customization and fine-tuning. They will learn how to adapt pre-trained LLMs to specific domains using transfer learning and parameter-efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation). The submodule will explain how NeMo supports quantization-aware training to prepare models for deployment in resource-constrained environments like NVIDIA Jetson or cloud instances with limited GPU memory.
Deployment workflows will also be covered in detail. Students will see how to export NeMo-trained models to ONNX or TensorRT formats for high-performance inference and deploy them with Triton Inference Server for scalable serving. They will also learn how to containerize NeMo models using NGC-ready Docker images and orchestrate deployments using Kubernetes and Helm charts.
Another important component is multi-modal AI, where students will discover how NeMo can combine text, speech, and vision data into unified AI applications. For example, a pipeline could integrate speech input via NVIDIA Riva, process it with a NeMo NLP model, and generate a text or voice response, enabling rich conversational AI agents.
Real-world applications will be examined, including financial document analysis, medical record summarization, multilingual customer service bots, and research assistants powered by custom LLMs. Students will also discuss ethical considerations in NLP, such as bias mitigation, transparency in AI decisions, and compliance with data privacy laws like GDPR.
By the end of this submodule, students will be proficient in using NVIDIA NeMo to design, train, and deploy cutting-edge NLP models optimized for GPU acceleration and enterprise-grade scalability. They will possess the skills to create tailored language AI solutions that are accurate, efficient, and production-ready for complex business environments.
Infra Clara for Healthcare AI3:41
In this submodule, students will explore NVIDIA Clara, a specialized AI and computing platform designed to accelerate healthcare and life sciences applications. Clara provides end-to-end capabilities for medical imaging, genomics, drug discovery, and smart hospital solutions, leveraging the power of GPU acceleration, AI model optimization, and secure data workflows. The platform is built to address the strict regulatory, privacy, and performance requirements of the healthcare sector while enabling innovations that improve patient care and operational efficiency.
The lesson begins with an overview of the challenges in healthcare AI—including massive data volumes, complex imaging modalities, stringent compliance regulations like HIPAA and GDPR, and the need for explainable AI models. Students will understand how Clara is designed to meet these demands with a scalable, secure, and highly performant AI stack that integrates seamlessly with hospital IT systems, medical devices, and cloud infrastructure.
Students will dive into the core components of NVIDIA Clara, starting with Clara Imaging, which provides AI-accelerated pipelines for processing CT scans, MRI images, X-rays, and ultrasound data. They will learn how Clara supports segmentation, classification, and detection models tailored for radiology workflows. Clara integrates with DICOM standards and popular imaging systems, making it interoperable with existing hospital infrastructure.
Next, the focus will shift to Clara Genomics, which accelerates DNA and RNA sequencing pipelines. Students will see how GPU-accelerated algorithms drastically reduce the time required for genome alignment, variant calling, and transcriptome analysis—transforming genomic research and enabling personalized medicine.
Another critical component, Clara Discovery, will be introduced for drug discovery and molecular simulations. Using GPU-powered molecular dynamics engines, Clara can model protein-ligand interactions, screen drug candidates, and simulate biological systems with high accuracy, reducing the time and cost of pharmaceutical R&D.
Deployment will be a major part of this submodule. Students will explore how Clara supports containerized AI workflows via the NVIDIA NGC Registry, allowing healthcare AI applications to run on-premises in secure data centers, at the edge in medical devices, or in hybrid cloud environments. Clara workflows are orchestrated using Kubernetes and support federated learning to train AI models across multiple institutions without moving sensitive patient data, preserving privacy and compliance.
Real-world use cases will highlight Clara’s versatility: AI-assisted diagnostics in radiology, automated pathology slide analysis, real-time patient monitoring in ICUs, genomics-based treatment recommendations, and simulation-driven drug development. For each scenario, students will examine accuracy requirements, latency constraints, and regulatory approval processes.
Security and compliance will also be covered in depth. Students will understand how Clara implements data encryption, audit trails, and access control mechanisms to safeguard sensitive medical information. They will also explore techniques for anonymizing patient data while retaining clinical value for AI training.
By the end of this submodule, students will have a clear understanding of how NVIDIA Clara transforms healthcare through GPU-accelerated AI, enabling faster diagnostics, deeper insights from medical data, and innovative treatment pathways. They will be prepared to design and deploy healthcare AI solutions that meet industry performance, scalability, and compliance standards.
Infra Merlin for Recommender Systems3:27
In this submodule, students will explore NVIDIA Merlin, a comprehensive GPU-accelerated framework for building high-performance recommender systems at scale. Recommender systems power some of the most influential AI applications in the world—from personalized product recommendations on e-commerce platforms to content suggestions on streaming services, targeted advertising, and customized news feeds. NVIDIA Merlin is designed to handle massive datasets, complex user-item interactions, and demanding real-time inference requirements with exceptional speed and efficiency.
The lesson begins with an overview of the importance of recommender systems in driving user engagement, boosting revenue, and improving customer satisfaction. Students will understand how Merlin addresses the three major challenges in recommendation pipelines: data preprocessing, model training, and inference optimization.
The first core component, NVTabular, will be introduced as Merlin’s GPU-accelerated data preprocessing library. Students will see how NVTabular dramatically reduces the time required for tasks like feature engineering, categorical encoding, normalization, and data augmentation on terabyte-scale datasets. By leveraging GPUs, NVTabular enables interactive-speed transformations that would take hours or even days on CPUs.
Next, students will explore Merlin Models, a library that provides pre-built and customizable deep learning architectures optimized for recommendation tasks. They will learn about models like DLRM (Deep Learning Recommendation Model) and Wide & Deep networks, as well as transformer-based architectures for session-based and sequential recommendations. Training efficiency is enhanced with mixed-precision training, multi-GPU scaling, and distributed data parallelism, ensuring that even the largest datasets can be processed in practical timeframes.
For inference, students will study Triton Inference Server integration, which allows Merlin models to be deployed for real-time predictions at scale. Techniques such as model ensemble serving, batching, and TensorRT optimization will be discussed to achieve the lowest possible latency without sacrificing accuracy.
Students will also learn about Merlin Systems, a tool for orchestrating end-to-end recommendation workflows, from raw data ingestion to online serving. This includes integrating feature stores, managing model versioning, and enabling A/B testing to validate model performance in production environments.
Deployment strategies will cover cloud-based GPU instances, on-premises DGX systems, and hybrid architectures, with an emphasis on scalability, fault tolerance, and cost optimization. Security measures such as API authentication, data encryption, and compliance with privacy regulations like GDPR and CCPA will also be addressed.
Real-world use cases will highlight Merlin’s versatility: personalized movie recommendations on streaming platforms, product suggestions for online retailers, targeted ads in digital marketing, and user-specific news feeds. Each example will be tied to specific Merlin features that make such systems highly performant and adaptable to changing user behavior.
By the end of this submodule, students will have a complete understanding of how NVIDIA Merlin streamlines the creation, training, and deployment of cutting-edge recommender systems. They will be able to design AI-driven recommendation pipelines that are scalable, low-latency, and tailored to enterprise needs, giving them the tools to build the kind of personalization engines that drive the modern digital economy.

Requirements

Basic understanding of AI/ML concepts such as training, inference, and model deployment.
Familiarity with Linux command-line operations (Ubuntu recommended).
Basic knowledge of Docker and containerization (helpful but not mandatory — key concepts are covered in the course).
Access to a GPU-enabled system (NVIDIA A100, H100, L4, or Jetson Orin/Xavier) or cloud GPU instance (AWS, Azure, DGX Cloud).
Stable internet connection for downloading NVIDIA NGC containers, pretrained models, and SDKs.
Curiosity and a willingness to learn hands-on through labs and real-world projects.

Description

The Certified Infra AI Expert: End-to-End GPU-Accelerated AI Systems Training is a comprehensive, hands-on program designed for AI engineers, developers, and system architects who want to master the NVIDIA GPU ecosystem and build production-ready AI solutions from the ground up. Whether you’re working with data center GPUs like the A100 and H100, deploying edge AI on Jetson Orin, or developing digital twins with Omniverse, this course takes you through every stage of the AI lifecycle — from model training to optimization, deployment, and cloud/edge integration.

You’ll gain deep expertise in the NVIDIA AI Enterprise stack, learning how to set up GPU-powered infrastructure on AWS, Azure, and DGX Cloud. Through step-by-step labs, you’ll configure NVIDIA drivers, Kubernetes GPU nodes, and Helm charts for scalable AI workloads. The course covers NGC Registry workflows, showing you how to deploy AI containers, use pretrained models, and integrate NVIDIA DeepStream SDK for real-time video analytics and RAPIDS for GPU-accelerated data processing.

We’ll dive into NVIDIA Triton Inference Server for high-throughput inference, TAO Toolkit for transfer learning and quantization, and TensorRT for model optimization. You’ll learn best practices for container security, licensing via NVIDIA License Server, and cloud-native AI DevOps using Kubernetes, Helm, and CI/CD pipelines.

Specialized modules explore NVIDIA vertical SDKs such as:

Metropolis for smart cities
Riva for speech AI
NeMo for NLP
Clara for healthcare AI
Merlin for recommender systems

A highlight of the training is the Capstone Project, where you’ll design and deploy a complete AI solution using NVIDIA hardware and software. Choose between:

Video surveillance with DeepStream
Digital twin simulation with Omniverse
Smart edge AI with Jetson and IoT sensor fusion

You’ll integrate TensorRT optimization, Triton inference, and cloud-edge synchronization, delivering a project report, deployment pipeline, and demo video — essential portfolio pieces for demonstrating your skills.

By the end of this course, you will be able to:

Architect GPU-accelerated AI pipelines from data ingestion to deployment
Implement real-time AI systems with DeepStream, RAPIDS, and Triton
Optimize AI models for performance and efficiency using TensorRT
Deploy scalable AI solutions on cloud platforms and edge devices
Integrate AI with digital twins, IoT sensors, and streaming pipelines
Apply security and licensing best practices for enterprise AI environments

Upon successful completion, you’ll earn the Certified NVIDIA AI Expert credential, validating your ability to design, optimize, and deploy AI solutions using the full NVIDIA technology stack. This certification sets you apart as a professional who can bridge AI research and real-world implementation, making you highly valuable in industries from autonomous systems to healthcare, finance, manufacturing, and beyond.

If your goal is to become an end-to-end AI solutions architect with cutting-edge GPU acceleration skills, this is the definitive NVIDIA AI training program to get you there.

Who this course is for:

AI/ML Developers looking to move beyond model training into real-world deployment and optimization on NVIDIA hardware.
Edge AI Engineers working with Jetson devices and IoT sensor integration for real-time applications.
System Architects and DevOps Engineers responsible for cloud-native AI infrastructure, Kubernetes orchestration, and containerized AI workloads.
Technical Product Managers and Solution Engineers who need a deep, hands-on understanding of NVIDIA AI Enterprise, DeepStream, RAPIDS, Triton, and Omniverse.
Researchers aiming to deploy optimized AI pipelines in high-performance computing or industry-specific environments like healthcare, smart cities, robotics, or manufacturing.

Certified Infra AI Expert: End-to-End GPU-Accelerated AI

What you'll learn

Explore related topics

Course content

Introduction to Certified Infra AI Expert: End-to-End GPU-Accelerated AI2 lectures • 4min

Module 1: Hardware Ecosystem and GPU Compute Foundations5 lectures • 17min

Module 2: AI Containers and NGC Registry5 lectures • 16min

Module 3: Inference at Scale with Triton and TAO Toolkit5 lectures • 20min

Module 4: Real-Time AI with DeepStream and RAPIDS5 lectures • 20min

Module 5: Digital Twins & Omniverse Integration5 lectures • 18min

Module 6: Edge AI with Jetson and IoT Sensor Fusion5 lectures • 15min

Module 7: ModelOps and Lifecycle Management5 lectures • 14min

Module 8: Cloud-Native AI and DevOps for Infra Stack5 lectures • 14min

Module 9: Infra Vertical SDKs Overview5 lectures • 17min

Requirements

Description

Who this course is for: