Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Fine-Tune & Deploy LLMs with QLoRA on Sagemaker + Streamlit
Bestseller
Highest Rated
Rating: 4.6 out of 5(62 ratings)
535 students

Fine-Tune & Deploy LLMs with QLoRA on Sagemaker + Streamlit

Master QLoRA Math, Mixed Precision Training, Double Quantization, Lambda functions, API Gateway & Streamlit deployment
Created byPatrik Szepesi
Last updated 8/2025
English

What you'll learn

  • Train/Fine Tune LLMs in AWS Sagemaker using QLoRA and advanced 4-bit quantization on your own dataset
  • Create an interactive Streamlit app to deploy your fine tuned LLM with Sagemaker, Lambda Functions, and API Gateway
  • Master QLoRA fine-tuning — including adapter injection, memory optimization, parameter freezing, and the mathematics behind it
  • Leverage bfloat16 compute types for faster and more efficient training on modern GPUs
  • Understand mixed precision training with qLoRA in Sagemaker
  • Use Parameter Efficient Fine Tuning(PEFT) to dynamically find and inject LoRA layers
  • Understand the entire low-level fine-tuning pipeline — from raw dataset to trained model
  • Use double quantization and nf4 precision to compress models without losing performance
  • Discover how gradient checkpointing drastically reduces VRAM usage during training
  • Fine-tune large models like Mixtral on Amazon SageMaker using state-of-the-art GPU acceleration
  • Understand custom chunking code for LLMs
  • Merge LoRA weights and unload adapters for final model export — ready for deployment
  • Deploy your trained model to SageMaker Endpoints using Amazon's production infrastructure
  • Build real-time LLM APIs using Lambda functions and API Gateway
  • Securely Set up Training Jobs with IAM roles
  • AWS Budgeting, Server Management, and Pricing
  • Learn how to use AWS Quotas to use powerful GPUs

Course content

11 sections59 lectures7h 34m total length
  • What We Are Building5:23

    Explore building and deploying a fine-tuned large language model on AWS SageMaker via a Streamlit app, using API gateway and Lambda, with quantization, LoRA, mixed precision training, and monitoring.

Requirements

  • Familiarity with Python
  • Basic Linear Algebra(matrix multiplication)

Description

Large Language Models (LLMs) are redefining what's possible with AI — from chatbots to code generation — but the barrier to training and deploying them is still high. Expensive hardware, massive memory requirements, and complex toolchains often block individual practitioners and small teams. This course is built to change that.

In this hands-on, code-first training, you’ll learn how to fine-tune models like Mixtral-8x7B using QLoRA — a state-of-the-art method that enables efficient training by combining 4-bit quantization, LoRA adapters, and double quantization. You’ll also gain a deep understanding of quantized arithmetic, floating-point formats (like bfloat16 and INT8), and how they impact model size, memory bandwidth, and matrix multiplication operations.

You’ll write advanced Python code to preprocess datasets with custom token-aware chunking strategies, dynamically identify quantizable layers, and inject adapter modules using the PEFT (Parameter-Efficient Fine-Tuning) library. You’ll configure and launch distributed fine-tuning jobs on AWS SageMaker, leveraging powerful multi-GPU instances and optimizing them using gradient checkpointing, mixed-precision training, and bitsandbytes quantization.

After training, you’ll go all the way to deployment: merging adapter weights, saving your model for inference, and deploying it via SageMaker Endpoints. You’ll then expose your model through an AWS Lambda function and an API Gateway, and finally, build a Streamlit application to create a clean, responsive frontend interface.

Whether you’re a machine learning engineer, backend developer, or AI practitioner aiming to level up — this course will teach you how to move from academic toy models to real-world, scalable, production-ready LLMs using tools that today’s top companies rely on.

Who this course is for:

  • Machine Learning Engineers
  • Backend and MLOps Engineers
  • AI Researchers and Students
  • Anyone who wants to go beyond "prompt engineering" and start building, training, and deploying their own production-ready LLMs.