Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Local LLMs via Ollama & LM Studio - The Practical Guide

Name: Local LLMs via Ollama & LM Studio - The Practical Guide
Rating: 4.6 (1682 reviews)

Run open large language models like Gemma, Llama or DeepSeek locally to perform AI inference on consumer hardware.

Bestseller

Highest Rated

Created byMaximilian Schwarzmüller

Last updated 11/2025

English

What you'll learn

Explore & understand Open-LLM use-cases
Achieve 100% privacy & agency by running highly capable open LLMs locally
Select & run open LLMs like Gemma 3 or Llama 4
Utilize Ollama & LM Studio to run open LLMs locally
Analyze text, documents and images with open LLMs
Integrate locally running open LLMs into custom AI-powered programs & applications

Course content

5 sections • 59 lectures • 3h 54m total length

Welcome To The Course!2:06
Learn to run OpenAI models locally using LM Studio and Ollama, explore hardware requirements, and compare open models like Gemma models by Google with paid options while ensuring 100% privacy.
What Exactly Are "Open LLMs"?6:27
Open large language models reveal their weights and licensing, not the training code, while tokens, token IDs, and neural network weights shape local outputs.
Why Would You Want To Run Open LLMs Locally?6:51
Run open large language models locally on consumer hardware, compare with proprietary models, and enjoy free use, privacy, offline latency, and control for tasks like text summarization and data analysis.
Popular Open LLMs - Some Examples3:43
Explore popular open large language models with publicly available weights, including Llama, Gemma, and DeepSeek, and learn how to run these models locally on your laptop.
Where To Find Open LLMs?4:47
Explore open large language models on Hugging Face's model catalog, comparing Google's Gemma models, Meta's Llama, Qwen, and more, with model cards detailing capabilities, parameters, and running locally.
Running LLMs Locally - Available Options7:17
Explore running open models locally with llama.cpp, focusing on LM Studio and Ollama as accessible wrappers that provide GUI or CLI access and server modes.
Check The Model Licenses!4:03
Check licenses before running local models; review model cards to see if a license is MIT, Apache 2.0, or more restrictive like Llama or Gemma, especially for commercial use.
Course Slides0:03
Join Our Community0:25

Module Introduction1:20
Learn how to run openly available locally hosted large language models on your laptop or server by understanding model parameters, hardware requirements, and how quantization lowers demands without sacrificing performance.
LLM Hardware Requirements - First Steps4:21
Identify the hardware needed to run a large language model via inference on local machines or servers. Learn how GPUs and VRAM, plus RAM, affect loading the model and context.
Deriving Hardware Requirements From Model Parameters5:34
Explain how model parameters become weights in a neural network and how token IDs flow to outputs, highlighting VRAM or system memory requirements for 2B to 27B models.
Quantization To The Rescue!6:50
Quantization compresses large language models by converting parameters from float32/float16 to int4/int8, reducing memory and enabling local deployment with minimal quality loss.
Does It Run On Your Machine?5:50
Discover how local LLMs run on your machine using GPU, VRAM, or CPU, with memory and quantization trade-offs. Use Hugging Face profiles and LM Studio to choose feasible models.

Module Introduction2:03
Learn to install and use LM Studio to download and run open models locally, explore features, configurations, and programmatic usage via its web server for ai agents and automations.
Running Locally vs Remotely1:08
Installing & Using LM Studio3:09
Install LM Studio from lmstudio.ai for your operating system, then set up and use the main chat window, model loader, modes, and language and theme settings.
Finding, Downloading & Activating Open LLMs9:04
Discover open LLMs with LM Studio by browsing Hugging Face model cards or using LM Studio's model search to find quantized options, download, and load them locally.
Using the LM Studio Chat Interface4:53
Load a model and chat with it using the LM Studio chat interface. Manage sessions, regenerate or edit outputs, copy or delete messages, and customize appearance and per-chat prompts.
Working with System Prompts & Presets3:26
Switch to power user or developer to set and save system prompts as presets in LM Studio, applying them to future chats for consistent model behavior.
Managing Chats2:32
Delete, organize into folders, and review chat histories in Finder or Windows Explorer. Use power user mode to view per-message tokens and timing.
Power User Features For Managing Models & Chats6:27
Power user mode enables forked, parallel chats and continuous generation to branch conversations and extend responses, with a developer view and model management in LM Studio.
Leveraging Multimodal Models & Extracting Content From Images (OCR)2:48
Explore how multimodal models enable image inputs for local llms and demonstrate extracting content from images (OCR) using a wash receipt, comparing 12 billion-parameter and 27 billion-parameter Gemma 3 models.
Analyzing & Summarizing PDF Documents3:27
Onwards To More Advanced Settings1:52
Explore advanced settings in LM Studio, including model tuning, presets with system prompts, and temperature and sampling controls to shape chat output and inference.
Understanding Temperature, top_k & top_p6:32
Controlling Temperature, top_k & top_p in LM Studio4:45
Explore how temperature, top k, and top p in LM Studio control output diversity and determinism. See how low vs high settings alter token choice, repetition, and overall sampling behavior.
Managing the Underlying Runtime & Hardware Configuration4:17
Configure the underlying runtime and hardware for local llms by selecting llama.cpp or MLX, optimizing GPU use, VRAM limits, and strict guardrails.
Managing Context Length5:21
Power users can adjust the model context length and GPU offload in LM Studio and Ollama, balancing token capacity, memory use, and task complexity to fit system limits.
Using Flash Attention5:08
Enable flash attention and KV cache quantization to speed up generation and reduce memory usage, and recognize the feature is experimental and may cause issues with some models.
Working With Structured Outputs5:28
Learn to force LLM outputs into structured json using the structured output setting and a json schema, enabling extraction of financial data like year, revenue, and operating income.
Using Local LLMs For Code Generation2:35
Learn how locally running LLMs via Ollama and LM Studio enable offline code generation, code questions, and document interactions—privacy-focused tasks like image text extraction and PDF summarization.
Content Generation & Few Shot Prompting (Prompt Engineering)5:21
Explore few-shot prompting and prompt engineering to turn locally running LLMs into content generation machines, using example blocks and XML delimiters to craft LinkedIn posts.
Onwards To Programmatic Use2:25
Discover how LM Studio exposes a locally running model API server you can access from your code to build programmatic applications that parse user input, analyze images, and generate content.
LM Studio & Its OpenAI Compatibility6:00
More Code Examples!5:04
Explore code examples for communicating with locally running LLMs via LM Studio, including configuring the base URL and API key, model identifiers, and just-in-time loading in chat and image parsing.
Diving Deeper Into The LM Studio APIs2:10
Dive into the LM Studio APIs beyond the OpenAI-compatible endpoint, and learn to configure structured JSON outputs and temperature for locally running models.
Using the Python / JavaScript SDKs0:17

Module Introduction1:40
Installing & Starting Ollama2:08
Download and install Ollama across macOS, Linux, and Windows, noting it has no graphical interface; start Ollama and verify installation via the Ollama command in a terminal, using status indicators.
Finding Usable Open Models2:55
Learn to run open models locally with Ollama by using the run command and a model identifier from the Ollama catalog of popular LLMs such as mistral, llama, and gemma3.
Running Open LLMs Locally via Ollama7:43
learn to run open llms locally with Ollama using the Gemma 3 model, including tags and quantized versions, then download, run, and manage sessions via the command line.
Adding a GUI with Open WebUI2:12
Learn how to add a graphical user interface to locally running models with Open WebUI, compare it to LM Studio for easy setup, and connect to Ollama-hosted Gemma 3.
Dealing with Multiline Messages & Image Input (Multimodality)2:38
Learn to send multiline messages in Ollama using triple double quotes and to handle multimodal prompts by including an image path for Gemma3, enabling image-aware chat.
Inspecting Models & Extracting Model Information3:31
Learn to use Ollama's command-line chat and build a chat history, inspect models with /show commands, and view architecture, parameters, context length, licenses, and system prompts for tailored sessions.
Editing System Messages & Model Parameters6:01
Learn to tailor llm behavior by using /set to adjust system messages, model parameters like temperature, top_k, top_p, and context size, plus enable json mode and verbose outputs.
Saving & Loading Sessions and Models3:35
Learn how Ollama saves a chat session as a model copy that preserves chat history and settings, then load the named model (like s1) to restore the exact setup.
Managing Models5:42
Learn to save and load sessions by creating a copied model that bakes in your settings, system message, and chat history, and manage models with Ollama list, ps, and rm.
Creating Model Blueprints via Modelfiles6:22
Create and customize llms with modelfiles in Ollama by specifying a base model, overriding parameters and system messages, and baking chat history to load reusable, ready-to-run models.
Creating Models From Modelfiles3:26
Create a model from a modelfile with Ollama's create command. See how modelfiles adjust metadata, enable sharing via GitHub, and push to a registry.
Making Sense of Model Templates6:39
Discover how template instructions align model input with training, using start_of_turn and stop tokens to steer token generation, with LM Studio and Ollama templates baked in.
Building a Model From Scratch From a GGUF File6:37
Learn to build Ollama models from gguf files, using downloaded weights and metadata to customize Qwen3 variants, with templates shaped by the Go templating language.
Getting Started with the Ollama Server (API)2:12
Learn to use the Ollama server API from code or CLI, and connect via a graphical interface like Open WebUI. The server starts automatically; restart with Ollama serve.
Exploring the Ollama API & Programmatic Model Access5:18
Getting Structured Output2:55
Learn how to request structured outputs from Ollama and LM Studio by defining a json schema to generate JSON data that adheres to a specified structure.
More Code Examples!4:52
Explore practical code examples that run locally via the Ollama OpenAI compatible API with the OpenAI SDK, including chat and base64 image parsing with Gemma 3.
Using the Python / JavaScript SDKs0:15

Requirements

Basic understanding of LLM functionality & usage
NO programming or advanced technical expertise is required
If you want to run models locally: At least 8 GB of (V)RAM will be required

Description

Unlock the Power of Private, Powerful AI on Your Own PC!

ChatGPT, Google Gemini and all those other AI chatbots are standard tools for everyday use. But like all tools, they're not the best choices for all tasks.

When privacy, cost, offline access, or deep customization matter, running powerful open models locally on your own computer beats all those proprietary models and third-party AI chatbots.

This course will teach you how to leverage open LLMs like Meta's Llama models, Google's Gemma models or DeepSeek models to run AI workloads and AI chatbots right on your machine - no matter if it's a high-end PC or a normal laptop.

Why Local & Open LLMs?

In an era dominated by cloud-based AI and chatbots like ChatGPT, running state-of-the-art models locally offers game-changing advantages. Imagine leveraging cutting-edge AI with:

Zero or Low Cost: Forget expensive subscriptions; tap into powerful models freely.
100% Privacy: Your prompts and data stay securely on your machine – always.
Offline First: Operate powerful AI tools anytime, anywhere, no internet required.
Freedom from Vendor Lock-in: Access a diverse and rapidly growing ecosystem of open models.
Astonishing Capability: Discover how open models like Gemma, Llama, and DeepSeek are not just alternatives, but top performers, ranking high on benchmarks and the Chatbot Arena leaderboard!

Course Overview

This course is your comprehensive, hands-on journey into the practical world of local LLMs. We'll cut through the complexity, guiding you step-by-step from setup to advanced usage.

Here's what you'll master:

The Open LLM Landscape: Understand what open models are and why they matter (and where to find them).
Hardware Demystified: Learn the realistic hardware requirements for running LLMs locally.
Quantization Explained: Uncover the technique that makes running huge models feasible on consumer hardware.
LM Studio In-Depth: Get hands-on with installing, configuring, selecting, downloading, and running models using LM Studio.
Ollama Mastery: Learn to install, configure, and interact with models seamlessly via Ollama.
Real-World Use Cases: Apply your knowledge to practical tasks like image OCR (reading text from images), summarizing PDF documents, mastering few-shot prompting, and generating creative content.
Programmatic Power: Discover how to integrate these locally running models into your own scripts and applications using their built-in APIs (LM Studio & Ollama).
And much more! Build a solid foundation and gain the confidence to explore the vast potential of local AI.

Who Should Enroll?

This course is tailor-made for:

Developers looking to integrate powerful, private AI into their workflows or applications.
Tech enthusiasts eager to experiment with cutting-edge AI without the cloud constraints.
Privacy-conscious individuals wanting full control over their data when using AI.
Anyone seeking powerful AI solutions without ongoing subscription costs.
Students and professionals aiming to add practical, in-demand AI skills to their toolkit.

Ready to Take Control of Your AI Future?

Step into the world of powerful, private, and cost-effective artificial intelligence. Enroll now in "Unlock Local AI Power" and start running incredible Large Language Models directly on your own computer today!

Who this course is for:

Beginner and advanced users of AI chatbots & LLMs
Professionals that require a high degree of data privacy
Tech enthusiasts and AI users that want to go beyond the basics

Local LLMs via Ollama & LM Studio - The Practical Guide

What you'll learn

Explore related topics

Course content

Introduction9 lectures • 36min

Understanding Hardware Requirements & Quantization5 lectures • 24min

LM Studio Deep Dive24 lectures • 1hr 36min

Ollama Deep Dive19 lectures • 1hr 17min

Course Roundup2 lectures • 2min

Requirements

Description

Who this course is for: