Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Learn Large Language Models (LLMs) with Python and LangChain

Name: Learn Large Language Models (LLMs) with Python and LangChain
Rating: 4.5 (32 reviews)

Understand the Fundamentals of Large Language Models (LLMs) like BERT, RoBERTa, GPT, LLAMA with Python, Google Colab

Created byHolczer Balazs

Last updated 7/2025

English

What you'll learn

large language models (LLMs) fundamentals
encoder-only transformer architectures (BERT, RoBERTa etc.)
decoder-only transformer architectures (GPT, LLaMA etc.)
transfer learning and fine-tuning
retrieval-augmented generation (RAG)

Course content

17 sections • 92 lectures • 10h 55m total length

Introduction2:06
Explore the transformer backbone of large language models, from encoder-only to decoder-only architectures. Learn fine-tuning, retrieval augmented generation, and prompt engineering using Python and LangChain.

Transformers overview2:59
Explore transformers overview to build language models, address long-range word relationships, and learn self-attention, word embeddings, and parallelization that power models like ChatGPT.
Understanding word embeddings I11:11
Understanding word embeddings II9:27
Explore the word2vec skip-gram model, window size, one-hot encoding, and the embedding matrix that learns word vectors to reveal semantic similarities.
Tokenization and word embeddings9:32
Understanding positional encoding18:33
Learn how positional encoding combines word embeddings with sine and cosine vectors to encode token positions in transformer inputs, enabling parallel processing, deterministic position values, and extrapolation to longer sequences.
The self-attention mechanism I15:40
Explore the self-attention mechanism at the heart of transformers, showing how queries, keys, and values compute attention scores to reveal word relationships in an input sequence.
The self-attention mechanism II9:47
What is masking?4:39
Multi-head architecture8:11
The neural network layer7:07
Understanding the training of transformers5:17
Train transformers by predicting the next word from a sequence and updating the query, key, value, and embedding matrices using backpropagation, ground-truth labels, and cross-entropy loss.
Decoder only architectures (GPT or LLaMA)7:36
Encoder only architectures (BERT)4:40
Explore encoder-only architectures like BERT, which encode input into fixed-length embeddings for sentiment analysis, text classification, and language understanding, without text generation.
Encoder-decoder architectures (Google Translator)3:48
Mathematical formulation of transformers0:06
Transformers Quiz

What is BERT?8:29
Fundamentals of BERT architecture8:25
Explain bidirectional self-attention in Bert and how encoder-only transformers attend to left and right contexts, enabling parallel processing and a discriminative setting that extracts answers from context.
Understanding the [CLS] token6:47
Learn how the CLS token summarizes an input sequence in bert-based architectures, its role across transformer layers, and how the classification vector enables sentiment analysis and text classification.
Pre-training the model - masking11:17
Pre-training the model - sentence prediction10:13
Fine tuning for text classification and sentiment analysis8:01
Fine tuning for question answering11:51
Builds on Bert pre-training and fine tuning to predict the start and end of an answer within a context, using classification, separator, and segment embeddings.
BERT and RoBERTa3:54
Bert and Roberta share an encoder-only transformer with bidirectional attention. Roberta uses dynamic masking and more data, omitting next sentence prediction compared to Bert's static masking and next sentence prediction.
Original academic research article0:10
BERT Quiz

Sentiment analysis implementation I7:41
Explore the sentiment analysis implementation and leverage Google Colab to bypass local setup, using NumPy, TensorFlow, PyTorch, and Keras, with GPU or CPU runtimes to accelerate deep learning.
Sentiment analysis implementation II8:40
Sentiment analysis implementation III9:02
Demonstrate tokenization for sentiment analysis by converting text to token IDs with CLS and separator tokens, applying 512-token padding and attention masks, and creating IMDb train/test splits in PyTorch.
Sentiment analysis implementation IV11:45
Sentiment analysis implementation V14:04
Set up a data loader and train a bert-based sentiment classifier with batching, padding, and shuffle, using cross-entropy loss and the adam optimizer at a small learning rate.
Tuning the hyperparameters of the model1:03

Text classification implementation I8:18
Text classification implementation II2:11
Explore text classification with a transformer-encoder and a dense network, using a data loader, cross-entropy loss, and Adam optimizer across 20 epochs for about 88% test accuracy.
Using the built-in head of the architecture13:24
Use a built-in head of a pre-trained Bert-like architecture via auto tokenizer and auto model for sequence classification, train with the transformers trainer, and compare accuracy and precision.

Question answering implementation I4:57
Fine-tune a Bert model for question answering on a custom dataset by aligning start and end character positions, using context and questions with segment embeddings and Python tools.
Question answering implementation II3:05
Install transformers and datasets in Google Colab, train a BERT-based question answering model on a small dataset on CPU to avoid overfitting, and upload a custom bird question answering JSON.
Question answering implementation III8:08
Parse a custom bird QA JSON, convert it to a Hugging Face dataset, and assemble training data with context, questions, and answers for BERT-based question answering.
Question answering implementation IV10:19
Parse a Hugging Face JSON dataset to build a bert-based question answering pipeline, tokenize with bert-large-uncased, and map inputs with input ids, attention masks, and start/end positions.
Question answering implementation V11:14
Train and fine-tune the Bert large uncased whole word masking model for question answering on squad, covering full architecture, head training, small learning rates, and trainer setup.
Question answering implementation VI10:47
Fine-tune an encoder-only Bert model for question answering, test with a helper function, manage device and tokenization, and extract answers from context, while signaling future shift to GPT models.

Revisiting decoder-only transformer architectures7:34
Explore decoder-only architectures like GPT and Lama, their autoregressive generation, and why fine-tuning these large language models is costly; learn efficient approaches to fine-tune attention.
Transfer learning and fine-tuning models8:26
Compare transfer learning and fine tuning for decoder-only models, focusing on freezing layers, the head, and low-rank adaptation (LoRA) for small-data tasks.
Reinforcement Learning from Human Feedback (RLHF)6:30
Learn how decoder-only large language models like GPT use next-token prediction and cross-entropy loss, then fine-tune with reinforcement learning from human feedback using PPO and a reward model.
Understanding GPT, LLaMA and Alpaca, DeepSeek5:28
Decoder-Only Architecture Quiz

Understanding low-rank adaptation (LoRA) I10:55
Learn low rank adaptation (LoRA) to reduce fine-tuning parameters of decoder-only language models by exploiting matrix rank and linearly independent columns to boost efficiency.
Understanding low-rank adaptation (LoRA) II7:41
Explore how low rank adaptation freezes transformer matrices and uses small A and B adapters to form w' = W + AB, enabling efficient fine tuning with fewer parameters.
QLoRA (Quantized Low-Rank Adaptation)11:43
Learn quantized low-rank adaptation (qlora) to reduce memory footprint and accelerate fine-tuning of decoder-only LLMs, using post-training quantization or quantization-aware training.
Original academic research article0:07
Fine-Tuning Large Language Models Quiz

Using GPT-2 model I9:57
Using GPT-2 model II8:35
Learn to use GPT-2 with pipelines and the auto causal LM model for fine-tuning and control over hyperparameters, tokenizer, precision, device usage, and model sizes from small to extra large.
Using GPT-2 model III6:46
Explore configuring a GPT-2 based model with auto tokenizer or explicit name, enable half-precision loading, and distribute across hardware to support next-token prediction and token embeddings.
Using GPT-2 model IV8:26
Learn how to generate text with the GPT-2 model using the tokenizer, the prompt, and parameters like max length, temperature, and repetition penalty, then decode the output.
Using LLaMA model5:42
Learn to use the llama model with transformers in python, including 1.1 billion parameters, 16-bit precision, auto tokenizer and device map, to generate answers from prompts and preview fine-tuning.

Requirements

machine learning fundamentals
Python programming fundamentals

Description

Unlock the power of Large Language Models (LLMs) and bring cutting-edge AI to your projects! This beginner-friendly yet comprehensive course takes you deep into the world of transformer-based models — from foundational architectures like BERT and RoBERTa, to generative giants like GPT and Meta’s LLaMA.

But we don’t stop there.

You’ll also explore Retrieval-Augmented Generation (RAG) — one of the most powerful methods to enhance LLMs with real-time, context-aware information retrieval. Learn how RAG bridges the gap between static models and dynamic, knowledge-grounded generation — perfect for applications like chatbots, enterprise search, and AI assistants.

Whether you're a beginner Python developer or someone curious about how LLMs really work, this course will give you the theory, hands-on skills, and real-world insights to work confidently with modern AI tools.

What You’ll Learn

Section 1 - Transformers

word embeddings
positional embeddings and encoding
self-attention mechanism
masking
multi-head architecture
how to train a transformer architecture
transformer architectures: GPT, BERT and LLaMA

Section 2 - Encoder-Only Architectures

BERT fundamentals
pre-training and fine-tuning the model
the [CLS] token
BERT and RoBERTa
sentiment analysis, text classification and question answering with BERT

Section 3 - Decoder-Only Architectures

GPT and LLaMA fundamentals
reinforcement learning from human feedback (RLHF)
fine-tuning decoder-only architectures
LoRA and QLoRA
fine-tuning models on custom dataset

Section 4 - Retrieval-Augmented Generation (RAG)

what is RAG?
semantic search and vector databases
LSH and HNSW algorithms
using RAG with PDF files

Section 5 - Prompt Engineering

prompt engineering fundamentals
zero-shot prompting
few-shot prompting
chain of thoughts (CoT)
prompt chaining methods

Join the course today and start your journey into the world of Large Language Models and Retrieval-Augmented Generation. Whether you're building smarter apps, enhancing your AI knowledge, or simply exploring the future of language technology — this course will give you the tools and confidence to level up.

Enroll now and start building with the AI models shaping the future. Let's get learning!

Who this course is for:

Beginner Python developers who are curious about generative AI and large language models (LLMs)

Learn Large Language Models (LLMs) with Python and LangChain

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 2min

Evolution of Natural Language Processing1 lecture • 5min

Transformers15 lectures • 1hr 59min

Encoder-Only Architectures - BERT Artchitecture Overview9 lectures • 1hr 9min

Sentiment Analysis Implementation6 lectures • 52min

Text Classification Implementation3 lectures • 24min

Implementation of Question Answering6 lectures • 49min

Decoder-Only Transformer Architectures - GPT, LLaMA4 lectures • 28min

Fine-Tuning Large Language Models (LLMs)4 lectures • 30min

Using GPT and LLaMA5 lectures • 39min

Requirements

Description

Who this course is for: