
In this video, we give some introduction to the goal and structure of the course
In this video, we give simple introduction of architecture , key components and training process of LLM
In this video, we show how to use code to convert text to tokens , and convert tokens to ids
In this video, we introduce the concept of sliding window, we will use it to implement data preprocessing strategy
in this video, we show how to convert word token with its position in the sentence to vector
In this video, we use code to show steps for computing attention scores for given two words
In this video, we show how to use matrix to make fixed values turn into trainable
In this video, we show how to wrap the whole attention computing process in a class
In this video, we show how to use nn.Linear to improve the computing process
In this video, we talk about mask attention that is prevent future words interfere with the predict of current word
In this video, we show how to improve the process of mask attention
In this video, we show how to use dropout to make training has better outcome
In this video, we show how to design multi-head attention to extract more meaning from the same sentence
In this video, we show how to convert for loop into matrix operation
In this video, we continue to show the detail implementation of using matrix to replace looping
In this video, we give out the over all view of gpt model , indicate all components that are used to build the whole gpt model, then in later videos, we will dive deep into each layer
In this video, we use code to build the skeleton of the gpt model, then we can fill in different layers in later sections
In this video, we give detail algorithm steps for layer normalization
In this video, we give layer implementation by uisng torch framework
In this video, we see how to implement a sandwich like structure with GELU activation function
In this video, we talk about the trick of shirtcut connection which is useful for speed up the training of deep learning network
In this video, we show how to construct transformer block by using all components we have done before
In this video, we will see how to create the whole gpt model by using transformer blocks and layers we have seen before
In this video, we show how to use model to predict words
In an age where large language models (LLMs) are at the forefront of AI, knowing how to call an API isn’t enough to set you apart. Mastery comes from understanding the core architecture, mechanics, and fine-tuning techniques behind these powerful models. This course is designed for those who want to go beyond the basics and learn to build a large language model (LLM) from scratch, gaining insight into the internal components that make them work.
Starting with an introduction to the transformer architecture, you’ll learn how models process language by dissecting the elements of word embeddings, token encoding, and position encoding. This course covers the complete process of creating token embeddings, understanding and coding attention mechanisms, and building a simple yet effective model based on the principles of GPT.
You’ll gain hands-on experience implementing code to preprocess unlabeled data, developing the model to generate coherent text, and even fine-tuning for specific tasks like classification and instruction-following. By constructing and refining an LLM from scratch, you’ll learn the skills needed to debug, optimize, and innovate in a way that goes far beyond what’s possible by simply calling an API.
This deep dive into LLMs will empower you to:
Truly understand the inner workings of language models, including the transformer architecture and attention mechanisms.
Build and customize models suited to your specific needs, with a hands-on approach to code implementation and optimization.
Fine-tune models for targeted tasks, giving you full control over performance and functionality.
With these skills, you’ll not only become proficient in using LLMs but also develop the expertise needed to be a leader in this cutting-edge field. This course is perfect for developers, data scientists, and AI enthusiasts ready to dive deeper into the transformative world of large language models.