Deep Learning: Build a Text Generator Model from Scratch

Name: Deep Learning: Build a Text Generator Model from Scratch
Rating: 4.0 (7 reviews)

Build the Tech Behind GPT and Google Translate – Step-by-Step Transformer Tutorial From Scratch

Created byRavinthiran Partheepan

Last updated 5/2025

English

What you'll learn

Learn how to build a Deep Learning: Transformer model from scratch
Understand how Transformers work and why they are important in Text Generative AI
Learn to build the attention mechanism, which helps Transformers focus on important information
Know how to create a simple language model from scratch
Learn how Transformers process and understand language

Course content

9 sections • 33 lectures • 8h 29m total length

Codebase Requirements and Artifacts0:36
Getting Started3:28
Introduction to Transformers10:10
Learn how transformers process text in parallel, attending to all parts of the input. Compare with RNNs/LSTMs and see how this enables faster training for translation, summarization, and information extraction.
Transformer Explained - Analogy Point of View - Part 112:02
Explore transformer architecture through a simple analogy of encoder and decoder, self-attention, and tokens; learn how parallel processing and embeddings enable translation and text generation.
Transformer Explained - Analogy Point of View - Part 220:51
Transformer Explained - Core Point of View18:01
Transformer - Repeated Layers Explanation8:06
Self attention Mechanism Explained14:50

Project Environment Setup7:45
Constructor and Hyperparameter Initialization19:28
Feed Forward Pass Layer23:58
Positional Encoding8:05
Attention Layer Visualization12:16
Visualize attention weights for a text generator by plotting self and cross attention maps using matplotlib and seaborn, mapping source tokens to target tokens and showing encoder and decoder dynamics.
Text Generator Function18:26
Softmax Activation3:22
Dropout Layer6:00
Model Training Step11:04

Requirements

Basic knowledge of high school mathematics (linear algebra, probability, and statistics)
Willingness to learn and try new things.

Description

In this course, you will learn how to build Transformers from scratch, the same model that powers ChatGPT, Claude, Google Translate, and more. Transformers are the core of many powerful AI applications, and understanding how they work can help you build your own language models or text-generative AI applications. I will guide you through each step, making it easy to understand how these models function.

You will start with the basics, including the math behind Transformer stacks, and learn how to create the building blocks of a Transformer. I will cover key concepts like attention mechanisms, tokenization, and model training. No prior deep learning experience is needed, as I will explain everything in simple terms, step by step. By the end of the course, you will have the skills to create your own Transformer model from the ground up, without relying on pre-built libraries.

This course is perfect for anyone interested in deep learning and curious about the technology behind tools like GPT and Google Translate. Whether you're a beginner or looking to deepen your understanding, this course will give you a hands-on approach to building one of the most important models in modern deep learning. Let’s get started and learn how to build them from scratch!

Who this course is for:

Wants to learn how to build Text Generative AI models like ChatGPT, Llama, Google Translate and etc...
Beginner with no prior experience in deep learning or machine learning
Interested in deep learning and wants to understand Transformers

Deep Learning: Build a Text Generator Model from Scratch

What you'll learn

Explore related topics

Course content

Introduction8 lectures • 1hr 28min

Math Behind Transformer Stacks and Sub-Layers6 lectures • 2hr 24min

Transformer Stacks and Sub-Layers from Scratch9 lectures • 1hr 50min

Encoder Stack and Sub-Layers from Scratch1 lecture • 18min

Decoder Layer - Forward Pass and Dropout Layer1 lecture • 24min

Multihead Self Attention - Forward Pass and Softmax1 lecture • 38min

Feed Forward Neural Network Layer1 lecture • 13min

Layer Normalization and Residuals1 lecture • 15min

Stacking Transformer Layers and Running the Model5 lectures • 59min

Requirements

Description

Who this course is for: