Deep Learning for NLP - Part 2

Name: Deep Learning for NLP - Part 2
Rating: 5.0 (7 reviews)

Part 2: Encoder-decoder models, attention and Transformers

Created byManish Gupta

Last updated 7/2021

English

What you'll learn

Deep Learning for Natural Language Processing
Encoder-decoder models, Attention models, ELMo
GLUE, Transformers, GPT, BERT
DL for NLP

Course content

2 sections • 13 lectures • 2h 52m total length

Introduction1:51
Encoder-decoder models4:52
Global, local, hierarchical attention; attention for sentence pairs16:23
Attention based models30:32
ELMo14:02
Elmo trains forward and backward language models to produce context-sensitive word embeddings, giving different vectors for bank in different contexts, combined into a task-specific representation for downstream models.
Summary2:01

Introduction2:38
GLUE benchmark10:58
Transformers-120:04
Transformers-211:48
Explore transformer architecture details, including position encodings, self-attention, and residual connections; compare encoder and decoder flows and the use of masked self-attention in the decoder.
GPT37:12
BERT18:19
Summary2:16
The lecture surveys transformer-based models, detailing decoder and encoder-decoder attention, multiheaded self-attention, and positional encoding. It covers BERT variants, training regimes, fine-tuning, in-context learning, and model compression.

Requirements

Basics of machine learning
Recurrent Models: RNNs, LSTMs, GRUs and variants
Multi-Layered Perceptrons (MLPs)

Description

This course is a part of "Deep Learning for NLP" Series. In this course, I will introduce concepts like Encoder-decoder attention models, ELMo, GLUE, Transformers, GPT and BERT. These concepts form the base for good understanding of advanced deep learning models for modern Natural Language Processing.

The course consists of two main sections as follows.

In the first section, I will talk about Encoder-decoder models in the context of machine translation and how beam search decoder works. Next, I will talk about the concept of encoder-decoder attention. Further, I will elaborate on different types of attention like Global attention, local attention, hierarchical attention, and attention for sentence pairs using CNNs as well as LSTMs. We will also talk about attention visualization. Finally, we will discuss ELMo which is a way of using recurrent models to compute context sensitive word embeddings.

In the second section, I will talk about details about the various tasks which are a part of the GLUE benchmark and details about other benchmark NLP datasets across tasks. Then we will start our modern NLP journey with understanding different parts of an encoder-decoder Transformer model. We will delve into details of Transformers in terms of concepts like self attention, multi-head attention, positional embeddings, residual connections, and masked attention. After that I will talk about two most popular Transformer models: GPT and BERT. In the GPT part, we will discuss how is GPT trained and what are differences in variants like GPT2 and GPT3. In the BERT part, we will discuss how BERT is different from GPT, how it is pretrained using the masked language modeling and next sentence prediction tasks. We will also quickly talk about finetuning for BERT and multilingual BERT.

Who this course is for:

Beginners in deep learning
Python developers interested in data science concepts

Deep Learning for NLP - Part 2

What you'll learn

Explore related topics

Course content

Encoder-decoder attention models, ELMo6 lectures • 1hr 10min

GLUE, Transformers, GPT, BERT7 lectures • 1hr 43min

Requirements

Description

Who this course is for: