This course is a part of "Deep Learning for NLP" Series. In this course, I will introduce concepts like Encoder-decoder attention models, ELMo, GLUE, Transformers, GPT and BERT. These concepts form the base for good understanding of advanced deep learning models for modern Natural Language Processing.
The course consists of two main sections as follows.
In the first section, I will talk about Encoder-decoder models in the context of machine translation and how beam search decoder works. Next, I will talk about the concept of encoder-decoder attention. Further, I will elaborate on different types of attention like Global attention, local attention, hierarchical attention, and attention for sentence pairs using CNNs as well as LSTMs. We will also talk about attention visualization. Finally, we will discuss ELMo which is a way of using recurrent models to compute context sensitive word embeddings.
In the second section, I will talk about details about the various tasks which are a part of the GLUE benchmark and details about other benchmark NLP datasets across tasks. Then we will start our modern NLP journey with understanding different parts of an encoder-decoder Transformer model. We will delve into details of Transformers in terms of concepts like self attention, multi-head attention, positional embeddings, residual connections, and masked attention. After that I will talk about two most popular Transformer models: GPT and BERT. In the GPT part, we will discuss how is GPT trained and what are differences in variants like GPT2 and GPT3. In the BERT part, we will discuss how BERT is different from GPT, how it is pretrained using the masked language modeling and next sentence prediction tasks. We will also quickly talk about finetuning for BERT and multilingual BERT.