Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Natural Language Processing with Deep Learning in Python

Name: Natural Language Processing with Deep Learning in Python
Rating: 4.7 (8589 reviews)

Complete guide on deriving and implementing word2vec, GloVe, word embeddings, and sentiment analysis with recursive nets

Created byLazy Programmer Inc., Lazy Programmer Team

Last updated 2/2026

English

German [Auto],English [Auto],

What you'll learn

Understand and implement word2vec
Understand the CBOW method in word2vec
Understand the skip-gram method in word2vec
Understand the negative sampling optimization in word2vec
Understand and implement GloVe using gradient descent and alternating least squares
Use recurrent neural networks for parts-of-speech tagging
Use recurrent neural networks for named entity recognition
Understand and implement recursive neural networks for sentiment analysis
Understand and implement recursive neural tensor networks for sentiment analysis
Use Gensim to obtain pretrained word vectors and compute similarities and analogies
Understand important foundations for OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion

Course content

14 sections • 96 lectures • 12h 2m total length

Introduction, Outline, and Review5:35
Explore how deep learning boosted natural language processing, covering word embeddings like word2vec and glove, and neural architectures from rnn to recursive models for text tasks.
How to Succeed in this Course3:04
Follow these guidelines to excel in course on natural language processing with deep learning in Python: ask questions via q&a, meet prerequisites, and actively engage with notes or coding exercises.
Where to get the code / data for this course9:17
Discover where to get the course code on GitHub and how to clone repositories. Understand data placement in the large files folder beside the class folder and access Wikipedia data.
Preprocessed Wikipedia Data3:03
Access preprocessed Wikipedia data for nlp projects from the course catalog, avoiding download hassles. Learn to use text corpora with Python, Ruby, and other tools.
How to Open Files for Windows Users2:18
Learn to open text files on Windows with UTF-8 encoding using the open function. Avoid common errors like failing to decompress archives and downloading GitHub files without cloning.

What are vectors?7:56
Explore how feature vectors represent data in natural language processing, using word counts and embeddings to separate categories with geometric boundaries, and examine stop words and tf-idf weighting.
What is a word analogy?7:58
Explore how word analogies arise from word embeddings and vectors, and learn to compute analogies using vector arithmetic and cosine distance.
Trying to find and assess word vectors using TF-IDF and t-SNE7:42
Explore how tf-idf derived word vectors are reduced with t-SNE and plotted as a scatterplot to test semantic analogies, highlighting the limitations of tf-idf for word relationships.
Pretrained word vectors from GloVe11:05
Explore loading pre-trained glove word vectors from Stanford, compare Euclidean and cosine distances, and compute analogies and nearest neighbors to validate word embeddings in Python.
Pretrained word vectors from word2vec6:31
Explore using pre-trained word2vec vectors with a 3 million vocabulary, including phrases like new_york, and learn to load them with gensim to compute analogies and nearest neighbors.
Text Classification with word vectors4:24
Apply pre-trained word vectors to text classification by averaging word vectors to create document features in a bag-of-words framework, then train a supervised classifier on the Reuters dataset.
Text Classification in Code6:14
learn to implement text classification using bag-of-words features built from word2vec and glove vectors, with vectorizers, data loading, and a classifier, handling unknown words and empty sentences.
Using pretrained vectors later in the course3:32
Explore how word embeddings initialize language models using pretrained vectors from word2vec and glove. Experiment with freezing embeddings or keeping them trainable to see effects on rnn performance.
Suggestion Box3:10
Explore how a suggestion box gathers learner feedback to improve natural language processing with deep learning in Python. Share background, course, difficulty, and topics missing from the syllabus.

Review Section Intro3:13
Review language modeling with bigrams and Markov models, then compare to neural networks and logistic regression approaches in deep learning, plus practical efficiency tricks for implementation.
Bigrams and Language Models14:47
Learn to build a simple language model with bigrams and trigrams, using maximum likelihood and add-one smoothing on the Brown corpus.
Bigrams in Code14:19
Explore implementing bigrams in Python by mapping words to indices, applying start and end tokens, smoothing, and scoring real versus fake sentences with normalized log probabilities.
Neural Bigram Model7:56
Extend a neural bigram model with logistic regression using one-hot word encoding, train with gradient descent to minimize cross-entropy, and compare weight matrices with the probability matrix.
Neural Bigram Model in Code6:48
Explore how logistic regression estimates background probabilities and compares diagram probabilities with count-based estimates in a neural bigram model using a 2000-word vocabulary.
Neural Network Bigram Model9:13
Explore building a neural network bigram model for natural language processing, using two weight layers to compute hidden representations and a softmax output, with training via gradients and cross-entropy.
Neural Network Bigram Model in Code3:31
Explore implementing a neural network bigram model in code, with two-stage training updating w2 and w1, and observe loss convergence and weight plotting.
Improving Efficiency14:35
Boost efficiency in NLP models by using indexing tricks instead of one-hot encoding. Learn how to apply double indexing to compute costs and update weights, enabling faster, memory-efficient training.
Improving Efficiency in Code4:52
Unlock faster natural language processing neural network training by applying an indexing trick to compute hidden values and loss without one-hot inputs, and by reordering training for efficient gradient calculations.
Review Section Summary3:26
Review language modeling techniques from counting word probabilities to neural networks, link logistic regression to probabilities, and highlight indexing tricks for faster training.

Return of the Bigram3:07
Examine how a bigram neural network models the next word as a language model and builds word embeddings with two V-by-V matrices.
CBOW7:39
Explore the continuous bag of words cbow model for word prediction, using context word embeddings, mean pooling, and a softmax classifier to predict the middle word.
Skip-Gram4:00
The skip-gram model predicts context words from an input word, unlike cbow, using either one input with four targets or four samples with one target each, in language modeling.
Hierarchical Softmax8:22
Explore hierarchical softmax as a scalable alternative to full softmax for large vocabularies, using a binary word tree, sigmoid decisions, and path probabilities, with Huffman coding for frequent words.
Negative Sampling14:11
Explore negative sampling as a scalable alternative to softmax in natural language processing, using binary cross entropy, sigmoid outputs, and context word sampling within skip gram and cbow frameworks.
Negative Sampling - Important Details5:09
Understand how to tune negative sampling as a hyperparameter, typically five to twenty-five, using a smoothed modified unigram distribution, and apply a single negative sample per middle word.
Why do I have 2 word embedding matrices and what do I do with them?2:16
Explore why two word embedding matrices appear in the network. Compare three options: use only the first, concatenate, or average, and note normalization and cosine distance while emphasizing testing.
Word2Vec implementation tricks4:49
Explore practical Word2Vec implementation tricks, including selective word dropping, context window effects, learning rate scheduling, and performance tips using parallel processing and native code.
Word2Vec implementation outline4:09
Explore a Word2Vec implementation outline in python, detailing data loading, building and training the model, converting sentences to word indices, applying negative sampling, context windows, and stochastic gradient updates.
Word2Vec in Code with Numpy10:47
Learn to implement word2vec in Python with NumPy, including tokenization, vocabulary building up to 20,000 words with unknown tokens, context windows, negative sampling, and training the model with gradient updates.
Tensorflow or Theano - Your Choice!4:09
Explore why every example in this course uses both Theano and TensorFlow, highlighting first-principles coding in Theano and a production-focused TensorFlow workflow.
Word2Vec Tensorflow Implementation Details3:58
Adapt a Word2Vec implementation to TensorFlow using embedding lookup, dot products, and binary cross-entropy with positive and negative samples; compare sampled softmax and hierarchical softmax approaches for efficiency.
Word2Vec Tensorflow in Code4:06
Explore the TensorFlow implementation of word2vec in Python, detailing input and output embeddings, negative sampling, and a training loop that collects samples and trains when reaching 128 samples.
Alternative to Wikipedia Data: Brown Corpus6:03
Learn how to replace Wikipedia data with the Brown corpus in natural language processing using deep learning in Python, leveraging NLTK utilities, tokenized sentences, and vocabulary limiting.

GloVe Section Introduction2:19
Explore GloVe, global vectors for word representation, and learn how matrix factorization from recommender systems underpins word vectors for NLP, offering simpler training with fewer hyperparameters.
Matrix Factorization for Recommender Systems - Basic Concepts21:08
Explore matrix factorization for recommender systems, including sparse rating matrices, collaborative filtering, and latent features to predict user ratings and drive recommendations.
Matrix Factorization Training8:11
Train a matrix factorization model by learning two matrices, w and u, whose product approximates the ratings matrix with a least-squares loss via alternating least squares.
Expanding the Matrix Factorization Model9:23
Expand matrix factorization with global average, user bias, and movie bias to better predict ratings, and derive update equations for biases and latent features during training.
Regularization for Matrix Factorization6:18
Apply regularization to a matrix factorization model using the Frobenius norm, deriving update rules for W, U, and bias terms to prevent overfitting.
GloVe - Global Vectors for Word Representation4:12
Learn how GloVe builds word embeddings by constructing a term-term co-occurrence matrix with context distance, applying log scaling and X max weighting, and solving via alternating least squares with biases.
Recap of ways to train GloVe2:31
Explore multiple approaches to training GloVe, from alternating least squares and gradient descent to pure Python implementations and deep learning frameworks like PyTorch and TensorFlow, plus GPU acceleration.
GloVe in Code - Numpy Gradient Descent16:48
Code the GloVe model in Python, build the embedding matrix with a given vocabulary and context size, optimize with gradient descent, and test word analogies.
GloVe in Code - Alternating Least Squares4:42
Continue the glove coding example by implementing alternating least squares, updating w, B, u, and c with regularisation, and comparing vectorized versus non-vectorized solutions for performance.
GloVe in Tensorflow with Gradient Descent7:03
Implement glove in TensorFlow with gradient descent, leveraging automatic differentiation and built-in optimizers to train a word embedding model from a cooccurrence matrix.
Visualizing country analogies with t-SNE4:24
Visualize glove word embeddings with t-SNE to reveal how countries and ethnicities cluster on a scatterplot, highlighting analogies and word proximity in natural language processing with deep learning in Python.
Hyperparameter Challenge2:19
Try new word analogies beyond countries and cities to test your natural language processing model, and tune hyperparameters like learning rate, momentum, embedding dimensionality, and vocabulary size.
Training GloVe with SVD (Singular Value Decomposition)10:38

Pointwise Mutual Information - Word2Vec as Matrix Factorization12:06
Explore unifying word embedding methods by framing Word2Vec and glove as matrix factorization, using pointwise mutual information and negative sampling to derive a PMI-based objective.
PMI in Code7:21
Implement the PMI-based word embeddings in Python by building a sparse counts matrix, computing PMI with smoothing, and training embeddings via alternating least squares.

Parts-of-Speech (POS) Tagging5:00
Learn parts of speech tagging with a logistic regression baseline, then improve with recurrent neural networks and hidden Markov models, using sequence data and maximum likelihood.
How can neural networks be used to solve POS tagging?4:08
Explore how logistic regression, hidden Markov models, and recurrent neural networks tackle parts of speech tagging, highlighting context, long term dependencies, and the role of LSTM/GRU units.
Parts-of-Speech Tagging Baseline15:18
Train a baseline parts-of-speech tagger with logistic regression and stochastic gradient descent, compare to a decision tree, and evaluate with accuracy and f1 on Chungking data.
Parts-of-Speech Tagging Recurrent Neural Network in Theano13:05
Explore how a recurrent neural network in Theano solves the parts-of-speech tagging problem, using word embeddings, softmax outputs, and F1 score evaluation.
Parts-of-Speech Tagging Recurrent Neural Network in Tensorflow12:17
Perform parts of speech tagging using a recurrent neural network in TensorFlow, handling fixed-length sequences with zero padding, leveraging word embeddings and a final classifier.
How does an HMM solve POS tagging?7:57
Explore hidden Markov models for parts of speech tagging, estimating transition and emission probabilities from data and using the Viterbi algorithm to map words to tags.
Parts-of-Speech Tagging Hidden Markov Model (HMM)5:58
Learn to implement parts-of-speech tagging with hidden Markov models in Python, smoothing state transition and observation matrices, and evaluate accuracy and F1 on train and test data.
Named Entity Recognition (NER)3:01
Explore named entity recognition in natural language processing with deep learning in Python, identifying person, company, and location tokens via logistic regression and recurrent networks on tweets.
Comparing NER and POS tagging2:01
Frame named entity recognition as the same data format as parts of speech tagging, enabling code reuse with only data loading changes and tagging each word by entity type.
Named Entity Recognition Baseline5:54
Create a named entity recognition baseline with logistic regression by preparing word-tag data, lowering words, building X and y, and evaluating with a 30 percent test split and F1 scores.
Named Entity Recognition RNN in Theano2:19
Develop a named entity recognition model with an RNN in Theano for natural language processing using Python and deep learning, and evaluate performance with train and test data.
Named Entity Recognition RNN in Tensorflow2:13
learn how to perform named entity recognition with an rnn in tensorflow, using the same code as the pos example and reviewing data prep, padding, placeholders, and training loop.
Hyperparameter Challenge II2:13
Explore hyperparameter tuning to balance model complexity and accuracy in NLP tasks, emphasizing overfitting risk, the limits of complexity, and tradeoffs between logistic regression baselines and Arnon models.

Recursive Neural Networks Section Introduction7:14
Explore how bag-of-words falls short and how trees capture sentence structure in language processing. Learn plain recursive neural networks and recursive neural tensor networks, plus memory-efficient tree-to-sequence implementations.
Sentences as Trees5:29
Represent sentences as trees using parts of speech to form noun phrases and verb phrases, then apply sentiment analysis and negation handling, paving the way toward recursive neural networks.
Data Description for Recursive Neural Networks6:52
Explore data representations for recursive neural networks by modeling sentences as binary parse trees with words at leaves and phrases at inner nodes, enabling robust sentiment analysis.
What are Recursive Neural Networks / Tree Neural Networks (TNNs)?5:41
Explore the architecture of recursive neural networks for binary and multi-child trees, detailing how shared weights connect to parse tree nodes, compute hidden states, and produce outputs with softmax.
Building a TNN with Recursion4:47
Build a recursive neural network for sentiment analysis using tree structures and word embeddings; implement a forward hidden function and train with stochastic gradient descent, noting challenges from per-tree graphs.
Trees to Sequences6:38
Transform parse trees into sequences to convert recursive neural networks into recurrent neural networks, using three arrays (parents, relations, and words) and post-order traversal to build a scalable tree-to-sequence pipeline.
Recursive Neural Tensor Networks6:22
The lecture extends recursive neural networks to a recursive neural tensor network by introducing quadratic interaction terms between left and right child representations for binary-tree processing.
RNTN in Tensorflow (Tips)12:19
Learn practical tips for implementing an r n t n in TensorFlow, including using a while loop for symbolic inputs and managing a tensor array of hidden states.
RNTN in Tensorflow (Code)11:19
Implement a recursive neural tensor network (RNTN) in TensorFlow, using embeddings, quadratic and linear weights, post-order traversal, and SGD training with regularization for root-node predictions.
Recursive Neural Network in TensorFlow with Recursion4:12
Explore a practical recursive neural network implemented in TensorFlow, using post-order traversal, tree-based graphs, and a custom training loop with Savir to save and load weights.

(Review) Theano Basics7:47
Discover Theano basics, including symbolic variables and tensors of various shapes, and build functions. Learn automatic gradients with shared variables and a simple training loop to minimize a cost.
(Review) Theano Neural Network in Code9:17
Implement a Theano neural network in code using softmax, define a cost function with regularisation, and train with placeholders and shared variables, then evaluate predictions on the test set.
(Review) Tensorflow Basics7:27
Explore TensorFlow basics, including placeholders, variables, sessions, and simple matrix multiplication, then minimize a cost function with gradient descent using a learning rate of 0.3.
(Review) Tensorflow Neural Network in Code9:43
Build a TensorFlow neural network with a second hidden layer of 100 units, configure placeholders and variables, and train using momentum-based optimizers while monitoring cost and potential overfitting.

Requirements

Install Numpy, Matplotlib, Sci-Kit Learn, and Theano or TensorFlow (should be extremely easy by now)
Understand backpropagation and gradient descent, be able to derive and code the equations on your own
Code a recurrent neural network from basic primitives in Theano (or Tensorflow), especially the scan function
Code a feedforward neural network in Theano (or Tensorflow)
Helpful to have experience with tree algorithms

Description

Ever wondered how AI technologies like OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion really work? In this course, you will learn the foundations of these groundbreaking applications.

In this course we are going to look at NLP (natural language processing) with deep learning.

Previously, you learned about some of the basics, like how many NLP problems are just regular machine learning and data science problems in disguise, and simple, practical methods like bag-of-words and term-document matrices.

These allowed us to do some pretty cool things, like detect spam emails, write poetry, spin articles, and group together similar words.

In this course I’m going to show you how to do even more awesome things. We’ll learn not just 1, but 4 new architectures in this course.

First up is word2vec.

In this course, I’m going to show you exactly how word2vec works, from theory to implementation, and you’ll see that it’s merely the application of skills you already know.

Word2vec is interesting because it magically maps words to a vector space where you can find analogies, like:

king - man = queen - woman
France - Paris = England - London
December - Novemeber = July - June

For those beginners who find algorithms tough and just want to use a library, we will demonstrate the use of the Gensim library to obtain pre-trained word vectors, compute similarities and analogies, and apply those word vectors to build text classifiers.

We are also going to look at the GloVe method, which also finds word vectors, but uses a technique called matrix factorization, which is a popular algorithm for recommender systems.

Amazingly, the word vectors produced by GLoVe are just as good as the ones produced by word2vec, and it’s way easier to train.

We will also look at some classical NLP problems, like parts-of-speech tagging and named entity recognition, and use recurrent neural networks to solve them. You’ll see that just about any problem can be solved using neural networks, but you’ll also learn the dangers of having too much complexity.

Lastly, you’ll learn about recursive neural networks, which finally help us solve the problem of negation in sentiment analysis. Recursive neural networks exploit the fact that sentences have a tree structure, and we can finally get away from naively using bag-of-words.

All of the materials required for this course can be downloaded and installed for FREE. We will do most of our work in Numpy, Matplotlib, and Theano. I am always available to answer your questions and help you along your data science journey.

This course focuses on "how to build and understand", not just "how to use". Anyone can learn to use an API in 15 minutes after reading some documentation. It's not about "remembering facts", it's about "seeing for yourself" via experimentation. It will teach you how to visualize what's happening in the model internally. If you want more than just a superficial look at machine learning models, this course is for you.

See you in class!

"If you can't implement it, you don't understand it"

Or as the great physicist Richard Feynman said: "What I cannot create, I do not understand".
My courses are the ONLY courses where you will learn how to implement machine learning algorithms from scratch
Other courses will teach you how to plug in your data into a library, but do you really need help with 3 lines of code?
After doing the same thing with 10 datasets, you realize you didn't learn 10 things. You learned 1 thing, and just repeated the same 3 lines of code 10 times...

Suggested Prerequisites:

calculus (taking derivatives)
matrix addition, multiplication
probability (conditional and joint distributions)
Python coding: if/else, loops, lists, dicts, sets
Numpy coding: matrix and vector operations, loading a CSV file
neural networks and backpropagation, be able to derive and code gradient descent algorithms on your own
Can write a feedforward neural network in Theano or TensorFlow
Can write a recurrent neural network / LSTM / GRU in Theano or TensorFlow from basic primitives, especially the scan function
Helpful to have experience with tree algorithms

WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:

Check out the lecture "Machine Learning and AI Prerequisite Roadmap" (available in the FAQ of any of my courses, including the free Numpy course)

UNIQUE FEATURES

Every line of code explained in detail - email me any time if you disagree
No wasted time "typing" on the keyboard like other courses - let's be honest, nobody can really write code worth learning about in just 20 minutes from scratch
Not afraid of university-level math - get important details about algorithms that other courses leave out

Who this course is for:

Students and professionals who want to create word vector representations for various NLP tasks
Students and professionals who are interested in state-of-the-art neural network architectures like recursive neural networks
SHOULD NOT: Anyone who is not comfortable with the prerequisites.

Natural Language Processing with Deep Learning in Python

What you'll learn

Explore related topics

Course content

Outline, Review, and Logistical Things5 lectures • 23min

Beginner's Corner: Working with Word Vectors9 lectures • 59min

Review of Language Modeling and Neural Networks10 lectures • 1hr 23min

Word Embeddings and Word2Vec14 lectures • 1hr 23min

Word Embeddings using GloVe13 lectures • 1hr 40min

Unifying Word2Vec and GloVe2 lectures • 19min

Using Neural Networks to Solve NLP Problems13 lectures • 1hr 21min

Recursive Neural Networks (Tree Neural Networks)10 lectures • 1hr 11min

Theano and Tensorflow Basics Review4 lectures • 34min

Appendix / FAQ Finale1 lecture • 4min

Requirements

Description

Who this course is for: