Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

[2026] Machine Learning: Natural Language Processing (V2)

Name: [2026] Machine Learning: Natural Language Processing (V2)
Rating: 4.7 (7244 reviews)

NLP: Use Markov Models, NLTK, Agentic AI, Artificial Intelligence, Machine Learning, and Data Science in Python

Created byLazy Programmer Inc., Lazy Programmer Team

Last updated 3/2026

English

English [Auto],Korean [Auto],

What you'll learn

How to convert text into vectors using CountVectorizer, TF-IDF, word2vec, and GloVe
How to implement a document retrieval system / search engine / similarity search / vector similarity
Probability models, language models and Markov models (prerequisite for Transformers, BERT, and GPT-3)
How to implement a cipher decryption algorithm using genetic algorithms and language modeling
How to implement spam detection
How to implement sentiment analysis
How to implement an article spinner
How to implement text summarization
How to implement latent semantic indexing
How to implement topic modeling with LDA, NMF, and SVD
Machine learning (Naive Bayes, Logistic Regression, PCA, SVD, Latent Dirichlet Allocation)
Deep learning (ANNs, CNNs, RNNs, LSTM, GRU) (more important prerequisites for BERT and GPT-3)
Hugging Face Transformers (VIP only)
How to use Python, Scikit-Learn, Tensorflow, +More for NLP
Text preprocessing, tokenization, stopwords, lemmatization, and stemming
Parts-of-speech (POS) tagging and named entity recognition (NER)
Understand important foundations for OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion

Course content

25 sections • 169 lectures • 23h 31m total length

Introduction and Outline10:40
Explore natural language processing in python, from text processing and vectorization to language models, transformers, and real-world applications like spam detection, sentiment analysis, and translation.
Are You Beginner, Intermediate, or Advanced? All are OK!5:06
Use the beginner, intermediate, and advanced designations in titles to guide your path through this course. Pace yourself, practice, and leverage the materials at your level.

Where To Get the Code4:38
Learn where to access course code for machine learning NLP in Python, including notebooks and plain text Python files, via the resources tab, code link, and GitHub repositories.
How to Succeed in This Course3:04
Ask questions via the Q&A to get answers typically within 24 hours. Meet prerequisites and engage with conceptual and coding lectures by taking handwritten notes and coding.
Temporary 403 Errors2:57
Handle temporary 403 errors when downloading files by noting the host may block CoLab due to public IPs, then download in a browser and upload via the notebook's file explorer.

Vector Models & Text Preprocessing Intro3:40
Convert language into numerical representations for NLP using vector models. Explore token, character, and vocabulary concepts, tokenization, stemming, stop words, and TF-IDF, plus basic word vectors like word2vec and GloVe.
Basic Definitions for NLP5:01
Learn basic natural language processing definitions for English text, including sentences, tokens, and characters, vocabulary, corpus, and engram, with unigrams, bigrams, and trigrams, and a look at Markov models.
What is a Vector?10:41
Understand what a vector is as an array of scalars representing text in high dimensions, enabling spam detection, document clustering, and NLP within machine learning.
Bag of Words2:32
Explore the bag of words representation, which ignores word order and can blur phrases like dog toy versus toy dog, yet it remains widely used in vector models.
Count Vectorizer (Theory)13:45
Explore the counting method, a simple bag-of-words approach to convert text into document vectors by counting word occurrences, with discussion of tokenization, vocabulary mapping, and normalization considerations.
Tokenization14:45
Explore tokenization in NLP using split versus advanced options, including punctuation handling, casing, accents, and word, character, or subword approaches, with practical trade-offs.
Stopwords4:51
Explore stop words in text vectorization, and how removing them reduces dimensionality and improves differentiation; use scikit-learn's stop words features and multilingual lists to tailor for your language.
Stemming and Lemmatization12:03
Learn how stemming and lemmatization reduce vocabulary size by converting related word forms to a root or lemma, improving search relevance and natural language processing efficiency with practical examples.
Stemming and Lemmatization Demo13:26
Observe a Colab demo comparing stemming and lemmatization using the Porter stemmer and WordNet, showing how POS tagging guides accurate lemmas and explains real words versus nonwords.
Count Vectorizer (Code)15:43
Preview of machine learning with the count vectorizer in Python, showing a pipeline of fit, transform, and evaluate on BBC News data with five labels, highlighting tokenization choices and sparsity.
Vector Similarity11:35
TF-IDF (Theory)14:16
Explore tf-idf theory, detailing term frequency and inverse document frequency, and demonstrate practical use with a tf-idf vectorizer and its common variations.
(Interactive) Recommender Exercise Prompt2:36
Build a movie recommendation system by combining keywords, genres, synopsis, tagline, production companies, and production countries into document strings, then apply tf-idf and rank the top five similar movies.
TF-IDF (Code)20:25
Explore how to build a tf-idf based movie recommender in python by extracting genres and keywords, vectorizing text, and ranking results with cosine similarity.
Word-to-Index Mapping10:54
Learn to build a word to index mapping from a text corpus by tokenizing documents, using a Python dictionary, and handling unknown words for effective count vectors and NLP workflows.
How to Build TF-IDF From Scratch15:08
Build a from-scratch tf-idf model on BBC news data, map words to indices, compute tf and idf, and identify top terms for random documents.
Neural Word Embeddings10:15
Explore neural word embeddings that map words to vectors for fine-grained text representations beyond bag-of-words. Learn how word2vec and GloVe enable sequence models and word analogies.
Neural Word Embeddings Demo11:25
Demonstrate neural word embeddings using pre-trained vectors, loading embeddings, implementing analogies and nearest neighbors to compare glove vectors with other embeddings.
Vector Models & Text Preprocessing Summary3:50
Convert text to vectors via tokenization, stop words, stemming, and lemmatization; use bag-of-words, tf-idf, and word-to-index mapping, then explore neural word embeddings like word2vec and glove for word analogies.
Text Summarization Preview1:21
Explore simple text summarization using vector-based methods and term frequencies, and see how these approaches fit into your existing machine learning course path.
How To Do NLP In Other Languages10:41
Apply NLP to any language by tokenizing text, mapping tokens to integers, and creating a matrix of document vectors while choosing or building language-specific tokenizers.
Suggestion Box3:10
Learn how a suggestion box collects learner feedback for the machine learning: natural language processing in Python course, including background, difficulty, missing explanations, and requests for future topics.

Markov Models Section Introduction2:42
Explore how Markov models underpin natural language processing, reinforcement learning, and speech recognition, train them with a state transaction matrix, and generate poetry from an author's existing poems.
The Markov Property7:34
Explore the Markov property as the restrictive assumption in sequence modeling, where each symbol depends only on the preceding one, enabling a chain-rule joint distribution for language and beyond.
The Markov Model12:30
The lecture presents Markov model for sequences of categorical states, detailing the state distribution pi, the state transition matrix A, and the initial distribution pi, and explains training from data.
Probability Smoothing and Log-Probabilities7:50
Explore smoothing techniques like add-one and epsilon smoothing to fix zero probability issues in Markov models, and learn to compute sequence probabilities using log probabilities to avoid underflow.
Building a Text Classifier (Theory)7:29
Apply Markov models to build a text classifier, using Bayes rule to compare authors and select the most probable class via log-likelihood and maximum a posteriori.
Building a Text Classifier (Exercise Prompt)6:33
Build a text classifier to distinguish Edgar Allan Poe from Robert Frost using two separate Markov models, a train-test split, and evaluation metrics like accuracy and F1 score.
Building a Text Classifier (Code pt 1)10:32
Build a text classifier with Markov models to distinguish Edgar Allan Poe and Robert Frost from poem lines, using data, preprocessing, tokenization, and A and Pi matrices with smoothing.
Building a Text Classifier (Code pt 2)12:06
Populate A and pi from training data, implement a compute counts function, and build a log-probability text classifier; evaluate with confusion matrices and F1 scores on train and test sets.
Language Model (Theory)10:15
Explore how Markov models generate text and why they are generative rather than discriminative. Learn sampling, second-order Markov extensions, and the computational trade-offs compared with neural networks.
Language Model (Exercise Prompt)6:52
Explore building a second-order Markov language model from a Robert Frost poetry corpus. Store word probabilities with dictionaries and sample text using cumulative probabilities without smoothing.
Language Model (Code pt 1)10:45
Build a second-order language model to generate Robert Frost poems by constructing initial, first-order, and second-order dictionaries, tokenizing text, removing punctuation, and converting word lists into probabilities.
Language Model (Code pt 2)9:25
Implement a word sampler from a probability dictionary using a uniform draw and cumulative sum, then generate four lines of poems with initial, first-order, and second-order models.
Markov Models Section Summary3:00
Explore the Markov model’s role in NLP, from the Markov property to first- and second-order forms, and relate simple forecasting to neural networks and transformers.

Article Spinning - Problem Description7:55
Explore the article spinning problem, its impact on blog visibility and search rankings, and how Markov models and Transformers relate to automated content generation and ethics.
Article Spinning - N-Gram Approach4:24
Apply n-gram and markov models to article spinning by predicting next words from previous context. Use first- and second-order dependencies and maximum likelihood estimates to replace text while preserving grammar.
Article Spinner Exercise Prompt5:45
Spin the article spinner exercise with a BBC business dataset using a three-dimensional V by V by V matrix, and explore tokenization options that preserve punctuation with a treebank tokenizer.
Article Spinner in Python (pt 1)17:32
Develop a Python-based article spinner by building a tri-gram model from the BBC news dataset, using tokenization, detokenization, and text wrapping to present spun business articles.
Article Spinner in Python (pt 2)10:00
Explore how to build an article spinner in Python, using a random seed, tri-grams, and context window techniques to analyze long-range dependencies and semantic coherence.
Case Study: Article Spinning Gone Wrong5:42
Examine why article spinning fails without context and grammar, compare simple synonym replacement to deeper models, and learn ethical risks like plagiarism.

Section Introduction4:50
Explore a unique cipher project that combines probabilistic language models and genetic algorithms to encode, decode, and evolve solutions, then implement the theory in code.
Ciphers3:59
Explore the substitution cipher in natural language processing, learning how a one-to-one letter mapping encrypts and decrypts messages, with an explicit example and a spy's perspective on cracking the code.
Language Models (Review)16:06
Learn how language models assign high probabilities to real text using engrams and bigrams within a markov framework, apply add-one smoothing, and decode with log-likelihood maximization.
Genetic Algorithms21:23
Explore how genetic algorithms optimize a substitution cipher by evolving dna-like letter mappings through swapping mutations, fitness evaluation via log likelihood, and population-based selection.
Code Preparation4:46
Describe the code needed for a message decryption script, including a random substitution cipher, a character-level language model from Moby Dick, log likelihood evaluation, encoding, decoding, and a genetic algorithm.
Code pt 13:06
Build a substitution cipher from a randomized letter mapping and compare the decrypted results to the true answer using a simulated genetic algorithm.
Code pt 27:20
Build a language model for cipher decryption using a 26x26 bigram matrix and a 26-entry initial distribution, with add-one smoothing, and functions to update transition and compute log probabilities.
Code pt 34:52
Download the Moby Dick text, clean it with a regex to remove non-alphabetic characters, and train the language model by tokenizing, lowercasing, and normalizing bigram and trigram counts.
Code pt 44:03
Encode and decode a message with a substitution cipher by normalizing to lowercase, replacing non letters with spaces, and mapping characters via encode and decode functions.
Code pt 57:12
This lecture implements an evolutionary cipher solver in python by creating a dna pool of 20 random strings, evolving offspring via swaps, and scoring decoded messages with log likelihood.
Code pt 65:25
Review decoded and true messages, compare log likelihoods, examine mapping errors and randomness-driven variability, and plot the objective per iteration to illustrate convergence in cipher decryption.
Cipher Decryption - Additional Discussion2:56
Clarify the misconception that this method can crack modern encryption like RSA, and emphasize its real use for learning NLP, ML, language models, and optimization methods.
Real-World Application: Acoustic Keylogger2:50
Explore a real-world application of cipher decryption through a laser microphone keystroke keylogger, using clustering to map keystroke sounds to encoded characters and decode them.
Section Conclusion6:00
Learn cipher decryption by combining substitution ciphers, probabilistic language models, and genetic algorithms; build a character-level markov model with add-one smoothing and log likelihood to optimize decryption.

Spam Detection - Problem Description6:32
Describe what spam detection is and why we want to filter unwanted messages, such as emails or texts, using a simple function that returns 1 for spam and 0 otherwise.
Naive Bayes Intuition11:37
Explore Bayes rule intuition for classification, defining priors, likelihoods, and posteriors, then explain Naive Bayes' independence assumption and common distributions for NLP, like Gaussian, multinomial, and Bernoulli.
Spam Detection - Exercise Prompt2:07
Develop a text classifier to detect spam in sms messages, split data into train and test, explore count vectorizer and tf-idf with stop words, and evaluate with accuracy.
Aside: Class Imbalance, ROC, AUC, and F1 Score (pt 1)12:25
Explore how class imbalance distorts accuracy and how a confusion matrix reveals true positives, false positives, true negatives, and false negatives, with sensitivity, specificity, precision, recall, and the F1 score.
Aside: Class Imbalance, ROC, AUC, and F1 Score (pt 2)11:02
Explain class imbalance and roc curve, showing the tradeoff between true positive rate and false positives. Note auc and f1 score as alternatives, and explain how predict_proba enables probability-based thresholding.
Spam Detection in Python16:23
This lecture guides building an sms spam classifier in python using count vectorizer features, evaluating with f1, roc auc score, and confusion matrices, plus word clouds and misclassification analysis.

Sentiment Analysis - Problem Description7:27
Explore sentiment analysis as a supervised task that predicts sentiment from text (positive, negative, or neutral) using classification or regression, with applications in reputation monitoring and stock market.
Logistic Regression Intuition (pt 1)17:36
Explore the intuition of logistic regression as a linear classifier using vector features for sentiment analysis, its sigmoid probability, and a comparison of discriminative logistic regression with generative Naive Bayes.
Multiclass Logistic Regression (pt 2)6:52
Explore multiclass logistic regression using softmax to convert class scores into a probability distribution, with outputs as an n by k matrix and predictions by the maximum probability per sample.
Logistic Regression Training and Interpretation (pt 3)8:15
Learn to train logistic regression in Python, fit models with iterative optimization, generate predictions and probabilities, and interpret weights for both binary and multiclass cases.
Sentiment Analysis - Exercise Prompt4:00
Implement sentiment analysis on a tweets dataset labeled positive, negative, or neutral, splitting into train-test sets. Build a logistic regression classifier with count or tf-idf vectorization, and evaluate accuracy, F1.
Sentiment Analysis in Python (pt 1)10:38
Implement sentiment analysis on airline tweets using Python, TF-IDF features, and logistic regression; explore class imbalance, AUC, and multiclass evaluation with one-vs-one and confusion matrices.
Sentiment Analysis in Python (pt 2)8:28
The lecture demonstrates filtering to positive and negative samples, tf-idf vectorization, and logistic regression with accuracy, plus interpreting weights via a word-to-index mapping and identifying top weighted words for sentiment.

Requirements

Install Python, it's free!
Decent Python programming skills
Optional: If you want to understand the math parts, linear algebra and probability are helpful

Description

Ever wondered how AI technologies like OpenAI ChatGPT, GPT-4, DALL-E, Midjourney, and Stable Diffusion really work? In this course, you will learn the foundations of these groundbreaking applications.

Hello friends!

Welcome to Machine Learning: Natural Language Processing in Python (Version 2).

This is a massive 4-in-1 course covering:

1) Vector models and text preprocessing methods

2) Probability models and Markov models

3) Machine learning methods

4) Deep learning and neural network methods

In part 1, which covers vector models and text preprocessing methods, you will learn about why vectors are so essential in data science and artificial intelligence. You will learn about various techniques for converting text into vectors, such as the CountVectorizer and TF-IDF, and you'll learn the basics of neural embedding methods like word2vec, and GloVe.

You'll then apply what you learned for various tasks, such as:

Text classification
Document retrieval / search engine
Text summarization

Along the way, you'll also learn important text preprocessing steps, such as tokenization, stemming, and lemmatization.

You'll be introduced briefly to classic NLP tasks such as parts-of-speech tagging.

In part 2, which covers probability models and Markov models, you'll learn about one of the most important models in all of data science and machine learning in the past 100 years. It has been applied in many areas in addition to NLP, such as finance, bioinformatics, and reinforcement learning.

In this course, you'll see how such probability models can be used in various ways, such as:

Building a text classifier
Article spinning
Text generation (generating poetry)

Importantly, these methods are an essential prerequisite for understanding how the latest Transformer (attention) models such as BERT and GPT-3 work. Specifically, we'll learn about 2 important tasks which correspond with the pre-training objectives for BERT and GPT.

In part 3, which covers machine learning methods, you'll learn about more of the classic NLP tasks, such as:

Spam detection
Sentiment analysis
Latent semantic analysis (also known as latent semantic indexing)
Topic modeling

This section will be application-focused rather than theory-focused, meaning that instead of spending most of our effort learning about the details of various ML algorithms, you'll be focusing on how they can be applied to the above tasks.

Of course, you'll still need to learn something about those algorithms in order to understand what's going on. The following algorithms will be used:

Naive Bayes
Logistic Regression
Principal Components Analysis (PCA) / Singular Value Decomposition (SVD)
Latent Dirichlet Allocation (LDA)

These are not just "any" machine learning / artificial intelligence algorithms but rather, ones that have been staples in NLP and are thus an essential part of any NLP course.

In part 4, which covers deep learning methods, you'll learn about modern neural network architectures that can be applied to solve NLP tasks. Thanks to their great power and flexibility, neural networks can be used to solve any of the aforementioned tasks in the course.

You'll learn about:

Feedforward Artificial Neural Networks (ANNs)
Embeddings
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)

The study of RNNs will involve modern architectures such as the LSTM and GRU which have been widely used by Google, Amazon, Apple, Facebook, etc. for difficult tasks such as language translation, speech recognition, and text-to-speech.

Obviously, as the latest Transformers (such as BERT and GPT-3) are examples of deep neural networks, this part of the course is an essential prerequisite for understanding Transformers.

You will learn how Transformers accelerated progress in AI research, and how scaling laws, techniques like pre-training, supervised fine-tuning, RLHF (reinforcement learning from human feedback), DPO (direct preference optimization), GRPO (group relative policy optimization), alignment, and more, have led to the latest advancements in the field, including vision-language models, multimodal models, and Agentic AI / AI Agents.

UNIQUE FEATURES

Every line of code explained in detail - email me any time if you disagree
No wasted time "typing" on the keyboard like other courses - let's be honest, nobody can really write code worth learning about in just 20 minutes from scratch
Not afraid of university-level math - get important details about algorithms that other courses leave out

Thank you for reading and I hope to see you soon!

Who this course is for:

Anyone who wants to learn natural language processing (NLP)
Anyone interested in artificial intelligence, machine learning, deep learning, or data science
Anyone who wants to go beyond typical beginner-only courses on Udemy

[2026] Machine Learning: Natural Language Processing (V2)

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 16min

Getting Set Up3 lectures • 11min

Vector Models and Text Preprocessing22 lectures • 3hr 32min

Probabilistic Models (Introduction)1 lecture • 5min

Markov Models (Intermediate)13 lectures • 1hr 48min

Article Spinner (Intermediate)6 lectures • 51min

Cipher Decryption (Advanced)14 lectures • 1hr 35min

Machine Learning Models (Introduction)1 lecture • 6min

Spam Detection6 lectures • 1hr

Sentiment Analysis7 lectures • 1hr 3min

Requirements

Description

Who this course is for: