All-in-One: Machine Learning, DL, NLP [Hindi][Python]
4.2 (231 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
20,505 students enrolled

All-in-One: Machine Learning, DL, NLP [Hindi][Python]

Complete hands-on Machine Learning Course with Data Science, NLP, Deep Learning and Artificial Intelligence
4.2 (231 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
20,505 students enrolled
Created by Rishi Bansal
Last updated 8/2020
Current price: $41.99 Original price: $59.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 16.5 hours on-demand video
  • 1 article
  • 58 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Assignments
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Master in creating Machine Learning Models on Python
  • Visualizing various ML Models wherever possible to develop a better understanding about it.
  • How to Analyse the Data, Clean it and Prepare (Data Preprocessing Techniques) it to feed into Machine Learning Models.
  • Learn the most Basic Mathematics behind Simple Linear Regression and its Best fit line.
  • What is Gradient Descent, how it works Internally with full Mathematical explanation.
  • Make predictions using Simple Linear Regression, Multiple Linear Regression.
  • Make predictions using Logistic Regression, K-Nearest Neighbours and Naive Bayes.
  • Fundamental Concept of Deep Learning and Natural Language Processing. Python Code is include at some place for explanation.
  • For Machine Learning Concept no prerequisite. Anyone can do this course.
  • Prior Understanding of Python is requiried with know how to operate Spyder/Jupiter Notebook for Coding.

This course is designed to cover maximum Concept of Machine Learning.  Anyone can opt for this course. No prior understanding of Machine Learning is required. 

NOTE: Course is still under Development. You will see new topics will get added regularly.

Now question is why this course?

This Course will not only teach you the basics of Machine learning and Simple Linear Regression. It will also cover in depth mathematical explanation of Cost function and use of Gradient Descent for Simple Linear Regression. Understanding these is must for a solid foundation before entering into Machine Learning World. This foundation will help you to understand all other algorithms and mathematics behind it.

As a Bonus Introduction Natural Language Processing is included.

Below Topics are covered till now.

Chapter - Introduction to Machine Learning

- Machine Learning?

- Types of Machine Learning

Chapter - Data Preprocessing

- Null Values

- Correlated Feature check

- Data Molding

- Imputing

- Scaling

- Label Encoder

- On-Hot Encoder

Chapter - Supervised Learning: Regression

- Simple Linear Regression

- Minimizing Cost Function - Ordinary Least Square(OLS), Gradient Descent

- Assumptions of Linear Regression, Dummy Variable

- Multiple Linear Regression

- Regression Model Performance - R-Square

- Polynomial Linear Regression

Chapter - Supervised Learning: Classification

- Logistic Regression

- K-Nearest Neighbours

- Naive Bayes

- Saving and Loading ML Models

- Classification Model Performance - Confusion Matrix

Chapter: UnSupervised Learning: Clustering

- Partitionaing Algorithm: K-Means Algorithm, Random Initialization Trap, Elbow Method

- Hierarchical Clustering: Agglomerative, Dendogram

- Density Based Clustering: DBSCAN

- Measuring UnSupervised Clusters Performace - Silhouette Index

Chapter: UnSupervised Learning: Association Rule

- Apriori Algorthm

- Association Rule Mining

Chapter: Non-Linear Supervised Algorithm: Decision Tree and Support Vector Machines

- Decision Tree Regression

- Decision Tree Classification

- Support Vector Machines(SVM) - Classification

- Kernel SVM, Soft Margin, Kernel Trick

Chapter - Natural Language Processing

Below Text Preprocessing Techniques with python Code

- Tokenization, Stop Words Removal, N-Grams, Stemming, Word Sense Disambiguation

- Count Vectorizer, Tfidf Vectorizer. Hashing Vector

- Case Study - Spam Filter

Chapter - Deep Learning

- Artificial Neural Networks, Hidden Layer, Activation function

- Forward and Backward Propagation

- Implementing Gate in python using perceptron

Chapter: Regularization, Lasso Regression, Ridge Regression

- Overfitting, Underfitting

- Bias, Variance

- Regularization

- L1 & L2 Loss Function

- Lasso and Ridge Regression

Chapter: Dimensionality Reduction

- Feature Selection - Forward and Backward

- Feature Extraction - PCA, LDA

Chapter: Ensemble Methods: Bagging and Boosting

- Bagging - Random Forest (Regression and Classification)

- Boosting - Gradient Boosting (Regression and Classification)

Who this course is for:
  • Anyone who is looking or dont know from where to start Machine Learning, Deep Learning and Natural Language Processing can opt for this course.
  • This will provide a good foundation in understanding concept of Machine Learning.
Course content
Expand all 162 lectures 16:17:52
+ Introduction to Machine Learning
3 lectures 21:55

Full Course Material can be download from github:

Preview 09:38

Supervised - labeled data is used to help machines recognize characteristics and use them for future data. E.g: classify pictures of cats and dogs.

Unsupervised - we simply put unlabeled data and let machine understand the characteristics and classify it. E.g: Clustering (News Article)

Reinforcement Learning: RML interact with the environment by producing actions and then analyze errors or rewards. E.g: Chess game

Preview 06:46

Regression: This is a type of problem where we need to predict the continuous-response value (ex : above we predict number which can vary from -infinity to +infinity)

E.g: House Price, Value of stock

Classification: This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” (ex: we predict one of the values in a set of values).

E.g: Mail spam or not, Diabetes or not, etc

Supervised Learning

Test your understanding about Regression and Classification Problems

Quiz 1
2 questions
+ Optional: Setup Environment
4 lectures 21:57

Anaconda is a distribution of Python, including a selection of libraries and other useful tools. It is not an IDE but does include the Jupiter and Spyder IDE

Installing Anaconda
How to Use Spyder Notebook
How to use Jupiter Notebook
Installing Library
+ Data Preprocessing
9 lectures 55:07

•Preprocessing refers to transformation before feeding to machine learning

•Quality of data is important to train the model

•Source – Government databases, professional or company data sources(twitter), your company, etc

•Data will never be in the format you need – Pandas Dataframe for reformatting

•Columns to remove – No values, duplicate(correlated column, e.g: house size in ft and metres)

•Learning algorithms understands only number, converting text image to number is required

•Unscaled or unstandardized data have might have unacceptable prediction

What is Data Preprocessing?

•Check for Null values

•Remove or Impute


•df = df.dropna(how='any',axis=0)

Checking for Null Values: Concept + Python Code

•Sometimes two features that are meant to measure different characteristics of a model are influenced by common mechanism and they move together.

How to Handle Correlation:

•Remove one of the feature

•Apply Principal Component Analysis(PLA)

Correlated Feature Check: Concept + Python Code

•Adjusting Data Types - Inspect data types to see if there are any issues. Data should be numeric.

•If required create new columns

Data Molding(Encoding): Concept + Python Code

Missing Data - Ways to Handle

•Drop rows

•Replace values (Impute)

Impute Missing Values: Concept + Python Code

•Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

•It is performed during the data pre-processing to handle highly varying magnitudes or values or units.


• Without Feature Scaling a machine learning algorithm tends to weigh greater values -> higher and consider smaller values as the lower values, regardless of the unit of the values.

Scaling: Python Code

Convert text values to numbers. These can be used in the following situations:

•There are only two values for a column in your data. The values will then become 0/1 - effectively a binary representation

•The values have relationship with each other where comparisons are meaningful (e.g. low<medium<high)

Label Encoder: Concept + Code

•Use when there is no meaningful comparison between values in the column

•Creates a new column for each unique value for the specified feature in the data set

One-Hot Encoder: Concept + Python Code
Data Preprocessing
3 questions
+ Supervised Learning: Regression
20 lectures 02:35:52

Full Course content (Code) can be downloaded from Github:

Simple Linear Regression: Concept

•Error = (y_pred – y_act)^2

•Two Methods:

1.Least Square Criterian (OLS)

2.Gradient Descent

Minimizing Cost Function

•non-iterative method that fits a model such that the sum-of-squares of differences of observed and predicted values is minimized

•Error = (y_pred – y_act)^2

•Line => y = bo + b1x

Ordinary Least Square(OLS)

•Cost Function, J(m,c) = (y_pred – y_act)^2 / No. of data point

•Hypothesis: y_pred = c + mx

Gradient Descent

It tells how well regression equation explains the data.

•A value of R^2 = 1 means regression predictions perfectly fit/explains the data.

Question: Can ?2 be negative?

•Ans: When: ( Sum of Square Errors(SSE) > {Total Sum of Squares(SST)} )

•This means when our predicted model performs worst than average line which is a very rare case.

Measuring Regression Model Performance: R^2 (R - Square)

Code file and datasets can be found in the

Simple Linear Regression: Python Code -1
Simple Linear Regression: Python Code -2


•linear regression is sensitive to outlier effects

•needs the relationship between the independent and dependent variables to be linear

•linearity assumption can best be tested with scatter plots

2. Homoscedasticity

•meaning the residuals are equal across the regression line

•Heteroscedasticity Test to check - The Goldfeld-Quandt Test

3. Multivariate Normality

•This assumption can best be checked with a histogram or a Q-Q-Plot

•Normality can be checked with a goodness of fit test(Kolmogorov-Smirnov)

4. No Autocorrelation in the Data

•when the residuals are not independent from each other.

•in simple terms when the value of y(x+1) is not independent from the value of y(x)

•Durbin-Watson test

5. Lack of Multicollinearity

•Multicollinearity: Model cannot differentiate between the effect of D1 and D2 as these are totally related.

•fixed using correlation in data pre processing

Assumptions of Linear Regression

There is a linear relationship between both the dependent and independent variables.
It also assumes no major correlation between the independent variables.

•Multiple regressions can be linear and nonlinear.

Multiple Linear Regression: Concept

y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5

Here x4,x5 are dummy variable

x5 = 1 – x4

Multicollinearity -> that’s why its called as dummy variable

for 2 -> 1 & 0

For: > 2 -> column

Dummy Variable
Multiple Linear Regression: Python - 1
Multiple Linear Regression: Python - 2
Multiple Linear Regression: Python - 3

If data is not linear, we need polynomial terms to fit it better

Polynomial Linear Regression: Concept
Polynomial Linear Regression: Python - 1
Polynomial Linear Regression: Python - 2
Polynomial Linear Regression: Python - 3
Polynomial Linear Regression: Python - 4
Linear Regressions Comparisons
Simple Linear Regression: Quiz
4 questions
Question in this section is related to Supervised Learning: Regression
Boston Housing Price Prediction
1 question
Assignment: Predicting Housing Prices (Boston Data Solution): Optional
+ Supervised Learning: Classification
14 lectures 01:31:30

Issue with Linear Regression

•But if we have an outlier, it will go horribly wrong

•Because of one outlier, whole linear regression prediction is going wrong

Logistic Regression

  • Logistic regression can be understood by standard logistic function. Logistic function is a Sigmoid function, which takes real value between zero and one.

  • If we plot sigmoid function, the graph will be S curve. When there is an outlier, sigmoid function takes care of it.

  • Linear regression assumes that the data follows a linear function.

  • Logistic regression models the data using the sigmoid function

Logistic Regression

Describe the performance of a classification model

•Accuracy: Is fraction of correct predictions in all prediction made by model

•Precision: Is fraction of correct positive predictions in all positive predictions made by the model

•Recall: Is fraction of correct positive predictions made in actual positive data

Confusion Matrix: Measuring Performance of Classification Model

Spam Filter (positive class: spam): Optimize for precision or specificity because false negatives (spam goes to inbox) are more acceptable than false positives (non-spam caught by spam filter).

Fraudulent transaction detector ( positive class: fraud): Optimize for sensitivity because false positives (normal transactions that are flagged as possible fraud) are more acceptable than false negatives (fraudulent transactions that are not detected)

Confusion Matrix: Case Study
Logistic Regression: Python 1
Logistic Regression: Python 2
Logistic Regression: Python 3
Logistic Regression: Python 4

It assumes that similar things exist in close proximity.

* Step 1: Choose the no. K of neighbours
* Step 2: Take the K nearest neighbours of the new data points by Euclidean distance
* Step 3: Among K Neighbours, count the no. of data points in each category
* Step 4: Assign new data point to the category where you counted most neighbour

K - Nearest Neighbours Algorithm
K - Nearest Neighbours: Python 1
K - Nearest Neighbours: Python 2

Its Naive(innocent) because it assumes that all the features are independent of each other. Which is almost never possible.

•Easy to understand.

•All features are independent.

•All impact results equally.

•Need small amount of data to train the model.

•Fast – up to 100X faster.

•It is highly scalable.

•It can make probabilistic predictions.

•It's simple & out-performs many sophisticated methods.

•Stable to data changes.

Naive Bayes
Naive Bayes: Python Code
Pickle File: Saving and Loading ML Models: Python
Question in this section is related to Supervised Learning: Classification
Wine Quality Prediction
1 question
Assignment: Predicting Wine Quality: Optional
+ UnSupervised Learning: Clustering
15 lectures 01:26:19


1.Initialize k centroids.

2.Select at random K points, the centroids(not necessary from the dataset)

3.Assign each data to the nearest centroid, this step will create clusters.

4.Compute and place the new centroid of each cluster.

5.Reassign each data point to the new closest centroid. If any new reassignment, Repeat steps 4 otherwise go to Finish

K-Means Algorithm

•Solution(Fix) -> K-Means++

•K-Means++ -> smarter initialization of centroids, rest is same

Random Initialization Trap


•Euclidean distance between a given point and centroid to which it is assigned.

•Iterate this process for all the points in the cluster

•Sum all the values and divide by no. of points

Total WCSS decreases as no. of clusters increases

Total WCSS is minimum when No. of clusters is equal to no. of data points

Elbow Method to find the optimal number of clusters

Elbow Method: Choosing optimum no of clusters
K-Means++ : Python 1
K-Means++ : Python 2
K-Means++ : Python 3

•These methods does hierarchical decomposition of datasets.

Agglomerative method (Bottom-Up): assume each data as cluster & merge to create a bigger cluster

Divisive method (Top-Down): start with one cluster & continue splitting


•Start with assigning one cluster to each data - N Cluster

•Combine two closest point in one cluster - (N - 1) Cluster

•Combine two closest cluster into one cluster - (N - 2) Cluster

•Repeat Step 3 until there is only one cluster left

Hierarchical - Agglomerative Algorithm
Agglomerative - Dendrogram
Agglomerative - Python 1
Agglomerative - Python 2

All above techniques are distance based & such methods can find only spherical clusters and not suited for clusters of other shapes. All they are severely impacted by noise or outliers in the data.


•If data is of arbitrary shape

•Data contain noise

Algorithm has two parameters:
eps: The radius of our neighborhoods around a data point p. If distance between two points is lower or equal to eps then they are neighbours. Small value will lead to large data points as outlier and large value will lead to majority of data points to same cluster.

minPts: The minimum number of data points we want in a neighborhood to define a cluster. minPts >= D +1 and should be at least 3.

Density Based Clustering - DBSCAN
DBSCAN - Python 1
DBSCAN - Python 2

•Not as straight forward as Supervised Algorithm

•Question of Good clustering is relative

Some Popular Index:


•Evaluates intra-cluster similarity and inter-cluster differences

•Not Normalized, so difficult to compare between two different datasets

Silhouette Index

•calculates using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample

•The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

•Normalized, a value close to 1 is always good

•good for spherical data structures

Measuring UnSupervised Clusters Performance
Silhouette Index - Python 1
+ UnSupervised Learning: Association Rule
5 lectures 42:16

Apriori Algorithm:

•Used to identify frequent item sets.

•Uses bottom-up approach, identify individual items first that fulfill a min occurrence threshold. After this, it add one item at a time and check if the resulting item set still meet the specified threshold.

•Algorithm stops when there are no more item left to add to meet the min. occurence threshold

Apriori Algorithm

•Once we generated itemsets using Apriori, we can apply association rules.

•As our item size is having 2 items so our association rule will be of the form (A) -> (B)

Three Stage:

1. Support

2. Confidence

3. Lift

Association Rule Mining
Apriori Association: Python 1
Apriori Association - Python 2
Apriori Association- Python 3
+ Supervised Learning: Decision Tree and Support Vector Machines
17 lectures 01:40:19

•Its a tree like data structure to make a model of the data

•uses if-else at every node of the tree

•can be used for both classification and regression analysis

Algorithm : Decision Trees

•ID3 (Entropy and Information Gain)

•Gini Index

•Chi Square

My Github:

For detailed Entropy explanation refer to file : "Decision Tree" in above Repo.

Decision Tree Regression - Concept 1
Decision Tree Regression - Concept 2
Decision Tree Regression - Python 1
Decision Tree Regression - Python 2
Decision Tree Classification - Concept 1
Decision Tree Classification - Concept 2
Decision Tree Classification - Python 1
Decision Tree Classification - Python 2

•Operates well in higher dimensions

•Avoids curse of dimensionality

•Fast to compute

•Max Margin: A slight error in measurement will not cause a misclassification

Support Vector Machines - Concept
Support Vector Machines - Python 1
Support Vector Machines - Python 2
Kernel SVM
Kernel SVM - Python 1
Kernel SVM - Python 2

•fit line is the hyperplane that has a maximum number of points

•Y = mx +c

•-e < Ypred – Yact < e

Support Vector Regression - Concept
Support Vector Regression - Python 1
Support Vector Regression - Python 2
+ Deep Learning: Introduction
12 lectures 01:05:30

•Deep learning is part of a broader family of machine learning methods based on artificial neural networks.

•Learning can be supervised, semi-supervised or unsupervised

•The “deep” in deep learning refers to the depth of the network.

What is Deep Learning?

•Implementation of Deep Learning

•Inspired by biological systems

•Dendrites are the structures on the neuron that receive electrical messages, to process these signals, and to transfer the information to the soma of the neuron

•Axons: primary transmission lines of the nervous system

How Artificial Neural Network Works

•Hidden Layer is between Input and Output layer

•Allow for the function of a neural network to be broken down into specific transformations of the data

•Each hidden layer function is having specific task to produce a defined output

Artificial Neural Network - Hidden Layer

•These type of functions are attached to each neuron in the network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.

•It helps to standardize the output of each neuron.

•E.g: Threshold, Sigmoid, Relu(Rectifier), Softmax

Activation Function

Linear Function:

•Using Linear function only will make the output layer to be a linear function as well so we can't map a non-linear dataset

Step Function:

•we define threshold values and have discrete output values

•if(z > threshold) — “activate” the node (value 1)

•if(z < threshold) — don’t “activate” the node (value 0)

•So, we have value either 0 or 1

•issue here is that it is possible multiple output classes/nodes to be activated (to have the value 1). So we are not able to properly classify/decide.

Sigmoid Function:

•It is a non-linear function

•Value range is (0,1)

•classify values either 1 or 0

Different Activation Functions

•Cost reduces with adjustment in weight(w)

•Error propagates from right to left and update the weights according to how much they are responsible for the error.

•Determining how changing the weights impact the overall cost in the neural network.

•The Learning rate decides by how much we update the weights

weight = weight + Error*Lr*input

What is forward and backward propagation
Basics of Deep Learning
2 questions

1> Create virtual env

#conda create -n tensorflow pip python=3.5

2> activate env

#activate tensorflow

#conda info --envs

3> Install tensorflow

#conda install -c conda-forge tensorflow

this will install tensorflow 1.10.0

#python -m pip install --upgrade pip

#pip install setuptools==39.1.0

4> Install keras

#pip install keras==2.2.2

5> Install other package

#pip install matplotlib

#pip install sklearn

#pip install pydot

6> Install spyder separately so that you can launch it without activating your virtual env

#conda install spyder

Creating Env for Deep Learning 1
Creating Env for Deep Learning 2
Creating Env for Deep Learning : Document
Artificial Neural Network - Python 1
Artificial Neural Network - Python 2
Artificial Neural Network - Python 3
+ Deep Learning: Create a Simple Neural Network(Perceptron) from Scratch
6 lectures 36:11

•neuron along with a set of input nodes connected to the inputs via weighted edges, is a perceptron, the simplest neural network.

Single Layer Perceptron
Perceptron: implement OR gate

1.Initialize Learning rate, bias and weights

2.Function perceptron: perceptron(x_1, x_2, output)

•Takes input variable and actual output

•Calculate Error = ½*(actual - predicted)^2

•Recalculate the weights

weights = weights + error * input * lr

3. Function predict: predict(x_1, x_2)

•Takes input variable and actual output

•Predict = Calculate Output

4. Call perceptron for each row of OR gate

5. Run in Loop for multiple times to train the Network

6. Take Input values from user to predict the value

Develop Algorithm to Implement Gates

Develop Code according to Algorithm already Discussed

Implement Or Gate: Python 1

Develop Code according to Algorithm already Discussed

Implement Or gate: Python 2

Develop Code according to Algorithm already Discussed

Implement Or Gate: Python 3