# All-in-One: Machine Learning, DL, NLP [Hindi][Python]

**5 hours**left at this price!

- 16.5 hours on-demand video
- 1 article
- 58 downloadable resources
- Full lifetime access
- Access on mobile and TV
- Assignments

- Certificate of Completion

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business- Master in creating Machine Learning Models on Python
- Visualizing various ML Models wherever possible to develop a better understanding about it.
- How to Analyse the Data, Clean it and Prepare (Data Preprocessing Techniques) it to feed into Machine Learning Models.
- Learn the most Basic Mathematics behind Simple Linear Regression and its Best fit line.
- What is Gradient Descent, how it works Internally with full Mathematical explanation.
- Make predictions using Simple Linear Regression, Multiple Linear Regression.
- Make predictions using Logistic Regression, K-Nearest Neighbours and Naive Bayes.
- Fundamental Concept of Deep Learning and Natural Language Processing. Python Code is include at some place for explanation.

- For Machine Learning Concept no prerequisite. Anyone can do this course.
- Prior Understanding of Python is requiried with know how to operate Spyder/Jupiter Notebook for Coding.

This course is designed to cover maximum Concept of Machine Learning. Anyone can opt for this course. No prior understanding of Machine Learning is required.

NOTE: Course is still under Development. You will see new topics will get added regularly.

Now question is why this course?

This Course will not only teach you the basics of Machine learning and Simple Linear Regression. It will also cover in depth mathematical explanation of Cost function and use of Gradient Descent for Simple Linear Regression. Understanding these is must for a solid foundation before entering into Machine Learning World. This foundation will help you to understand all other algorithms and mathematics behind it.

As a Bonus Introduction Natural Language Processing is included.

Below Topics are covered till now.

**Chapter - Introduction to Machine Learning**

- Machine Learning?

- Types of Machine Learning

**Chapter - Data Preprocessing**

- Null Values

- Correlated Feature check

- Data Molding

- Imputing

- Scaling

- Label Encoder

- On-Hot Encoder

**Chapter - Supervised Learning: Regression**

- Simple Linear Regression

- Minimizing Cost Function - Ordinary Least Square(OLS), Gradient Descent

- Assumptions of Linear Regression, Dummy Variable

- Multiple Linear Regression

- Regression Model Performance - R-Square

- Polynomial Linear Regression

**Chapter - Supervised Learning: Classification**

- Logistic Regression

- K-Nearest Neighbours

- Naive Bayes

- Saving and Loading ML Models

- Classification Model Performance - Confusion Matrix

**Chapter: UnSupervised Learning: Clustering**

- Partitionaing Algorithm: K-Means Algorithm, Random Initialization Trap, Elbow Method

- Hierarchical Clustering: Agglomerative, Dendogram

- Density Based Clustering: DBSCAN

- Measuring UnSupervised Clusters Performace - Silhouette Index

**Chapter: UnSupervised Learning: Association Rule**

- Apriori Algorthm

- Association Rule Mining

**Chapter: Non-Linear Supervised Algorithm: Decision Tree and Support Vector Machines**

- Decision Tree Regression

- Decision Tree Classification

- Support Vector Machines(SVM) - Classification

- Kernel SVM, Soft Margin, Kernel Trick

**Chapter - Natural Language Processing**

Below Text Preprocessing Techniques with python Code

- Tokenization, Stop Words Removal, N-Grams, Stemming, Word Sense Disambiguation

- Count Vectorizer, Tfidf Vectorizer. Hashing Vector

- Case Study - Spam Filter

**Chapter - Deep Learning**

- Artificial Neural Networks, Hidden Layer, Activation function

- Forward and Backward Propagation

- Implementing Gate in python using perceptron

**Chapter: Regularization, Lasso Regression, Ridge Regression**

- Overfitting, Underfitting

- Bias, Variance

- Regularization

- L1 & L2 Loss Function

- Lasso and Ridge Regression

**Chapter: Dimensionality Reduction**

- Feature Selection - Forward and Backward

- Feature Extraction - PCA, LDA

**Chapter: Ensemble Methods: Bagging and Boosting**

- Bagging - Random Forest (Regression and Classification)

- Boosting - Gradient Boosting (Regression and Classification)

- Anyone who is looking or dont know from where to start Machine Learning, Deep Learning and Natural Language Processing can opt for this course.
- This will provide a good foundation in understanding concept of Machine Learning.

Full Course Material can be download from github: https://github.com/bansalrishi/MachineLearningWithPython_UD

•**Supervised** - labeled data is used to help machines recognize characteristics and use them for future data. E.g: classify pictures of cats and dogs.

•**Unsupervised** - we simply put unlabeled data and let machine understand the characteristics and classify it. E.g: Clustering (News Article)

•**Reinforcement Learning**: RML interact with the environment by producing actions and then analyze errors or rewards. E.g: Chess game

•** Regression: **This is a type of problem where we need to predict the continuous-response value (ex : above we predict number which can vary from -infinity to +infinity)

E.g: House Price, Value of stock

•** Classification: **This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” (ex: we predict one of the values in a set of values).

E.g: Mail spam or not, Diabetes or not, etc

•Preprocessing refers to transformation before feeding to machine learning

•Quality of data is important to train the model

•Source – Government databases, professional or company data sources(twitter), your company, etc

•Data will never be in the format you need – Pandas Dataframe for reformatting

•Columns to remove – No values, duplicate(correlated column, e.g: house size in ft and metres)

•Learning algorithms understands only number, converting text image to number is required

•Unscaled or unstandardized data have might have unacceptable prediction

•Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

•It is performed during the data pre-processing to handle highly varying magnitudes or values or units.

•**Disadvantage**:

• Without Feature Scaling a machine learning algorithm tends to weigh greater values -> higher and consider smaller values as the lower values, regardless of the unit of the values.

Convert text values to numbers. These can be used in the following situations:

•There are only two values for a column in your data. The values will then become 0/1 - effectively a binary representation

•The values have relationship with each other where comparisons are meaningful (e.g. low<medium<high)

It tells how well regression equation explains the data.

•A value of R^2 = 1 means regression predictions perfectly fit/explains the data.

Question: Can ?2 be negative?

•Ans: When: ( Sum of Square Errors(SSE) > {Total Sum of Squares(SST)} )

•This means when our predicted model performs worst than average line which is a very rare case.

**1.Linearity**

•linear regression is sensitive to outlier effects

•needs the relationship between the independent and dependent variables to be linear

•linearity assumption can best be tested with scatter plots

**2. Homoscedasticity**

•meaning the residuals are equal across the regression line

•Heteroscedasticity Test to check - The Goldfeld-Quandt Test

**3. Multivariate Normality**

•This assumption can best be checked with a histogram or a Q-Q-Plot

•Normality can be checked with a goodness of fit test(Kolmogorov-Smirnov)

**4. No Autocorrelation in the Data**

•when the residuals are not independent from each other.

•in simple terms when the value of y(x+1) is not independent from the value of y(x)

•Durbin-Watson test

**5. Lack of Multicollinearity**

•Multicollinearity: Model cannot differentiate between the effect of D1 and D2 as these are totally related.

•fixed using correlation in data pre processing

Issue with Linear Regression

•But if we have an outlier, it will go horribly wrong

•Because of one outlier, whole linear regression prediction is going wrong

Logistic Regression

Logistic regression can be understood by standard logistic function. Logistic function is a Sigmoid function, which takes real value between zero and one.

If we plot sigmoid function, the graph will be S curve. When there is an outlier, sigmoid function takes care of it.

Linear regression assumes that the data follows a linear function.

Logistic regression models the data using the sigmoid function

Describe the performance of a classification model

•Accuracy: Is fraction of correct predictions in all prediction made by model

•Precision: Is fraction of correct positive predictions in all positive predictions made by the model

•Recall: Is fraction of correct positive predictions made in actual positive data

** Spam Filter (positive class: spam)**: Optimize for precision or specificity because false negatives (spam goes to inbox) are more acceptable than false positives (non-spam caught by spam filter).

** Fraudulent transaction detector ( positive class: fraud)**: Optimize for sensitivity because false positives (normal transactions that are flagged as possible fraud) are more acceptable than false negatives (fraudulent transactions that are not detected)

It assumes that similar things exist in close proximity.

**Algorithm**:

* Step 1: Choose the no. K of neighbours

* Step 2: Take the K nearest neighbours of the new data points by Euclidean distance

* Step 3: Among K Neighbours, count the no. of data points in each category

* Step 4: Assign new data point to the category where you counted most neighbour

Its Naive(innocent) because it assumes that all the features are independent of each other. Which is almost never possible.

•Easy to understand.

•All features are independent.

•All impact results equally.

•Need small amount of data to train the model.

•Fast – up to 100X faster.

•It is highly scalable.

•It can make probabilistic predictions.

•It's simple & out-performs many sophisticated methods.

•Stable to data changes.

**Algorithm:**

1.Initialize k centroids.

2.Select at random K points, the centroids(not necessary from the dataset)

3.Assign each data to the nearest centroid, this step will create clusters.

4.Compute and place the new centroid of each cluster.

5.Reassign each data point to the new closest centroid. If any new reassignment, Repeat steps 4 otherwise go to Finish

**WCSS**:

•Euclidean distance between a given point and centroid to which it is assigned.

•Iterate this process for all the points in the cluster

•Sum all the values and divide by no. of points

Total WCSS decreases as no. of clusters increases

Total WCSS is minimum when No. of clusters is equal to no. of data points

Elbow Method to find the optimal number of clusters

•These methods does hierarchical decomposition of datasets.

•**Agglomerative** method (Bottom-Up): assume each data as cluster & merge to create a bigger cluster

•**Divisive** method (Top-Down): start with one cluster & continue splitting

**Algorithm**:

•Start with assigning one cluster to each data - N Cluster

•Combine two closest point in one cluster - (N - 1) Cluster

•Combine two closest cluster into one cluster - (N - 2) Cluster

•Repeat Step 3 until there is only one cluster left

All above techniques are distance based & such methods can find only spherical clusters and not suited for clusters of other shapes. All they are severely impacted by noise or outliers in the data.

**Used**:

•If data is of arbitrary shape

•Data contain noise

**Algorithm has two parameters:**

eps: The radius of our neighborhoods around a data point p. If distance between two points is lower or equal to eps then they are neighbours. Small value will lead to large data points as outlier and large value will lead to majority of data points to same cluster.

minPts: The minimum number of data points we want in a neighborhood to define a cluster. minPts >= D +1 and should be at least 3.

•Not as straight forward as Supervised Algorithm

•Question of Good clustering is relative

Some Popular Index:

**Davies-Bouldin**

•Evaluates intra-cluster similarity and inter-cluster differences

•Not Normalized, so difficult to compare between two different datasets

**Silhouette Index**

•calculates using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample

•The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

•Normalized, a value close to 1 is always good

•good for spherical data structures

**Apriori Algorithm:**

•Used to identify frequent item sets.

•Uses bottom-up approach, identify individual items first that fulfill a min occurrence threshold. After this, it add one item at a time and check if the resulting item set still meet the specified threshold.

•Algorithm stops when there are no more item left to add to meet the min. occurence threshold

•Its a tree like data structure to make a model of the data

•uses if-else at every node of the tree

•can be used for both classification and regression analysis

**Algorithm : Decision Trees**

•ID3 (Entropy and Information Gain)

•Gini Index

•Chi Square

My Github: https://github.com/bansalrishi/MachineLearningWithPython_UD

For detailed Entropy explanation refer to file : "Decision Tree" in above Repo.

•Implementation of Deep Learning

•Inspired by biological systems

•Dendrites are the structures on the neuron that receive electrical messages, to process these signals, and to transfer the information to the soma of the neuron

•Axons: primary transmission lines of the nervous system

•These type of functions are attached to each neuron in the network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.

•It helps to standardize the output of each neuron.

•E.g: Threshold, Sigmoid, Relu(Rectifier), Softmax

**Linear Function:**

•Using Linear function only will make the output layer to be a linear function as well so we can't map a non-linear dataset

**Step Function:**

•we define threshold values and have discrete output values

•if(z > threshold) — “activate” the node (value 1)

•if(z < threshold) — don’t “activate” the node (value 0)

•So, we have value either 0 or 1

•issue here is that it is possible multiple output classes/nodes to be activated (to have the value 1). So we are not able to properly classify/decide.

**Sigmoid Function:**

•It is a non-linear function

•Value range is (0,1)

•classify values either 1 or 0

•Cost reduces with adjustment in weight(w)

•Error propagates from right to left and update the weights according to how much they are responsible for the error.

•Determining how changing the weights impact the overall cost in the neural network.

•The **Learning rate **decides by how much we update the weights

**weight = weight + Error*Lr*input**

1> Create virtual env

#conda create -n tensorflow pip python=3.5

2> activate env

#activate tensorflow

#conda info --envs

3> Install tensorflow

#conda install -c conda-forge tensorflow

this will install tensorflow 1.10.0

#python -m pip install --upgrade pip

#pip install setuptools==39.1.0

4> Install keras

#pip install keras==2.2.2

5> Install other package

#pip install matplotlib

#pip install sklearn

#pip install pydot

6> Install spyder separately so that you can launch it without activating your virtual env

#conda install spyder

**1.Initialize Learning rate, bias and weights**

**2.Function perceptron: perceptron(x_1, x_2, output)**

•Takes input variable and actual output

•Calculate Error = ½*(actual - predicted)^2

•Recalculate the weights

weights = weights + error * input * lr

**3. Function predict: predict(x_1, x_2)**

•Takes input variable and actual output

•Predict = Calculate Output

**4. Call perceptron for each row of OR gate**

**5. Run in Loop for multiple times to train the Network**

**6. Take Input values from user to predict the value**