# All-in-One:Machine Learning,DL,NLP,AWS Deply [Hindi][Python]

**23 hours**left at this price!

- 17.5 hours on-demand video
- 1 article
- 59 downloadable resources
- Full lifetime access
- Access on mobile and TV
- Assignments

- Certificate of Completion

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business- Master in creating Machine Learning Models on Python
- Visualizing various ML Models wherever possible to develop a better understanding about it.
- How to Analyse the Data, Clean it and Prepare (Data Preprocessing Techniques) it to feed into Machine Learning Models.
- Learn the most Basic Mathematics behind Simple Linear Regression and its Best fit line.
- What is Gradient Descent, how it works Internally with full Mathematical explanation.
- Make predictions using Simple Linear Regression, Multiple Linear Regression.
- Deploy your own model on AWS using Flask so that anyone can access it and get the prediction.
- Make predictions using Logistic Regression, K-Nearest Neighbours and Naive Bayes.
- Fundamental Concept of Deep Learning and Natural Language Processing. Python Code is include at some place for explanation.
- Regularisation and idea behind it. See it in action using Lasso and Ridge Regression.

- For Machine Learning Concept no prerequisite. Anyone can do this course.
- Prior Understanding of Python is required.

This course is designed to cover maximum Concept of Machine Learning. Anyone can opt for this course. No prior understanding of Machine Learning is required.

As a Bonus Introduction Natural Language Processing and Deep Learning is included.

Below Topics are covered

**Chapter - Introduction to Machine Learning**

- Machine Learning?

- Types of Machine Learning

**Chapter - Setup Environment **

- Installing Anaconda, how to use Spyder and Jupiter Notebook

- Installing Libraries

**Chapter - Creating Environment on cloud (AWS)**

- Creating EC2, connecting to EC2

- Installing libraries, transferring files to EC2 instance, executing python scripts

**Chapter - Data Preprocessing**

- Null Values

- Correlated Feature check

- Data Molding

- Imputing

- Scaling

- Label Encoder

- On-Hot Encoder

**Chapter - Supervised Learning: Regression**

- Simple Linear Regression

- Minimizing Cost Function - Ordinary Least Square(OLS), Gradient Descent

- Assumptions of Linear Regression, Dummy Variable

- Multiple Linear Regression

- Regression Model Performance - R-Square

- Polynomial Linear Regression

**Chapter - Supervised Learning: Classification**

- Logistic Regression

- K-Nearest Neighbours

- Naive Bayes

- Saving and Loading ML Models

- Classification Model Performance - Confusion Matrix

**Chapter: UnSupervised Learning: Clustering**

- Partitionaing Algorithm: K-Means Algorithm, Random Initialization Trap, Elbow Method

- Hierarchical Clustering: Agglomerative, Dendogram

- Density Based Clustering: DBSCAN

- Measuring UnSupervised Clusters Performace - Silhouette Index

**Chapter: UnSupervised Learning: Association Rule**

- Apriori Algorthm

- Association Rule Mining

**Chapter: Deploy Machine Learning Model using Flask**

- Understanding the flow

- Serverside and Clientside coding, Setup Flask on AWS, sending request and getting response back from flask server

**Chapter: Non-Linear Supervised Algorithm: Decision Tree and Support Vector Machines**

- Decision Tree Regression

- Decision Tree Classification

- Support Vector Machines(SVM) - Classification

- Kernel SVM, Soft Margin, Kernel Trick

**Chapter - Natural Language Processing**

Below Text Preprocessing Techniques with python Code

- Tokenization, Stop Words Removal, N-Grams, Stemming, Word Sense Disambiguation

- Count Vectorizer, Tfidf Vectorizer. Hashing Vector

- Case Study - Spam Filter

**Chapter - Deep Learning**

- Artificial Neural Networks, Hidden Layer, Activation function

- Forward and Backward Propagation

- Implementing Gate in python using perceptron

**Chapter: Regularization, Lasso Regression, Ridge Regression**

- Overfitting, Underfitting

- Bias, Variance

- Regularization

- L1 & L2 Loss Function

- Lasso and Ridge Regression

**Chapter: Dimensionality Reduction**

- Feature Selection - Forward and Backward

- Feature Extraction - PCA, LDA

**Chapter: Ensemble Methods: Bagging and Boosting**

- Bagging - Random Forest (Regression and Classification)

- Boosting - Gradient Boosting (Regression and Classification)

- Anyone who is looking or dont know from where to start Machine Learning, Deep Learning and Natural Language Processing can opt for this course.
- This will provide a good foundation in understanding concept of Machine Learning.

Full Course Material can be download from github: https://github.com/bansalrishi/MachineLearningWithPython_UD

•**Supervised** - labeled data is used to help machines recognize characteristics and use them for future data. E.g: classify pictures of cats and dogs.

•**Unsupervised** - we simply put unlabeled data and let machine understand the characteristics and classify it. E.g: Clustering (News Article)

•**Reinforcement Learning**: RML interact with the environment by producing actions and then analyze errors or rewards. E.g: Chess game

•** Regression: **This is a type of problem where we need to predict the continuous-response value (ex : above we predict number which can vary from -infinity to +infinity)

E.g: House Price, Value of stock

•** Classification: **This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” (ex: we predict one of the values in a set of values).

E.g: Mail spam or not, Diabetes or not, etc

•Preprocessing refers to transformation before feeding to machine learning

•Quality of data is important to train the model

•Source – Government databases, professional or company data sources(twitter), your company, etc

•Data will never be in the format you need – Pandas Dataframe for reformatting

•Columns to remove – No values, duplicate(correlated column, e.g: house size in ft and metres)

•Learning algorithms understands only number, converting text image to number is required

•Unscaled or unstandardized data have might have unacceptable prediction

•Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

•It is performed during the data pre-processing to handle highly varying magnitudes or values or units.

•**Disadvantage**:

• Without Feature Scaling a machine learning algorithm tends to weigh greater values -> higher and consider smaller values as the lower values, regardless of the unit of the values.

Convert text values to numbers. These can be used in the following situations:

•There are only two values for a column in your data. The values will then become 0/1 - effectively a binary representation

•The values have relationship with each other where comparisons are meaningful (e.g. low<medium<high)

It tells how well regression equation explains the data.

•A value of R^2 = 1 means regression predictions perfectly fit/explains the data.

Question: Can ?2 be negative?

•Ans: When: ( Sum of Square Errors(SSE) > {Total Sum of Squares(SST)} )

•This means when our predicted model performs worst than average line which is a very rare case.

**1.Linearity**

•linear regression is sensitive to outlier effects

•needs the relationship between the independent and dependent variables to be linear

•linearity assumption can best be tested with scatter plots

**2. Homoscedasticity**

•meaning the residuals are equal across the regression line

•Heteroscedasticity Test to check - The Goldfeld-Quandt Test

**3. Multivariate Normality**

•This assumption can best be checked with a histogram or a Q-Q-Plot

•Normality can be checked with a goodness of fit test(Kolmogorov-Smirnov)

**4. No Autocorrelation in the Data**

•when the residuals are not independent from each other.

•in simple terms when the value of y(x+1) is not independent from the value of y(x)

•Durbin-Watson test

**5. Lack of Multicollinearity**

•Multicollinearity: Model cannot differentiate between the effect of D1 and D2 as these are totally related.

•fixed using correlation in data pre processing

Issue with Linear Regression

•But if we have an outlier, it will go horribly wrong

•Because of one outlier, whole linear regression prediction is going wrong

Logistic Regression

Logistic regression can be understood by standard logistic function. Logistic function is a Sigmoid function, which takes real value between zero and one.

If we plot sigmoid function, the graph will be S curve. When there is an outlier, sigmoid function takes care of it.

Linear regression assumes that the data follows a linear function.

Logistic regression models the data using the sigmoid function

Describe the performance of a classification model

•Accuracy: Is fraction of correct predictions in all prediction made by model

•Precision: Is fraction of correct positive predictions in all positive predictions made by the model

•Recall: Is fraction of correct positive predictions made in actual positive data

** Spam Filter (positive class: spam)**: Optimize for precision or specificity because false negatives (spam goes to inbox) are more acceptable than false positives (non-spam caught by spam filter).

** Fraudulent transaction detector ( positive class: fraud)**: Optimize for sensitivity because false positives (normal transactions that are flagged as possible fraud) are more acceptable than false negatives (fraudulent transactions that are not detected)

It assumes that similar things exist in close proximity.

**Algorithm**:

* Step 1: Choose the no. K of neighbours

* Step 2: Take the K nearest neighbours of the new data points by Euclidean distance

* Step 3: Among K Neighbours, count the no. of data points in each category

* Step 4: Assign new data point to the category where you counted most neighbour

Its Naive(innocent) because it assumes that all the features are independent of each other. Which is almost never possible.

•Easy to understand.

•All features are independent.

•All impact results equally.

•Need small amount of data to train the model.

•Fast – up to 100X faster.

•It is highly scalable.

•It can make probabilistic predictions.

•It's simple & out-performs many sophisticated methods.

•Stable to data changes.

**Algorithm:**

1.Initialize k centroids.

2.Select at random K points, the centroids(not necessary from the dataset)

3.Assign each data to the nearest centroid, this step will create clusters.

4.Compute and place the new centroid of each cluster.

5.Reassign each data point to the new closest centroid. If any new reassignment, Repeat steps 4 otherwise go to Finish

**WCSS**:

•Euclidean distance between a given point and centroid to which it is assigned.

•Iterate this process for all the points in the cluster

•Sum all the values and divide by no. of points

Total WCSS decreases as no. of clusters increases

Total WCSS is minimum when No. of clusters is equal to no. of data points

Elbow Method to find the optimal number of clusters

•These methods does hierarchical decomposition of datasets.

•**Agglomerative** method (Bottom-Up): assume each data as cluster & merge to create a bigger cluster

•**Divisive** method (Top-Down): start with one cluster & continue splitting

**Algorithm**:

•Start with assigning one cluster to each data - N Cluster

•Combine two closest point in one cluster - (N - 1) Cluster

•Combine two closest cluster into one cluster - (N - 2) Cluster

•Repeat Step 3 until there is only one cluster left

All above techniques are distance based & such methods can find only spherical clusters and not suited for clusters of other shapes. All they are severely impacted by noise or outliers in the data.

**Used**:

•If data is of arbitrary shape

•Data contain noise

**Algorithm has two parameters:**

eps: The radius of our neighborhoods around a data point p. If distance between two points is lower or equal to eps then they are neighbours. Small value will lead to large data points as outlier and large value will lead to majority of data points to same cluster.

minPts: The minimum number of data points we want in a neighborhood to define a cluster. minPts >= D +1 and should be at least 3.

•Not as straight forward as Supervised Algorithm

•Question of Good clustering is relative

Some Popular Index:

**Davies-Bouldin**

•Evaluates intra-cluster similarity and inter-cluster differences

•Not Normalized, so difficult to compare between two different datasets

**Silhouette Index**

•calculates using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample

•The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

•Normalized, a value close to 1 is always good

•good for spherical data structures

**Apriori Algorithm:**

•Used to identify frequent item sets.

•Uses bottom-up approach, identify individual items first that fulfill a min occurrence threshold. After this, it add one item at a time and check if the resulting item set still meet the specified threshold.

•Algorithm stops when there are no more item left to add to meet the min. occurence threshold

•Its a tree like data structure to make a model of the data

•uses if-else at every node of the tree

•can be used for both classification and regression analysis

**Algorithm : Decision Trees**

•ID3 (Entropy and Information Gain)

•Gini Index

•Chi Square

My Github: https://github.com/bansalrishi/MachineLearningWithPython_UD

For detailed Entropy explanation refer to file : "Decision Tree" in above Repo.