# Pro data science in Python

**14 hours**left at this price!

- 11.5 hours on-demand video
- 86 downloadable resources
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business- Use complex scikit-learn tools for machine learning
- Do statistical analysis using Statsmodels
- Read, transform and manipulate data using Pandas
- Use Keras for neural networks
- Solve both supervised and unsupervised machine learning problems
- Do time series analysis and forecasting using Statsmodels
- Classify images using Deep Convolutional Networks

Building aggregate analysis is a fundamental technique for validating our analysis and results, before jumping into the actual machine learning algorithms. For example, we can compare our results versus some aggregated reports that we might get, thus validating that our data is in good shape.

Fundamental ideas behind linear regression

We use Bernoulli and Multinomial Naive Bayes classifiers to predict spam in a real SMS dataset from Kaggle. We finally achieve a 96% accuracy (in sample) vs 86% that we would have obtained by using the proportion of non-spam/total sms. This probably gives a good reason for spammers to hate machine learning!

We introduce SVM within a very simple (linear) context. Even though it is an extremely powerful algorithm, it will tend to generate too many support vectors, possibly over-fitting the data. Is there a solution to that? Even though SVM is famous as a classification tool, we will see how it can be used as a very powerful regression tool

We show how to run a linear regression model via ordinary least squares, lasso, and ridge. We see how we LASSO can reduce the dimensionality of a feature set, and how Ridge can estimate using a correlated feature set. At the end, we also end up with models with bias, but that can generate more stable predictions. In the example analyzed here, we end up with all models having a very similar "score", so we can't conclude that either one is "better" than another in terms of prediction. But we show how LASSO can generate a model that competes really well with Ridge and OLS, even with high correlation; and at the same time reduce correctly the dimensionality of the problem. We also how to use the "LASSOCV" and "RIDGECV" functions which automatically compute the regularization parameter we need for those methods, even though in this case we can't get a specific improvement.

The best performing methods nowadays rely on building smaller models and then averaging (or choosing one) between them. Many of the winning algorithms in Kaggle competitions do exactly this. We describe the two big families of ensemble methods: (A) - Averaging ensemble methods (B) - Boosting ensemble methods

In ML, we typically deal with hundreds (if not thousands of features), and for many reasons (either for plotting, modelling, identifying rare observations) we will need to reduce that set. We show how to use scikit-learn to compute PCA, and later project that same data into a low-dimensional space. After that, we plot that data, understand which features move in similar directions, which features have high loadings into the principal components, and even identify weird observations.

We use a dataset containing information on multiple human development indexes, to cluster the countries into 3 groups. We show that both K-means and PCA+K-Means (with one principal component extracted) achieve practically the same results. We finally report the results per cluster and present some insights

We have multiple recordings per word: "Banana", "Chair", "IceCream", "Hello", "Goodbye". We want to extract some metrics from each file, so we can do machine learning later. The difficult part is that the metrics that we need are related to the signal encoded in each file (audio file actually). Luckily, we can leverage an existing R package that reads .wav files, and outputs many properties about the frequencies operating in each file. At the end, we produce 2 csv files (one for training and one for testing) containing 21 features that we can use later for doing machine learning. The approach presented here, can be extended to situations requiring the classification of any sound.

We load the features that we extracted before, both for our training and testing datasets. We evaluate the performance of both Adaboost and SVM. Both methods have a practical in sample accuracy of 100%, 80% of cross-validation accuracy, and 80% of out-of-sample accuracy.

- Some experience with data science, Python and statistics
- Being able to code functions, and understand a Python program
- Understand the basics behind regression, random variables, and classification

This course explores several data science and machine learning techniques that every data science practitioner should be familiar with. Fundamentally, the course pivots over four axis:

- Pandas and Matplotlib for working with data
- Keras for Deep Learning,
- Scikit-learn for machine learning
- Statsmodels for statistics

This course explores the fundamental concepts in these big four topics, and provides the student with an overview of the problems that can be solved nowadays.

I only focus on the computational and practical implications of these techniques, and it is assumed that the student is partially familiar with Statistics-ML-Data Science - or is willing to complement the techniques presented here with theoretical material. Python programming experience will be absolutely necessary, as we only explain how to define Classes in Python (as we will use them along the course)

The teaching strategy is to briefly explain the theory behind these techniques, show how these techniques work in very simple problems, and finally present the student with some real examples. I believe that these real examples add an enormous value to the student, as it helps understand why these techniques are so used nowadays (because they solve real problems!)

Some examples that we will attack here will be: Forecasting the GDP of the United States, forecasting London new houses prices, identifying squares and triangles in pictures, predicting the value of vehicles using online data, detecting spam on SMS data, and many more!

In a nutshell, this course explains how to:

- Define classes for storing data in a better way
- Plotting data
- Merging, pivoting, subsetting, and grouping data via Pandas
- Using linear regression via Statsmodels
- Working with time series/forecasting in Statsmodels
- Several unsupervised machine learning techniques, such as clustering
- Several supervised techniques such as random forests, classification trees, Naive Bayes classifiers, etc
- Define Deep Learning architectures using Keras
- Design different neural networks such as recurrent neural networks, multi-layer perceptrons,etc.
- Classify Audio/sounds in a similar way that Alexa, Siri and Cortana do using machine learning

The student needs to be familiar with statistics, Python and some machine learning concepts

- Data science beginners, and intermediate users
- Statisticians, and CS students wanting to strengthen their data science skills