Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Real data science problems with Python

Name: Real data science problems with Python
Rating: 3.9 (56 reviews)

Practice machine learning and data science with real problems

Created byFrancisco Juretig

Last updated 1/2018

English

What you'll learn

Work with many ML techniques in real problems such as classification, image processing, regression
Build neural networks for classification and regression
Apply machine learning and data science to Audio Processing, Image detection, real time video, sentiment analysis and many more things

Course content

16 sections • 31 lectures • 7h 43m total length

Introduction12:21

Reading WAV files and extracting features16:44
We have multiple recordings per word: "Banana", "Chair", "IceCream", "Hello", "Goodbye". We want to extract some metrics from each file, so we can do machine learning later. The difficult part is that the metrics that we need are related to the signal encoded in each file (audio file actually). Luckily, we can leverage an existing R package that reads .wav files, and outputs many properties about the frequencies operating in each file. At the end, we produce 2 csv files (one for training and one for testing) containing 21 features that we can use later for doing machine learning. The approach presented here, can be extended to situations requiring the classification of any sound.
Classifying words using Adaboost and SVM15:47
We load the features that we extracted before, both for our training and testing datasets. We evaluate the performance of both Adaboost and SVM. Both methods have a practical in sample accuracy of 100%, 80% of cross-validation accuracy, and 80% of out-of-sample accuracy.
Classifying words using Multilayer Perceptron Deep Neural networks7:27
We design a MLP neural network for classifying the audio files we used in the previous lecture. But, in this case we basically get the same out-of-sample accuracy we were getting before, around 72%. So, the extra effort in configuring and running a neural net was not justified

Predicting nuclear output in the US via MLP and SVR15:00
We use official data from the US Nuclear Regulatory Commission, in order to predict the % usage of the existing reactors in the US. We test both Multilayer Perceptrons and Support Vector Regression (SVR). However, in this case, both methods do not perform well; and that is probably good to remind us that Machine Learning cannot always predict everything
Multi-output neural networks13:43
We use a deep neural network to predict the output of US commercial reactors, but instead of predicting one value per observation, we will predict multiple ones. Sounds hard? It's quite easy using Keras

Incremental training in Keras19:56
We use a real Kaggle example containing 350K observations for used cars in Ebay-Germany. The problem is that constructing the feature matrix is not viable, as we would end up with a Numpy matrix containing over 250 columns and 350K observations. Such a matrix will not fit into our RAM memory and we won't be even able to call Keras (if we somehow could, it would not work).

We thus train the model using train_on_batch(), feeding the model with batches of around 17K observations. Using this incremental approach, we can easily construct the matrix on each batch creation step. We finally estimate this using a deep neural network, achieving a mean absolute error around 1,500 euros per car

Poisonous mushrooms detection using Kaggle Data9:35
We work with a dataset containing multiple features per mushroom, and the objective is to predict whether they are edible or not. That is particularly challenging for humans, as there are no clear characteristic/rule that state when a mushroom is poisonous.
Classifying mushrooms using a super GPU on AWS9:15
We redo our previous exercise, but now using deep neural networks, using Keras. We easily get to 100% accuracy after very few epochs.

A class that maps Black&White images to Python objects17:01
Images are used frequently in machine learning, both for deep neural networks and for traditional algorithms (SVM, random forests, etc). We review the basics behind image loading and we present a class that can be used to read an entire directory and build the proper matrices needed for doing machine learning. This class is useful for transforming images in RGB channels (3 tensors) into black and white (0,1) matrices. It should only be used when reading images already in black and white format
A class that maps RGB Images to Python objects5:36
We present a similar class, but now it is designed to accommodate 3 channel image data (RGB Images), which we typically need to treat as a 5-dim tensor. This class will be useful for doing convolutional neural nets in the next section

Detecting hands in pictures via Convolutional Neural Networks19:52
We train a deep convolutional network in Keras, to identify hand gestures, and we achieve an excellent accuracy. We explain how to prepare the data, and preprocess the images before loading them into Python
Identifying bolts and nuts in images15:50
Identifying bolts and nuts by calculating polygons19:00
We process images via OpenCV and we detect and count nuts in the images. We combine the results from a blob-detector + DBSCAN clustering, to recover the exact amount of nuts appearing in the images (and their positions)

Requirements

Some experience with Python
General knowledge on Machine Learning, Statistics

Description

This course explores a variety of machine learning and data science techniques using real life datasets/images/audio collected from several sources. These realistic situations are much better than dummy examples, because they force the student to better think the problem, pre-process the data in a better way, and evaluate the performance of the prediction in different ways.

The datasets used here are from different sources such as Kaggle, US Data.gov, CrowdFlower, etc. And each lecture shows how to preprocess the data, model it using an appropriate technique, and compute how well each technique is working on that specific problem. Certain lectures contain also multiple techniques, and we discuss which technique is outperforming the other. Naturally, all the code is shared here, and you can contact me if you have any questions. Every lecture can also be downloaded, so you can enjoy them while travelling.

The student should already be familiar with Python and some data science techniques. In each lecture, we do discuss some technical details on each method, but we do not invest much time in explaining the underlying mathematical principles behind each method

Some of the techniques presented here are:

Pure image processing using OpencCV
Convolutional neural networks using Keras-Theano
Logistic and naive bayes classifiers
Adaboost, Support Vector Machines for regression and classification, Random Forests
Real time video processing, Multilayer Perceptrons, Deep Neural Networks,etc.
Linear regression
Penalized estimators
Clustering
Principal components

The modules/libraries used here are:

Scikit-learn
Keras-theano
Pandas
OpenCV

Some of the real examples used here:

Predicting the GDP based on socio-economic variables
Detecting human parts and gestures in images
Tracking objects in real time video
Machine learning on speech recognition
Detecting spam in SMS messages
Sentiment analysis using Twitter data
Counting objects in pictures and retrieving their position
Forecasting London property prices
Predicting whether people earn more than a 50K threshold based on US Census data
Predicting the nuclear output of US based reactors
Predicting the house prices for some US counties
And much more...

The motivation for this course is that many students willing to learn data science/machine learning are usually suck with dummy datasets that are not challenging enough. This course aims to ease that transition between knowing machine learning, and doing real machine learning on real situations.

Who this course is for:

Intermediate Python users with some knowledge on data science
Students wanting to practice with real datasets
Students who know some machine learning, but want to evaluate scikit-learn and Keras(Theano/Tensorflow) to real problems they will encounter in the analytics industry

Real data science problems with Python

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 12min

Wines1 lecture • 11min

Doing Machine learning with Audio - Classifying sounds3 lectures • 40min

Nuclear reactors in the US2 lectures • 29min

Clustering1 lecture • 20min

Used car prices for German Ebay1 lecture • 20min

Identifying poisonous mushrooms2 lectures • 19min

Plotting1 lecture • 17min

Useful image classes2 lectures • 23min

Image classification3 lectures • 55min

Requirements

Description

Who this course is for: