This course explores a variety of machine learning and data science techniques using real life datasets/images/audio collected from several sources. These realistic situations are much better than dummy examples, because they force the student to better think the problem, pre-process the data in a better way, and evaluate the performance of the prediction in different ways.
The datasets used here are from different sources such as Kaggle, US Data.gov, CrowdFlower, etc. And each lecture shows how to preprocess the data, model it using an appropriate technique, and compute how well each technique is working on that specific problem. Certain lectures contain also multiple techniques, and we discuss which technique is outperforming the other. Naturally, all the code is shared here, and you can contact me if you have any questions. Every lecture can also be downloaded, so you can enjoy them while travelling.
The student should already be familiar with Python and some data science techniques. In each lecture, we do discuss some technical details on each method, but we do not invest much time in explaining the underlying mathematical principles behind each method
Some of the techniques presented here are:
- Pure image processing using OpencCV
- Convolutional neural networks using Keras-Theano
- Logistic and naive bayes classifiers
- Adaboost, Support Vector Machines for regression and classification, Random Forests
- Real time video processing, Multilayer Perceptrons, Deep Neural Networks,etc.
- Linear regression
- Penalized estimators
- Principal components
The modules/libraries used here are:
Some of the real examples used here:
- Predicting the GDP based on socio-economic variables
- Detecting human parts and gestures in images
- Tracking objects in real time video
- Machine learning on speech recognition
- Detecting spam in SMS messages
- Sentiment analysis using Twitter data
- Counting objects in pictures and retrieving their position
- Forecasting London property prices
- Predicting whether people earn more than a 50K threshold based on US Census data
- Predicting the nuclear output of US based reactors
- Predicting the house prices for some US counties
- And much more...
The motivation for this course is that many students willing to learn data science/machine learning are usually suck with dummy datasets that are not challenging enough. This course aims to ease that transition between knowing machine learning, and doing real machine learning on real situations.