Note: This course is a subset of our 20+ hour course 'From 0 to 1: Machine Learning & Natural Language Processing' so please don't sign up for both:-)
In an age of decision fatigue and information overload, this course is a crisp yet thorough primer on 2 great ML techniques that help cut through the noise: decision trees and random forests.
Prerequisites: No prerequisites, knowledge of some undergraduate level mathematics would help but is not mandatory. Working knowledge of Python would be helpful if you want to run the source code that is provided.
Taught by a Stanford-educated, ex-Googler and an IIT, IIM - educated ex-Flipkart lead analyst. This team has decades of practical experience in quant trading, analytics and e-commerce.
Mail us about anything - anything! - and we will always reply :-)
Recursive Partitioning is the most common strategy for growing Decision Trees from a training set.
Learn what makes one attribute be higher up in a Decision Tree compared to others.
ID3, C4.5, CART and CHAID are commonly used Decision Tree Learning algorithms. Learn what makes them different from each other. Pruning is a mechanism to avoid one of the risks inherent with Decision Trees ie overfitting.
Anaconda's iPython is a Python IDE. The best part about it is the ease with which one can install packages in iPython - 1 line is virtually always enough. Just say '!pip'
We'll use our Decision Tree Classifier to predict the results on Kaggle's test data set. Submit the results to Kaggle and see where you stand!
Overfitting is one of the biggest problems with Machine Learning - it's a trap that's easy to fall into and important to be aware of.
Overfitting is a difficult problem to solve - there is no way to avoid it completely, by correcting for it, we fall into the opposite error of underfitting.
Cross Validation is a popular way to choose between models. There are a few different variants - K-Fold Cross validation is the most well known.
Overfitting occurs when the model becomes too complex. Regularization helps maintain the balance between accuracy and complexity of the model.
The crowd is indeed wiser than the individual - at least with ensemble learning. The Netflix competition showed that ensemble learning helps achieve tremendous improvements in accuracy - many learners perform better than just 1.
Bagging, Boosting and Stacking are different techniques to help build an ensemble that rocks!
Decision trees are cool but painstaking to build - because they really tend to overfit. Random Forests to the rescue! Use an ensemble of decision trees - all the benefits of decision trees, few of the pains!
Machine learning is not a one-shot process. You'll need to iterate, test multiple models to see what works better. Let's use cross validation to compare the accuracy of different models - Decision trees vs Random Forests
Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore.
Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft
Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too
Swetha: Early Flipkart employee, IIM Ahmedabad and IIT Madras alum
Navdeep: longtime Flipkart employee too, and IIT Guwahati alum
We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!
We hope you will try our offerings, and think you'll like them :-)