Byte-Sized-Chunks: Decision Trees and Random Forests

Cool machine learning techniques to predict survival probabilities aboard the Titanic - a Kaggle problem!
4.0 (31 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
1,098 students enrolled
$19
$20
5% off
Take This Course
  • Lectures 19
  • Length 4.5 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 3/2016 English

Course Description

Note: This course is a subset of our 20+ hour course 'From 0 to 1: Machine Learning & Natural Language Processing' so please don't sign up for both:-)

In an age of decision fatigue and information overload, this course is a crisp yet thorough primer on 2 great ML techniques that help cut through the noise: decision trees and random forests.

Prerequisites: No prerequisites, knowledge of some undergraduate level mathematics would help but is not mandatory. Working knowledge of Python would be helpful if you want to run the source code that is provided.

Taught by a Stanford-educated, ex-Googler and an IIT, IIM - educated ex-Flipkart lead analyst. This team has decades of practical experience in quant trading, analytics and e-commerce.

What's Covered:

  • Decision Trees are a visual and intuitive way of predicting what the outcome will be given some inputs. They assign an order of importance to the input variables that helps you see clearly what really influences your outcome.
  • Random Forests avoid overfitting: Decision trees are cool but painstaking to build - because they really tend to overfit. Random Forests to the rescue! Use an ensemble of decision trees - all the benefits of decision trees, few of the pains!
  • Python Activity: Surviving aboard the Titanic! Build a decision tree to predict the survival of a passenger on the Titanic. This is a challenge posed by Kaggle (a competitive online data science community). We'll start off by exploring the data and transforming the data into feature vectors that can be fed to a Decision Tree Classifier.

Mail us about anything - anything! - and we will always reply :-)

What are the requirements?

  • No prerequisites, knowledge of some undergraduate level mathematics would help but is not mandatory. Working knowledge of Python would be helpful if you want to perform the coding exercise and understand the provided source code

What am I going to get from this course?

  • Design and Implement the solution to a famous problem in machine learning: predicting survival probabilities aboard the Titanic
  • Understand the perils of overfitting, and how random forests help overcome this risk
  • Identify the use-cases for Decision Trees as well as Random Forests

What is the target audience?

  • Nope! Please don't enroll for this class if you have already enrolled for our 21-hour course 'From 0 to 1: Machine Learning and NLP in Python'
  • Yep! Analytics professionals, modelers, big data professionals who haven't had exposure to machine learning
  • Yep! Engineers who want to understand or learn machine learning and apply it to problems they are solving
  • Yep! Product managers who want to have intelligent conversations with data scientists and engineers about machine learning
  • Yep! Tech executives and investors who are interested in big data, machine learning or natural language processing

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Decision Fatigue, And Decision Trees
You, This Course, and Us!
Preview
01:14
17:00
What are Decision Trees and how are they useful? Decision Trees are a visual and intuitive way of predicting what the outcome will be given some inputs. They assign an order of importance to the input variables that helps you see clearly what really influences your outcome.
18:03

Recursive Partitioning is the most common strategy for growing Decision Trees from a training set.

Learn what makes one attribute be higher up in a Decision Tree compared to others.

18:51
We'll take a small detour into Information Theory to understand the concept of Information Gain. This concept forms the basis of how popular Decision Tree Learning algorithms work.
07:50

ID3, C4.5, CART and CHAID are commonly used Decision Tree Learning algorithms. Learn what makes them different from each other. Pruning is a mechanism to avoid one of the risks inherent with Decision Trees ie overfitting.

09:00

Anaconda's iPython is a Python IDE. The best part about it is the ease with which one can install packages in iPython - 1 line is virtually always enough. Just say '!pip'

18:05
Numpy arrays are pretty cool for performing mathematical computations on your data.
14:19
We continue with a basic tutorial on Numpy and Scipy
19:21
Build a decision tree to predict the survival of a passenger on the Titanic. This is a challenge posed by Kaggle (a competitive online data science community). We'll start off by exploring the data and transforming the data into feature vectors that can be fed to a Decision Tree Classifier.
14:16
We continue with the Kaggle challenge. Let's feed the training set to a Decision Tree Classifier and then parse the results.
13:00

We'll use our Decision Tree Classifier to predict the results on Kaggle's test data set. Submit the results to Kaggle and see where you stand!

Section 2: A Few Useful Things to Know About Overfitting
19:03

Overfitting is one of the biggest problems with Machine Learning - it's a trap that's easy to fall into and important to be aware of.

11:19

Overfitting is a difficult problem to solve - there is no way to avoid it completely, by correcting for it, we fall into the opposite error of underfitting.

18:55

Cross Validation is a popular way to choose between models. There are a few different variants - K-Fold Cross validation is the most well known.

07:18

Overfitting occurs when the model becomes too complex. Regularization helps maintain the balance between accuracy and complexity of the model.

16:39

The crowd is indeed wiser than the individual - at least with ensemble learning. The Netflix competition showed that ensemble learning helps achieve tremendous improvements in accuracy - many learners perform better than just 1.

18:02

Bagging, Boosting and Stacking are different techniques to help build an ensemble that rocks!

Section 3: Random Forests
12:28

Decision trees are cool but painstaking to build - because they really tend to overfit. Random Forests to the rescue! Use an ensemble of decision trees - all the benefits of decision trees, few of the pains!

20:03

Machine learning is not a one-shot process. You'll need to iterate, test multiple models to see what works better. Let's use cross validation to compare the accuracy of different models - Decision trees vs Random Forests

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Loony Corn, A 4-person team;ex-Google; Stanford, IIM Ahmedabad, IIT

Loonycorn is us, Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh. Between the four of us, we have studied at Stanford, IIM Ahmedabad, the IITs and have spent years (decades, actually) working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

Swetha: Early Flipkart employee, IIM Ahmedabad and IIT Madras alum

Navdeep: longtime Flipkart employee too, and IIT Guwahati alum

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Udemy!

We hope you will try our offerings, and think you'll like them :-)

Ready to start learning?
Take This Course