Introduction to R Programming on Sports Data
3.7 (15 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,157 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Introduction to R Programming on Sports Data to your Wishlist.

Add to Wishlist

Introduction to R Programming on Sports Data

I want to show you how easy it is to create predictive models using sports data
3.7 (15 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,157 students enrolled
Created by Jerry Kim
Last updated 11/2016
English
Current price: $10 Original price: $25 Discount: 60% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
  • 1 hour on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Complete a project that uses NFL data to determine the most important positions
  • Learn web scraping with R and Python
  • Read xls files into R
  • Know the basics of dataframes, along with manipulating, merging, and combining them
  • Split data into training, validation, and test sets, along with understanding cross-validation
  • Be aware of the problem of overfitting when generating predictive models
  • Learn the basics of Linear regression and Lasso regression
  • Learn the basics of Random Forests
  • Generate data visualization using ggplot2
View Curriculum
Requirements
  • Basic understanding of programming
  • Basic knowledge of Python (and basic knowledge of R is recommended but not required)
  • Basic knowledge of statistics
Description

Are you interested in learning about data analysis and machine learning, but don't know where to start? Are you interested in sports and curious to know how analytics can be applied to sports? In the game of football, are you curious as which positions are the most important (other than the quarterback)?


If so, you've come to the right course! In this course, I will show you how easy it is to use the statistical software program R Studio in order to use data from the NFL to answer the question of which positions matter the most in the game of football! I will work you through this project so you learn about R by doing, as opposed to watching boring lectures that cover theory without any applications


My hope is that going over this project will provide the interest and motivation necessary for you to answer your own statistics and data-related questions using the concepts I cover in this course. I want you to become proactive instead of just being spectators and consumers 

Who is the target audience?
  • Anyone interested in learning R by doing and going through a project
  • Anyone looking for a fun project to work on
  • Anyone interested in sports but also math and statistics
  • Those who want/need to learn data analysis and machine learning but haven't seen why it is useful or fun
  • College students
  • Working professionals who need a refresher on predictive modeling and machine learning
Students Who Viewed This Course Also Viewed
Curriculum For This Course
10 Lectures
01:10:21
+
Introduction
1 Lecture 01:28

Introduction to the concepts I will cover in this course. I will cover web scraping using R and Python, dealing with dataframes, manipulating and merging dataframes, and then using Lasso regression and Random Forest to generate models that show which positions in football best correlate with the winning percentages of teams

Preview 01:28
+
Web scraping, using Excel files R, and merging dataframes
4 Lectures 35:53

Obtaining data for projects in which you want to generate predictions is not always easy to obtain. It will not always be available in just a simple file you can download. I will show you the valuable skill known as web scraping that is used by data scientists and data analysts to obtain data from websites by extracting the HTML content. I use the BeautifulSoup package in Python for this.

Preview 09:29

I cover where to obtain R Studio, the link to the code I use in this course (https://github.com/jk34/NFL_model), how to read in the Excel files into R, storing the Excel files as dataframes, and combining dataframes using cbind and merge.

Installing RStudio, reading Excel files into R, and storing them in a dataframe
16:10

I cover how to extract the HTML tables from ESPN containing the teams and their winning percentages for each season. I extract these HTML tables using web scraping with R.

Web scraping part II: Extracting HTML tables
05:32

I cover how to use just a subset of the dataframe containing the players and their positions for each team so that it only contains the top rated player at each position for each team for each season. I then merge this new dataframe with the dataframe containing the winning percentage of each team

Merging the players and ratings dataframe with winning percentage dataframe
04:42
+
Splitting data into training/validation/test sets, overfitting, K-fold CV
2 Lectures 12:08

I cover how to split our dataframe into a training, validation, and test set. Because I don't have an actual test set, I use 1/5 of our data as the "test" set, and use K-fold cross-validation on the remaining data to generate the training and validation sets.

Splitting data into training, validation, and test sets
08:40

I talk about the problem of overfitting and how to overcome this problem when generating predictive models

Overfitting
03:28
+
Using Linear Lasso Regression and Random Forest for predictive modeling
3 Lectures 20:52

I finally cover how to use our data to generate a model that can answer our question of which positions matter the most in the NFL. I go over Linear regression, and how Lasso regression can modifies Linear regression so it picks out only the relevant positions in the game of football. I mention how to use Linear and Lasso regression in R.

Linear regression, Lasso regression, evaluating the errors of our model
11:05

I cover another predictive model we will use: Random Forests. It is an ensemble average of decision trees

Random Forests
03:20

I talk about how I implement Random Forest on our data in R, and then how to generate the nice visualization that shows which positions are the most important to winning in the NFL

Random Forests in R and using ggplot for visualization
06:27
About the Instructor
Jerry Kim
3.7 Average rating
15 Reviews
2,157 Students
1 Course
Data Science Freelancer

Jerry Kim has a MA in Physics from the University of Texas at Austin, along with a BA in Physics and BS in Applied Mathematics from the University of California at Los Angeles. He taught Physics courses for 2 years as a teaching assistant at the University of Texas at Austin. He has experience with programming, computational physics, and data science.

He wants to provide his experience and knowledge of programming and data analysis to others. He currently works as a freelancer for data science projects.