Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Baseball Data Wrangling with Vagrant, R, and Retrosheet
Rating: 4.6 out of 5(173 ratings)
13,369 students

Baseball Data Wrangling with Vagrant, R, and Retrosheet

Analytics with the Chadwick tools, dplyr, and ggplot.
Created byCharles Redmond
Last updated 6/2015
English

What you'll learn

  • install VirtualBox and Vagrant
  • run a virtual Linux machine
  • install the Chadwick software tools
  • extract game and play-by-play baseball data from Retrosheet files
  • produce graphs with ggplot

Course content

4 sections28 lectures2h 9m total length
  • Introduction1:22

    This is our course introduction in which I detail the structure of the course.

  • Installing VirtualBox0:57

    After viewing this lecture, you will be able to install VirtualBox on your machine.

  • Installing Vagrant0:38

    After viewing this lecture, you will be able to install Vagrant on your machine.

  • Creating a Project Folder3:17

    After this lecture, you will create a folder for your project and navigate to the folder via the command-line. You will also check to make sure you have Vagrant installed.

  • Vagrant Up3:11

    After viewing this lecture, you will download your Linux box and start it.

  • Directory Structure4:34

    After viewing this lecture, you will be able to ssh into your Linux box and navigate the directory structure.

Requirements

  • Students will need to have R and RStudio installed on their own computers.

Description

This course is for those interested in doing baseball analytics with the Retrosheet game-by-game and play-by-play data. The main tools for working with such data are in the Chadwick software. We install a virtual Linux machine, on which we will install the Chadwick software. We will then learn how to extract baseball data with the Chadwick software, how to further filter the data with dplyr in R, and how to plot our results with ggplot.

For the first part of the course, in which we install the virtual Linux machine and learn how to work with the Chadwick software, there are no prerequisites. To follow the second part of the course, knowledge of dplyr is necessary. This can be obtained through my course "Baseball Database Queries with SQL and dplyr".

At a relaxed pace, the course should take two to three weeks to complete.

Who this course is for:

  • This course is for those interested in doing baseball analytics with Retrosheet files.
  • No background is needed for the first part of the course. A background in the R package dplyr is necessary to follow the second part of the course.