
This is our course introduction in which I detail the structure of the course.
After viewing this lecture, you will be able to install VirtualBox on your machine.
After viewing this lecture, you will be able to install Vagrant on your machine.
After this lecture, you will create a folder for your project and navigate to the folder via the command-line. You will also check to make sure you have Vagrant installed.
After viewing this lecture, you will download your Linux box and start it.
After viewing this lecture, you will be able to ssh into your Linux box and navigate the directory structure.
After viewing this lecture, you will be able to download the Chadwick software and copy files from one directory to another on your Linux machine.
After viewing this lecture, you will be able to install the Chadwick software to your Linux machine.
After viewing this lecture, you will be familiar with the contents of the Retrosheet event files.
After viewing this lecture, you will be able to work with the cwevent and cwgame programs from the Chadwick software.
After viewing this video, you will be able to extract the information you need for our first project using the Chadwick software. You will also see how to work with wildcards in Linux.
After viewing this lecture, you will be able to assign names to data frame columns. We will also review how to read csv files into R.
After viewing this lecture, you will be able to work with some of the logical operators in R. We will also review the mutate verb in dplyr.
After viewing this lecture, you will be able to work with the substr function in R.
After viewing this lecture, you will be able to work with the paste function and the as.Date function in R.
In this video, we extract the information we need for our player data frames from our main bdat data frame.
After viewing this video, you will understand enough of the essentials of ggplot to create our cumulative home run plots.
After viewing this video, you will be able to put multiple plots on one graph.
After viewing this lecture, you will be able to add axis labels, a title, and a legend to your graph. You will also understand how to use color within the aesthetics.
In this video, I give the details of project #2.
In this video, we return to our Linux machine and extract the data we need for our project.
In this video, we read our data into R.
In this video, we generate a column of date objects in the default R date format.
In this video, we modify the AB column and generate an H (hits) column.
In this video, we generate the player data frames. We accumulate the AB and H columns and then divide them to obtain a batting average column.
In this video, we finally generate our plots.
In this video we add a horizontal line to our graph to represent the .400 batting average line.
In this video, I recommend a text for additional ideas and examples.
This course is for those interested in doing baseball analytics with the Retrosheet game-by-game and play-by-play data. The main tools for working with such data are in the Chadwick software. We install a virtual Linux machine, on which we will install the Chadwick software. We will then learn how to extract baseball data with the Chadwick software, how to further filter the data with dplyr in R, and how to plot our results with ggplot.
For the first part of the course, in which we install the virtual Linux machine and learn how to work with the Chadwick software, there are no prerequisites. To follow the second part of the course, knowledge of dplyr is necessary. This can be obtained through my course "Baseball Database Queries with SQL and dplyr".
At a relaxed pace, the course should take two to three weeks to complete.