Wrangling Major League Baseball Pitchf/x Data with Python
What you'll learn
- How to find MLB game and pitch data in Gameday.
- How to create and program a Jupyter Notebook in Python.
- How to extract XML pitch data from the MLB website.
- How to coerce XML tree data into a Pandas Dataframe.
- How to extract Dataframe slices into multiple views.
- How to plot pitch data with Matplotlib and Pyplot graphs.
- Adding data columns to a Pandas Dataframe.
- Plotting pitch tendency as pie charts (by ball-strike count).
- Basic programming is helpful.
In the 2006 playoffs, Major League Baseball debuted a pitch tracking camera system called PitchF/x. Now installed in every MLB stadium, the system has been continually extended and re-branded. From cameras to TrackMan radar, from StatCast, to GameDay – MLB now tracks every pitch and every player's movement on each pitch. The data are made public on the MLB web site and SaberMetricians world-wide pour over every detail. The teams themselves, average five or more statisticians dedicated to analyzing the data to aid in selecting and improving players.
I'm Chaz Henry – a software engineer, 12 year little league coach and founder of the PowerChalk dot com website. In this class, we're going to open a fresh Jupyter Notebook, grab the MLB game data from Clayton Kershaw's 2014 no-hitter and wrangle that data in Python. It's an introduction in SaberMetrics - the empirical study of baseball statistics.
We'll use built-in Python libraries and graph the pitches with MatPlotLib and PyPlot. Along the way we'll talk about best practices for Jupyter Notebook, Python coding, XML parsing and maybe a little baseball.
So, if you're a coder, a SaberMetrician or a just a baseball fan who wants to peek behind the curtain at what's driving MoneyBall and the next wave of player development, sign up for the course and let's start scrubbing the pitch data from one of the greatest pitching performances in MLB history.
Who this course is for:
- Beginner or intermediate Python programmers.
- SaberMetric baseball fans.
In 2000 Chaz sold the company that he built from his Computer Science Masters thesis at NC State to a public company in Silicon Valley. Since then he has built an online video game (StMulligan), a Facebook Chatbot for Ticketmaster, a cloud based video analysis system used by the Los Angeles Dodgers (PowerChalk) and a Raspberry Pi based sports camera system. He is a 12 year Little League baseball coach and and avid sports fan.