Baseball Database Queries with SQL and dplyr
- Students will need to have R and RStudio installed on their own computers. (I will cover how to do this.)
- It will be best if students also have Microsoft Access, but this is not absolutely necessary.
In this course, we explain the relationship between SQL and the R package dplyr. I will show you how to query a baseball database with SQL in Microsoft Access and then show you how to do exactly the same thing with dplyr in R. We will begin with simple queries, progress to aggregation and grouping, and finish with queries involving joins. By the end of the course, you should be able to use dplyr to explore your own data sets.
At a relaxed pace, it should take about three weeks to complete the course. The course is for beginners in SQL, R, and dplyr. You also do not have to understand very much about baseball. We will be using the Lahman Baseball Database, R, dplyr, and Microsoft Access. I will show you how to install everything.
- This course is for beginners who would like to learn about SQL and/or dplyr.
- This course is for beginners interested in baseball analytics.
- This course is NOT for those with extensive knowledge of both SQL and dplyr.
- Access Set-up
- RStudio Set-up
- The Rcpp Package
- SELECT (SQL) and select (dplyr)
- ORDER BY (SQL) and arrange (dplyr)
- WHERE (SQL) and filter (dplyr)
- AND and OR
- Grouping and Sum in Access SQL
- Grouping, Summarize, and Sum in dplyr
- Averaging in Access SQL and dplyr
- max and min
- count (SQL) and n (dplyr)
- WHERE vs. HAVING
- Batting Average and Mutate (dplyr)
- Career Batting Average
- Inner Joins with Access SQL
- Inner Joins with dplyr
- A Query with an INNER JOIN
- Joining on more than one field with SQL in Access
- Joining on more than one field with dplyr in R
- Joining three tables with SQL in Access
- Joining three tables with dplyr in R
- Grouping and joining with SQL in Access
- Grouping and joining with dplyr in R
- Problem #1 with SQL
- Problem #1 with dplyr
- Problem #2 with Access
- Problem #2 with dplyr
- Reading Other Data into R
Dr. Charles Redmond is a professor in the Tom Ridge School of Intelligence Studies and Information Science at Mercyhurst University. He has been a member of the Department of Mathematics and Computer Systems at Mercyhurst for 21 years and has recently completed a term as chair of the department. Dr. Redmond received his PhD in mathematics from Lehigh University in 1993 and has published in the Annals of Applied Probability, the Journal of Stochastic Processes and Their Applications, Mathematics Magazine, the College Mathematics Journal, and Mathematics Teacher. In his spare time he enjoys making music and computer generated art, reading, and owning a Clumber Spaniel.