Baseball Database Queries with SQL and dplyr
- Students will need to have R and RStudio installed on their own computers. (I will cover how to do this.)
- It will be best if students also have Microsoft Access, but this is not absolutely necessary.
In this course, we explain the relationship between SQL and the R package dplyr. I will show you how to query a baseball database with SQL in Microsoft Access and then show you how to do exactly the same thing with dplyr in R. We will begin with simple queries, progress to aggregation and grouping, and finish with queries involving joins. By the end of the course, you should be able to use dplyr to explore your own data sets.
At a relaxed pace, it should take about three weeks to complete the course. The course is for beginners in SQL, R, and dplyr. You also do not have to understand very much about baseball. We will be using the Lahman Baseball Database, R, dplyr, and Microsoft Access. I will show you how to install everything.
- This course is for beginners who would like to learn about SQL and/or dplyr.
- This course is for beginners interested in baseball analytics.
- This course is NOT for those with extensive knowledge of both SQL and dplyr.
- Access Set-up05:55
- RStudio Set-up07:10
- The Rcpp Package01:00
- SELECT (SQL) and select (dplyr)09:04
- ORDER BY (SQL) and arrange (dplyr)04:58
- WHERE (SQL) and filter (dplyr)07:00
- AND and OR06:11
- Grouping and Sum in Access SQL03:49
- Grouping, Summarize, and Sum in dplyr05:33
- Averaging in Access SQL and dplyr06:31
- max and min05:33
- count (SQL) and n (dplyr)03:40
- WHERE vs. HAVING09:50
- Batting Average and Mutate (dplyr)07:10
- Career Batting Average10:54
- Inner Joins with Access SQL07:18
- Inner Joins with dplyr06:27
- A Query with an INNER JOIN08:36
- Joining on more than one field with SQL in Access06:22
- Joining on more than one field with dplyr in R07:00
- Joining three tables with SQL in Access05:07
- Joining three tables with dplyr in R04:31
- Grouping and joining with SQL in Access04:07
- Grouping and joining with dplyr in R06:09
- Problem #1 with SQL06:50
- Problem #1 with dplyr07:23
- Problem #2 with Access05:38
- Problem #2 with dplyr04:10
- Reading Other Data into R07:21
Dr. Charles Redmond is a professor in the Tom Ridge School of Intelligence Studies and Information Science at Mercyhurst University. He has been a member of the Department of Mathematics and Computer Systems at Mercyhurst for 21 years and has recently completed a term as chair of the department. Dr. Redmond received his PhD in mathematics from Lehigh University in 1993 and has published in the Annals of Applied Probability, the Journal of Stochastic Processes and Their Applications, Mathematics Magazine, the College Mathematics Journal, and Mathematics Teacher. In his spare time he enjoys making music and computer generated art, reading, and owning a Clumber Spaniel.