In this course, I show you how to evaluate the performance of a regression model using training sets and test sets. We will use R and ggplot as our tools. Along the way, we will learn how to row-slice data frames, use the predict function in R, and add titles and labels to our plots. We will also work on our programming skills by learning how to write for loops and functions of two variables.
Students should have the background in R, ggplot, and regression equivalent to what one would have after viewing my two Udemy courses on linear and polynomial regression. At a relaxed pace, it should take about two weeks to complete the course.
In this video, I introduce the course.
After this lecture, you will be able to extract specific rows from a data frame. You will also be able to sample from a given set of integers. Putting all of this together, you will be able to randomly divide a data frame into two distinct data frames, each of the same size.
In this lecture, I generate plots of both the training set and the test set. I also show you how to add a title to your plots in ggplot.
After viewing this lecture, you will be able to use R's predict function and ggplot to obtain a plot of the least-squares line. You will also begin thinking about applying the least-squares line, generated from the training data, to the test data.
After viewing this video, you will be able to use R's predict function to calculate the test mean squared error.
After viewing this video, you will be able to use R's predict function to plot the quadratic polynomial that fits the training data set the best. We will have to do this by writing our own function and using the stat_function in ggplot.
for the Quadratic Model..
After viewing this lecture, you will be able to write for loops in R.
After viewing this lecture, it will be easier for you to generate higher-degree polynomial models.
After viewing this video, you will be able to quickly generate test MSE's for higher degree polynomials, using a bit of programming.
After viewing this video, you will be able to generate a plot, via ggplot, of polynomial degree vs. MSE.
After viewing this lecture, you will be able to write functions of two variables in R.
In this video, we will take our for loop and set it inside a function of two variables.
After viewing this lecture, you will be able to repeatedly divide the original data set into training and test sets, calculate the test MSE's for a range of polynomial models, and plot the results. You will do this with the help of a bit of programming.
In this lecture, I mention some problems associated with the method we have been discussing throughout the course. I also give information about an excellent resource on machine learning.
Dr. Charles Redmond is a professor in the Tom Ridge School of Intelligence Studies and Information Science at Mercyhurst University. He has been a member of the Department of Mathematics and Computer Systems at Mercyhurst for 21 years and has recently completed a term as chair of the department. Dr. Redmond received his PhD in mathematics from Lehigh University in 1993 and has published in the Annals of Applied Probability, the Journal of Stochastic Processes and Their Applications, Mathematics Magazine, the College Mathematics Journal, and Mathematics Teacher. In his spare time he enjoys making music and computer generated art, reading, and owning a Clumber Spaniel.