R Datasets and Dataframes

R-Tutorials Training
A free video tutorial from R-Tutorials Training
Data Science Education
4.4 instructor rating • 24 courses • 225,349 students

Learn more from the full course

R Level 1 - Data Analytics with R

Use R for Data Analytics and Data Mining

07:22:32 of on-demand video • Updated April 2019

  • this course will show you how the most common types of graphs can be produced with R base
  • you will get a good understanding of functions and loops in R which are very useful programming skills to have
  • you will get the necessary theoretical background for R
  • you will learn how to create and handle different types of objects
  • you will get fluent in the R programming language to master your specific quantitative tasks
English [Auto] One of the great things about our is the availability of exercise data sets, those are various data sets that come with our base or with an add on package. This allows you to easily try out new features of our. It also makes sure that you can communicate the problem to global our user community with the same data set on your path to our mastery. You will encounter many code examples using those exact datasets like empty cars. And Iris, I would highly recommend to get familiar with at least those two data sets. Now, in this video, I will show you where those data sets are, how you can access them, how you can make yourself familiar with those data sets, how to manipulate them. And while we are already working with data frames, mainly, I will also show you the basics about working with those data frames to even get a list. You could check out the package data sets. This is where all the our base data sets are located, all the famous ones and most of the ones I used in my tutorials are housed in this package. If we check this one out, we can get an alphabetical list of the data sets available, for example, if we click on air Miles. We learned what this data is about, which format it has, the source and even the references. If you use one of these data sets for the first time, it's quite wise to check out the section. Variables, including the dimensions are well explained here. By the way, this is the same as if you would code questionmark Amaya's. You would again get to the help section of this data set, but of course, if you import your own data set, all of this is not available. You should know what your data set is about anyways to get a quick overview on your data set. And this is, of course, also applicable to your own imported data sets. You could state head e-mails or tail e-mails. This gives you the first and last six rows of observations, this is important to at least know how many variables there are, what the dimensions are, and you also get an idea about the class of each variable. Again, this is a quick way so that you do not need to print out the whole data set, which can slow down your work drastically if you have several thousand observations to work with. Another way in how to orient yourself when new data is the summary function if you run this one on the very famous empty cost data set. We get basic statistics like quartiles, minimum, maximum, median and mean for each variable in this case, we can see that there are 11 variables. You could also plot a data set to get a first impression. In this case, since there are 11 variables, we get to scatterplot metrics, which is not that helpful. There are too many plots as to be able to see clearly any patterns. If you have only one variable in your data set, like in a simple time series data. You could also use a histogram in this case. Let's try the highest command of our base to get an idea about the distribution of the Army's Time series data. In this case, we would quickly learn that the most observations are in the first spin between zero and 5000 miles, I would say that visual impressions are a valuable source for insight into your data. Our base has the functions, plot and history, which can be tweaked to get a quick impression on nearly any data out there while we are already talking about those preinstalled, our data sets. Let's also take a look at working with those data sets, especially with data frames like empty cars. Let's recall the empty cars data set by calling head empty cars so that we see the variables that are in the data frame. We now want to learn how to manipulate such a data frame, and they will also show you general rules of working with those data frames. If you want to extract a single column, you need to use the dollar sign that way, the computer knows that you are talking about column X, state of room Y, let's say we want to get this some of the column weight in the empty cost data set. You might already know the function some, but now we need to specify the name of the data frame, which is empty, because then a dollar sign and then the name of the variable, which is W t in this case. If you run the line, we learned that this column has a total of 102 point nine times 1000 pounds, which is the dimension for this variable stated in the documentation. Now, this method with a dollar sign is fine if you work with several different data frames at the same time and you want to avoid confusion with the variables. But if you plan on working intensely with a single data frame, you can spare yourself some time by attaching the data set to your environment. That way I know which data frame a given variable belongs to. If we attach the data, set empty cars, we can get the same sum without using the dollar sign. We can just state the variable name and now are automatically connects this call with the empty cars data set. Again, this is useful if you have one main data frame. This could, however, lead to confusion if you have several data frames with similar or the same variable names. By the way, if you want your data set to be attached only temporarily, you could undo it with the function detach like you do now. If I now run the same function, some double duty does not know anymore which data set, I mean, and they get an error message. So now that we know how to work with specific variables, I also want to show you how you can extract data from a data frame, like with all manipulations based on index positions. We need the box brackets for that. Let's say we want to extract the values of the variable W.T. of the second observation from the head of the data. We know that W.T. is the variable number six. Therefore, we code the name of the dataset empty cars, then the box brackets. At first we state the position of the row and then the position of the variable. That way, we can expect very specific info from our data set, if you want to enlarge the spectrum, you can simply use the concatenate tool to insert a vector, like, for example, in the next line. So this line gives us the weight values for the rose to five and eight. All right, guys, let's recap what we learned in this video I showed you that there is a package called data sets in our base, which houses an array of preinstalled, data sets. You can easily use these data sets to learn are and to try out new tools. I showed you how to get help for these preinstalled data sets for a first impression on newly imported data. We discussed the functions head tail summary and we also took a look at visual impressions with plot and history. Very importantly, I showed you how to work with data frames. You could use the dollar sign to get access to a specific column, or you can even attach and detach this data frame to make coding easier. And at last, I showed you how to extract not only at a variable level, but also at the value level by using the box brackets.