R programming: What is the Apply family?

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 119 courses • 1,576,225 students

Lecture description

Here you will know about 3 main functions from Apply family and will know how to use them

Learn more from the full course

R Programming: Advanced Analytics In R For Data Science

Take Your R & R Studio Skills To The Next Level. Data Analytics, Data Science, Statistical Analysis in Business, GGPlot2

05:53:11 of on-demand video • Updated October 2020

  • Perform Data Preparation in R
  • Identify missing records in dataframes
  • Locate missing data in your dataframes
  • Apply the Median Imputation method to replace missing records
  • Apply the Factual Analysis method to replace missing records
  • Understand how to use the which() function
  • Know how to reset the dataframe index
  • Work with the gsub() and sub() functions for replacing strings
  • Explain why NA is a third type of logical constant
  • Deal with date-times in R
  • Convert date-times into POSIXct time format
  • Create, use, append, modify, rename, access and subset Lists in R
  • Understand when to use [] and when to use [[]] or the $ sign when working with Lists
  • Create a timeseries plot in R
  • Understand how the Apply family of functions works
  • Recreate an apply statement with a for() loop
  • Use apply() when working with matrices
  • Use lapply() and sapply() when working with lists and vectors
  • Add your own functions into apply statements
  • Nest apply(), lapply() and sapply() functions within each other
  • Use the which.max() and which.min() functions
English Hello and welcome back to the advanced course in R programming in this tutorial we're going to learn about apply family functions. We're going to aim to understand the underlying fundamental principle behind them and how they operate and why and also through that we will see why they were created in the first place. So let's jump straight into it. Here we've got a matrix of three rows and five columns. Let's say this matrix has a name and it's called M. Now let's see what happens when we apply the simplest of the apply. Family of functions the Apply function to this matrix. So in order to do that would type in a line that looks something like this. Apply then the first argument would be m meaning the matrix that we want to use this function on then number one and we'll understand what number 1. In this case means just in the second and then a function that we want to use on the matrix. So here we're using the mean function as a what does this mean. Well this means that we want to literally take the mean function and apply it to matrix M and this is where the one comes into play because the one actually means that we want to apply this function mean to the rows of our matrix. So that is exactly what is going to happen when you run the Slyne apply is going to take mean and is going to apply it to every single row of the matrix and treat those rows as vectors. So that is where the name comes from. You want to apply something to your matrix and that's why you use Apply function. So when you do that what you'll get in result is a vector of three values. 51 is the average for the first row for 41.6 is average for second row and 43 is the average for the third row. So that's how the Apply function works. Let's look at another example. Let's say we want to apply the function which is maximum to this matrix and also to the roads. How would we do that. Well I would say apply M 1 maximal meaning we want to take the maximal function applied to matrix M and use the rows of this matrix. So this would look like this. Once again apply the maximum to the first row to the second row to the third row and in result what do we get. We get a vector once again of three values hundred and eleven is the maximum of the first row 98 is the maximum of the second row and a hundred and one is the maximum of the third row. All right so hopefully this makes sense it's pretty simple concept. You just want to apply something to matrix to all of the rows. But now let's look an example like this. Let's say we want to apply to matrix M we apply the mean function. But this time instead of the one we're going to put a two in here what happens then. Well a 2 means columns. So basically we're applying the mean function to each one of the columns. And what do we get as a result of this. Well this we also get a vector but not of three values. This time the vector is going to have 5 values right because we have five columns. And here you can see those values. So 24 is the mean of the first column. Forty three point seven the mean of the second column. Forty three point three The mean of the third column and so on. And that is how the play function works. As you can see it's a very very simple and straightforward concept. And the interesting part here is that the first temptation that you might have when you want to apply the mean function for instance to this Matrix is to create a loop and loop through the rows of The Matrix and apply the mean function to every single row. Well that is also an option. And as you'll see further down in this section of the course we're actually going to experiment with that and we're going to test it out and compare the two approaches. But the power of R will a lot of the power of R actually lies inside of these apply functions that you can achieve the same results much quicker and much more efficient. And that's why it's considered to be such a powerful tool for data mining and data analytics and everything to do with data science because you have these apply functions and you can achieve results much much more efficiently so you don't have to worry so much about coding. You just type in an apply function and does the hard work for you. So there we go. That's the basic apply function. And now let's quickly have a look at the whole list of functions in the Apply family so the family consists of the following functions the Apply function which we just talked about it on the matrix and you can either use on the rows or columns. And in fact there is a bit more to it you can specify a vector of 1 and 2 and use it on the rows and columns at the same time. But that's a bit more sophisticated so we won't go into that. For now for us it's sufficient that we can use it on the rows all the columns. Then we've got the tapply function which is used on a vector to extract subgroups and apply a function to them. We've got the by function which is used on dataframes and the same is the same concept as a group in SQL as group by in SQL Then the eapply function which is used on an environment hence the letter e. Then you've got the lapply. It's a function that is applied to all elements of the list hence the letter L. Then you've got sapply it's a version of L apply. It can simplify the results so it's not presented as a list so it's either present as a matrix or a vector hence letter S You've got vapply which has a pre-specified type of return value hence letter V. You've got replicate which can run a function several times usually used with Generation random variables . So this is one of the few ones that doesn't actually have the word apply in it and also don't confuse it with the replicate function which is just R E P. which we've used before. So these are different functions. Then we've got mapply which is a multivariate version of sapply arguments can be recycled hence letter M As you can see this list just keeps going. And we've got rapply which is a recursive version of lapply hence the letter R. So as you can see just from the sheer size of this list we what how I've got here we've got 10 functions in this family. And actually some people attribute more functions to this family that are not on the slide just through by seeing the size of this list. You can tell that this is a very popular concept and R is very evolved. It's very thought through. There's lots of different applications to this family and that's why there's so many members in this family. And of course it would take many many many hours to get through all of these functions and master all of them. That's why in this section of the course we're going to focus on the three that are highlighted in bold . The apply function, the lapply and the sapply. They are by far the most popular ones the most used ones and they're the best to get started with there . I wouldn't say that they're ultra simple. They do get more complex and you will see we will build some very sophisticated examples using just these three functions. But even knowing about these three functions you will already have all of the basics you need to then explore the ones that you might require. Along the way. So hopefully you're excited about this section. And we're definitely going to learn a lot and learn some very powerful techniques in R Can't wait to get started on the first tutorial. And I'll see you there until next time happy coding .