The .agg() Method

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,891 students

Lecture description

Certain situations may require different aggregation methods on different columns within our groupings. In this lesson, we'll invoke the .agg() method on our GroupBy object to apply a different aggregation operation to each inner column.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] Are right in this lesson I'll introduce the add method which allows us to call different aggregation methods on the columns within our data frame. So let's begin by executing our code there we can see a preview of our first three rows of fortune and the add method is going to be called directly on our group by object which is stored in the sectors variable. So let's begin with sectors dot AG and then I'm going to open my parentheses to indicate a method. So usually we've been doing something like calling the method collectedly So for example I've done something like sectors that mean. And that took the average for every numeric column that was available. The average for revenue profits and employees as a whole. I also was able to extract an independent column like let's say employees and then get the mean just for that. And what the AG method allows us to do is to get the best of all worlds. We can specify the column and what operations we want to perform on it. And that allows us to perform completely different operations on each. So for example if I want to sum up the values in my revenue and profits column for each grouping while performing a average or mean calculation for the employees column among each grouping what I can do is use the AG method what it expects is a python dictionary which of course is indicated by a pair of curly braces. And in this dictionary we want to use a column name as a key and after the colon we want to provide the method or operation that we want to aggregate on as the value. So for example if I want to perform a sum aggregation on my revenue column I could write revenue as my key a colon which is the default syntax in a dictionary to connect a key to a value and then the name of the aggregation method which is some and then going to put a comma to move on to the very next combination. And at this point I can also create a line break. There will be no errors when we execute. So let's say for my profit's column I also want to do some. So I'm going to write profits which matches my column name right here. Then a colon and again in quotes the name of the method that I want to apply which is sum. Now of course if I want to include the employees column I just add another comma and write employees and let's say for employees I want to do a completely different aggregation. I want to take the mean or the average. Now I could write mean. And so what it's going to do is perform a unique aggregation process on each column depending on what command it gets from the dictionaries. So now you'll see the results in the revenue and profits column are going to have the sums or the totals for those groupings. While the values in the employees column are going to have the averages or means of the employees for the companies in that grouping or in that sector. Now let's do a slightly different example let's provide a list of values to the Add method that's the opposite thing that we can give it. So in this one we specified a specific operation that we wanted to apply to every column. Now let's say we want to apply multiple operations to multiple columns. So for example if I do sectors the AG and I open the parentheses Let's say I want to apply and not just one operation like we would do you know some are mean let's say to each column revenue profits and employees. I want to take the size I want to take the sum and I want to take the mean. Now what I can do here is put a list and right in size some mean. And so there's three columns times three operations. So when I aggregate this I'm going to get nine total columns and you can see for each of those three columns I have them up top as the top level of my columns index. And then for each I have a calculation. So for example I have 20 companies that fall into the aerospace and defense sector the sum of their revenues is three hundred fifty seven thousand nine hundred forty and the average of their revenue is seventeen thousand eight hundred ninety seven. And then I repeat the process for profits and employees. And of course you'll see in this case the size is going to be the same exact same for each. But this might change if we have something like northerlies But the more important calculations which are things like some and mean are now separated from those three columns and calculated in a single data frame. So weve performed a massive calculation process with just a single line of code. Weve taken the sum and mean of every sector within our original fortune data frame by just grouping and calling this AG method. So the AG method is basically just an aggregator. It accepts either a dictionary where we specify what we want to aggregate each column by or it can take something like a list where it applies to every single columns and just applies that operation as it loops through these columns. And of course what we can do here is combine these two together. So for example if I wanted to take revenue here and I wanted to give it a list like I want the sum and the mean for the revenue and only these some for the profits and only the mean for the employees I can execute this and you'll see for each one it's going to separate them for revenue because I gave it a list. It's going to do both sum and mean for profits since I gave it a single string value. It's only going to do some. And for employees it's going to calculate the average or mean of the number of employees in each group because I only gave it a single string argument. So you can mix and match these two designs to specify what columns you want to apply operations on what combination of operations you want to apply these methods on and so on. So that's just a quick introduction to the AG method. There's a lot more that you can do with it. But this hopefully will just get the brain juices flowing to give you a general sense of how quickly you can aggregate and summarize the information that's contained within a data frame through the group by object.