The .stack() Method

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,253 students

Lecture description

The .stack() method stacks an index from the column axis to the row axis. It essentially transfers the columns to the row index. In this lesson, we'll see a live example on our bigmac dataset.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] Are right in this method let's return back to multi indexes and I'll be introducing the stack method which basically takes the columns the top can called the column axis or the column index and moves that index to the main index the one on the left the horizontal one. Of course this is a concept that is much easier seen an action than explained so let's get right into it. I'm going to be introducing a new data set for this lesson in the next couple ones and it's called world stats that CSFB. So let's call our greasiest method and import this world stats file. And this is basically economic data I think from the World Bank or some kind of other global organization basically has a column for the country. The year as well as that country's population and their GDP the economic statistics are largely irrelevant. But we basically want to create a monthly index here and GRUBERT by the country and the year. So in order to do that I'm going to use my index card parameter on my resume method as a reminder you can give this either a single string with a column name and it's going to create an index with that column values or you can get an A list which is what we want to do and that's when you create a multi index multilevel index in our data frame. So I first want to have a country index and an eight year level within my multi index. Let's take a look at what this looks like. And there you have it you're going to have country on the outside of the outer layer and then you have the year and then you have the statistics. So I'm going to just store this in a very low carb world and let's preview the first three rows with the head method and an argument of theory. So there we have it. So as I mentioned we were typically used to referring to the index as the stuff on the left here. This doesn't bode well the columns technically constitute an index as well. Basically the other index they're the index in position one. We use the term columns because it's much more common. It's a much more simpler way to describe it. But really an index is just kind of an identifier for a value in a table. And really the columns are an identifier for the specific column in which the value belongs. So this still constitutes an index and all that the stack method does. I'm going to call it on my wall that if Im right below for X could I explain it again. Is it basically takes the column based index and moves it to be on the road based index. So when I execute this you'll see that population and GDP will move. So from here these two values have moved right here. And of course it's created those two gallons for every single value or combination of values in my existing multiplexed. So because I need to store two rows now one for the population and one for the GDP I've effectively doubled the size of my data from a rose perspective because I now have twice as many rows. However I've passed the number of columns I used to have from my original one so I basically you know doubled in size one way and have that in size. The other way. So just kind of against shifting or pivoting how the data looks we're not changing its content so much as we're just looking at it from a different angle. So what will the other thing that's happened here is in addition to kind of migrating those values we can see that the value of population here for the Arab world to 2015 is right here Arab world 2015 population. And there is that number again. Similarly the first GDP value you can see it starts with 2.5 3 is listed right here on the GDP grow 2.5 3 for that combination of Arab world in 2015. But the big difference here is because Panas is left with one column of data. It's converted our original data frame essentially to a panda series. It's kind of confusing to have a single column panel series with a 3:11 multi index. But that's really what it is. It's just one column of information in order to identify any value in that column you just need to arrive three values to the index it's technically not a multi-column multi-role data frame it's not two dimensional because in technical terms you only have to provide one piece of information in order to access a value here. It's just unfortunate that that one piece of information has to be three pieces of information which is the country the year and the actual statistic here population or GDP that you want to pull. But in the Panas terms it is technically a one dimensional object because there's only one cause of death. And I can prove that this is a series by the way by passing this resulting object into my type type function in Python and get in that it is in fact a series. Now if you don't like that design then you don't like the fact that it's being rendered as a series. There is a convenient serious method called to frame. It's going to be to underscore frame and predictably what this method does is convert a series to a data frame. So if it's a multi index series it's going to keep the multi index you can see here that country in year and this column that we moved from or this new index the third layer that we move from the columns is still going to be present here you can see that it's not going to have a a name because it didn't have any name in the college level but you can see here that this is now going to be data for and we can tell by the more bold look. And also because we basically are left with no no names describe the column that's now storing these values combined Panas as default to the numeric index naming scheme which is always numeric and it has one column leftovers that's predictably going to get a 0 for the first position in that index. Of course if you want to change that it's as simple as just calling the rename method on the data frame and providing it that dictionary can specify in a dictionary. I think it is. Anyway the rename method is actually a lesson on it in this in this. And this course being called the rename method to change that 0 to whatever you like but basically to summarize the stack method essentially Pinots a level it takes the columns and it stacks them continues to stack them on the horizontal based index. So that's the stack method. And in the next couple of lessons we'll explore the unstacked method which as you might be able to predict essentially does the process in reverse. It takes an outer layer of a multi index and basically moves it to serve as the columns. So I'll take a look at that. And the very next last.