Use the pd.melt method to create a narrow dataset from a wide one

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,156 students

Lecture description

The pd.melt() can effectively perform anti-pivot operations. In this lesson, we'll call the method on a DataFrame to convert its current data structure into a more tabular format. We'll also explore the optional parameters available to modify the resulting column names in the new DataFrame.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] Are right in this lesson I'll introduce the PD melds method which basically works essentially as the reverse operation to a pivot table and essentially takes an aggregated data set and then converts it to a tabular format. Let me show you what I mean. I'm going to use my ritziest the method here to import a newsiest V called courters that CXXVI And for now I'm just going to save it in a variable called sales and I'm just going to output the entire variable right here. It is a fairly small data frame but here what you can see is we have the data frame with 10 salesmen that are listed right here in the very first column and to the right we have essentially their revenue number let's say for each quarter throughout the year. So here is how much Boras made in quarter one quarter to quarter three quarter four. Here's much Bob made and so on and so forth. Let's say that we want to convert this to a more modern data structure. We basically want it to say Boris followed by the quarter fourth followed by the revenue. So we basically want to take this information and essentially create one line for each of these data points right here. One line for each of these data points for Bob right here and basically move this information from the columns to an other single column that's going to represent that information. So again super hard to describe in words but let's take a look at how it works in practice so I'm going to call the method directly on panderers open my parentheses and the very first thing here as you can see is expecting a frame and our frame is being stored in sales. So I'm going to put sales in here. And then the parameter that I want to look at here is ID vars basically this is going to accept an argument of essentially in this case a single column that's basically going to indicate what's going to remain as the column that's going to be preserved essentially. Now the values here are going to be duplicated. But this is going to be the column that's that's going to be kept in place. So basically what we want to do is we want to keep the sales because for these columns Q1 Q2 Q3 and Q4 those are the coms that we basically want to destroy because we're going to take their values and store them in a brand new color. So what I'm going to do is for that ideavirus parameter on going to do is give it the argument of salesman which is the name of this column right here. So what it's going to do is it's going to keep these values of course it's going to create four copies of each one for each of these numeric values. Let's take a look at what we get. So right below here you can see that we've created a new three column data set. And you can see that Borris is now going to be remade because that salesman column is what we told it to keep. But now we have this new variable column with this variable column represents is the value that was stored in each one of these columns. So you can see as I start here we begin with Q1 and then we move on to Q2 and Q3 and Q4 and so on. So predictively if I take a look for example at me Boris in Q1 I had 600 in two thousand nine hundred and eight. And that's the value that's listed right here for Q1 which has been moved right here into this new variable column. And for example for Q2 I have 2 3 3 8 7 9 and that was on a single Borris from the original data frame. But now if I go to Borjas Q2 you can see that there is that value but now it's going to have its own separate row in this brand new data frame. So basically what we've done is essentially take these columns these values right here and basically condense them into a single variable column. So these values right here are going to represent my old column headers. Whoops. And then the values are just going to be the reformatted aligned values from those columns. Now the default panel is going to do is to call this column variable that's going to represent our old column headers right here and then call this column values which is basically going to represent the values here. And we can basically change that very easily through two other parameters. We have the var name parameter which is going to be basically this variable column that's being generated. So if we take that parameter and we provided a new argument this is really the quarter. This is basically specifying the name that's going to appear in the column that's going to serve as the collection of all the column headers right here. So we want that to be quarter. So I can see a salesman as appearing and this represents a quarter which is good. And finally we want to give another name to this value column which is basically going to be the default column header for all of the actual values that have been realigned from the original data frame and that's available in a value named parameter. You can see it right here. So instead of having value I'm just going to change that to be revenue. So now you can see we've created a new three column data frame it looks different from the original one but it's basically storing the exact same data. Just one once again which is the central theme of this whole module just realigning the data. Pivoting it looking at it from a different angle from a different perspective. Nothing has changed technically we haven't even you know aggregated or form any kind of calculations here. We've just taken these columns and condense them into a single column called quarter and then move the corresponding values for each person into their proper row where we're not going to have four rows for every single salesperson one for one row for each quarter for every single one of our nine sales people here. So you can see we have nine total people we started zero and we go to eight or nine total people times four rows one for each quarter is going to be 36 total rows. So here in this data if predictably we're going to have 36 total rows that's going to be 4 8 9 9 salesman times 4 quarters for 36 Toro's. So again the mountain that that is basically how we reorient and reorganize our original summarized data frame into a more tabular structure.