Sort a DataFrame with the sort_values Method, Part I

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,219 students

Lecture description

Call the .sort_values() method to sort the values in a DataFrame based on the values in a single column. The method is a bit more complex than when called on a single-dimensional pandas Series.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] All right in this lesson we'll talk about these sort values method when called on a data frame. Let's begin by executing our code and previewing the first three rows of R and B a data set with the head method. So as you may recall whenever we have a series all we had to do was call sort values on it and it would automatically be sorted. And that's because a series is just a one dimensional column of data and Panas knows that you expected to sort that series. Now storing a data frame is a little bit more complex because Panas has no clue what column you're referring to. Whenever you call the sort values method on a data frame. So if I do NBA certain values and I add my parentheses for every method. Let's take a look at the parameters. You can see that the very first parameter that's listed is by and that's where Pandurs is expecting a string in this case of the column that we want to sort by. So if we do something like name it's going to sort the rows and use the values in the Name column to do so. So you can see that now the names that are alphabetical so that starts with a are going to show up first and all of the other row values are going to come up with them. So it's going to sort the entire data frame but it's only going to do it by the values in the name column. And predictably just like when we call these sort values method on a series these sort values methhead on a data frame also has an ascending parameter set to true by default which means alphabetical on a series of strings. But we can also set it equal to false and that's going to sort the name column in reverse alphabetical order and sort the rest of the data for him along with it. Let's try it a couple of more examples. Let's say we want to sort by a column of. In this case floating points let's say the age column I could do NBA that sort values open my parentheses and the first parameter is by which is the column that pandas wants to sort by. So this time I want to sort by age and you can see that the smallest ages show up at the top. We have these players who are 19 years old. And it proceeds upwards in a sending order to the greatest age. Now if we change this to a Sunday it was false when we're dealing with a number like an integer or a floating point. And this means descending which means show the greatest number or in this case the oldest age first and then proceed downwards to the smallest age. Let's do one more example with salary. If we do NBA that sort values and we just sort by salary we're going to get the lowest salaries at the beginning and the greatest salaries at the end. If we want to take a look at let's say the top earning NBA players we can change the ascending parameter to false instead of its default argument of true. And there we have our top earning players in the NBA. Kobe Bryant LeBron James Carmelo Anthony. Cool. And as with the regular sort values method this operation is not actually modifying the original data frame until we actually add that in place equals true argument and only at that point if I preview NBA immediately afterwards you can see that now we've sorted it by salary and overwritten the original NBA data frame. Now to conclude this lesson I just wanted to emphasize one point that I didn't mention. You may recall that a lot of these columns have no values those values and then the last lessons we talked about how to fill those in but in case we don't fill those fill them in as we did in this lesson you may be wondering where those any values will be placed in a sort operation. And the answer is by default they're going to be placed at the very end. So if I do NBA and I do that sort values and I do something like the salary column what we can see is if I do the tail method here you can see that it's showing us all of these no values and that's because it's going to start in ascending order which means the smaller salary is Shaab first then it's going to proceed all the way to the highest salary and only then is it going to store all of the Navajoes or the values or rather the rows with the null values in the salary column. Those are going to be placed at the very end here. And that's due to an argument here or a parameter called a position and you can see it set by default to last. So the operation that we just performed is equivalent to writing it out like this. Now if we want to do the opposite where we want the know the no values to populate at the top of the data frame we can change this argument to first. And what that is going to do is it's still going to keep the ascending order for the regular values but it's going to place the nails first and when it's done with the nulls then it's going to begin the sort of the salary values in ascending order and we can mix and match by the way if we want first to have our no Dahlia's at the very top. And then we want to sort the salary in descending order so start at the greatest salary and proceed downwards to the smallest salary something like this will work. So you can see at the very top we have all of our know values in the salary column. And then if I go take a look at my last five rows with the tail method there we have our smallest salary so it's starting at the greatest salary. After all those no values and the proceeding downwards to the smallest salary. So that is the sort values method when called upon a data frame and in the next lesson we'll take a look at how to sort a data frame by using multiple columns to sort.