Select One Column from a Pandas DataFrame

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,891 students

Lecture description

Use two syntactical options to extract a single column from a pandas DataFrame. I prefer the square bracket approach because it works 100% of the time. The alternative option is using dot syntax, which treats the columns as attributes of the larger DataFrame object.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] Are right in this lesson I'll introduce two different syntaxes that we can use to extract a single column from a data frame. Let's begin by executing our code to import R and B a C S V and assign it to the NBA variable. And I'm also going to preview the first three rows of the data frame with the head method just so that we have a reference point. All right. So I'm going to introduce the simpler syntax first and then explain why I don't prefer it. So the easiest way to extract a single column is by writing the name of the data frame and then we have a dot as always and in in data or frames like this where our columns have singular names so there is only a single word. There's no spaces there's no funny characters. We can actually just write out the column name directly. Now you have to watch out for case sensitivity so you have to enter it exactly as it's written in terms of keeping all the capital letters capital keeping all the lower case letters lowercase and so on. So if I wanted to extract the name column on the left all I have to do is write name here and it does not require any quotes or any fancy you know syntax symbols just the name of the column right after that period. And there we have the column extracted. Now if you take a second could you guess what object Panas is returning to us here. If you guessed a series you would be correct. Whenever we extract a single column from a data frame it's going to just extract it as a series. It's going to keep the index labels from our original data frame and then just take the values from that column directly from the column values in our data frame. So let's try another example. Let's say we want to do the number column. It's once again going to be the name of the data frame followed by a dot as always and then the name of the column that we want to extract written as it's written in the data frame but with no double quotes surrounding it. So if I want number there it is. And there we have our series of the values from the number column in our data frame. Let's do one more example. Let's extract the values from the salary column. I'm going to write the name of my data frame a dot and then the name of the column with a capital S. Of course. And there we have it. So if you enter it with the wrong casing or if you enter a column name that does not exist you're going to get an error. So just as proof of this I'm going to try this out with a lowercase s and you'll see that you'll get an attribute error below clenched his back and that's the first option that we can use. And what I can actually do here to collapse this my first option as always is just to double click this. But if I want to hide all the output a little trick here is that you can type. Output equals none. And what that will do is prevent the output from featuring any of the lines of that cell. So if I execute that that's just the way to save some space so that we can see what we're working with. So that's the very first way that we can extract one column. And there's a reason why I don't like this approach and that's that it doesn't work 100 percent of the time. It only works when our column names do not have spaces and we can't guarantee that. And I'd rather work with a syntax that is guaranteed to work 100 percent of the time even if it requires a little bit more typing than I would a syntax like this which is only going to work sometimes and not all of the times. So the second way to extract a column from a data frame is with brackets syntax. So that's going to be NBA and then a pair of brackets. Now you may recall from the previous module that when we use brackets on a series we can pass either an index label or an index position within those brackets and that's going to return the corresponding value when we use brackets on a data frame in comparison. We are going to get columns back in return. So whenever we want to select columns from a data frame is when we want to use the bracket syntax on a data frame. So let's extract the exact same three columns that we did in the cell above. If I want the name column I'm going to have to write it out as it's spelled but this time around we do need to surround it in double quotes so I can extract the name column this way. Square brackets after my data frame and then the name of my column in quotes. So this line right here is exactly the same thing as up here a little bit longer a little bit more symbols required. But this is going to work 100 percent of the time. So there we have it. And just to reiterate that point if I have a column for example if you can imagine instead of this being name if this was Player Name something like player name would not work as long as there are spaces the code above in this cell is going to run into an error. And whenever you run into spaces Panas and Python get really confused. In comparison the code below in the square brackets will always work. So Name column was actually player name this code would work if there were any symbols this code would work. This method is a lot more safe and a lot less prone to error which is why I prefer in my work. But as you explore other people's Juber notebooks you'll see it done many different ways. Let's do two more examples. Let's say we want to extract the number column. Once again we're going to write the name of our data frame square brackets and then name the name of a column in double quotes. If I want the salary column there it is. Here we have all of the values from the salary column if I want the number column. Once again just write the name of the data frame right the square brackets double quotes. Write the name of the column that you'd like to extract. So there we have the number column and as a reminder the first way that we introduced in the previous module to hide this output is just to double click this gray area to the left and there it is. That buys us some more space. Two things that I want to conclude this lesson with the first is just proving to you that whenever we extract a single column like the name column. If I pass it to the type built in Python function we can see that it is in fact a panda's series. So a single column when extracted from a data frame is going to be extracted as a series object. And the other thing here is that we have to remember that it's giving us a brand new object and that object is going to contain all of its regular methods and attributes. So for example when we extract something like name this is giving us back a brand new series and series have their own methods and attributes and we can continue operating on it as if it were the original object and we can continue method chaining so to speak. So if we wanted the first five rows of this name series that results when we extract the name column from our NBA data frame we can directly call the head method right on it because it's just going to get that name it's going to give us back a series and then it's going to move to the next operation in line which is the head method which we can call on a series and that's going to give us the first five rows of that name series that is extracted from the data frame. So don't be afraid to change multiple methods in sequence change methods and attributes and sequence to play around within Juber notebook and see how all of these operations work. Many times when you're interacting with one the pandas object it's going to give us back and other pandas object and it's not just in cases like this but cases we've seen like the some method in the previous lesson where we're going to get a series of back things like the value accounts method are going to give us back another series with a different format. So don't be afraid to try out these additional methods and always keep in mind what kind of object we're getting back and what functionalities are available on that specific object. So in this lesson we explored how to select a single column from a data frame and then the next lesson we will talk about selecting two or more columns from a Panas data frame.