Common String Methods - lower, upper, title, and len

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 312,758 students

Lecture description

String methods in pandas require a .str prefix to operate properly. In this lesson, we'll explore four popular string methods we can invoke on all values in a Series:

  • str.lower() to convert a string's characters to lowercase

  • str.upper() to convert a string's characters to uppercase

  • str.title() to capitalize the first letter of every word in a string

  • str.len() to return a count of the number of characters in a string

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:33:02 of on-demand video • Updated November 2021

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] All right, let's explore for common string methods and show how we can actually call them on our series within our data frame. Let's begin by executing your code. And there we have our Chicago data set. And I'll actually begin with a review of how these methods work on regular python strings just as a little bit of a warm up act. So we have the lower the upper the title and the land methods. So the lower method converts all of the characters in a string to lowercase. So, for example, if I have something like Hello World and I called lower on it. The great thing about Jupiter notebook is that we can write regular old python here. There you can see it's converted all of the letters to lowercase and it doesn't matter if they're currently in lowercase. For example, if I just have Hello World, it's going to convert all of the characters to lowercase regardless of what they currently are. The second complementary method is, of course, upir, and that does the reverse. So if I have a string that's all lowercase and I call up on it, that converts all of the characters to uppercase. There's my hell a world that's coming from this example right here. I also have a convenient method called title. Now, what title does if I write out Hello World Is It Capitalized? Is the first letter of every word and the way it figures that out is with spaces. So what title does is it's going to capitalized the H because it's the first character in this word and it's going to capitalize the W because the first character in this world. And as you can see, hello, world, there is upper case at the beginning of each word, and finally we have the Lenn built in Python function. Now, Len isn't something that we actually call on a string. Rather, we pass a string into the Len built in function and that just tells us the number of characters within that string. So once again, if I do have a world, it's actually going to give us 11 because space is count. So hello is five characters. The space is the sixth character and here's five more characters for a total of 11. So there we have our familiar Python string methods. Now, when it comes to using these on entire columns, the syntax is going to be a little bit different than what you might expect. So I'm actually going to create a few cells below here, get that python stuff out of the way. So you would think it would be simply a matter of extracting a specific column, let's say I want to take the name. Series and then call a string method on it, unfortunately, a method like title here is not going to work. It's actually going to trigger an error. And I think this has something to do with the back end, with conflicts with the existing Python method names. So the solution here and the way that the Panda's library is designed is that whenever you call a string method, it has to be prefixed with another combination of letters, which is Dot Star. So here I have my series and just like a regular method, I have to begin with a dot and then I'm going to have this little precursor dot staro that's basically short for string. After that story, I place another dot, and that's where I actually write the string method that I want to use. So if I want to apply the string lower method, the lower method to every string within my name series, I do dot star, dot lower and you'll see that will do is convert everything to lower case. Now unfortunately, all of these columns are already in upper case, so we can't necessarily see the impact of calling up on any of them. But I can actually go ahead and string these methods along right here so I can call upper on my lower series right here so you can see the impact. And whenever we change string methods, we still have to include that extra prefix. So I'm going to do that Astar Upper, give it another set of parentheses and there we have it. In upper case, let's take a look at a few more examples. We also have the convenient title metho, and I think that's the one that we really want here. So, for example, if I wanted to make the names look nice and pretty, I can extract that name column and then to call the title method. I can't just go ahead and do that title. I have to attach that dot star prefix before using another dot and using the string method that I want. Title is going to capitalize the first character of every separated word for every value in my series. So there we have a brand new series. Similarly, if I wanted to do the same thing for position title, I can do Chicago position title to extract it. There is my regular series where I have all of my values in uppercase. If I want to just capitalize the first letter of each word, I have to begin with the datastore, prefix another dot and call my string method and then we have them in a much more presentable format. And within seconds, Panis has performed that operation on thirty two thousand different values. Pretty impressive when you think about it. And again, what we're returning here is a brand new series. So if we just want to overwrite our original series, we can do something like this. And then we're going to have a normal looking series, I'm just going to preview the first couple of rows of Chicago and you can see we've replaced all of the values in the position title column with much more presentable, more pretty looking data. And finally, we discussed the Lenn method or in Python, rather, it's a built in function and in pandas, it's actually built as a method. So let's say I want to take the number of characters in each of the values in my department column. I'm going to begin by extracting it, by doing this now, unfortunately, if we passed this whole series into Len, it's just going to give us the number of rows. That's the default design. So in order to get the number of characters in every single value here, we just do the exact same syntax as we saw above. We do dot star, which is the common prefix for a string method and then another dot and then the method we want to call, which in this case is going to be Elián. So when when we're working with series in pandas here, Elián is not a built in function, it's actually an available method. It just has to be prefixed again with that Star three letter combination. And if we take a look at this, it's going to calculate the number of characters within every single one of those values. So water management has eleven and then police on rows one and two have six. So those are just for common string methods, lower upper title and land. And in this lesson, we also introduced that little extra prefix. This is going to be very common throughout these lessons for our string methods. And we do have to place it before any of our string methods when we're calling them on our series. Otherwise we will get a panas error. And in the next lesson, we'll continue diving into more string methods, starting with the replace method.