Common String Methods - lower, upper, title, and len

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,840 students

Lecture description

String methods in pandas require a .str prefix to operate properly. In this lesson, we'll explore four popular string methods we can invoke on all values in a Series:

  • str.lower() to convert a string's characters to lowercase

  • str.upper() to convert a string's characters to uppercase

  • str.title() to capitalize the first letter of every word in a string

  • str.len() to return a count of the number of characters in a string

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] All right let's explore for common string methods and show how we can actually call them on our series within our data frame. Let's begin by executing your code. And there we have our Chicago data set. And I'll actually begin with a review of how these methods work on regular Python strings. Just as a little bit of a warm up act. So we have the lower the upper the title and the land methods so the lower method converts all of the characters in a string to lowercase. So for example if I have something like hello world and I call dealt lower on it. The great thing about Juber notebook is that we can write regular old Python here. There you can see it's converted all of the letters to lowercase and it doesn't matter if they're currently in lowercase. For example if I just have hello world it's going to convert all of the characters to lowercase. Regardless of what they currently are. The second complementary method is of course upper and that does the reverse. If I have a string that's all lowercase and I call Upper on it that converts all of the characters to uppercase. There's my Hello world. That's coming from this example right here. I also have a convenient method called Title. Now what title does. If I write out Hello world is it. Capitalize the first letter of every word and the way it figures that out is with spaces. So what title does is it's going to capitalize the H. Because it's the first character in this word and it's going to catalyze the W. Because the first character in this world is you can see Hello world there is uppercase at the beginning of each word. And finally we have the land built in Python function. Now Len isn't something that we actually call on a string rather we pass a string into the len built in function and that just tells us the number of characters within that string. So once again if I do hello world it's actually going to give us 11 because spaces count. So hello is five characters. The space is the sixth character. And here's five more characters for a total of 11. So there we have our familiar Python string methods. Now when it comes to using these on the entire columns the syntax is going to be a little bit different than what you might expect. So I'm actually going to create a few cells below here that Python stuff out of the way. So you would think it would be simply a matter of extracting a specific column. Let's say I want to take the name series and then call a string method on it. Unfortunately a method like title here is not going to work. It's actually going to trigger an error. And I think this has something to do with the back end with conflicts with the existing Peiffer method names. So the solution here and the way that the Panas library is designed is that whenever you call a string method it has to be prefixed with a and other combination of letters which is dot s tr. So here I have my series and just like a regular method I have to begin with a dot and then I'm going to have this little pre-cursor dot as TR That's basically short for string after that dot as TR I place another dot and that's where I actually write the string method that I want to use. So if I want to apply the string lower method the lower method to every string within my name series I do dot s tr dot lower and you'll see that will do is convert everything to lowercase. Now unfortunately all of these columns are already in uppercase so we can't necessarily see the impact of calling up or on any of them. But I can actually go ahead and string these methods along right here so I can call up or on my lower series right here so you can see the impact. And whenever we change string methods we still have to include that as to prefix. I'm going to do as your upper. Give it another set of parentheses and there we have it in upper case. Take a look at a few more examples. We also have the convenient title math and I think that's the one that we really want to hear. So for example if I wanted to make the names look nice and pretty I can extract that name column and then to call the title method. I can't just go ahead and do that title I have to attach that dot as your Prefect's before using another dot and using the string method that I want title is going to capitalize the first character of every separated word for every value in my series. So there we have a brand new series. Similarly if I wanted to do the same thing for position title I can do Chicago position title to extracted. There is my regular series where I have all of my values in uppercase if I want to just capitalize the first letter of each word I have to begin with the dot as TR prefix another dot and comma string method. And there we have them in a much more presentable format. And within seconds Panas has performed the operation on 32000 different guys. Pretty impressive when you think about it. And again what we're returning here is a brand new series so if we just want to overwrite our original series we can do something like this. And then we're going to have a normal looking series. I'm just going to preview the first couple of rows of Chicago and you can see we've replaced all of the values in the position title column with much more presentable more pretty looking data. And finally we discussed the Lenn method or in Python rather it's a built in function and in Pandas It's actually built as a method. So let's say I want to take the number of characters in each of the values in my department column I'm going to begin by extracting it by doing this. Now unfortunately if we passed this whole series into Lenn it's just going to give us the number of rows. That's the default design. So in order to get the number of characters in every single value here we just do the exact same syntax as we saw above. We do dot as TR which is the common prefix for a string method and then another dot and then the method we want to call which in this case is going to be at the end. So when or when we're working with a series in panels here Eliane is not a built in function it's it's actually an available method. It just has to be prefixed again with that. TR are three letter combination. And if we take a look at this it's going to calculate the number of characters within every single one of those values so water management has 11. And then police on rows 1 and 2 have 6. So those are just for common string methods. Lower. Upper. Title and land. And in this lesson we also introduced that little escuchar Prefect's. This is going to be very common throughout these lessons for our string methods and we do have to place it before any of our string methods when we're calling them on our series. Otherwise we will get a Pandurs error and in the next lesson we'll continue diving into more Szeryng methods starting with the replace method.