Introduction to Seaborn

Sundog Education by Frank Kane
A free video tutorial from Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
4.5 instructor rating • 22 courses • 441,487 students

Lecture description

Seaborn both sits on top of Matplotlib to make it better, and introduces new kinds of visualization tools that can help you extract meaning from data. We'll walk through a bunch of examples using real fuel efficiency data for 2019 cars.

Learn more from the full course

Autonomous Cars: Deep Learning and Computer Vision in Python

Learn OpenCV, Keras, object and lane detection, and traffic sign classification for self-driving cars

12:44:34 of on-demand video • Updated May 2020

  • Automatically detect lane markings in images
  • Detect cars and pedestrians using a trained classifier and with SVM
  • Classify traffic signs using Convolutional Neural Networks
  • Identify other vehicles in images using template matching
  • Build deep neural networks with Tensorflow and Keras
  • Analyze and visualize data with Numpy, Pandas, Matplotlib, and Seaborn
  • Process image data using OpenCV
  • Calibrate cameras in Python, correcting for distortion
  • Sharpen and blur images with convolution
  • Detect edges in images with Sobel, Laplace, and Canny
  • Transform images through translation, rotation, resizing, and perspective transform
  • Extract image features with HOG
  • Detect object corners with Harris
  • Classify data with machine learning techniques including regression, decision trees, Naive Bayes, and SVM
  • Classify data with artificial neural networks and deep learning
English [Auto] All right let's talk about seaborne now which is basically met plot lib. Plus plus if you will go back to Section Three here and again if you haven't already started your Jupiter note book Make sure you take care of that first. Go to your anaconda prompt CD into the directory you install the course materials into and type in Jupiter notebook to kick this off. Now get into Section 3 and we'll open up see worn out IPY and be all right. So seaborne is basically a visualisation library that sits on top of map plot lib and all it does is make it a little bit pretty to look at. But it also has a bunch of different kinds of charts and graphs that we didn't have in MAP Gottleib and some of them are very useful for doing computer vision so we will be using Siebel in quite a bit within this course. Just an example we'll start off again saying that plotted in line meaning that we want to view all of our results as part of this note book itself within the browser. We'll import pondus as PD load up a fuel efficiency that C-s v file that I've uploaded here to my Web site here. And this is real data by the way so this is actual data that comes from the US government about the fuel efficiency of every car they have a record of for the 2019 model year and specific. So let's extract some information from that that we can play with. Let's start by extracting the number of gears from that resulting data frame and we're going to do value counts. And if you remember back from our Panda's tutorial that basically gives us back the data we need for a histogram that says how many times each unique value occurs in our data frame. So this should give us back a series that maps the gear numbers to the number of times each unique value appeared. We can then just plot that saying that we want a bar chart. So right now we're just using that plot lib as is. Just to visualize this data and there you have it so you can see that eight speed transmission seemed to be the most common one followed by six speed and we have sort of exponential drop off there are two other more obscure values. Now it's you Seaborn. So seaborne in its most basic form can just make map plot lib look better. So all we need to do is say import Seaborn as S and S and then we can say s n s dot set and all that does is replace the default settings in that plot lib with more visually modern looking settings that Seaborn has given us that plot lib is pretty old. I mean it goes back to that plot and it's kind of showing its age quite frankly so this gives it a more modern look and feel. So now we can do that same exact bar chart but with the Seaborn defaults applied you can see it's little bit prettier. We have you know more muted tones here and it's also against this nice little graphical background here that actually let you visualize that grid a little bit better. Otherwise pretty much the same but it just is just a little bit easier on the eyes right let's dive into some more depth here let's take a closer look at the data that we're dealing with so here's our raw data frame that we actually loaded up that came from the government here and we're just to head to take a look at the first five rows here. As an example. So the information I have extracted are the car manufacturer like Aston Martin or Volkswagen. The car line which is basically the model the engine displacement that's how many leaders the engine is how many cylinders are in the engine the transmission type it's city MPG fuel efficiency hits highway fuel efficiency the combined city plus highway mpg value and the number of gear cities car has So that's the information that we have to play with here. Now Seaborn has some plots that Madlib doesn't offer at all. So for example there's dist plot and that's a way of actually plotting a histogram together with a smooth distribution overlaid on top of that histogram. So let's take a look at that on the com mpg column. So here we have a histogram of how many times each value within Comba mpg appears. You can see we have kind of this spike here around you know the low low to mid 20s right. That seems to be kind of like the most common MPG rating for a vehicle. And we can overlay this sort of trend curve here automatically as part of this plot. So that's something Seaborn is doing for us automatically without even trying. So that makes it a little bit easier to visualize the bigger trends here. And you can see that's kind of helpful because we had these like weird values in between these other values so it seems like there seems to be some sort of quantization that occurs in our data that we can smooth over a little bit with that trend line. So that's sometimes a useful way to visualize things. Another thing you can have in Seaborn is the pair plot. That's also something unique to seaborne. And this is cool stuff because it lets you visualize plots of every possible combination of a set of attributes so you can like just look at every possible way of visualizing a set of values and try to find the ones that look interesting that might be useful to investigate more deeply. So as an example let's classify cars by how many cylinders they have and will look for relationships between how many cylinders each car has and their city MPG rating their highway mpg rating and their combined MPG rating. So let's just start by extracting those columns from our data frame into DFI 2. So we're gonna use that same syntax so we introduced in our Panda's tutorial to just extract these columns into a new data frame. So we now have a new set of rows here that only contain the cylinders and the MPG columns from our original data. Now watch this if we do pare plot on that new data frame DFT you can say that we want to focus on the cylinders as our primary thing that we want to look at. And with a given height to say that we want this to be a nice big plot that we can visualize easily let that run. Here we go so what we have here is like a grid of grids. Right. So this is kind of neat. Let's scroll down a little bit so we can sort of visualize what's going on here. So you can see that we have on here every single column and over here we have everything every single column as well. So if you want to plot Comba mpg versus cylinder's you can look so here if you want to plot highway mpg versus city MPG you can look at this plot here so you can see here that you can find interesting linear relationships between different columns here. So for example just looking at the cylinders column here we can see that there's a pretty clear relationship between the number of cylinders and the MPG whether it's city highway or combined. So it's the number of Sillars increases we can see that that MPG has to be dropping. But there's a really wide spread here for four cylinder vehicles so this is more to the story here in the world of four cylinder vehicles some are really bad some are really good. Really big spread there. So already we've got some useful insight there into our data so we can also use a scatterplot in seaborne 1.9. It's just sort of a prettier version of the map plotted one. Basically you can plot individual data points across any two axes you want and see how your data is distributed across those dimensions so let's say s and s dot scatterplot we're gonna say the x axis is going to be engine displacement Y is gonna be combined mpg mpg and for the data itself we're going to refer to RDF data frame from our raw data. So this is going to pluck out those two columns and plot them against each other on a scatterplot. And there you have it. So each individual point in our data frame is being scattered onto this plot that maps that particular points engine displacement and combined mpg value. And again you can see there is a relationship here. So already we're getting some you know insights from visualizing that data. Again the lower engine displacements tend to have a very wide spread of mpg. But in general the bigger the engine displacement the worse the fuel efficiency which shouldn't be that big of a surprise right. One other cool thing and seaborne is the joint plot. This lets you visualize scatter plots and histograms at the same time on each axis. So let's take a look at that same spread of engine displacement versus combat. But this time we're going to do a joint plot instead of a scatterplot. Here's what it looks like. So we have the same scatterplot as before but we have histograms overlaid on each axis so we can see over here on this side the histogram of MPG ratings. OK. So we can visualize that very easily and sort of see how this data all rolls up and up. Up here we have a histogram of the engine displacement values as well so this makes it a lot easier to tell that the most common engine displacement is around a little bit under two two leaders right. So that's a little bit of an easier way of like trying to figure out how many dots are in a given column here a section because a lot of times they can overlap. And that data is not really that intuitive to figure out the histogram makes that distribution of data it easier to see on other things seaborne offers is L-M plot. And that's just a scatterplot with a linear regression apply to it automatically so I can say the same scatterplot but instead scatterplot L-M plot gives me back this same exact scatterplot but with a linear regression applied to it. And if you look really closely you can see this sort of a shaded area around there to this given you sort of your bounds on that regression and we'll talk about linear regression and more death later in this course. But basically we're fitting a line to the data that we have. Very simple concept. Back in MAP plotless we talked about box plots and Seaborn has its own version of it as well. Box in wiskers plots in this example. Let's take a look at each vehicle manufacturer and visualize the miles per gallon rating across the vehicles they produce. So that's going to give us the spread of MPG ratings across all the vehicles. Each manufacturer offers. OK so we're going to do basically an individual box plot for each manufacturer showing the distribution of MPG ratings across their entire product line. Got it. All right so there's a lot of manufacturers so we're gonna have to do a couple of things here to take advantage of what Seaborn offers. First of all we're going to set the figure size to 15 15:5. That just makes it bigger. So we can fit more information on the screen. We'll then to find the box Plaut itself we're not say we want to plot the manufacturer on the x axis and the combined mpg values on the y axis using our original data frame here is the data DMF and we're gonna save that box plot into a variable. We will then set the tick labels on that plot to have a 45 degree rotation. That way they'll be easier to read because there's a lot of them. So the syntax here is run a set set X tick labels on the X tick labels that we get back from that plot with a rotation of 45 degrees. So it's basically saying I want to set the labels on the x axis to the existing labels you know leave them unchanged but specify a rotation of 45 degrees. So let's go ahead and kick that off the set X tick labels come and put out some applet here as part of its process here. But here's the chart itself pretty interesting so you can see that 45 degree angle that we specified on the labels here being used there that's a lot easier to read and you can look at the spread of MPG values for each individual manufacturer. So pretty interesting. Volkswagen has a really wide range for example whereas Aston Martin is pretty tightly clustered. Valvo are also pretty tight here. You know so. Interesting stuff. Also General Motors tends to be clustered here around you know mid 20s or so but they have a lot of outliers up here on the higher end as well so it seems there's a few very efficient. General Motors cars out there as well. Then we have Ferrari obviously not very good mpg because people who drive Ferrari's care more about performance than fuel efficiency I think so. Interesting insights to be gained from this box in wiskers plot here of fuel efficiency across the models for each vehicle manufacturer that we know about. Fun stuff and it's pretty to look at again it's modern pleasing colors and that's kind of what you want gets you out of the box. There's also the swarm plots which set up boxes in wiskers plots each individual data point. But it actually groups them together in a way that makes it easier to visualize them so it makes more sense when you look at it. We'll just do a swarm plot on the same exact thing. So and the manufacturer name and combined MPG from our date data frame. Again we will set the rotation to 45 degrees on the x axis and kick it off. Well the difference here is we're doing a swarm plot instead of a box plot. You can see here it's the box and whisker is we're just getting this different format here where we're sort of clumping together these points here to actually represent the distribution of the data better. So so each individual vehicle is being plotted to a point on the scrap what we're grouping those points together horizontally to try to reflect the distribution of those points a little bit better. So it's a way of looking at the raw data a little bit more so than in a box plot. But it's still grouped in a way that gives you the same information as a box plot just with more refined information. So this is what we call swarm plot. You can get the same results out of it. So again you know looking more deeply into Volkswagen You can see that they have a pretty wide spread here. There's a bunch around 30 and a bunch around 10 and nothing much in between. So kind of a curious case there. And I think that's because Volkswagen actually owns a bunch of different brands that are targeted at very different markets so we're kind of probably seeing the consumer vehicles up here and the performance vehicles way down here would be my guess. General Motors you know very tightly clustered in this range here they are more about mass market vehicles so they kind of want to be in that sweet spot there of things that perform reasonably well but also perform well to kind of appeals to an American audience. Anyway just another way of looking at it one more is the counterplot basically the same thing as a histogram but it's for categorical data. So a histogram really is only a histogram if you're dealing with numerical values if you're dealing with categories though that's called a counterplot. So let's just look at it as an example again let's abstract the manufacturer names and just take a look at how many cars each manufacturer makes into account plot counting up how many vehicles each manufacturer has. And again we'll rotate them by 45 degrees so we can actually read those labels and there we have it. So just like a histogram except that it's broken down by categories so there's no real inherent meaning to the actual order that these appear in they're just counts broken down by category that's all there is to it. It's all about plot is. So you can see pretty clearly here that General Motors has the most number of car models available. Followed closely by BMW and you know again these are big companies that own other manufacturers so you know we're not necessarily saying that there are over 100 different BMW models on the market and 2019 those include other brands they own as well. But you know on the other end here there's a very few number of Aston-Martin models in a very low number of Rolls-Royce models for example so you can really see the distribution here of how many models each manufacturer produces very easily. Finally let's take a look at a heat map. Heat maps are fun. So there are ways to plot to the data where the colors represent the individual values within each cell of that table. So it makes more sense again if you just look at it. Let's make a pivot table from our original data frame to create a to the table mapping average MPG rating for each combination of the number of cylinders and engine displacement. Let's take a look at this heat map that we got here. Rosnay pivot table on our original data frame just to extract this 2d information basically a 2D array that maps the combined mpg for each combination of cylinders and engine displacement. So basically we're running up here it's kind of like a data frame right where remapping cylinder is against engine displacement with individual cells and that plot contained the mpg for each combination. And we're going to aggregate these together using mean Serat say we will look across all the different values and take the mean for each individual combination of cylinders and engine displacement. So if there's more than one car that has say you know a four cylinder 2.0 leader engine will take the average of all those cars together to arrive at the value in that cell of that plot. OK. So this is what that plot looks like as a heat map. Now a lot of the day is missing because apparently there's no such thing as a 12 cylinder 1.4 leader engine. That would be crazy. But these represent all the values that are we actually have data for in our data frame and the actual color of each point corresponds to the value of that cell. So for example here's the legend of what those colors mean. Black is somewhere around 12 mpg. So if you have a 16 cylinder a liter engine that's going to have a really horrible fuel efficiency on average of just 12 mpg. OK so that's how you read this thing and you can see just by looking at it that as you go up to the center of the plot this corner here you have a low number of cylinders low engine displacement. Those have very light colors because they're more fuel efficient as you get down toward this corner here of lots of cylinders and lots of engine displacement. You get into worse and worse fuel efficiency. So this heat map makes it very easy to visualize how those actual MPG ratings change as a function of where they are in this plot. So that's the map all variety. If you want to try this out on your own Here's a little bit of a challenge for you. So try to explore the relationship between the number of gears a car has and its combined MPG rating. And I want you to visualize these two dimensions of data in a bunch of different ways. Do a scatterplot do NLM plot do a plot do a box plot and do a swarm plot. What conclusions can you draw from that. So before you scroll down give it a try yourself. I left you some empty spots here to actually play with and give that a shot. No peeking ahead of time. But I do have my solution down below. If you want to take a look when you're done and compare your results to mine so give that a shot. Hopefully we get some results. But if you do get stuck feel free to scroll down and don't be. But my answers are down there. OK. So have some fun with that and I hope that makes seaborne a little bit more real to you. Again we're going to be using it quite a bit throughout this course. It's a very useful visualization library that is also good to look at. And there you have it.