Dot Plots, Histograms and Skewness

Kashif Altaf
A free video tutorial from Kashif Altaf
Bestselling Instructor
4.3 instructor rating • 12 courses • 72,122 students

Lecture description

In this lesson of the Statistics for Business Analytics course, we will learn how to interpret and draw Dot Plots, Histograms and what Skewness means in terms of histograms.

Learn more from the full course

Statistics for Data Science, Data and Business Analysis

Master Statistics for Data Science, Probability and Statistics, and excel in careers of Data Science & Business Analysis

04:52:45 of on-demand video • Updated July 2020

  • Learn the fundamentals of Statistics
  • Learn how to organize and display different types of Data
  • Interpret and Draw Histograms, Pie Charts, Stem-and-Leaf Displays, Box Plots and much more!
  • Understand Average, Mode, Median, Standard Deviation and Variance
  • Understand Z-Scores and their usage
  • Take informed business decisions based on insights from Data.
  • Build a strong foundation for a career in R, Python and Data Science
English [Auto] Hi and welcome to another lesson and picking up where we left off in the last lesson. Now let's discuss dot plots histograms and skewness in history Gram's so dark arts is one of the simplest graphical summaries of data very easy to draw and very insightful. So how do we try it. First of all we draw a horizontal axis which shows the range of Daredevil's starting from the smallest to the largest values then Eastern values represented by a dart placed above that axis and appropriate place. So if we consider the example of that same 8 to the superstore our favorite example now we have this dark plot for that example. So as you can see on the x axis we have the spending of those customers starting from the lowest value in 50 years to the largest value of one hundred and nine. So you can see that we have calibrated our sexes based on that range. So this X-boxes represents the range of spending. And you can see these darts above certain locations on the slide so you can see that here this is the start is about fifty to. Right. So you remember there was the lowest value of 52. So this is the first doc and the last dart represents the largest well-you which is 1 or 9. And then all other values are in between. So there are 50 darts representing 50 values and some places you will see more than one dart which means those values are repeated more than once. So like here you have four darts at 62 right. So this means 62 is repeated four times. So if you go back to that same table you will see that 62 is rebated for pants. Similarly if you go and look there would be 97 the number 97 repeated four times but some numbers have no representation. Right. Like there is no 95 94 96 there is a gap here which means those and that data set does not include a value of 95 94 96 but there is a 93. There are four repetitions for 97 and so are also very easy to draw. And you can see that the most number of darts in any given interval are represented here. And if you remember almost one third of the customers were spending between 70 to 80 dollars. So this is represented by this fact here that within this interval you see a lot of darts. So this is very simple background but very insightful gives you a holistic picture of where the data numbers where the different data values are like. Then comes the histogram. So histogram is another very common graphical presentation of data which is quantitative meaning which is in the form of numbers the way to draw histogram is the first of all the variable of interest is placed on the horizontal axis on the x axis. So in our case of the super store the variable of interest we had was the spending so we will place that on the horizontal axis then a rectangle is drawn above each class interval with its height corresponding to the intervals frequency or relative frequency or percentage frequency. So what this means is that once we fix the variable of interest and attach it to the horizontal axis. Now we have to draw different rectangles above each group or each class and the height of those rectangles would be a reflection of the frequencies relative frequencies our percentage frequency. And we will see an example of a pictorial representation of this histogram in the next light. So that will make complete sense here. You may see some similarities between histogram and bartók right. Each of those had the values of interest on the x axis and then there are rectangles of varying heights. But there is an important distinction between histogram and bar charts unlike bar graphs or bar charts. A historian has no natural separation between rectangles of adjacent classes. We are talking about numbers so when we are talking about a number line there are no gaps. So these rectangles above each of the classes they would be attached to each other. They would be back to get back to back and there would be no gaps in between. And here is and here is a pictorial representation of a histogram for that superstore example the variable of interest which was the spending it is placed on the x axis. So it is further divided into classes. The first Laz's 50s 60s 70s 80s 90s and between 100 to 110. So we had six groups. So accordingly you see six rectangles all died with no gaps in between. So whenever that data is quantitative there would be no gaps and it will be Qala histogram when the data is not quantitative meaning it is in the form of labels or names. There would be gaps in between those rectangles and it would be called a bar chart. Right. So going back to his program. This is true Graham. Now we have six groups and each group has representation with the help of these rectangles all rectangles have equal right. There is no gap in between but the height of each rectangle is different depending on its frequency. Or it could be relative frequency it could be person frequency but whichever kind of frequency we have those frequencies preserved in different height for each class and just by having a quick look at this histogram we can very easily tell that OK most of the customers are spending within this range 70 to 79. This group has the highest representation. This group has the lowest representation. Right. So many insights can be derived from this Astro gram very quickly. So that is the plus point of such pictorial representations. Now let's talk about skewness. Right. So what is skewness. Basically skewness is a measure of lack of symmetry. Right. So before looking into skewness of histograms it would be a good idea to look at some examples of an example of symmetric histogram and how it looks like. So in a symmetric histogram left tail is the mirror image of the right tail and that means that if you draw a line through the center of this program the left side and right side should be similar symmetric and should be mirror image of each other. So that kind of histogram would be called symmetric. And this is a picture of how a symmetric graph looks like. If you draw a vertical line here in the center the left side and right side they all look the same. They look like a mirror image of each other. An example of this kind of histogram would be the heights or rates of people. So why height. Well because most of the people fall within a narrow range of height. I'm talking about the adults right. So there would be an average height and most of the people would be very close to that. Some would be a little bit taller. Some would be a little less taller but that represents the majority. Right. But then there would be some people who would be much taller than the average and there would be some people who would be very very tall. Right. And same same goes for the other side. Some people who would be extremely sharp very few people who would be much sharper than the average. But that is where most of the you know population lies and portside looks emetics So this would be an example of a symmetric histogram. Same goes for the rates of people because this is a natural phenomenon. So now that we know how a symmetric histogram should look like. Let's talk about some skewed histograms. So first of all let's look at moderately skewed disto grants towards left. So the skewness are cresting of the histograms could be in either direction. It could be towards right. Or it could be two years left. And here it is two thirds left. And the way to tell whether it's skewed towards left or right is by looking at the longer tail. Right. So this is the center. And this is the shorter tail. This is the longer tail. So whichever side the longer tail lies in that is where the skewness is. So here the longer tail lies toward the left. So we would say that this is skewed towards left and this is not a very extreme case of skewness it is moderately skewed. So this is a moderately skewed histogram towards the left. And what could be an example of such a histogram scores for an easy exam. When there is an easy exam a lot of students get good scores right. So if this was a normal exam maybe it would look like a normal curve. But when it's an easy exam most of the students are scoring on the higher side. So this will shift the graph or the average to more to one side towards the higher side. In this case. So this is a case of more skewness two words left. And here is a highly skewed histogram. Two words left. You can see that this on this side on the right side there is a very short tail. But there's a very very long tail on the left side. So this is highly skewed towards the left because the very long tail is towards the left. And this is actually far exceed the scores. So since a lot of students take this exam very very seriously they prepare for it for long periods of time and make the most of the students are able to get good scores on that. So that is why you see that this is where the majority of the students are scoring. Right. And there are few students which are able to get exceptional scores but the majority of the students are able to score and it is let's say. Right. But then there would be some unfortunate students who won't be able to score a decent score so they would be more towards the very low end of scores. So this is how this turns out to be highly skewed his program towards left. Now let's talk about skewness two words. Right. So here is a case of motard skewness and since the longer tail is towards the right. This is skewness towards the right as well an example could be housing values. Why. Because you know if in a certain state or in a certain location here is the average value of housing in that area. Then there would be a lower limit to the housing values so it would be very hard to find housing below that value right because there would be only as a certain lower limit for the housing values you cannot find housing cheaper than a certain limit. But on the upper side the limit could go very very high because maybe in and you know Bosch areas maybe there would be very very expensive housing. But on the lower side the cutoff might not be very low. So this would be an example of Mordred skewness towards. Right. Because the longer tail is towards the right side and if you look at an actual graph or actual data of the housing prices in the U.S. It turns out to be very highly skewed because there would be a certain lower limit to the prices of housing. Right. The portion of this graph this is where the center of the graph lies. And if you go to it towards the cheaper side of housing you can see that the tail is not very long because there would be only a certain lower limit to the housing. You can probably find a house less than $20000 or $15000 that would probably be the cheapest as cheapest as it goes. But on the higher side probably the limit is very very high. There will be no housing that could cost millions of dollars. Multimillions of dollars. So on a linear graph disposes the positive side would extend to a very very long tail. But on the negative side on our on the lesser side rather the tail would be very sharp. So that is how the prices of housing are everywhere and this is a case of histogram which is highly skewed towards right. Similarly for the executive salaries here is in actual data of decomposition for the annual competition of executives. And if you see on the lower side this is where the average salaries of executives lie. And as you can see the percentage of executives below 50 earning below $50000 per annum would be below 1 percent or so. Right. Most of the executives are earning at least $50000 or more than that. And here is the average So an average executive would be probably earning around 20000 or two hundred thousand dollars and the lowest it could get could be around for $40000 $50000. So there is not a lot of gap between these average and minimum values on a linear scale. But when you think about the maximum values executives could be making Well the upper limit could be very very high. You can see that there are executives who are earning you know two million dollars per annum to point to 2.4 to point it and here is a big spike for all of the executives combined who are earning above three million dollars per annum right. And their person paid the percentage of executives earning three million dollars per annum are more is around 1.5 percent right. So you can see that the long tail on this right side is very very long actually so it is a case of highly skewed graph to words. Right. So I hope that with the help of these examples and these graphs I was able to clarify the concept of histograms and skewness whether it's towards the right or to the raft and whether it's a moderate or highly skewed. So thanks for being with me. And let's move on to the next lesson where we will look at community of distributions and a special kind of graph based on community distributions guard or guys. So by and I will see you in the next lesson.