What is Data Visualization?

Minerva Singh
A free video tutorial from Minerva Singh
Bestselling Udemy Instructor & Data Scientist(Cambridge Uni)
4.2 instructor rating • 39 courses • 69,880 students

Learn more from the full course

Complete Data Science Training with Python for Data Analysis

Beginners python data analytics : Data science introduction : Learn data science : Python data analysis methods tutorial

12:49:50 of on-demand video • Updated July 2019

  • Python data analytics - Install Anaconda & Work Within The iPytjhon/Jupyter Environment, A Powerful Framework For Data Science Analysis
  • Python Data Science - Become Proficient In Using The Most Common Python Data Science Packages Including Numpy, Pandas, Scikit & Matplotlib
  • Data analysis techniques - Be Able To Read In Data From Different Sources (Including Webpage Data) & Clean The Data
  • Data analytics - Carry Out Data Exploratory & Pre-processing Tasks Such As Tabulation, Pivoting & Data Summarizing In Python
  • Become Proficient In Working With Real Life Data Collected From Different Sources
  • Carry Out Data Visualization & Understand Which Techniques To Apply When
  • Carry Out The Most Common Statistical Data Analysis Techniques In Python Including T-Tests & Linear Regression
  • Understand The Difference Between Machine Learning & Statistical Data Analysis
  • Implement Different Unsupervised Learning Techniques On Real Life Data
  • Implement Supervised Learning (Both In The Form Of Classification & Regression) Techniques On Real Data
  • Evaluate The Accuracy & Generality Of Machine Learning Models
  • Build Basic Neural Networks & Deep Learning Algorithms
  • Use The Powerful H2o Framework For Implementing Deep Neural Networks
English [Auto] In this section we are going to cover data visualization which is a very important component of data science data visualization generally refers to any effort that we take which helps people understand the significance of data by placing it in a visual context. So in this case we before we analyzed the data formally and such like we tried to envisage other data in a visual context and patterns trends and correlations that might go undetected and draw us in the yes we with data files or any your HDMI labels they become clearer to visualization data visualization can be used prior to formal analyses as exploratory data analyses. And as far as exploratory data analyses goes visualize the visualizing data is just one component of it. All the things we have done previously you know things like grouping our data together cross tabulating it. Those things are also part of exploratory data analyses but data visualization that's just a visual component of EDI. And we can also use the average realization to present the outputs of our analyses. Now over the next couple of slides we will look at a couple of examples of data visualization so that you get a feel for the data the different data visualization techniques out there. So what you're looking at over here are about plots. And as soon as you have a look at these bar plots you can see the scale of footballs racquet shoes and tents in Arizona California Oregon and Washington. And you know you can see in all the cases the sales of Bentz but in three cases the sales of temps are lower than the other three. So in Arizona Oregon Washington fewer tents are sold as compared to shoes rackets footballs. And in California the least sale is for shoes but actually more tents are being sold. So you know as soon as you see this bar plot you can automatically derive some estimates and inferences about your data. Now this is also kind of a bomb plot and it's popularly known as a stacked bomb plot. And you can see in 1950 the you know there was considerable population growth in Asia about very little population growth in Africa. And by 2050 the population growth in Africa is going to sort of increase a fair bit. And the same cannot be said about the population growth in Europe that's in fluorescent green because this seems to remain steady throughout that only a small amount of population growth in Europe and the maximum population growth. Starting From 1950 on words all the way from 2050 will be coming from Asia and we really can't see any tangible population growth is going to come out of Oceania. And it seems that the population growth in Northern America is happening a bit more than it happened in the previous decade. So essentially even before we start examining and analyzing the data formally we can draw so many inferences about our data by these simple plots. Now this is a pie chart and pie charts you must have come across them commonly in newspaper articles and magazine articles and we can see this is a pie chart for the landslide that's in different parts of the world. And you know even if you ignore the numbers you can see that South Asia and East Asia they have the highest casualty rate for from landslides because you know these by these bite slices are biggest for these regions and it would seem that Middle East you know it's just a tiny sliver here this tiny pinkish flavor here and it seems to have very few deaths from landslides for the obvious reason. And same for here you know even smaller. So you know even if there were no numbers we can again draw some inferences about the data by merely looking at this pie chart. Now this is a histo gram and this is the shows the distribution or the frequency of the different numbers. So you know the greatest frequency is you know for a number of books between 2.5 to 3.5. And because this particular bar is the highest And this indicates the highest frequency of 16 and the lowest frequency is for a number of books from between 5.5 to 6.5. And because it just has a frequency of two and so on. So histograms are very good for visualizing data distribution and frequency of the different values that in and this is a box blocked and this box blocked box blocks that typically have a lot of information going on but this one just tells us about the different held elements that may happen from cholesterols we can see things like cancer and other unknown cerebral et cetera. And we are going to discuss box blocks in a fair amount of detail throughout this course because boxed locks are really very important for visualizing some of the most common aspects of measurement data and just move on. And finally this is a line chart and we use line charts a lot for economic data and it is very good for the present trend so you know we can see trends or like time see these kind of trends. So 1988 1993 to 2003 all the way till 2013. And this just shows us the trends and average income inequality within the countries so we can see that in the income inequality in sub-Saharan Africa it increased in 1993. But it seems to have come down by 2013 and in East Asia and Pacific the income inequality increased by 1998 it declined in 2003. It had a marginal increase in 2000. It seems to be either plateauing off or declining by 2013 and in industrialized countries the income inequality seems to have increased in 2008. And there's a small amount of decline foppish the dashed lines are very interesting because it seems that the income inequality started increasing from 1993 and it peaked in 2003 and there was a sharp decline in income inequality from 2003 to 2008 and after that it seems to be either Lightwing off or increasing slightly. So you know these are the kind of things that we ritualize using line charts and even before we start working with the data we have been able to draw a lot of inferences about the data by simply drawing these lines to represent the trends in temporal average income equality from 1998 to 2013. If we want to say we should like the relationship between two quantitative variables we have something known as scatter plots and we can see that beach with it. You know over here we have visitors and we have average daily temperatures. And it seems that most of them they seem to be moving in the same direction and maybe we can infer that as the average daily temperatures increase the number of people going to the beach also increases. So in this lecture you've seen a lot of different charts and graphs which have presented different kinds of data. Now the next thing and the most challenging thing is to decide which jocking or graphing graphing technique works for your kind of data. I mean if you just had to visualize the relationship between two quantitative variables then yes a plot on a graph like this makes sense. But what happens when you have data for economic matrices for different time or different time periods and then you know what happens when you have frequency kind of data or when you have data updating to say casualties from landslides. So obviously there are different kinds of data out there and the different kinds of data we've discussed in the previous section but those different data they lend themselves to different which relation techniques and I'm going to discuss some of the rules of thumb for selecting your graphing and charting technique in the next lecture and then we are actually going to start visualizing some real life data. But before that it is important to just remember you know which jocking graphing the graphing technique you should use when.