What are they and what are they used for?

Ben Young
A free video tutorial from Ben Young
Oxford MBA and Tableau Professional
4.6 instructor rating • 3 courses • 259,469 students

Learn more from the full course

Tableau Expert: Top Visualization Techniques in Tableau 10

Become An Expert In Tableau 10 - Master Visualisation Techniques Including Sankey Diagrams, Viola Charts And More

05:40:19 of on-demand video • Updated August 2020

  • Create charts and visualizations that prove your advanced Tableau abilities.
  • Think creatively to solve complex business problems using Tableau.
  • Build a Viola Chart in Tableau.
  • Construct a Sankey Diagram in Tableau.
  • Develop advanced Hexbin Charts in Tableau.
  • Visualize Likert Scale survey data in Tableau.
  • Implement Hamburger Menu functionality in Tableau.
English [Auto] Hello and welcome back to this course on mastering top visualization techniques and Tablo. If you haven't already please go to the course content section of the Demy course page and download the data sets for this course. You'll notice we have individual folders for individual sections and then a dataset African mobile data that covers all the sections not listed here with their own individual folder. This African mobile data set is a fictional cell phone provider in Africa and we've been given some data where we just have a transaction or a row ID. We have the individual dates country city region segment sales and profit that occurred for that provider in this section. Again we're going to be looking at domain padding and data densification already have a connection set up so I can show you an example. But just to review your connect to a new data source go to excel. You'll find the African mobile data connector bring out Section 2 for this video. Notice we have all the same columns. And let's jump right in. The first thing we're going to look at is called Domain padding. So the date and sales changes to a barter. You can see that we have data for years 2013 14 15 and 2016. If I expand that out say we have data across all four quarters for all four years it changes to entire view so it's basis nicely than if I explain it one more time. You know something happened all of a sudden are quarters of different sizes because we're actually missing some data. So we have January we have February April May June. But we do not have March and then we also are missing September in 2013 and you'll find a few other missing months throughout the dataset. So it's happened right here is Tablo is drawing a mark on the view in this case a bar for every single piece of data that it has at the level of detail you're working with. So right now we have data for January. So it can draw a bar for January showing the sum of sales. We have data for February. So Tablo can draw bar. We do not have data for March. However it is possible untamable it is intelligent enough to recognize that there are times we actually want to show that missing data especially in the context of dates. So if you come up here to month you right click you'll find we actually have an option that says Show missing values if you click on that. We now have domain padding. Again I mentioned in the intro it's a simple concept that a lot of you probably seen before. It's just a fancy your name for that honestly confused me a ton when I first started out but what domain padding does is it does just that pads your data. It gives it a place on your view even though doesn't actually exist inside of your data. So notice we have Marsch now we have September we have may we have August all these months that we didn't have in our data before are now showing up. So we have the complete picture January through December for all four years. Even though that didn't exist in our dataset before. So that's a concept of domain padding as it applies to dates. It also applies to histograms. So if we bring up sales Well we're right click drag on Rose don't do any aggregation actually sorry control w we'll switch that to the entire view. And then we've successfully created a barcode. All joking aside this is actually just an individual mark for every single sales transaction. You can see some happened to 70 and 60 cents some three or one in 10 cents. We have an individual market for all of those. This is not the best way to visualize the distribution of individual sales. And so a lot of times people will come in right click on sales create bins see the sales bin and they'll say we want to it by units of 10. So when you right click they're created up here we now have a salesmen Predrag going up to columns and then we right click drag that up do account for the number of records we now have our histogram of the distribution of individual sales transactions. You'll also notice that we have Domingue padding going on. So if we come up here to our salesmen right click and get rid of the missing values we have a nice smooth histogram of sorts with no gaps. So not really sure what this distribution is called. I think random would be a good word but Ulos we don't have any gaps in our data. The only thing that really stands out is this one sales transaction worth 430 is when sales been worth 430 sorry. But then if you come up here and you show missing values you'll notice that we actually don't have any transactions that are between 140 and 1:49 same appear with 260. Same here with 690. Thing is there is some missing data here but our Been function can comprehend it and show the missing values. Because if you come into your bin right here to go at it you'll see it has a minimum value and a maximum value. So the bin is something that's called range where when a function or in this case the binning calculation is range where it makes it possible for Tablo to understand that there are individual values in between the minimum and maximum even if they don't exist in the data. This is a very important point. You'll notice that what's happening here is range where because Tebow recognizes that across the 12 months in a year it recognizes there is January February March even if March doesn't exist in your data. So we have a range where calculations in terms of date parts but also in terms of Binz. So that is domain padding the main padding is really just taking advantage of those range where functions inside of Tablo to pad your data and create Blinkx values or placeholders in your view that Tablo can then access in order to pad your data and make it so that you represent the full spectrum of values available to densification is the process of using table calculations on top of that to comprehend those blank values and placeholders in your view to make it possible to draw these marks even when you don't have data. The easiest way to show this is back with our date parts or move quarter to make it a little bit less and you come up and take a set of sales duplicate that. So we now have our chart repeated. But again you notice on both charts we have values for January we have values for February. We don't have anything for March. However if we come up to some of sales at a quick table calculation and then do a running total you'll see something awesome has happened like in January. Again the marks that we use as we would expect look at February the marks as well. However if you look in March we now have a value. This is an incredibly important point. Look at the tool tip you'll see we have month of date for March year of date for 2013. Our table calculation which is the running sum of sales along to table across that showing the 34000 we have in sales but our sales value is blank indicating and no in the data that's because from March 2013 our data doesn't actually show anything. This is the process of data densification you run a table calculation across a padded range and Tablo then in essence makes up data to fill in that placeholder. It draws a mark for you without actually increasing the size of your data set. So this is something that's all handled inside of tablas engine where it does the data densification across your padded values. Now why why is this important. Let's look at an example. So I mentioned I have this example set up what would do a lot of times when creating advanced charts is will want to draw marks that don't exist in our data just like in here where we showed in March. A lot of times when we're drawing custom charts it involves drawing a line or some other figure a cross marks that don't exist in our data. So what we'll do will come in and we'll create a field called to pad. This is just two values 1 and 10 that basically make up the minimum and maximum of our range. Then we'll create a spin off of that. So this is our bins to pad. Generally I call them padded and then you make your size of been one Susi again a minimum value of 1 the maximum value of 10. But these bins understand that there are other values in between those two because bins are range aware. So now if I switch out to pad for padded you'll see nothing. It's because right here missing guys aren't showing. Then if we show missing values you have the values 1 all the way through 10 rather than just one in 10 like existed in our data. We run sales across this or really any calculation we can just do a number of records instead. That's fine you'll see we have values in one end values in 10. Don't worry about the distribution of these two. This is just an example. I'll explain more on that later. But then if we run a table calculation across this you now see that Tablo is drawing those marks. Cool thing about this is when we start drawing values you'll notice that now we have two points that exist in our data. But now we have a line that table can draw between them. This is incredibly important functionality that makes it possible to go in and to draw really cool shapes that Tablo doesn't include by default by plugging in mathematical equations that make it possible to shape how this line moves and draw some really cool curves. So that's it for this election. That is what domain padding is. Along with data densification hope you understood that. Feel free to review it. It's a very important concept and I know a lot of credit to people like Alan Eldridge Jonathan Drummy Joe Maiko who have done a lot of work online to explain these things feel free to look them up feel free to walk through this lecture again and then we'll see you on the next lecture to tell you how to make this to pad and really explain our data prep process. Thanks.