Pivot Table Demo: Analyzing Wine Tasting Scores

Maven Analytics
A free video tutorial from Maven Analytics
Award-Winning Analytics & Business Intelligence Training
4.6 instructor rating • 23 courses • 564,878 students

Lecture description

In this Excel PivotTable case study we'll analyze information from 130,000+ wine tastings, including average point ratings, prices, and written reviews.   

Learn more from the full course

Microsoft Excel - Data Analysis with Excel Pivot Tables

Master Excel pivots & learn data analysis w/ advanced Excel cases from a top Excel & business intelligence instructor

06:32:18 of on-demand video • Updated January 2021

  • Take your data analysis skills from ZERO to PRO with Excel Pivot Tables
  • Learn how to use Pivot Tables and Pivot Charts to streamline and absolutely revolutionize your workflow in Excel
  • Master unique Pivot Table tips, tools and case studies that you won't find in ANY other course, guaranteed
  • Explore fun, interactive, and highly effective lessons from a best-selling Excel instructor
  • Get LIFETIME access to project files, quizzes, homework exercises, and 1-on-1 expert support
  • Test your skills with 10 real-world Pivot Table case studies (weather, shark attacks, wine tastings, burrito ratings, and more!)
English [Auto] All right, I'm really excited about this next case study because we're going to talk about wine tasting and I found a really fun data set that I'm excited to share with you. And I think you'll have a lot of fun playing with basically it includes almost one hundred and thirty thousand records of individual wine tastings. And it will tell us about the actual wine that was being tasted, what variety it is or what type. The winery came from the subregion, region, province and country. And then on the actual metrics or value side, we're going to actually have the point rating for that wine as well as the price for a bottle. So really interesting data. We've got a bunch of dimensions that we're going to use to explore this data set on top of those geo fields like Country Provence region. We've got, again, like I said, the wine name, the wine variety, and then actual descriptions that were recorded by the taster along with the tasters Twitter handle, if for whatever reason, you want to go ahead and follow that person. So concepts that we're going to cover, we're going to do a bunch of sorting filtering for this one. We're going to use value grouping, which is actually a new type of grouping that we haven't covered yet in the course. And then we're going to do some visualization with pivot charts and talk about some interesting, slightly more advanced analytical concepts like correlation and trend line fitting. So with that, let's pop into our Excel workbook and get started. All right, so here we are in our Pivot Table Case study workbook, I'm in the wine tasting scores tab and you'll see all those fields that we just talked about, the wine names, taster names, Twitter handles, full descriptions here and column and then those points and price values and L and M scrolling down. See, we've got one hundred and twenty nine thousand nine hundred and seventy two rows of data. Each of which represents a unique record or a unique tasting for a given wine, and the other thing to note here, call him a tasting I.D. This is like an index column. It's a unique sequential set of values that uniquely identifies each row in our data set. So from here, let's go ahead and insert a pivot table. New worksheet, OK? And you can rename it anything you'd like, I'm going to name it Wine Tasting Pivot. And just to be consistent, tab color like green and boom, we are good to go. So now that I found this data set, my wheels are turning and thinking about what I might be able to learn from the data. How am I be able to explore the data? And the first thing that came to mind is, well, what types of wines, what varieties do the experts tend to rate the best or the highest? So as a wine drinker, you know, I'm pretty basic. I stick to the common options. Your pinto's, your roses, your Molbeck. But if we go ahead and grab variety and pull it into rows here and our pivot, you'll notice that there are hundreds and hundreds and hundreds of different wine varieties out there that personally I've never tasted. So this is going to be a great chance to use this data set for a little bit of a quantitative analysis to help me identify what the next variety of wine I should taste is. So got our varieties here on Rose. All we need to do is scroll down to our points field and drag those into values. And remember, we don't want the some of the ratings or the points. We want the average so that we're on an even playing field between different varieties. Let's go ahead and give this quick number format number. Think one decimal spot. That's really all we need here. There you go. And I've got average points for each variety I'm going to sort. My variety is descending by that new field. And there we go, I've got a sorted ranked list of different wine varieties now before I go to my wine shop and ask them for all their bottles of Torontos. And I know I'm probably butchering the name apologise in advance. What I need to think about here is, well, how accurate or how realistic is this ninety five point rating? Because what I haven't factored in here is how many times a wine that falls into this variety was tasted. How many records exist in this data set for this particular variety? And to answer that question, we can go ahead and use our trusty tasting I.D. here and pull it in and turn it into a count, which is going to tell me the count of tastings or the count of records in our data set that fall into each of these row labels. So there you go. This first top rated wine variety actually only had one observation in the whole table. So one person tasted it and that one person gave it a ninety five. So I'd be a little bit foolish to jump to a conclusion and say that this is indeed the best rated variety wine in the whole dataset, especially as I scroll down and see, OK, this one had a ninety one point one with twenty seven ratings. That's quite a bit better. Here's one with seven hundred and fifty one, twenty eight hundred and four. Now these point ratings I can start to put a little bit more faith into. Right. So what I'm going to do is actually filter down my list and put a criteria threshold on it so that I can feel more confident in the point rating averages. So let's go ahead and do a value filter greater. Then we're going to say we only want the wines or the varieties or the count of tastings is greater than five hundred and you can pick your own threshold here. I like five hundred as an option. It's going to limit me down to about forty observations. Forty varieties here. And as you can see now, the top rating is this sangiovese say groo again, no promises on the pronunciation there, but that's our top rated variety with a ninety point five rating. So, OK, I'm going to trust this one. It's got seven hundred and fifty one individual tastings, so I know that's pretty legit. Let's go ahead and drill a little bit deeper here. Filter. And keep only that variety. And what I want to learn here is, OK, no, it was tasted seven hundred and fifty times, but I want to hear what people are actually saying about it. And luckily, I've got this description field that I can use here. Now, watch what happens when I drag description as a secondary road label. Boom, it totally explodes my pivot column A is now about a mile wide, I have to use this tiny little scroller. It's a total pain in the butt. So I'm actually going to control and undo that and remind you of that option we have in our pivot table. Tools, options, auto fit column widths on update. Let's go ahead and check that box press. OK, and now when we grab that description field, pull it in. It stays within the bounds of the existing column widths, and you can see that it kind of condensed it or merged it with that same column A. And what I want to do is break these out into separate columns. In other words, I want variety in one column, one field, and I want description and another. And to do that, we're going to go into our pivot table design tab, report, layout, outline, form. To be honest, I'm not sure why this isn't the standard by default, because it really is just a much more useful way to work with Pivot's. So now I got my descriptions in column B here. I can stretch it out a bit and I can select the entire column B and in my home tab and click wrap text, which makes it a little bit more readable. I can see all the way to the end of the string for each of these tastings. So from here, let's go ahead and sort those descriptions descending by the reading. I want to see the top ratings first. So average of points. OK, and there you go. This top rating, this taster gave it a point rating of one hundred. And you can kind of read through and see what kinds of trends you start to see, you know, dark and fleshy, ever evolving aromas of dark fruit, chocolate spice and tobacco. Very, very descriptive wording here, loaded thickly on top of each other. Chocolate fudge, blackberry preserves, rum cake, print. That's a lot of flavor. So maybe this is something that I'm interested in. Maybe it's a little bit too much for my taste, but I'm gathering data by seeing how these top reviewers are describing this particular wine variety. And I wouldn't be doing my due diligence if I didn't look at the bottom ratings as well. So let's apply another filter here and let's actually choose top ten, but select the bottom ten by that point column. OK, and now I'm looking at the lowest 10 rating, so it looks like the lowest one. Got an eighty two. And this particular taster made a comment that tastes like smoked ham and moist tobacco sticky in the mouth. That doesn't sound so good. So you know, whether or not I end up going with this variety is yet to be seen. I'll leave you with a little cliffhanger there, but that was a good way to kind of drill in, see some of the top reviews, some of the bottom reviews, and learn more about how the experts tend to rate certain varieties of wine. So that's that. Let's go ahead and pull description back out. Let's pull variety out. Let's pull the count of tasting it out so that we're left with just average points in the values. Now, the second thing that I really want to explore here that I'm very excited about is I want to see if there's a relationship or correlation between the price point and the point rating. So in other words, do expensive wines tend to consistently rate higher in terms of points, or is there a point where you're maybe paying more money but not seeing any impact on the actual rating? So Pivot's will give us a great way to kind of answer that question very, very quickly. And to do that, we already have our point rating here and values, which is right where we want it. I'm just going to take price and I'm actually going to pull price into Rose. This is a little bit unusual because, you know, in the past, every time we've had a quantitative field value field, we pulled it into the values pain of our field list. This time it's a little bit different. We're going to pull it into rows so that we can see for wines priced at four dollars. What was the average point rating? Five dollars. What was the average rating? All the way down to the highest priced wines in the entire sample, which I believe are up in the three thousands. So we've got a wine here that was rated in our data set that cost three thousand three hundred dollars all the way down to four. So you'll notice if I go back down. There are some blanks here as well, right here in row three ninety four. So we can go ahead and just filter the blanks out. Just so that we don't confuse things, press, OK, and now what I want to do here is actually visualize this pattern because a lot of times it's going to be easier to see the trend visually than by just looking at the numbers themselves. So it's going to be a pivot table tools. Let's pop in a pivot chart. I mean, use the clustered column option here by default. And there we go. If I drag this out, you can already start to see the story here, which is that on average, yes, these low priced wines in the four, 10, 20 dollar range do tend to be rated the lowest. And there's a very clear relationship all the way out into the, you know, maybe 80, 90, one hundred dollar mark where that trend starts to flatten out, see a lot more volatility, a lot more variability in the rating. But generally speaking, the trend from, I would say about one hundred dollars up is pretty flat. And, you know, that's the initial kind of visual pattern that we see here. But we can actually quantify this with tools. Ganor, pivot chart tools design. Let's add a chart element. We're going to add a chart trend line, actually going to go into more trend line options here. Choose the logarithmic trend line. That's a great fit for this type of curve. And you can see that it's kind of rescale my y axis makes that pattern a little bit tough to see. So I'm going to. Right. Click that axis, format it and let's set a minimum to about 70. So this is my point rating and my Y I'm going to say, OK, the minimum point rating is 70. That'll just kind of rescale things so that we see that pattern very clearly. Now in plain English, the way that I interpret this is that if I'm looking for the best bang for my buck and I want a very, very highly rated wine without paying a ton of money, I'm probably going to find a wine right about in this range here, maybe the 60 to 80 or 100 dollars range, because I know from the data, if I go in the four to 10 to 20 dollars range, probably getting something with a pretty low rating. And if I go with something in the three or four hundred dollar range, I know it's really not much of a difference compared to something, you know, around 100. So really interesting story that's kind of presented itself just from a very quick analysis of this raw data. And if you want to get actually pretty advanced, you can. Right click and format the trend line and actually show the equation right there on the chart. And if we just drag this into view here. This is getting into some more advanced analytics concepts here, but this equation is actually the fit of that logarithmic line, which could be used for predictions. So you could say, OK, given a bottle of wine at a certain price point or x value, we would plug that X value in and spit out a predicted Y value based on the path of this line. And that would tell you for any given price point for a bottle of wine. What's the predicted point rating? So some really, really interesting things that we could get into here. Now, let's go ahead. I don't need that equation, so let's delete that one. And one thing I want to show you here is that right now we're looking at a really granular view. We've got an individual row for every single price that's represented in the data set. Now, we don't have to stay this granular. We can group these prices into buckets, kind of like you would with creating bins in a histogram. So if I. Right. Click just like I would with a regular text based label and I click group, now I get these value specific options. And this is something that we haven't seen yet in the course. This allows me to group these values into buckets with a starting point, an ending point and a buy option, which is basically like my bucket width. So here I could say let's start at zero dollars. We can go ahead and end at thirty three hundred. And I want to group these by hundred dollar increments here. Press, OK, now we've got this blank option here, so let's go ahead and just. Filter hide that blank option and there we go. So now we've got the same chart, the same kind of pattern, but now we're grouping the prices into cross all lines that are priced zero to ninety nine. Here's the average rating, one hundred to one ninety nine. Here's the average rating. And from here, you could really take this analysis in a number of different directions. You know, obviously we see some weird things going on here at the end where the average rating actually drops quite a bit in the thirty two to thirty three hundred dollars range. And we know that that's because there's really only a small number of lines. I think it's only one line in that range that appears in our data set. We can check that by pulling in the count of tasting ID if we wanted to. And looking at how the buckets kind of compare in terms of the number of ratings and yes, so there is only one line in that range that was tested. So one thing that we could do here is to kind of filter out some of this low volume noise. We could say, you know, I'm only comfortable with buckets that include more than 10 tastings so we could select those could filter to keep only those items, pull the count back out. And there you go. We've lost our trend line, but we can add that back in. So, again, this is totally up to you. In this case, we used hundred dollar bins. You could break it down by 10 or buy two or buy five hundred, whatever you choose. But really great example of using value groups and pivot charts to expose some really fascinating and interesting patterns in the underlying source data.