Pivot Table Demo: Analyzing Wine Tasting Scores

A free video tutorial from Maven Analytics
Award-Winning Analytics & Business Intelligence Training
Rating: 4.6 out of 5Instructor rating
30 courses
811,845 students
Pivot Table Demo: Analyzing Wine Tasting Scores

Lecture description

In this Excel PivotTable case study we'll analyze information from 130,000+ wine tastings, including average point ratings, prices, and written reviews.   

Learn more from the full course

Microsoft Excel - Data Analysis with Excel Pivot Tables

Master Excel pivots & learn data analysis w/ advanced Excel cases from a top Excel & business intelligence instructor

06:32:18 of on-demand video • Updated August 2022

Take your data analysis skills from ZERO to PRO with Excel Pivot Tables
Learn how to use Pivot Tables and Pivot Charts to streamline and absolutely revolutionize your workflow in Excel
Master unique Pivot Table tips, tools and case studies that you won't find in ANY other course, guaranteed
Explore fun, interactive, and highly effective lessons from a best-selling Excel instructor
Get LIFETIME access to project files, quizzes, homework exercises, and 1-on-1 expert support
Test your skills with 10 real-world Pivot Table case studies (weather, shark attacks, wine tastings, burrito ratings, and more!)
English [Auto]
All right. I'm really excited about this next case study because we're going to talk about wine tasting. And I found a really fun data set that I'm excited to share with you and I think you'll have a lot of fun playing with. Basically, it includes almost 130,000 records of individual wine tastings, and it will tell us about the actual wine that was being tasted, what variety it is or what type the winery came from the subregion, region, province and country. And then on the actual metrics or value side, we're going to actually have the point rating for that wine as well as the price for a bottle. So really interesting data. We've got a bunch of dimensions that we're going to use to explore this data set on top of those geo fields like country, Provence region. We've got, again, like I said, the wine name, the wine variety and then actual descriptions that were recorded by the taster along with the tasters Twitter handle. If for whatever reason you want to go ahead and and follow that person. So concepts that we're going to cover, we're going to do a bunch of sorting filtering for this one. We're going to use value grouping, which is actually a new type of grouping that we haven't covered yet in the course. And then we're going to do some visualization with pivot charts and talk about some interesting, slightly more advanced analytical concepts like correlation and trend line fitting. So with that, let's pop into our Excel workbook and get started. All right. So here we are in our Pivot Table Case Study workbook, I mean, the wine tasting scores tab. And you'll see all those fields that we just talked about, the wine names, the taster names, Twitter handles the full descriptions here in column K and then those points and price values in L and M, so scrolling down. See, we've got 129,972 rows of data. Each of which represents a unique record or a unique tasting for a given wine. And the other thing to note here, column A, this tasting ID, this is like an index column. It's a unique sequential set of values that uniquely identifies each row in our data set. So from here, let's go ahead and insert pivot table new worksheet press. Okay. And you can rename it anything you'd like. I'm going to name it wine tasting. Pivot. And just to be consistent tab color like green and boom, we are good to go. So now that I found this data set, my wheels are turning and thinking about what I might be able to learn from the data, how one might be able to explore the data. And the first thing that came to mind is, well, what types of wines, what varieties do the experts tend to rate the best or the highest? So as a wine drinker, you know, I'm pretty basic. I stick to the common options your Pinos, your Rosés, your Malbec. But if we go ahead and grab a variety and pull it into rows here in our pivot, you'll notice that there are hundreds and hundreds and hundreds of different wine varieties out there that personally I've never tasted. So this is going to be a great chance to use this data set for a little bit of a quantitative analysis to help me identify what the next variety of wine I should taste is. So got our varieties here on rows. All we need to do is scroll down to our points field and drag those into values. And remember, we don't want the some of the ratings or the points. We want the average so that we're on an even playing field between different varieties. Let's go ahead and give this a quick number, format number. I think one decimal spot is really all we need here. There you go. Now I've got average points for each variety. I'm going to sort my varieties descending by that new field. And there we go. I've got a sorted ranked list of different wine varieties. Now, before I go to my wine shop and ask them for all their bottles of Tarantino's and I know I'm probably butchering the name Apologies in advance. What I need to think about here is, well, how accurate or how realistic is this 95 point rating? Because what I haven't factored in here is how many times a wine that falls into this variety was tasted? How many records exist in this data set for this particular variety? And to answer that question, we can go ahead and use our trusty tasting ID here and pull it in and turn it into a count, which is going to tell me the count of tastings or the count of records in our data set that fall into each of these row labels. So there you go. There's first top rated wine variety actually only had one observation in the whole table. So one person tasted it and that one person gave it in 95. So I'd be a little bit foolish to jump to a conclusion and say that this is indeed the best rated variety of wine in the whole data set, especially as I scroll down and see, okay, this one had a 91.1 with 27 ratings. That's quite a bit better. Here's one with 751 2804. Now, these point ratings, I can start to put a little bit more faith into, right? So what I'm going to do is actually filter down my list and put a criteria threshold on it so that I can feel more confident in the point rating averages. So let's go ahead and do a value filter greater. Then we're going to say we only want the wines or the varieties or the count of tastings is greater than 500 and you can pick your own threshold here. I like 500 as an option. It's going to limit me down to about 40 observations, 40 varieties here. And as you can see now, the top rated is this Sangiovese grosso. Again, no promises on the pronunciation there, but that's our top rated variety with a 90.5 rating. So I'm going to trust this one. It's got 751 individual tastings, so I know that's pretty legit. Let's go ahead and drill a little bit deeper here. I'm going to filter. And keep only that variety. And what I want to learn here is note was tasted 750 times, but I want to hear what people are actually saying about it. And luckily I've got this description field that I can use here. Now watch what happens when I drag description as a secondary row label. Boom. It totally explodes. My Pivot column A is now about a mile wide. I have to use this tiny little scroller. It's a total pain in the butt. So I'm actually going to control Z and undo that and remind you of that option we have in our pivot table tools options, auto fit column widths on update. Let's go ahead and uncheck that box press. Okay. And now when we grab that description field, pull it in. It stays within the bounds of the existing column widths. And you can see that it kind of condensed it or merged it with that same column. A And what I want to do is break these out into separate columns. In other words, I want variety in one column, one field, and I want description in another. And to do that, we're going to go into our Pivot Table Design Tab, report, layout, outline form. To be honest, I'm not sure why this isn't the standard by default because it really is just a much more useful way to work with pivots. So now I've got my descriptions in column B here. I can stretch it out a bit and I can select the entire column B and in my home tab I can click wrap text, which makes it a little bit more readable. I can see all the way to the end of the string for each of these tastings. So from here let's go ahead and sort those descriptions descending by the rating. I want to see the top ratings first. So average of points and there you go. This top rating, this taster gave it a point, rating of 100. And you can kind of read through and see what kinds of trends you start to see, you know, dark and fleshy, ever evolving aromas of dark fruit, chocolate, spice and tobacco. Very, very descriptive wording here, loaded thickly on top of each other. Chocolate fudge, blackberry preserves, rum cake, prune. That's a lot of flavor. So maybe this is something that I'm interested in. Maybe it's a little bit too much for my taste. But I'm gathering data by seeing how these top reviewers are describing this particular wine variety. And I wouldn't be doing my due diligence if I didn't look at the bottom ratings as well. So let's apply another filter here and let's actually choose top ten, but select the bottom ten by that point column press. And now I'm looking at the lowest ten ratings. So it looks like the lowest one got an 82. And this particular taster made a comment that tastes like smoked ham and moist tobacco, sticky in the mouth. That doesn't sound so good. So whether or not I end up going with this variety is yet to be seen. I'll leave you with a little cliffhanger there, but that was a good way to kind of drill in, see some of the top reviews, some of the bottom reviews, and learn more about how the experts tend to rate certain varieties of wine. So that's that. Let's go ahead and pull description back out. Let's pull variety out. Let's pull the count of tasting ID out so that we're left with just average points in the values. Now, the second thing that I really want to explore here that I'm very excited about is I want to see if there's a relationship or correlation between the price point and the point rating. So in other words, do expensive wines tend to consistently rate higher in terms of points? Or is there a point where you're maybe paying more money but not seeing any impact on the actual rating? So pivots will give us a great way to kind of answer that question very, very quickly. And to do that, we already have our point rating here in values, which is right where we want it. I'm just going to take price and I'm actually going to pull price into Rose. This is a little bit unusual because in the past, every time we've had a quantitative field, value field, we've pulled it into the values pane of our field list. This time it's a little bit different. We're going to pull it into rows so that we can see four wines priced at $4. What was the average point rating? $5? What was the average rating? All the way down to the highest priced wines in the entire sample, which I believe are up in the three thousands. Yep. So we've got a wine here that was rated in our data set that costs $3,300 all the way down to four. So you'll notice if I go back down, there are some blanks here as well, right here in row 394. So we can go ahead and just filter the blanks out. Just so that we don't confuse things. PRESS Okay. And now what I want to do here is actually visualize this pattern, because a lot of times it's going to be easier to see the trend visually than by just looking at the numbers themselves. So let's go into a pivot table tools and let's pop in a pivot chart. Let me use the clustered column option here by default. And there we go. If I drag this out, you can already start to see the story here, which is that on average, yes, these low priced wines in the four, ten, $20 range do tend to be rated the lowest. And there's a very clear relationship all the way up into the, you know, maybe 80, 90, $100 mark where that trend starts to flatten out. You see a lot more volatility, a lot more variability in the point rating. But generally speaking, the trend from I would say about $100 up is pretty flat. And, you know, that's the initial kind of visual pattern that we see here. But we can actually quantify this with tools. Go into our pivot chart tools design. Let's add a chart element. We're going to add a chart trend line. I'm actually going to go into more trend line options here. Choose the logarithmic trend line. That's a great fit for this type of curve. And you can see that it's kind of re scaled. My y axis makes that pattern a little bit tough to see. So I'm going to right click that axis, format it and let's set a minimum to about 70. So this is my point rating and my Y, I'm going to say, okay, the minimum point rating is 70. That'll just kind of rescale things so that we see that pattern very clearly. Now, in plain English, the way that I interpret this is that if I'm looking for the best bang for my buck and I want a very, very highly rated wine without paying a ton of money, I'm probably going to find a wine right about in this range here, maybe the 60 to 80 or 100 range, because I know from the data, if I go in the four to 10 to $20 range, probably getting something with a pretty low rating. And if I go with something in the three or $400 range, I know it's really not much of a difference compared to something, you know, around 100. So really interesting story that's kind of presented itself just from a very quick analysis of this raw data. And if you want to get actually pretty advanced, you can right click and format the trend line and actually show the equation right there on the chart. And if we just drag this into view here. This is getting into some more advanced analytics concepts here, but this equation is actually the fit of that logarithmic line which could be used for predictions. So you could say, okay, given a bottle of wine at a certain price point or X value, we would plug that x value in and spit out a predicted Y value based on the path of this line. And that would tell you for any given price point for a bottle of wine, what's the predicted point rating? So some really, really interesting things that we could get into here. Now, let's go ahead. I don't need that equation, so let's delete that one. And one thing I want to show you here is that right now we're looking at a really granular view. We've got an individual row for every single price that's represented in the data set. Now, we don't have to stay this granular. We can group these prices into buckets, kind of like you would with creating bins in a histogram. So if I right click just like I would with a regular text based row label and I click group, now I get these value specific options and this is something that we haven't seen yet in the course. This allows me to group these values into buckets with a starting point and ending point and a buy option, which is basically like my bucket width. So here I could say, let's start at $0. We can go ahead and end at 3300 and I want to group these buy $100 increments here. You can press OC and we've got this blank option here. So let's go ahead and just. Boom filter. Hide that blank option. And there we go. So now we've got the same chart, the same kind of pattern, but now we're grouping the prices into across all wines that are priced 0 to 99. Here's the average rating, 100 to 199. Here's the average rating. And from here, you could really take this analysis in a number of different directions. You know, obviously we see some weird things going on here at the end where the average rating actually drops quite a bit in the 32 to 30 $300 range. And we know that that's because there's really only a small number of ones. I think it's only one line in that range that appears in our data set. We can check that by pulling in account of tasting ID we wanted to. And looking at how the buckets kind of compare in terms of the number of ratings. And yes, so there is only one wine in that range that was tasted. So one thing that we could do here is to kind of filter out some of this low volume noise. We could say, you know, I'm only comfortable with buckets that include more than ten tastings. So we could select those good filter to keep only those items. Pull the count back out. And there you go. We've lost our trend line, but we can add that back in. So, again, this is totally up to you. In this case, we used $100 bins. You could break it down by ten or by two or by 500, whatever you choose. But really great example of using value groups and pivot charts to expose some really fascinating and interesting patterns in the underlying source data.