Pivot Table Demo: Analyzing Wine Tasting Scores

A free video tutorial from Maven Analytics
Empowering everyday people with life-changing data skills
Rating: 4.6 out of 5Instructor rating
41 courses
1,282,016 students
Pivot Table Demo: Analyzing Wine Tasting Scores

Lecture description

In this Excel PivotTable case study we'll analyze information from 130,000+ wine tastings, including average point ratings, prices, and written reviews.   

Learn more from the full course

Microsoft Excel: Data Analysis with Excel Pivot Tables

Master Excel Pivot Tables for data analysis, EDA & business intelligence. Learn from a top MS Excel instructor!

06:30:12 of on-demand video • Updated May 2024

Take your data analysis & EDA skills from ZERO to PRO with Excel Pivot Tables
Learn how to use Pivot Tables and Pivot Charts to streamline and simplify your workflow in Excel
Master unique Pivot Table tips, tools and case studies that you won't find in ANY other course
Explore fun, interactive, and highly effective lessons from a best-selling MS Excel instructor
Get LIFETIME access to project files, quizzes, homework exercises, and 1-on-1 expert support
Test your skills with 10 real-world Pivot Table case studies (weather, shark attacks, wine tastings, burrito ratings, and more!)
English [CC]
Instructor: All right, I'm really excited about this next case study, 'cause we're gonna talk about wine tasting and I found a really fun data set that I'm excited to share with you and I think you'll have a lot of fun playing with. Basically it includes almost 130,000 records of individual wine tastings and it will tell us about the actual wine that was being tasted, what variety it is or what type, the winery it came from, the sub region, region, province, and country. And then on the actual metrics or value side, we're gonna actually have the point rating for that wine as well as the price for a bottle. So really interesting data. We've got a bunch of dimensions that we're gonna use to explore this data set. On top of those geo fields like country, province, region, we've got, again like I said, the wine name, the wine variety and then actual descriptions that were recorded by the taster, along with the taster's Twitter handle. If for whatever reason you want to go ahead and follow that person. So concepts that we're gonna cover, we're gonna do a bunch of sorting filtering for this one. We're gonna use value grouping, which is actually a new type of grouping that we haven't covered yet in the course. And then we're gonna do some visualization with pivot charts and talk about some interesting, slightly more advanced analytical concepts like correlation and trendline fitting. So with that, let's pop into our Excel workbook and get started. All right, so here we are in our Pivot table case study workbook. I'm in the wine tasting scores tab and you'll see all those fields that we just talked about the wine names, taster names, the Twitter handles, the full descriptions here in column K and then those points and price values in L and M. So scrolling down, see we've got 129,972 rows of data, each of which represents a unique record or a unique tasting for a given wine. And the other thing to note here, column A is tasting ID. This is like an index column. It's a unique sequential set of values that uniquely identifies each row in our dataset. So from here, let's go ahead and insert pivot table, new worksheet, press okay. And you can rename it anything you'd like. I'm gonna name it Wine Tasting Pivot. And just to be consistent, tab color, light green, and boom we are good to go. So now that I've found this data set, my wheels are turning, I'm thinking about what I might be able to learn from the data, how I might be able to explore the data. And the first thing that came to mind is, well, what types of wines, what varieties do the experts tend to rate the best or the highest? So as a wine drinker, you know I'm pretty basic. I stick to the common options, your Pinots, your Roses, your Malbecs. But if we go ahead and grab variety and pull it into rows here in our pivot, you'll notice that there are hundreds and hundreds and hundreds of different wine varieties out there, that personally I've never tasted. So this is gonna be a great chance to use this data set for a little bit of a quantitative analysis to help me identify what the next variety of wine I should taste is. So got our varieties here on rows. All we need to do is scroll down to our points field and drag those into values. And remember, we don't want the sum of the ratings or the points, we want the average, so that we're on an even playing field between different varieties. Let's go ahead and give this a quick number format, number, I think one decimal spot. That's really all we need here. There you go. Now I've got average points for each variety. I'm gonna sort my varieties descending by that new field. And there we go. I've got a sorted ranked list of different wine varieties. Now, before I go to my wine shop and ask them for all their bottles of Terrantez, and I know I'm probably butchering the name, apologies in advance, what I need to think about here is, well, how accurate or how realistic is this 95 point rating? Because what I haven't factored in here is how many times a wine that falls into this variety was tasted. How many records exist in this dataset for this particular variety? And to answer that question, we can go ahead and use our trusty tasting ID here and pull it in and turn it into a count, which is gonna tell me the count of tastings or the count of records in our dataset that fall into each of these row labels. So there you go. This first top rated wine variety, actually only had one observation in the whole table. So one person tasted it and that one person gave it a 95. So I'd be a little bit foolish to jump to a conclusion and say that this is indeed the best rated variety of wine in the whole data set, especially as I scroll down and see okay, this one had a 91.1 with 27 ratings, that's quite a bit better. Here's one with 751, 2,804. Now these point ratings I can start to put a little bit more faith into, right? So what I'm gonna do is actually filter down my list and put a criteria threshold on it so that I can feel more confident in the point rating averages. So let's go ahead and do a value filter, greater than or we're gonna say we only want the wines or the varieties or the count of tastings, is greater than 500. And you can pick your own threshold here. I like 500 as an option. It's gonna limit me down to, you know, about 40 observations, 40 varieties here. And as you can see now, the top rated is this Sangiovese Grosso. Again, no promises on the pronunciation there but that's our top rated variety with a 90.5 rating. So, okay, I'm gonna trust this one. It's got 751 individual tastings, so I know that's pretty legit. Let's go ahead and drill a little bit deeper here. I'm gonna filter and keep only that variety. And what I wanna learn here is, okay, I know it was tasted 750 times but I want to hear what people are actually saying about it. And luckily I've got this description field that I can use here. Now watch what happens when I drag description as a secondary row label, boom, it totally explodes my pivot, column A is now about a mile wide. I have to use this tiny little scroller. It's a total pain in the butt. So I'm actually gonna Ctrl Z and undo that and remind you of that option We have in our pivot table tools, options, autofit column width on update. Let's go ahead and uncheck that box, press okay. And now when we grab that description field, pull it in, it stays within the bounds of the existing column widths and you can see that it kind of condensed it or merged it with that same column A. And what I want to do is break these out into separate columns. In other words, I want variety in one column, one field, and I want description in another. And to do that, we're gonna go into our pivot table design tab, report layout, outline form. To be honest, I'm not sure why this isn't the standard by default, because it really is just a much more useful way to work with pivots. So now I've got my descriptions and column B here, I can stretch it out a bit and I can select the entire column B. And in my home tab I can click wrap text which makes it a little bit more readable. I can see all the way to the end of the string for each of these tastings. So from here, let's go ahead and sort those descriptions descending by the rating. I wanna see the top ratings first, so average of points. Okay, and there you go. This top rating, this taster gave it a point rating of 100. And you can kind of read through and see what kinds of trends you start to see. You know, dark and fleshy, ever evolving aromas of dark fruit, chocolate, spice, and tobacco. Very, very descriptive wording here. Loaded thickly on top of each other. Chocolate fudged, blackberry preserves, rum cake, prune. That's a lot of flavor. So maybe this is something that I'm interested in, maybe it's a little bit too much for my taste but I'm gathering data by seeing how these top reviewers are describing this particular wine variety. And I wouldn't be doing my due diligence if I didn't look at the bottom ratings as well. So let's apply another filter here and let's actually choose top 10, but select the bottom 10 by that point column. Press okay, and now I'm looking at the lowest 10 ratings. So it looks like the lowest one got an 82 and this particular taster made a comment that tastes like smoked ham and moist tobacco. Sticky in the mouth. That doesn't sound so good. So you know, whether or not I end up going with this variety is yet to be seen. I'll leave you with a little cliffhanger there. But that was a good way to kind of drill in, see some of the top reviews, some of the bottom reviews, and learn more about how the experts tend to rate certain varieties of wine. So that's that. Let's go ahead and pull description back out. Let's pull variety out. Let's pull the count of tasting ID out, so that we're left with just average points in the values. Now the second thing that I really want to explore here, that I'm very excited about is I want to see if there's a relationship or correlation between the price point and the point rating. So in other words, do expensive wines tend to consistently rate higher in terms of points? Or is there a point where you're maybe paying more money but not seeing any impact on the actual rating? So pivots will give us a great way to kind of answer that question very, very quickly. And to do that, we already have our point rating here in values, which is right where we want it. I'm just gonna take price and I'm actually gonna pull price into rows. And this is a little bit unusual because you know, in the past, every time we've had a quantitative field value field, we've pulled it into the values pane of our field list. This time it's a little bit different. We're gonna pull it into rows so that we can see four wines priced at $4. What was the average point rating? $5, what was the average rating? All the way down to the highest priced wines in the entire sample, which I believe are up in the three thousands. So we've got a wine here that was rated in our data set that costs $3,300 all the way down to four. So you'll notice if I go back down, there are some blanks here as well, right here in row 394. So we can go ahead and just filter the blanks out just so that we don't confuse things, press okay. And now what I want to do here is actually visualize this pattern because a lot of times it's gonna be easier to see the trend visually than by just looking at the numbers themselves. So let's go into our pivot table tools and let's pop in a pivot chart. I'm gonna use the clustered column option here by default. And there we go. If I drag this out, you can already start to see the story here, which is that on average, yes, these low priced wines in the four, 10 20 dollar range do tend to be rated the lowest. And there's a very clear relationship all the way up into the maybe 80, 90, a hundred dollars mark where that trend starts to flatten out. See a lot more volatility, a lot more variability in the point rating. But generally speaking, the trend from I would say about a hundred dollars up is pretty flat. And you know, that's the initial kind of visual pattern that we see here. But we can actually quantify this with tools. Go into our pivot chart tools, design, and let's add a chart element. We're gonna add a chart trend line. I'm actually gonna go into more trend line options here and choose the logarithmic trend line. That's a great fit for this type of curve. And you can see that it's kind of rescaled my Y axis, makes that pattern a little bit tough to see. So I'm gonna right click that access, format it and let's set a minimum to about 70. So this is my point rating and in my Y. I'm gonna say, okay, the minimum point rating is 70. That'll just kind of rescale things so that we see that pattern very clearly. Now in plain English, the way that I interpret this, is that if I'm looking for the best bang for my buck and I want a very very highly rated wine without paying a ton of money, I'm probably gonna find a wine right about in this range here, maybe the 60 to 80 or a hundred dollars range. 'Cause I know from the data, if I go in the four to 10 to 20 dollar range, I'm probably getting something with a pretty low rating. And if I go with something in the three or four hundred dollar range, I know it's really not much of a difference, compared to something you know around a hundred. So really interesting story that's kind of presented itself just from a very quick analysis of this raw data. And if you want to get actually pretty advanced, you can right click and format the trend line and actually show the equation right there on the chart. And if we just drag this into view here, this is getting into some more advanced analytics concepts here, but this equation is actually the fit of that logarithmic line, which could be used for predictions. So you could say, okay, given a bottle of wine, at a certain price point or X value, we would plug that X value in and spit out a predicted Y value based on the path of this line. And that would tell you for any given price point, for a bottle of wine, what's the predicted point rating? So some really, really interesting things that we could get into here. Now, let's go ahead, I don't need that equation, so let's delete that one. And one thing I wanna show you here is that right now we're looking at a really granular view. We've got an individual row for every single price, that's represented in the data set. Now we don't have to stay this granular. We can group these prices into buckets. Kind of like you would with creating bins in a histogram. So if I right click, just like I would with a regular text-based row label and I click group, now I get these value specific options. And this is something that we haven't seen yet in the course. This allows me to group these values into buckets with a starting point, an ending point, and a buy option, which is basically like my bucket width. So here I could say let's start at $0, we can go ahead and end at 3,300. And I wanna group these by a hundred dollars increments here. You can press, okay, and we've got this blank option here. So let's go ahead and just boom, filter, hide that blank option. And there we go. So now we've got the same chart, the same kind of pattern, but now we're grouping the prices in two, across all wines that are priced zero to 99, here's the average rating a hundred to 199, here's the average rating. And from here you could really take this analysis in a number of different directions. You know, obviously we see some weird things going on here at the end, where the average rating actually drops quite a bit in the 32 to 33 hundred dollar range. And we know that that's because there's really only a small number of wines. I think it's only one wine in that range, that appears in our data set. We can check that by pulling in the count of tasting ID if we wanted to, and looking at how the buckets kind of compare, in terms of the number of ratings. And yes, so there is only one wine in that range that was tasted. So one thing that we could do here is to kind of filter out some of this low volume noise. We could say, you know, I'm only comfortable with buckets that include more than 10 tasting. So we could select those, could filter to keep only those items. Pull the count back out. And there you go. We've lost our trend line, but we can add that back in. So again, this is totally up to you. In this case we used a hundred dollars bins. You could break it down by 10 or by two or by 500, whatever you choose. But really great example of using value groups and pivot charts to expose some really fascinating and interesting patterns in the underlying source data.