Analyzing Top Traffic Sources

Maven Analytics
A free video tutorial from Maven Analytics
Award-Winning Analytics & Business Intelligence Training
4.6 instructor rating • 20 courses • 466,561 students

Learn more from the full course

Advanced SQL: MySQL Data Analysis & Business Intelligence

Master advanced SQL database coding w/ MySQL Workbench. My SQL course takes your SQL analysis skills to new heights!

08:01:44 of on-demand video • Updated February 2021

  • Write advanced SQL queries to explore and analyze databases with MySQL Workbench
  • Learn how to use subqueries and temporary tables to handle complex multi-step data problems
  • Solve practical hands-on assignments and real-world eCommerce use cases
  • Analyze data across multiple tables with powerful MySQL JOIN statements
  • Master unique tips and tools that you won't find anywhere else
  • Get LIFETIME access to resource files, quizzes, assignments, and expert support
English [Auto] All right now we're going to get into the business concepts the sequel demos and kick off the course project. So our first business concept that you should understand is traffic source analysis traffic source analysis is all about understanding where your customers come from and which of those channels are driving the highest quality traffic. So we'll take a look at our Web site. We'll try to see where our customers are coming from in the real world. We'd be looking at e-mail social media search direct traffic and then we're gonna be looking at conversion rates which is the percentage of those sessions which convert to our sales or revenue activity. We do the conversion rate analysis so that we can understand how highly qualified that traffic is and how valuable each of those traffic sources is to us some of our common use cases here with traffic source analysis are analyzing search data and shifting budgets towards one engine or another looking at different campaigns or keywords to see which have the best conversion rates we could compare user behavior patterns across different traffic sources and we can look for opportunities to eliminate wasted spend within our paid marketing channels. We can also look at channels which are performing very well so that we can try to scale those up so before we get into the sequel I want to do a quick overview of three very important tables for traffic source analysis. So first we have Web site sessions. Then we have Web site page views and then we have Web site orders and I'm going to walk you through what's in each of those here and we'll also do that in a live sequel demo as well. First we've got Web site sessions and I'm going to show you what's in the Web site sessions table for Web site session I.D. ten fifty nine after that I'm going to show you the Web site page use table again limiting to Web sites session I.D. ten fifty nine so you can see what the Web site page use records look like for that same session. And then finally we're going to look at the orders table and again I'm going to limit to orders where Web site session I.D. equals ten fifty nine so that you can see how all three of these tables work together and we can start talking about what we would do with each of these tables first take a look at the Web site sessions data so you see here we have a Web site session I.D. This is the primary key of the Web site sessions table we have a created that timestamp which is when that session happened we have a user I.D. that is linked to the cookie in a user's browser we use that to track users across multiple sessions we have a binary flag here is repeat session. So this is going to be a 0 or a 1 depending on whether or not this customer has been to the Web site before and that's identified and then we have these you team source you team campaign you team content these are tracking parameters that we use to measure our paid marketing activity so fun fact you may have seen these you team parameters used before and not know what your team is this stands for urchin tracking module urchin is the predecessor to Google Analytics that Google actually bought and then rebranded as our own. So when you see you team parameters this is what's used by Google Analytics and many e-commerce companies have adopted similar tracking conventions for their own internal database so that you can use one set of parameters for both the Google Analytics implementation and for the internal database. Finally we have device type and HDTV refer so device type helps us understand whether the user is on their computer or if they're on a mobile device and then HDP refer helps us understand where traffic is coming from this can be especially helpful for traffic that's coming to us which we don't have tagged with paid tracking parameters because it's not through a marketing campaign we can look at HDP refer as another way to try to figure out where that traffic is coming from. Next we have the Web site page views table for that same session. So first thing to note Web site Sessions has a primary key of Web site session I.D. and that is a foreign key in the Web site pages table. So when you want to use these two together that's how you're gonna do your join. So notice all of these records have ten fifty nine as the Web site session I.D. and then we have various Web site page view I.D. and created at timestamps. So the Web site page use table is basically a log of page views that a user saw when they were on the e-commerce Web site. So what we see here is this user first landed on the home page at this time and then roughly three minutes later they went to the products page then they clicked through to the original Mr. Fuzzy page then clicked through to the cart then to the shipping page the billing page. And finally they landed on thank you for your order so this can be really helpful when we start to do things like conversion funnel analysis. Just wanted to give you a quick walkthrough of that. And then finally we have the orders table. I mentioned this table before when we were doing the walkthrough of the database the Orders table is really important for any e-commerce company. This is where the revenue events are tracked. We have here again Web site session I.D.. This is how we join back to Web site sessions so when we want to do things like look at our marketing campaigns for example the G search non branded campaign and understand whether or not those campaigns are driving a large enough number of orders to warrant their marketing spend. This is how we'll do that. We'll use the Web site session I.D. in the orders table as a foreign key that joins back to Web site sessions. So some other things that we have in here we have the price USD and the cogs the cost of goods sold in U.S. dollars as well. So we'll use that to understand revenue and to do margin analysis for our various products. Here we have the primary product I.D. So the primary product being the product that a customer puts in their cart first. A lot of e-commerce companies will use this methodology to label a primary product in the event that customers are able to purchase multiple products. This customer only purchased one item but we have a count here of the total items that customers purchased next will go a little deeper into these UTF tracking parameters. I just want you to understand how these work. That way you'll have the context so that it all makes sense as you're analyzing the data so when businesses run paid marketing campaigns they're obsessed over performance and they want to measure everything how much they spend how well traffic converts to sales any kind of data that they can get on the money that they're spending in marketing. They really really want. So pay traffic is often tagged with these UTF parameters these are appended to the end of the you are ELLs and they allow us to tie the Web site activity back to the specific traffic sources and campaigns that we're driving that activity. So if we were running ABC Web site dot com instead of just sending traffic to ABC Web site dot com we would also append these tracking parameters to the end where we have a parameter UTF source and a value where you would put something in here. These are known as parameter value pairs and they are appended as these you are L string query parameters. After the initial you are L. So one thing to note is the question mark here tells the browser that everything that comes next is not going to impact where the browser should look to find the page. It's just on there for tracking purposes. And so our parameter value pairs are separated by ampersand. So in this case we have two parameter value pairs. One is you team source and one is UTF campaign. This is where you might tag it as the specific social media campaign you're running and then the name of the social network as is typically the format that people will follow. Using these parameters but you could do it however you want. That's just the standard convention. So here's how that data looks in the database. If we were to go to Web site sessions and do a select distinct you team source you T.M. campaign what we'll see here is a number of you team source and your team campaigns. We have B search and G search and social book. Are the three new teams source values that come out and each of them have multiple campaigns associated with them within those sources. So we'll do some analysis on that later on. Just wanted to explain how that works. And then we have some traffic that has a no you team source. So when we see no you team source this is traffic that is not driven by a paid campaign or possibly the marketer who's running the paid campaign forgot to put tracking parameters on their campaigns. Next we'll talk a little bit about specifically how we use sequel to analyze this data. So we're gonna use the UTF parameters that are stored in the database to identify paid Web site sessions and then from that session data we're going to link to our ordered data to understand how much revenue those paid campaigns are driving. This is a very typical analytics requirement. It's something that any business is going to be asking someone to do. They typically will have a dedicated analyst that will serve marketing activities or if it's a really small company they may have one analyst that's serving the entire company. But you're going to be doing stuff like this all the time. My pro tip for doing traffic source analysis is get really familiar with using group buy and then the aggregate functions like count and some and you're going to get very good at understanding what are your largest session drivers. What are your largest order drivers something the revenue that's generated from these traffic sources this just becomes really powerful when you're operating a business at scale to be able to use the data to try and figure all this stuff out. So let's jump into workbench and I'll do a live demo on the stuff right now. All right. So here we've got a very simple query we're naming the Maven fuzzy factory as the schema that we want to use. Just a good best practice so that we don't run into bugs and the next we are going to run a simple select star from Web site sessions. And I've noted here that we're going to limit the data to Web site session entities between 1000 and 2000. This is totally arbitrary. And so you see what the data looks like here. We have a whole bunch of sessions when they were created. The user I.D. whether or not it was a repeat session and then some team tracking the device type of the user and the refer. So one common thing that we may try to do is just size our various traffic sources. I'll give the example of looking into the UTF content where a lot of places will use UTF content to store the name of a specific ad unit that they're running. So let's do a count of sessions and we'll group that by Team content so we can see which ads are driving the most sessions. First I'm gonna go ahead and do UTF content and then I'll do a count of distinct Web site session I.D.. One thing to note here website session I.D. should be unique. Anyway I always use count distinct when I'm trying to size anything just in case something in my query produces duplication that I didn't think was gonna be there. It's kind of like a failsafe to keep yourself from messing things up so that way if something happens and I end up with the same session I.D. getting duplicated onto multiple rows because I'm using the distinct it's going to cover me there in this specific application. It's totally redundant and it doesn't matter but I just wanted to note that method for you guys. So then we'll come here and we'll add a group by and we'll say group by you T.M. content and we run that and you see we've got four different types of Team content here. We have no. So that is traffic that was not tagged with any kind of a YouTube content parameter that could be direct traffic that could be a paid session that the marketing person forgot to tag. Then we've got B add to g add 1 and G add 2. And so one thing that will often want to do here is order by volume descending. So we'll show you how that works. We can do so here we've now reordered the traffic sources so that the largest ones are on top. This is pretty typical for anything that I do I'll usually be looking for the largest buckets as the most important buckets. That doesn't mean we wouldn't care about the things on the bottom of the list. It just means if you have to focus your attention somewhere first you typically want to steer the business towards going after the segments that are generating the lion's share of activity. Here we're doing a count distinct Web site I.D. as our descending criteria. So basically our sessions I want to show you here that you could also do order by Sessions descending. So in this case we've named this with an alias and we can also use that alias in our order by and you see the results are the same. And so that's just a little shortcut. Another thing that we can do here is we can use in our group by or in our order by instead of writing out the column name we can simply write the column position. So within our select statement we have two columns. This is column 1 and this is column 2. So a nice little shorthand that you can do in my school is grouped by one. And so that same group by column 1. And then you could do order by column two descending. And that will do the same thing as if you had typed out your team content and count distinct Web site sessions. So take a look at this and you see yet again we have the same exact output. So these are just handy little shortcuts you can do. And obviously as you know the line breaks don't matter here so when we run this on the same line as the group by that will work too. So now that we have the Web site sessions that drive the most volume I want to show you how we would bring in our orders as well so take a look here. We'll do a left join to the orders table and as we talked about before orders has a Web site session i.e. and so we'll do Web site session Ida equals Web site sessions dot Web site session 90 and what you should note here is once we bring in orders that has Web site session I.D.. So now if we tried to run this we're going to get an error because this column is ambiguous. The server doesn't know which table we're trying to pull it from. So I'm going to just note that explicitly here and once we do multi table queries I'll be explicitly naming most of my columns. Just general best practice to make it really easy to read. Takes a little longer to write but should be OK. So now we've got the order table in here and we can add another column which is a count distinct orders that order Ida as orders. And we'll see what happens there. And so what we see is we've got our sessions and our orders and we see all of our orders have been driven by this g add one during the session period from 1000 to 2000. Next we can translate this to a rate in this case it's obvious because many of them are zeros but we can always do this and this is a good practice to start to analyze conversion rates. So conversion rate analysis in general is just a pretty simple fraction. It's how many successful outcomes we have in this case the number of orders over how many attempts we had to convert. And so in this case that's going to be sessions and I'll name that session to order conversion rate and we'll go ahead and run that and you'll see what we get. So we see that for G add 1 We got a three point five nine percent conversion rate I call that session to order conversion rate. A lot of times we'll just talk about this specifically as conversion rate the percentage of your sessions which convert to a revenue producing sale. So I think that's a pretty good intro to using your web session data and your orders data to size your traffic sources and understand how well those traffic sources are converting to orders at this point you'll be ready to go through the assignments. This course is structured a little different than the course I did previously. You'll have multiple assignments back to back. One of the pieces of feedback students gave me is that they wish there were more assignments. So we've built a lot of assignments into this course. So go ahead and try the assignments you'll go through them on your own and then I'll walk through solution videos and we'll also get to see the business decisions that are made based on your recommendations. Have fun with it. And if anything looks tricky and you don't think you know how to do it you can always come back to this video and try to get a quick walkthrough of these methods again. Good luck.