Left Joins

Boris Paskhaver
A free video tutorial from Boris Paskhaver
Software Engineer | Consultant | Author
4.7 instructor rating • 6 courses • 283,219 students

Lecture description

left join establishes one of the DataFrames as the base dataset for the merge. It attempts to find each value in another DataFrame and drag over that DataFrame's rows when there's a value match. In this lesson, we'll practice executing this join with the .merge() method.

Learn more from the full course

Data Analysis with Pandas and Python

Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!

20:34:30 of on-demand video • Updated September 2020

  • Perform a multitude of data operations in Python's popular "pandas" library including grouping, pivoting, joining and more!
  • Learn hundreds of methods and attributes across numerous pandas objects
  • Possess a strong understanding of manipulating 1D, 2D, and 3D data sets
  • Resolve common issues in broken or incomplete data sets
English [Auto] Are right in the past few lessons we explored her joints and outer joins and now we're going to focus on left joins. So let's take a look at a visual representation first that'll help us a rapper head around this concept. So as we know an INNER JOIN looks something like this. That's where we focus on the values that exist in both of the data frames if they exist in one but not the other we don't care about them. And we also explored the full outer join where we look at values that exist in either or both. Now in this case we've kind of been focusing on both the data frames they've both had equal weight in this operation that with where we perform the merge on whenever we move to a left merge we basically set one of the data frames as our main focus. So if we pretend here that a is are left out of frame and B is our right data frame what we're doing is saying we want to look at the values in a column in data frame A. And we want to see if they exist in that frame B. If they do then great pull those rows over where we can join based on a common value and if they don't exist in common be in the data frame B then just put no values. So we're looking primarily our focus is on the values within data frame. Now if we have values in data frame B that are exclusively there we really will never care about them at all. When we're doing a left join because we're only caring about the values in the right data frame or data frame B if they already exist in that frame. So this operation is very similar basically the same thing as a V lookup operation in Excel. We're looking for a value in a left hand data set trying to identify it in a Right-Hand data set whatever values in the right hand data set whatever exists there doesn't matter. It only matters as long as we can find the match to the left hand data set. So it's kind of basically using one data frame as the foundation and another one is just kind of an additional reference point to pull in additional data from. So a b in this case the data on the right gets a little bit neglected because it's exc exclusive values are basically ignored only the values in data from a serve as the base for which we want to pull information for and a right join is basically the process in reverse. And in reality there are basically the same thing because you can basically do a left or right join and swap them by simply flipping the order of the data frames and repeating the exact same process. So throughout these tutorials I'm only going to do a left merge because it's basically the same thing if you just want to apply the process in reverse what you can do is just flip what data frame you're considering as you're left one and which when you're considering your right one and just continue calling the left operation and just swapping the data frames to establish which one you want to serve as the foundation and which when you want to serve as the additional data points that you want to pull in but this is all kind of visual and kind of big picture. Let's see this in practice and that'll give us more clarification of how exactly this works. So as I mentioned it operates very similar to a lookup operation. So let's see what kind of thing we can perform a lookup operation on here. I'm going to execute our code. Let's take a look at the first couple of rows of our week one data from. And let's also take a look at the first three rows of our foods data frame. Finally bringing that in in here. So this is the kind of operation that I left join is perfect. You can see here that in the week one that we have a unique food ID and that food ID is tied to the exact same food Id here and we have additional information related to that food item in the separate data offering. So if we want to pull this information over to here for example if we want to say oh this food ID is one but how do I know what that item is and what price that is. All I can say will look up this one right here and pull in all of the other values from that row into my original data frame. Again the values in my second data frame are only relevant if they exist in my first one which is Week 1. So pretend we have a food item here like food ID of 1000 that isnt present at all in week one. And that case we're not going to care about it in a left Amerge because Week one is only going to look at the food ideas here. And these are the only ones that are going to matter if they exist in my foods. That is going to bring them over if if they dont exist in my foods its going to basically say no are not found but the exclusive values in foods are irrelevant. We're focusing on week one. We're making this our foundation and these are the values that we actually care about. So now let's execute this process. I'm going to call the merge operation again on week one I want to merge some kind of data into my week one data frame and the data from that I want to merge with is Foods and now we're going to run into that whole parameter again. We've said it to inner and outer so far. Now we're going to set it to left and again left. In this case represents the data that we're calling the method on and right represents the data frame that we're supplying as well. And this is what I was talking about a few minutes earlier where if we wanted to merge with a focus on foods we can do something like like this. But it's really the same thing as setting this to last and just flipping around foods and week one because it's really just a matter of where you view each data frame and so on. So I'm going to bring this back to week one and bring this back to foods. I generally just don't use right. I use less singularly And then just set my data from that I want to merge into always as the one that I call my merge method on which is what the left out of frame is. So now once again I have a pretty lucky occurrence because in these two data frames we have different columns but the ones that we want to match on have the same name. You can see food Id here and pseudo-ID here are written the same way same case sensitivity and all that. So I can once again use that on parameter and give it the column that I want to search for common values in and that's going to be food Id. Now I want to execute this we'll see that we're basically taking this data for him. You can see that we have the first two rows the exact same for example customer id 5:37 and food Id nine are represented right here but based on the values that is found in foods it dragged those matching values over. So for example where we have this food Id won it looked up one in my food Id column in foods it was able to locate it and thus it brought over sushi and 399 as the new values in the food item and price columns. Now if the food Id existed in week one but it did not exist in foods it wouldn't be able to locate it in my right data frame and we would have a bunch of no values present here in this case. They're all valid food IDs so we can see that they're all going to have the proper operation pulled over so we can see here now that we have that operation all of the values from the right data frame have been pulled in. Now there's one additional thing I want to show you here. I want to provide an additional parameter here and we can see here is called Sort by default. It's set to false. And if I set it equal to true what's going to happen is it's going to sort on the column that we provided as our matching column where it's looking for both those values. So because we provided food Id as our matching columns we matched on food Id But all of those were scattered because we kept the original order of our left out of frame which was week 1. But if we wanted to see them sorted immediately in the order in which in the column rather sorted by the values in the column by which it's matching We can just pass this sort. Equals true parameter into here. It's basically a shortcut to avoid calling the sort values method. And now we can see all of the customer IDs who ordered food wine and we can see that predictably the values that have been pulled from the right data frame which is Foods is consistent. For one it's always sushi and 399. If I come down to two you can see it's always burrito and 999. And of course predictably it would be because it's just locating those food ideas on the right that A-frame drying them back in to the left. Now again this operation creates a brand new data frame so nothing is being overwritten. So if we wanted this to represent our new modified data frame and we wanted it to be stored in week one we would have to reassign it back to work once with the equal sign. And then if I take a look at week one we have that modified data frame where we've combined those values from foods back into week one based the most common occurrences in the food Id column. So that's a quick introduction to left joins We'll dive into another example in the next lesson and it will also allow us to dive a little bit more into some additional parameters that are available on the merge method. When the column names in the two data are friends that were merging very. So I'll see if there.