Adding Rows with append() and pd.concat() (Part 1)

Alexander Hagmann
A free video tutorial from Alexander Hagmann
Data Scientist | Finance Professional | Entrepreneur
4.6 instructor rating • 6 courses • 27,001 students

Learn more from the full course

The Complete Pandas Bootcamp 2020: Data Science with Python

Pandas fully explained | 150+ Exercises | Must-have skills for Machine Learning & Finance | + Scikit-Learn and Seaborn

31:50:30 of on-demand video • Updated September 2020

  • Bring your Data Handling & Data Analysis skills to an outstanding level.
  • Learn and practice all relevant Pandas methods and workflows with Real-World Datasets
  • Learn Pandas based on NEW Version 1.0 (the days of versions 0.x are over)
  • Import, clean, and merge messy Data and prepare Data for Machine Learning
  • Master a complete Machine Learning Project A-Z with Pandas, Scikit-Learn, and Seaborn
  • Analyze, visualize, and understand your Data with Pandas, Matplotlib, and Seaborn
  • Practice and master your Pandas skills with Quizzes, 150+ Exercises, and Comprehensive Projects
  • Import Financial/Stock Data from Web Sources and analyze them with Pandas
  • Learn and master the most important Pandas workflows for Finance
  • Learn how to best transition from Versions 0.X to new Version 1.0
  • Learn the Basics of Pandas and Numpy Coding (Appendix)
  • Learn and master important Statistical Concepts with scipy
English [Auto] Welcome to the first session on emerging joining and concatenate data frames. And in this video we will add additional Rose observations to a data frame on other matters. He will combine two data frames vertically. So this sounds complicated but let's have a look at an example. And first of all we are importing pandas. And then we have an example. So we have two data frames. We have the data frame 2008 where we have our columns where the athlete Michael Phelps won a medal in the 2008 edition in Beijing. So in total we are we have eight gold medals and we have a second data frame 2012 with all medals from Michael Phelps and the addition 2012 from London. So here we have four gold medals and two silver medals and a quite common operation is now to actually combine those two data frames and actually add the rose after the two data frames. So to combine those two data frames vertically and by doing so we are getting a combined data frame with all rows of the first data frame and all rows of the second data frame. So this is a quite easy operation. And also on pandas if you know how to do it that's quite straightforward. So now for our coding example we have two data frames. So we have the data frame man 2004 and the data frame man two thousand eight and let's import both and let's have a look here. So here we have all male athletes that won in the addition 2004 at least one medal in a swimming event. So we have here the first column with the athlete names and here the second column with the total number of medals and the addition 2004 and obviously here the data frame is sorted by the medals column from high to low. So no surprise here from the top Michael Phelps and in total we have four year fifty nine rows and fifty nine athletes. So starting from the next level is zero. And then we have a second data frame called men two thousand and eight. And that's also impart this one here. And no surprise that these are the male athletes and the addition two thousand eight in Beijing that at least won one medal in a swimming event. And also here the data frame assaulted by the medals column and we have Michael Phelps and in total we have sixty two athletes are sixty two rows. And let's assume that actually you want to combine both data frames and actually want to add all rows of the 2008 Dara frame to the rows of the 2004 data frame and we can do this with the pen method. So here we have our data from 2004 and then we are using the apparent method and let's have a look here with the shift tab so the appellant method appends pencil rows of other to the end of this frame. So this rain miss the man 2004 frame and the other frame is here the man 2008 frame. So we have here the power meter other very passive man two thousand eight. And by definition all rows of the 2004 frame are at the top of the new data frame and all rows of the two thousand and eight frame are at the end of the new data frame. So actually by using the append method we are returning a new object on new data frame. So let's try this out and we also have here apparently to ignore index and by default it's set to form. So we have a look here later. And here we can see on the top all the rows of the 2004 data frame and here at the end we have the rows of the two thousand and eight data frame. And in total we have one on the 21 athletes all rows and we can see here on the left hand side that we still have the original index labels from the original data frames. So for example here we have Lin. This that the index labeled 61 and also in the original 2008 data frame you had the 61 as index later. However we can change this behaviour by setting the ignore index parameter to true and by doing so the original index labels are obviously ignored and penance creates a brand new Range Index from zero to one hundred and twenty. So let's have a look here. So starting here from zero to one hundred and twenty. So now we have successfully combined both two data friends vertically. However one pitfall this year that it's quite hard for each row to identify that the row is originally coming from. So if we again want to split here the combined data frame and the original data frames that could be quite hard actually. And that comes actually at the conquered method and to play. So this is actually the second alternative. To combine two data frames so vertically. So with the PD dot com cap method. So this is a direct pan mass method. So let's check here and the concrete method concatenate pen as objects along a particular axis. So here we can define whether we want to combine the two data frames vertically or horizontally and we can define an axis. So let's check this. So if you pass through the Axis parameter 0 or index we combine both two data frames along the index or vertically. And if you pass one our columns you actually combine the two data frames horizontally. But actually the default setting is zero so vertically and that's obviously the objects power meter a. have to pass a list with the data frames that we want to concatenate. And in this case it's man 2004 and man 2008. But we could also concatenate here more data frames than just two. So five 5 10 or 100. And also here we have to pair me to ignore in next and the default setting is false. So let's try this out here and again here we are starting with the 2004 rows. So by definition the most left in data frame that we pass it to the objects pair power meter is on the top and then we are having the two thousand and eight rows and we have again 121 rows. And again here we can see on the left hand side that that we have for the original index this and bypassing true to the ignore index pair meet. We can change this and we can create a brand new Range Index. So this is the same as service with the append method. So now we have fair Range Index from zero to one hundred and twenty and before this the append method we had the problem that exit was quite hard to identify for each row from it. So it's a data frame. The row was coming from and you have this is the PD dot com cat method. We can change this. And here we have the apparently keys. And we can actually create a mighty index C on the left hand side. And actually we are adding an outer and next level. We can define index labels for our 2004 rows and for our 2008 rows. So here we are having to date our friends and for the keys barometer we can pass a list with two labels that we want to f for the rows of 2004 and for the rows of 2008. So let's create here a list and apparently you want to have the labels 2004 and 2008. So let's run this out. But before it sets here the ignore index to forwards again so we don't need this here and now you may have on the left hand side our multi index. And as of the order index level we are having near 2004 and 2008 and we can see all rows of firm 2004 and we are starting with our rows of 2008 and a total of course we have 121 rows so we have actually two alternatives to add rows to a combined two data frames vertically. So we have the append method and the PD up concrete method and actually two differences which makes actually the com cat method more powerful. So first of all we can define the key is where after we have combined the data frames that can still identify where each row was coming from. So from that data frame and that's actually a second advantage of the con cap method. So if you want to concatenate many data frames for example 10 50 or 100 so let's assume we have a data frame for each citizen from eighteen hundred to 96. So I think in total it's a lot funnier than the concrete method this and why appropriate service to the concrete method. Here we can pass a list with all data frames and then the conquered method automatically appends all data frames and with the pen method here we can also pass a list of more than one data frame so many data frames. But first before we have to select the one data frame. So here for example the 2048 frame and this data frame we have to apply the append method and then we have to pass a list of all other data frames that you want to combine. But excluding the very first data frame because we already have here the very first data frame and this is a bit error sensitive and you wouldn't do this if you have many data frames. So let's go back here to our com CAD to methods and there's actually a third advantage. So with the Concorde method we not only can combine data frames vertically but also horizontally in with the perimeter axis. So here. So as a summary the Concorde method is actually the more general method with more functionality. And actually I always use only the Concorde method. All right and that's actually one another perimeter. So before we create it here the mighty index service the outer index level where we have here for each row the addition. So 2004 or 2008 and we can also give you a name to the levels so we can give you names for the auto index level and the next level. And that's the parameter names. So here we have the parameter names and next year we want to name the outer index. Yeah. So we can do this by passing a year within a list. So let's run the cell and all we have here the name for our outer index the year and once we are happy here with our concatenated data frame we can also save our data frame for example in the variable man 0 4 0 8 and of course we can re inspect our safe data frame and actually it might be the case that we are not really happy here with the commodity index. So we can also reset the index with the reset index method so here we have again our data frame and with the reset index method we can transform our mighty index into columns and create a brand new Range Index so let's do this here. So now here we have the columns here and the former level one mighty index. And here we have our new Range Index and then we can also drop here the column level 1 and this could now be our final data frame. So we have here on athletes are rows from the 2004 edition and all matters and then here in the end we have all rows from the two thousand and eight of this and or 2008 data frame. And in total one hundred twenty one are right we are finished now with this video and I hope to see you also in the next one by.