Joining DataFrames

In Chapter 4, Computing Foundations – Databases, we discussed merging data from two database tables using the JOIN operation. To use a JOIN operation, you need to specify the names of the two tables, along with the type of JOIN (left, right, outer, or inner) and the columns on which to join:

SELECT *
FROM left_table OUTER JOIN right_table
ON left_table.index = right_table.index;

In pandas, you can accomplish table joins using the merge() or join() functions. By default, the join() function joins data on the index of the tables; however, other columns can be used by specifying the on parameter. If column names are overlapping in the two tables being joined, you will need to specify a rsuffix or lsuffix argument that renames the columns so they no longer have identical names:

df_join_df2 = df.join(df2, how='outer', rsuffix='r')
print(df_join_df2)

The output is as follows (note the NaN values in Row 3, a row that was not present in df):

  col3 new_col1  new_col2  new_col3  new_col4  new_col5  new_col6 col3r  
0    x                0.0       5.0       5.0       7.0      10.0     a   
1    y                0.0       7.0       7.0       8.0      11.0     b   
2    z                0.0       9.0       9.0       9.0      12.0     c   
3  NaN      NaN       NaN       NaN       NaN       NaN       NaN     d   

  new_col1r  new_col2r  new_col3r  new_col4r  new_col5r  new_col6r  
0                    0         11         17        7.5         13  
1                    0         13         19        8.5         14  
2                    0         15         21        9.5         15  
3                    0         17         23       10.5         16 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset