In Chapter 4, Computing Foundations – Databases, we discussed merging data from two database tables using the JOIN operation. To use a JOIN operation, you need to specify the names of the two tables, along with the type of JOIN (left, right, outer, or inner) and the columns on which to join:
SELECT *
FROM left_table OUTER JOIN right_table
ON left_table.index = right_table.index;
In pandas, you can accomplish table joins using the merge() or join() functions. By default, the join() function joins data on the index of the tables; however, other columns can be used by specifying the on parameter. If column names are overlapping in the two tables being joined, you will need to specify a rsuffix or lsuffix argument that renames the columns so they no longer have identical names:
df_join_df2 = df.join(df2, how='outer', rsuffix='r')
print(df_join_df2)
The output is as follows (note the NaN values in Row 3, a row that was not present in df):
col3 new_col1 new_col2 new_col3 new_col4 new_col5 new_col6 col3r 0 x 0.0 5.0 5.0 7.0 10.0 a 1 y 0.0 7.0 7.0 8.0 11.0 b 2 z 0.0 9.0 9.0 9.0 12.0 c 3 NaN NaN NaN NaN NaN NaN NaN d new_col1r new_col2r new_col3r new_col4r new_col5r new_col6r 0 0 11 17 7.5 13 1 0 13 19 8.5 14 2 0 15 21 9.5 15 3 0 17 23 10.5 16