The concat function is used to join multiple pandas data structures along a specified axis and possibly perform union or intersection operations along other axes. The following command explains the concat function:
concat(objs, axis=0, , join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False)
The elements of the concat function can be summarized as follows:
- The objs function: A list or dictionary of Series, DataFrame, or Panel objects to be concatenated.
- The axis function: The axis along which the concatenation should be performed. 0 is the default value.
- The join function: The type of join to perform when handling indexes on other axes. The 'outer' function is the default.
- The join_axes function: This is used to specify exact indexes for the remaining indexes instead of doing an outer/inner join.
- The keys function: This specifies a list of keys to be used to construct a MultiIndex.
For an explanation of the remaining options, please refer to the documentation at http://pandas.pydata.org/pandas-docs/stable/merging.html.
Here is an illustration of the workings of concat using our stock price examples from earlier chapters:
In [53]: stockDataDF=pd.read_csv('./tech_stockprices.csv').set_index(
['Symbol']);stockDataDF Out[53]: Closing price EPS Shares Outstanding(M) P/E Market Cap(B) Beta Symbol AAPL 501.53 40.32 892.45 12.44 447.59 0.84 AMZN 346.15 0.59 459.00 589.80 158.88 0.52 FB 61.48 0.59 2450.00 104.93 150.92 NaN GOOG 1133.43 36.05 335.83 31.44 380.64 0.87 TWTR 65.25 -0.30 555.20 NaN 36.23 NaN YHOO 34.90 1.27 1010.00 27.48 35.36 0.66
We now take various slices of the data:
In [83]: A=stockDataDF.ix[:4, ['Closing price', 'EPS']]; A Out[83]: Closing price EPS Symbol AAPL 501.53 40.32 AMZN 346.15 0.59 FB 61.48 0.59 GOOG 1133.43 36.05 In [84]: B=stockDataDF.ix[2:-2, ['P/E']];B Out[84]: P/E Symbol FB 104.93 GOOG 31.44 In [85]: C=stockDataDF.ix[1:5, ['Market Cap(B)']];C Out[85]: Market Cap(B) Symbol AMZN 158.88 FB 150.92 GOOG 380.64 TWTR 36.23
Here, we perform concatenation by specifying an outer join, which concatenates and performs a union on all three DataFrames and includes entries that do not have values for all the columns by inserting NaN for such columns:
In [86]: pd.concat([A,B,C],axis=1) # outer join Out[86]: Closing price EPS P/E Market Cap(B) AAPL 501.53 40.32 NaN NaN AMZN 346.15 0.59 NaN 158.88 FB 61.48 0.59 104.93 150.92 GOOG 1133.43 36.05 31.44 380.64 TWTR NaN NaN NaN 36.23
We can also specify an inner join that performs concatenation but only includes rows that contain values for all the columns in the final DataFrame by throwing out rows with missing columns; that is, it takes the intersection:
In [87]: pd.concat([A,B,C],axis=1, join='inner') # Inner join Out[87]: Closing price EPS P/E Market Cap(B) Symbol FB 61.48 0.59 104.93 150.92 GOOG 1133.43 36.05 31.44 380.64
The third case enables us to use the specific index from the original DataFrame to join on:
In [102]: pd.concat([A,B,C], axis=1, join_axes=[stockDataDF.index]) Out[102]: Closing price EPS P/E Market Cap(B) Symbol AAPL 501.53 40.32 NaN NaN AMZN 346.15 0.59 NaN 158.88 FB 61.48 0.59 104.93 150.92 GOOG 1133.43 36.05 31.44 380.64 TWTR NaN NaN NaN 36.23 YHOO NaN NaN NaN NaN
In this last case, we see that the YHOO row was included even though it wasn't contained in any of the slices that were concatenated. In this case, however, the values for all the columns are NaN. Here is another illustration of concat, but this time, it is on random statistical distributions. Note that in the absence of an axis argument, the default axis of concatenation is 0:
In[135]: np.random.seed(100) normDF=pd.DataFrame(np.random.randn(3,4));normDF Out[135]: 0 1 2 3 0 -1.749765 0.342680 1.153036 -0.252436 1 0.981321 0.514219 0.221180 -1.070043 2 -0.189496 0.255001 -0.458027 0.435163 In [136]: binomDF=pd.DataFrame(np.random.binomial(100,0.5,(3,4)));binomDF Out[136]: 0 1 2 3 0 57 50 57 50 1 48 56 49 43 2 40 47 49 55 In [137]: poissonDF=pd.DataFrame(np.random.poisson(100,(3,4)));poissonDF Out[137]: 0 1 2 3 0 93 96 96 89 1 76 96 104 103 2 96 93 107 84 In [138]: rand_distribs=[normDF,binomDF,poissonDF] In [140]: rand_distribsDF=pd.concat(rand_distribs,keys=['Normal', 'Binomial', 'Poisson']);rand_distribsDF Out[140]: 0 1 2 3 Normal 0 -1.749765 0.342680 1.153036 -0.252436 1 0.981321 0.514219 0.221180 -1.070043 2 -0.189496 0.255001 -0.458027 0.435163 Binomial 0 57.00 50.00 57.00 50.00 1 48.00 56.00 49.00 43.00 2 40.00 47.00 49.00 55.00 Poisson 0 93.00 96.00 96.00 89.00 1 76.00 96.00 104.00 103.00 2 96.00 93.00 107.00 84.00