The concat function

The concat function is used to join multiple pandas data structures along a specified axis and possibly perform union or intersection operations along other axes. The following command explains the concat function:

concat(objs, axis=0, , join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False)

The elements of the concat function can be summarized as follows:

  • The objs function: A list or dictionary of Series, DataFrame, or Panel objects to be concatenated.
  • The axis function: The axis along which the concatenation should be performed. 0 is the default value.
  • The join function: The type of join to perform when handling indexes on other axes. The 'outer' function is the default.
  • The join_axes function: This is used to specify exact indexes for the remaining indexes instead of doing an outer/inner join.
  • The keys function: This specifies a list of keys to be used to construct a MultiIndex.

For an explanation of the remaining options, please refer to the documentation at http://pandas.pydata.org/pandas-docs/stable/merging.html.

Here is an illustration of the workings of concat using our stock price examples from earlier chapters:

    In [53]: stockDataDF=pd.read_csv('./tech_stockprices.csv').set_index(
['Symbol']);stockDataDF
Out[53]: Closing price EPS Shares Outstanding(M) P/E Market Cap(B) Beta Symbol AAPL 501.53 40.32 892.45 12.44 447.59 0.84 AMZN 346.15 0.59 459.00 589.80 158.88 0.52 FB 61.48 0.59 2450.00 104.93 150.92 NaN GOOG 1133.43 36.05 335.83 31.44 380.64 0.87 TWTR 65.25 -0.30 555.20 NaN 36.23 NaN YHOO 34.90 1.27 1010.00 27.48 35.36 0.66

We now take various slices of the data:

    In [83]: A=stockDataDF.ix[:4, ['Closing price', 'EPS']]; A
    Out[83]:  Closing price  EPS
     Symbol
      AAPL     501.53      40.32
      AMZN     346.15     0.59
      FB      61.48     0.59
       GOOG    1133.43    36.05
    
    In [84]: B=stockDataDF.ix[2:-2, ['P/E']];B
    Out[84]:         P/E
            Symbol
           FB   104.93
           GOOG   31.44
    
    In [85]: C=stockDataDF.ix[1:5, ['Market Cap(B)']];C
    Out[85]:         Market Cap(B)
            Symbol
           AMZN   158.88
           FB   150.92
           GOOG   380.64
           TWTR   36.23

Here, we perform concatenation by specifying an outer join, which concatenates and performs a union on all three DataFrames and includes entries that do not have values for all the columns by inserting NaN for such columns:

    In [86]: pd.concat([A,B,C],axis=1) # outer join
    Out[86]:  Closing price  EPS    P/E   Market Cap(B)
      AAPL   501.53     40.32  NaN   NaN
      AMZN   346.15     0.59   NaN   158.88
       FB   61.48           0.59  104.93 150.92
       GOOG   1133.43       36.05  31.44 380.64
       TWTR   NaN            NaN    NaN    36.23

We can also specify an inner join that performs concatenation but only includes rows that contain values for all the columns in the final DataFrame by throwing out rows with missing columns; that is, it takes the intersection:

    In [87]: pd.concat([A,B,C],axis=1, join='inner') # Inner join
    Out[87]:        Closing price  EPS  P/E   Market Cap(B)
             Symbol
             FB      61.48    0.59 104.93  150.92
          GOOG    1133.43   36.05   31.44   380.64
  

The third case enables us to use the specific index from the original DataFrame to join on:

    In [102]: pd.concat([A,B,C], axis=1, join_axes=[stockDataDF.index])
    Out[102]:       Closing price  EPS    P/E   Market Cap(B)
          Symbol
             AAPL   501.53     40.32  NaN   NaN
            AMZN   346.15     0.59   NaN   158.88
             FB   61.48           0.59  104.93 150.92
             GOOG   1133.43       36.05  31.44 380.64
             TWTR   NaN            NaN    NaN    36.23
            YHOO   NaN            NaN    NaN    NaN
  

In this last case, we see that the YHOO row was included even though it wasn't contained in any of the slices that were concatenated. In this case, however, the values for all the columns are NaN. Here is another illustration of concat, but this time, it is on random statistical distributions. Note that in the absence of an axis argument, the default axis of concatenation is 0:

    In[135]: np.random.seed(100)
            normDF=pd.DataFrame(np.random.randn(3,4));normDF
    Out[135]:    0    1      2    3
      0  -1.749765  0.342680  1.153036  -0.252436
      1   0.981321  0.514219  0.221180  -1.070043
      2  -0.189496  0.255001 -0.458027   0.435163
    
    In [136]: binomDF=pd.DataFrame(np.random.binomial(100,0.5,(3,4)));binomDF
    Out[136]:  0  1  2  3
      0  57  50  57     50
      1  48  56  49     43
      2  40  47  49     55
    
    In [137]: poissonDF=pd.DataFrame(np.random.poisson(100,(3,4)));poissonDF
    Out[137]:  0  1  2  3
       0  93  96  96  89
       1  76  96  104  103
       2  96  93  107   84
    
    In [138]: rand_distribs=[normDF,binomDF,poissonDF]
    In [140]: rand_distribsDF=pd.concat(rand_distribs,keys=['Normal', 'Binomial', 'Poisson']);rand_distribsDF
    Out[140]:         0        1       2          3
      Normal     0  -1.749765   0.342680  1.153036  -0.252436
           1   0.981321   0.514219  0.221180  -1.070043
           2  -0.189496   0.255001 -0.458027   0.435163
      Binomial 0   57.00       50.00     57.00      50.00
           1   48.00       56.00     49.00      43.00
           2   40.00       47.00     49.00      55.00
      Poisson  0   93.00       96.00     96.00      89.00
           1   76.00       96.00    104.00     103.00
           2   96.00       93.00    107.00      84.00
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset