Using the where() method

The where() method is used to ensure that the result of Boolean filtering is the same shape as the original data. First, we set the random number generator seed to 100 so that the user can generate the same values, as shown next:

    In [379]: np.random.seed(100)
           normvals=pd.Series([np.random.normal() for i in np.arange(10)])
        normvals
    Out[379]: 0   -1.749765
        1    0.342680
        2    1.153036
        3   -0.252436
        4    0.981321
        5    0.514219
        6    0.221180
        7   -1.070043
        8   -0.189496
        9    0.255001
        dtype: float64
    
    In [381]: normvals[normvals>0]
    Out[381]: 1    0.342680
        2    1.153036
        4    0.981321
        5    0.514219
        6    0.221180
        9    0.255001
        dtype: float64
    
    In [382]: normvals.where(normvals>0)
    Out[382]: 0         NaN
        1    0.342680
        2    1.153036
        3         NaN
        4    0.981321
        5    0.514219
        6    0.221180
        7         NaN
        8         NaN
        9    0.255001
        dtype: float64
  

This method seems to be useful only in the case of a Series, as we get this behavior for free in the case of a DataFrame:

    In [393]: np.random.seed(100) 
           normDF=pd.DataFrame([[round(np.random.normal(),3) for i in np.arange(5)] for j in range(3)], 
                 columns=['0','30','60','90','120'])
        normDF
    Out[393]:  0  30  60  90  120
      0  -1.750   0.343   1.153  -0.252   0.981
      1   0.514   0.221  -1.070  -0.189   0.255
      2  -0.458   0.435  -0.584   0.817   0.673
      3 rows × 5 columns
    In [394]: normDF[normDF>0]
    Out[394]:  0  30  60  90  120
      0   NaN   0.343   1.153   NaN   0.981
      1   0.514   0.221   NaN       NaN   0.255
      2   NaN   0.435   NaN   0.817   0.673
      3 rows × 5 columns
    In [395]: normDF.where(normDF>0)
    Out[395]:  0  30  60  90  120
      0   NaN   0.343   1.153   NaN   0.981
      1   0.514   0.221   NaN   NaN   0.255
      2   NaN   0.435   NaN   0.817   0.673
      3 rows × 5 columns
  

The inverse operation of the where method is mask:

    In [396]: normDF.mask(normDF>0)
    Out[396]:  0  30  60  90  120
      0  -1.750  NaN   NaN  -0.252  NaN
      1   NaN  NaN  -1.070  -0.189  NaN
      2  -0.458  NaN  -0.584   NaN  NaN
      3 rows × 5 columns
  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset