The where() method is used to ensure that the result of Boolean filtering is the same shape as the original data. First, we set the random number generator seed to 100 so that the user can generate the same values, as shown next:
In [379]: np.random.seed(100) normvals=pd.Series([np.random.normal() for i in np.arange(10)]) normvals Out[379]: 0 -1.749765 1 0.342680 2 1.153036 3 -0.252436 4 0.981321 5 0.514219 6 0.221180 7 -1.070043 8 -0.189496 9 0.255001 dtype: float64 In [381]: normvals[normvals>0] Out[381]: 1 0.342680 2 1.153036 4 0.981321 5 0.514219 6 0.221180 9 0.255001 dtype: float64 In [382]: normvals.where(normvals>0) Out[382]: 0 NaN 1 0.342680 2 1.153036 3 NaN 4 0.981321 5 0.514219 6 0.221180 7 NaN 8 NaN 9 0.255001 dtype: float64
This method seems to be useful only in the case of a Series, as we get this behavior for free in the case of a DataFrame:
In [393]: np.random.seed(100) normDF=pd.DataFrame([[round(np.random.normal(),3) for i in np.arange(5)] for j in range(3)], columns=['0','30','60','90','120']) normDF Out[393]: 0 30 60 90 120 0 -1.750 0.343 1.153 -0.252 0.981 1 0.514 0.221 -1.070 -0.189 0.255 2 -0.458 0.435 -0.584 0.817 0.673 3 rows × 5 columns In [394]: normDF[normDF>0] Out[394]: 0 30 60 90 120 0 NaN 0.343 1.153 NaN 0.981 1 0.514 0.221 NaN NaN 0.255 2 NaN 0.435 NaN 0.817 0.673 3 rows × 5 columns In [395]: normDF.where(normDF>0) Out[395]: 0 30 60 90 120 0 NaN 0.343 1.153 NaN 0.981 1 0.514 0.221 NaN NaN 0.255 2 NaN 0.435 NaN 0.817 0.673 3 rows × 5 columns
The inverse operation of the where method is mask:
In [396]: normDF.mask(normDF>0) Out[396]: 0 30 60 90 120 0 -1.750 NaN NaN -0.252 NaN 1 NaN NaN -1.070 -0.189 NaN 2 -0.458 NaN -0.584 NaN NaN 3 rows × 5 columns