Winsorizing data

Winsorizing is another technique to deal with outliers and is named after Charles Winsor. In effect, Winsorization clips outliers to given percentiles in a symmetric fashion. For instance, we can clip to the 5th and 95th percentile. SciPy has a winsorize() function, which performs this procedure. The data for this recipe is the same as that for the Clipping and filtering outliers recipe.

How to do it...

Winsorize the data with the following procedure:

  1. The imports are as follows:
    rom scipy.stats.mstats import winsorize
    import statsmodels.api as sm
    import seaborn as sns
    import matplotlib.pyplot as plt
    import dautil as dl
    from IPython.display import HTML
  2. Load and winsorize the data for the effective temperature (limit is set to 15%):
    starsCYG = sm.datasets.get_rdataset("starsCYG", "robustbase", cache=True).data
    limit = 0.15
    winsorized_x = starsCYG.copy()
    winsorized_x['log.Te'] = winsorize(starsCYG['log.Te'], limits=limit)
  3. Winsorize the light intensity as follows:
    winsorized_y = starsCYG.copy()
    winsorized_y['log.light'] = winsorize(starsCYG['log.light'], limits=limit)
    winsorized_xy = starsCYG.apply(winsorize, limits=[limit, limit])
  4. Plot the Hertzsprung-Russell diagram with regression lines (not part of the usual astronomical diagram):
    sp = dl.plotting.Subplotter(2, 2, context)
    sp.label()
    sns.regplot(x='log.Te', y='log.light', data=starsCYG, ax=sp.ax)
    
    sp.label(advance=True)
    sns.regplot(x='log.Te', y='log.light', data=winsorized_x, ax=sp.ax)
    
    sp.label(advance=True)
    sns.regplot(x='log.Te', y='log.light', data=winsorized_y, ax=sp.ax)
    
    sp.label(advance=True)
    sns.regplot(x='log.Te', y='log.light', data=winsorized_xy, ax=sp.ax)
    plt.tight_layout()
    HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (refer to the winsorising_data.ipynb file in this book's code bundle):

How to do it...

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset