Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Dealing with Data and Numerical Issues

The recipes in this chapter are as follows:

Clipping and filtering outliers
Winsorizing data
Measuring central tendency of noisy data
Normalizing with the Box-Cox transformation
Transforming data with the power ladder
Transforming data with logarithms
Rebinning data
Applying logit() to transform proportions
Fitting a robust linear model
Taking variance into account with weighted least squares
Using arbitrary precision for optimization
Using arbitrary precision for linear algebra

Introduction

In the real world, data rarely matches textbook definitions and examples. We have to deal with issues such as faulty hardware, uncooperative customers, and disgruntled colleagues. It is difficult to predict what kind of issues you will run into, but it is safe to assume that they will be plentiful and challenging. In this chapter, I will sketch some common approaches to deal with noisy data, which are based more on rules of thumb than strict science. Luckily, the trial and error part of data analysis is limited.

Most of this chapter is about outlier management. Outliers are values that we consider to be abnormal. Of course, this is not the only issue that you will encounter, but it is a sneaky one. A common issue is that of missing or invalid values, so I will briefly mention masked arrays and pandas features such as the dropna() function, which I have used throughout this book.

I have also written two recipes about using mpmath for arbitrary precision calculations. I don't recommend using mpmath unless you really have to because of the performance penalty you have to pay. Usually we can work around numerical issues, so arbitrary precision libraries are rarely needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Dealing with Data and Numerical Issues

Create new playlist

Sign In

Sign Up

Chapter 4. Dealing with Data and Numerical Issues

Introduction

Table of Contents for
4. Dealing with Data and Numerical Issues