Adjusting values – data preprocessing

Raw data collected from a data source usually presents different particularities, such as data range, sampling, and category. Some variables result from measurements, while the others are a summary or even calculated. Preprocessing means to adapt these variables' values to form neural networks that can handle them properly.

Regarding weather variables, let's take a look at their range, sampling, and type, shown in the following table:

Variable

Unit

Range

Sampling

Type

Mean temperature

°C

23.86–29.25

Hourly

Average of hourly measurements

Precipitation

Mm

0–161.20

Daily

Accumulation of daily rain

Insolation

h

0–10.40

Daily

Count of hours receiving sun radiation

Mean humidity

%

65.50–96.00

Hourly

Average of hourly measurements

Mean wind speed

km/h

0.00–3.27

Hourly

Average of hourly measurements

Except for insolation and precipitation, the variables are all measured and share the same sampling, but if we wanted, for example, to use an hourly dataset, we would have to preprocess all the variables to use the same sample rate. Three of the variables are summarized using daily average values, but if we wanted to, we could use hourly data measurements. However, the range would surely be larger.

Equalizing data – normalization

Normalization is the process to get all the variables into the same data range, usually with smaller values, between 0 and 1 or -1 and 1. This helps the neural network to present values within the variable zone in activation functions such as sigmoid or hyperbolic tangent:

Equalizing data – normalization

Values too high or too low may drive neurons to produce values that are too high or too low as well for the activation functions, therefore leading the derivative for these neurons to be too small, near zero.

The normalization should consider a predefined range of the dataset. It is performed right away:

Equalizing data – normalization

Where Nmin and Nmax represent the normalized minimum and maximum limits, respectively; Xmin and Xmax denote X variable's minimum and maximum limits, respectively; X indicates the original value; and Xnorm refers to the normalized value. If we want the normalization to be between 0 and 1, for example, the equation is simplified as follows:

Equalizing data – normalization

By applying the normalization, a new "normalized" dataset is produced and is fed to the neural network. One should also take into account that a neural network fed with normalized values will be trained to produce normalized values on the output, so the inverse (denormalization) process becomes necessary as well.

Equalizing data – normalization

or:

Equalizing data – normalization

For the normalization between 0 and 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset