Fast Fourier transformations

Another interesting statistic we often want to compute about time series is the Fourier transformation (FT). Without going into the math, a Fourier transformation will show us the amount of oscillation within a particular frequency in a function.

You can imagine this like the tuner on an old FM radio. As you turn the tuner, you search through different frequencies. Every once in a while, you find a frequency that gives you a clear signal of a particular radio station. A Fourier transformation basically scans through the entire frequency spectrum and records at what frequencies there is a strong signal. In terms of a time series, this is useful when trying to find periodic patterns in the data.

Imagine that we found out that a frequency of one per week gave us a strong pattern. This would mean that knowledge about what the traffic was ton the same day one week ago would help our model.

When both the function and the Fourier transform are discrete, which is the case in a series of daily measurements, it is called the discrete Fourier transform (DFT). A very fast algorithm that is used for computing the DFT is known as the Fast Fourier Transform (FFT), which today has become an important algorithm in scientific computing. This theory was known to the mathematician Carl Gauss in 1805 but was brought to light more recently by American mathematicians James W. Cooley and John Tukey in 1965.

It's beyond the scope of this chapter to go into how and why the Fourier transformations work, so in this section we will only be giving a brief introduction. Imagine our function as a piece of wire. We take this wire and wrap it around a point, and if you wrap the wire so that the number of revolutions around the point matches the frequency of a signal, all of the signal peaks will be on one side of the pole. This means that the center of mass of the wire will move away from the point we wrapped the wire around.

In math, wrapping a function around a point can be achieved by multiplying the function g(n) with Fast Fourier transformations, where f is the frequency of wrapping, n is the number of the item from the series, and i is the imaginary square root of -1. Readers that are not familiar with imaginary numbers can think of them as coordinates in which each number has a two-dimensional coordinate consisting of both a real and an imaginary number.

To compute the center of mass, we average the coordinates of the points in our discrete function. The DFT formula is, therefore, as follows:

Fast Fourier transformations

Here y[f] is the fth element in the transformed series, and x[n] is the nth element of the input series, x. N is the total number of points in the input series. Note that y[f] will be a number with a real and a discrete element.

To detect frequencies, we are only really interested in the overall magnitude of y[f]. To get this magnitude we need to so we compute the root of the sum of the squares of the imaginary and real parts. In Python, we do not have to worry about all the math as we can use scikit-learn's fftpack, which has an FFT function built in.

The next step is to run the following code:

data = train.iloc[:,0:-4]
fft_complex = fft(data)
fft_mag = [np.sqrt(np.real(x)*np.real(x)+np.imag(x)*np.imag(x)) for x in fft_complex]

Here, we first extract the time series measurements without the global features from our training set. Then we run the FFT algorithm, before finally computing the magnitudes of the transformation.

After running that code, we now have the Fourier transformations for all the time series datasets. In order to allow us to get a better insight into the general behavior of the Fourier transformations we can average them by simply running:

arr = np.array(fft_mag)
fft_mean = np.mean(arr,axis=0)

This first turns the magnitudes into a NumPy array before then computing the mean. We want to compute the mean per frequency, not just the mean value of all the magnitudes, therefore we need to specify the axis along which to take the mean value.

In this case, the series are stacked in rows, so taking the mean column-wise (axis zero) will result in frequency-wise means. To better plot the transformation, we need to create a list of frequencies tested. The frequencies are in the form: day/all days in the dataset for each day, so 1/550, 2/550, 3/550, and so on. To create the list we need to run:

fft_xvals = [day / fft_mean.shape[0] for day in range(fft_mean.shape[0])]

In this visualization, we only care about the range of frequencies in a weekly range, so we will remove the second half of the transformation, which we can do by running:

npts = len(fft_xvals) // 2 + 1
fft_mean = fft_mean[:npts]
fft_xvals = fft_xvals[:npts]

Finally, we can plot our transformation:

fig, ax = plt.subplots(figsize=(10, 7))
ax.plot(fft_xvals[1:],fft_mean[1:])
plt.axvline(x=1./7,color='red',alpha=0.3)
plt.axvline(x=2./7,color='red',alpha=0.3)
plt.axvline(x=3./7,color='red',alpha=0.3)

Upon plotting the transformation, we will have successfully produced a chart similar to the one you see here:

Fast Fourier transformations

Fourier transformation of Wikipedia access statistics. Spikes marked by vertical lines

As you can see in the chart we produced, there are spikes at roughly 1/7 (0.14), 2/7 (0.28), and 3/7 (0.42). As a week has seven days, that is a frequency of one time per week, two times per week, and three times per week. In other words, page statistics repeat themselves (approximately) every week, so that, for example, access on one Saturday correlates with access on the previous Saturday.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset