Plotting a moving average

Look back at each of our charts. We can see that stocks tend to move up and down with jagged shifts. These peaks and valleys represent the day-to-day trade noise. A moving average is a way of looking at the value of a stock without having to look at the "ugly" jaggedness of the true price. If we computed the average of the entire past year's data, it would be a single value and it wouldn't be very useful. If we sectioned off the first, say, 20 values, into a window, and averaged those values, we would be representing the first 20 values as its own data value. We could then shift our window forward one data value and average those 20 values. It wouldn't be much different from the first, but the overall trend would be maintained.

In order to create a moving average, we will be using the average function that was created at the beginning of Chapter 2, Getting Our Feet Wet. You should find that function in the LearningDataAnalyis02.hs file. This new function, movingAverage, will reside in our LearningDataAnalyis04.hs file.

The movingAverage function will take a list of Double values and a window size (represented by an integer). This method uses recursion to march through the list. This function will compute the average of the first window values of the list. If there are more than just window elements in values, the function will pass the tail of values to itself for more processing.

movingAverage :: [Double] -> Integer -> [Double]
movingAverage values window =
 | window >= genericLength values = [ average values ]
 | otherwise = average
               (genericTake window values):(movingAverage (tail values) window)

Good. 2D plots require that we have an x-value and a y-value for plotting. We need to assign an x-value to each of these y-values. The following function will create an x pair for every y moving average with a starting index that is equal to the window size (call this function with a window size of 20 and the first index will be at position 20):

applyMovingAverageToData ::
  [(Double, Double)] -> Integer ->  [(Double, Double)]
applyMovingAverageToData dataset window =
  zip [fromIntegral window..] $ movingAverage
    (map snd (reverse dataset))
    window

The word size here is a bit misleading, for the list doesn't use infinite memory; consider just an infinite list instead.

This command is relatively straightforward except for our use of Haskell's lazy evaluation. Note that we have an infinite list in the command at [fromIntegral window..]. We are creating a range of values starting with the value window, but we never told Haskell where to end. The list is infinitely long, but ZIP knows that it needs to stop when the movingAverage function stops.

This is Haskell's beautiful (and powerful) trait of being lazy; Haskell will only do the work that is necessary to get the job done, and the job is done when the movingAverage function stops.

We are going to apply this function to our Apple percent change data from the past year using a 20-day moving average:

> aapl <- pullStockClosingPrices "aapl.sql" "aapl"
> let aapl252 = take 252 aapl
> let aapl252pc = applyPercentChangeToData aapl252
> let aapl252ma20 = applyMovingAverageToData aapl252pc 20

We will be plotting the data with the color red and the 20-day moving average with the color black:

> plot (PNG "aapl_20dayma.png") [Data2D [Title "AAPL - One Year, % Change", Style Lines, Color Red] [] aapl252pc, Data2D [Title "AAPL 20-Day MA", Style Lines, Color Black] [] aapl252ma20 ]
True

This gives the following chart:

Plotting a moving average

Inspect the new image, aapl_20dayma.png. Note how the black line is now smooth compared to the original red line. The jaggedness is gone. The moving average line matches the original form of the share price, but without the noisiness to the data. The moving average function has been described as just that – a noise removal function of data. There are many algorithms in the field of pattern recognition that require the original dataset to have a noise removal function passed over the data as an initial step, and this is one such approach.

Does this chart help us to answer any questions? We can identify the brief moments when the stock is performing above its average stock value and when it isn't. Many investors will use two moving averages in their work (typically, a 50-day window and a 200-day window) and look for where the moving average lines cross. Investors will then pinpoint these crossings as an indicator of when they should buy or sell a stock.

This study of images and looking for patterns in the charts is a form of investing known as technical analysis. It is the opinion of this author that technical analysis is an attempt to predict the future of a stock price based on the history of that stock price when the reality is that they are mostly unrelated. Technical analysis should not be used as a substitute for traditional research.

Note

Disclaimer: Your author doesn't own shares in any of the three companies mentioned in this chapter. Using the strategies demonstrated in this chapter, the reader can discover companies with a one-year growth percentage greater than 50 percent. While this practice of discovering high growth companies is exciting, remember to be skeptical of your findings. Don't fall into the trap of interpreting the past growth of a company as being indicative of its future growth. You can discover potential winners just as easily as losers with this method. Every company's fate hinges on tomorrow's good press or bad press. Predicting the past is easy. Predicting the future is not.

Plotting a scatterplot

We are going to close this chapter with a final visualization related to the earthquake dataset that we saw in Chapter 2, Getting Our Feet Wet. For this, I returned to the USGS page and downloaded their listing of earthquakes in the past month, totaling 7,818:

> convertCSVFileToSQL "all_month.csv" "earthquakes.sql" "oneMonth" ["time TEXT", "latitude REAL", "longitude REAL", "depth REAL", "mag REAL", "magType TEXT", "nst INTEGER", "gap REAL", "dmin REAL", "rms REAL", "net REAL", "id TEXT", "updated TEXT", "place TEXT", "type TEXT"]

Using this dataset of earthquakes, we would like to plot the location of every earthquake in the world that happened in the path month. In this dataset, we have longitude coordinates (similar to an x-axis) and latitude coordinates (similar to a y-axis).

Let's craft a function to retrieve the latitude and longitude coordinates, similar to our pullStockClosingPrices function declared at the beginning of this chapter:

pullLatitudeLongitude :: String -> String -> IO [(Double, Double)]
pullLatitudeLongitude databaseFile database = do
  sqlResult <- queryDatabase
    databaseFile
    ("SELECT latitude, longitude FROM " ++ database)
    return $ zip (readDoubleColumn sqlResult 1)
                 (readDoubleColumn sqlResult 0)

Good. Now we should pull the coordinates of each earthquake. We will be plotting using the Dots style in EasyPlot:

> earthquakeCoordinates <- pullLatitudeLongitude "earthquakes.sql" "oneMonth"
> coords <- pullLatitudeLongitude "earthquakes.sql" "oneMonth"
> plot (PNG "earthquakes.png") [Data2D [Title "Earthquakes", Color Red, Style Dots] [] coords ]

This gives the following chart:

Plotting a scatterplot

From this plot, we can make out the western coastline of North America and South America, as well as the Eastern coastline of Asia, Indonesia, the South Pacific Islands, and the Aleutian Islands. This is to be expected; these parts of the world get more earthquakes than other parts of the globe. We can also see giant areas of white; other parts of the globe don't see earthquakes that often.

Where are the strongest earthquakes? Better stated, how might we plot the earthquakes with a magnitude higher than 2 in the color blue and those earthquakes with a magnitude less than or equal to to 2 in red? That will be an exercise for the reader.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset