Plotting data from a SQLite3 database

In order to plot data, we need data to plot! For this, we will be using data representing the stock market of the United States, found on the Yahoo! Finance website. Yahoo! Finance has a wealth of information on every publicly traded company currently being traded in the United States. Conveniently, Yahoo! allows users to download the entire closing share value history of these companies into a csv file format, free of charge.

Let's begin by downloading the entire history of the Apple company from Yahoo! Finance (http://finance.yahoo.com). You can find Apple's content by performing a quote look up from the Yahoo! Finance homepage for the symbol AAPL (that's two As, not two Ps). From this page, find the link for Historical Prices. On the Historical Prices page, identify the link that says Download to Spreadsheet. The full link to Apple's historical prices can be found here:

http://real-chart.finance.yahoo.com/table.csv?s=AAPL&d=10&e=10&f=2014&g=d&a=11&b=12&c=1980&ignore=.csv

We should take a moment to explore our dataset. Here are the column headers in the csv file:

  • Date: A string representing the date of a particular day in Apple's history
  • Open: The opening value of 1 share
  • High: The high trade value over the course of this day
  • Low: The low trade value over the course of this day
  • Close: The final price of the share at the end of this trading day
  • Volume: The total number of shares traded on this day
  • Adj Close: A variation on the closing price that adjusts for dividend payouts and company splits

One other feature of this dataset is that each of the rows are written into the table in a chronologically reverse order. The most recent date in the table is first. The oldest is last.

Yahoo! Finance provides this table under the unhelpful name table.csv. I renamed the csv file provided by Yahoo! Finance to aapl.csv. Within the Haskell command line, convert this file to a SQLite3 database using the convertCSVFileToSQL function written in Chapter 2, Getting Our Feet Wet.

> :l LearningDataAnalysis02.hs
> convertCSVFileToSQL "aapl.csv" "aapl.sql" "aapl" ["date STRING", "open REAL", "high REAL", "low REAL", "close REAL", "volume REAL", "adjclose REAL"]
Successful

For our purposes, the most important column in this table is the Adj Close column, which gives us the clearest depiction of the data. We introduce a new function called pullStockClosingPrices, which pulls data from a database. This function will require a database file and the name of a database containing the financial data of a stock based on the format provided by Yahoo! Finance. This function will grab each record's rowid (which SQLite3 gives all tables automatically) and the adjclose field.

Note that we apply the reverse function to the rowid column; this is because we want the data values to be represented in chronological order:

pullStockClosingPrices :: String -> String -> IO [(Double, Double)]
pullStockClosingPrices databaseFile database = do
  sqlResult <- queryDatabase
    databaseFile ("SELECT rowid, adjclose FROM " ++ database)
    return $ zip
      (reverse $ readDoubleColumn sqlResult 0)
      (readDoubleColumn sqlResult 1)

Great. We have put together enough pieces of the puzzle to create our first visualization. We will be plotting the entire history of Apple's share price. Each rowid represents an x-coordinate in a scatterplot, and each adjclose represents the y-coordinate. We are going to pass a list of doubles defined as [(Double, Double)] to our plot function, where each tuple represents an (x, y) pair.

> :l LearningDataAnalysis04
> aapl <- pullStockClosingPrices "aapl.sql" "aapl"
> plot (PNG "aapl.png") $ Data2D [Title "AAPL"] [] $ aapl
True

Look in your current working directory and you should find a file titled aapl.png. Open it up.

The following screenshot is a typical default chart created by EasyPlot. We see the entire history of Apple's stock price. For most of that history, Apple's adjusted share price was less than $10 per share. At about the 6,000th trading day, we see the quick ascension of the share price to over $100 per share.

Plotting data from a SQLite3 database

Exploring the EasyPlot library

EasyPlot is a wrapper for the gnuplot software and provides a simple interface to create basic plots. We have limited access to the full power of gnuplot using this library, but that's fine. We can create outstanding plots with just this library.

Note

We do not use the more advanced functionality of EasyPlot that does allow us to create plots based on direct gnuplot commands.

Here is the type signature to the plot function. There's only two arguments to the function:

plot :: TerminalType -> a -> IO Bool

A complete documentation of this function can be found on the Hackage page for EasyPlot. We believe that it's beneficial for the reader to cover this library in more detail by navigating here:

http://hackage.haskell.org/package/easyplot-1.0/docs/Graphics-EasyPlot.html

The TerminalType argument is how we specify the output of the graph. The output allows us to tell EasyPlot that we wish to view this plot as a file or as an interactive plot. In our example, we specified that we wanted a PNG image followed by the image name. We could have easily specified JPEG or GIF (for images), LaTeX (for academic publications), PDF (for Adobe PDF documents), or PS (for encapsulated postscript documents).

Each of these arguments requires an output file name similar to the PNG call used in our example. There are also specific interactive plots that can be used for your respective operating system; Aqua is for Mac systems, Windows is for Windows systems, and X11 is for Linux systems with a graphical user interface installed. Each of these arguments can be used by themselves.

The second argument, a, is how we specify one or more datasets for plotting. The argument a can be a list of datasets, a single dataset, or even a gnuplot command, but we will commonly supply each dataset wrapped in either the Graph2D type or the Graph3D type.

The Graph2D type can be constructed using three different types of two dimensional plots: Data2D (for point cloud datasets that we use in this chapter), Function2D (for plotting based on a mathematical Haskell function), and Gnuplot2D (for plotting based on a mathematical function in a string that will be passed directly to gnuplot). The Graph3D type has each of these respective constructors, with the obvious difference that they require three-dimensional data and functions and are named Data3D, Function3D, and Gnuplot3D.

The default style of plotting two-dimensional point cloud data is to use the cross marker for each data point. With 9,000 points, the crosses overlap and our dataset is easily confused. We should express our data using a line plot rather than with crosses:

> plot (PNG "aapl_line.png") $ Data2D [Title "AAPL", Style Lines] [] $ aapl
True

Identify the aapl_line.png plot in your current working directory. This is a better representation of the data in my opinion:

Exploring the EasyPlot library

Plotting a subset of a dataset

This dataset is interesting, but we aren't trying to answer any questions with our data. That's fine. Data visualizations can be used to answer questions or it can be used as a tool for exploring data and discovering questions that need to be answered. We can see in the previous chart that the most interesting part of the data is the most recent jaggedness on right side of the graph representing the most recent trading days.

Investors like to look at data through various windows of time, say, the past five days, one month, six months, or one year of trading. Let's look at this data by using Haskell to take the most recent year of trading data. According to Wikipedia, there are typically 252 trading days in a calendar year on the New York Stock Exchange (NYSE). With this information, we use the take function to plot the data of the past year in the following manner:

> plot (PNG "aapl_oneyear.png") $ Data2D [Title "AAPL", Style Lines] [] $ take 252 aapl
True

This gives the following chart as the output:

Plotting a subset of a dataset

By reading this plot we can see that, about a year ago, AAPL was trading for just under $75 per share and has grown in value to over $105 per share. We can also see the fluctuation in the share price, but from this perspective it's too soon to apply any sort of meaning to these fluctuations.

Plotting data passed through a function

Plotting the share price over time is a quick way of glancing at data and seeing if this data has increased or decreased over time. A company's share price increasing over time is usually associated with the attributes of a good company. If we had invested in Apple a year ago, our share value would have increased to just over $30 per share. Is $30 per share investment over the course of a year a good investment or a bad one? The answer to this depends on the original share value. We could answer this question with the previous chart, but it could be made easier. We need to examine this share value through the perspective of the share's percentage of change over time.

To compute the percentage of change, we need two values: the original value and a newer value. If the newer value is greater than the original, we obviously have a positive percent change. If the newer value is lower, we have a negative percent change.

In the following formula, x1 represents the first data value in our dataset. In the context of our analysis of Apple's share value, it's the share price one year ago from the date which we pulled our data. The value of xi represents any data value in our dataset (including the first value). When every xi equals x1 in our formula, then it evaluates to 0 percent change. Every other value should produce either a increase or a decrease percentage:

Plotting data passed through a function

Here, xi is associated with the variable value and x1 is associated with the first variable value. We should represent this formula as a Haskell command:

percentChange :: Double -> Double -> Double
percentChange value first = 100.0 * (value – first) / first

The percentChange function only computes a single percent change at a given point in our data. We need a function that allows us to map this function to all the data points. The applyPercentChangeToData function does the trick for us in the following manner:

applyPercentChangeToData :: [(Double, Double)] -> [(Double, Double)]
applyPercentChangeToData dataset = zip indices scaledData
  where
    (_, first) = last dataset
    indices = reverse [1.0..(genericLength dataset)]
    scaledData = map
      ((_, value) -> percentChange value first)
      dataset

This is the function that we will use for plotting the percent change of a share price. This function will also renumber the starting value of each data point to begin with 1. This will be essential for aligning the datasets in the next section. We should call our newly created percentChange and applyPercentChangeToData functions. We are also using the combination of snd and last functions to find the first data value in this dataset (remember, this dataset is listed in the chronologically reverse order).

> let aapl252 = take 252 aapl
> let aapl252pc = applyPercentChangeToData aapl252
> plot (PNG "aapl_oneyear_pc.png") $ Data2D [Title "AAPL - One Year, % Change", Style Lines] [] $ aapl252pc
True

Open up aapl_pastyear_pc.png to find the following new image. You might notice that it's the same line as our previous image. That's because the percent change formula (when applied to a full dataset) is a simple scaling of the data. The structure of the data is perfectly retained in the scaling. We've kept all of the peaks and valleys.

Plotting data passed through a function

The only aspects of the graph that have changed are the x-axis and the y-axis, which now report the percent change of the data rather than the share price. The first data point in this chart is a 0 percent change, as we should expect. From this chart, we see how much Apple's share price has grown with respect to our original investment. Most of the chart is above the 0 percent change. At the right extremity of the chart, we see what a one-year investment looks like; we would have seen an almost 50 percent return on our original investment.

We need to express a few ideas on the topic of the company share value; this value represents a bit of fiction. Some of the value comes from the tangible assets of the company and some of the value comes from the public perception of that company (which is quantified in the daily buying and selling of shares due to countless subjective events). The share value multiplied by the total number of outstanding shares is called the market capitalization, or the total value of a company. The market capitalization is still a bit of fiction, since this value still incorporates the public perception of the company into the overall value.

There are many strategies in stock market investing and your author does not advocate one strategy over another. You should investigate every strategy that comes your way with a healthy dose of skepticism. But we do wish to highlight one strategy just for the ease at which it helps us to convey the ideas behind data visualization: simple growth investing. The growth investment strategy encourages the investor to seek out and buy the shares of companies that are growing faster than the average rate at which the overall market grows and to sell when those companies no longer beat the average growth rate.

For example, if you are trying to decide whether to invest in Company A or Company B, you should compare the growth rates of these two companies. If Company A is growing faster than Company B, you should invest in Company A. Otherwise, invest in Company B. Using this strategy, the investor should consistently beat the market average at the time expense of reevaluating the growth rate of holdings at regular intervals.

Before you race to the bank and take out a loan to throw money at the stock market, you should be aware of the weaknesses to the growth investing strategy. Growth investing ignores the market capitalization of a company (small companies tend to have high growth at a larger risk), the dividend payouts by a company (a value returned by the company that is not reflected in the share price of a stock), the multitude of the popular metrics used by investors to evaluate companies, the public perception of the company, the public perception of the industry, or the public perception of the entire market.

Growth investing only looks at the day-to-day changes in the share price of a company. Growth investing helps investors identify companies that outperform the market but does not guarantee that the current performance will continue in the future for any length of time. You have been warned. If you invest, invest with the understanding that every investment is a risk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset