Feature scaling

In our last section, we explored the essential features of EasyPlot towards getting our images publication-ready. In this section, we're going to explore how to plot multiple companies onto a single plot to accurately reflect their growth. So, the question we would like to answer in this section is: over the past year, which of these three companies—Apple, Google, or Microsoft—has had the highest percentage of growth in their stock value? In this section, we're going to take a look at trimming our dataset to 252 days. Why 252? Well, there are 365 days in the year. If you cut out the weekends and the United States federal holidays in which the New York Stock Exchange doesn't operate, you're left with 252 days. So, 252 is our magic number to represent one year of trading data. We're going to introduce feature scaling, and we're going to plot our three companies that have been feature scaled. So, here's the formula for feature scaling:

It is simply a value, minus the first value in our dataset, divided by the difference between the largest and smallest values. It's a fairly simple formula, and what this formula does is take any dataset and rescale it so that it's on the range of -1 to 1. What's also nice about feature scaling is that this formula won't change the shape of your data at all. The shape remains the same. Now, let's go over to our notebook and fetch out the Microsoft and Google datasets from our database. As mentioned at the start of this chapter, I would like you to build a database of three different companies, and I hope you have that datasets ready. If not, I hope you're willing to grab the dataset that accompanies this chapter, as shown in the following screenshot, so that you can follow along:

Now here, we have the lines ready to go in order to pool our Google and Microsoft datasets. Google's stock symbol is known as GOOGL, and so we have named our table googl, and Microsoft's stock symbol is MSFT, therefore we have named our table msft. What we need to do next is to trim our datasets to the most recent 252 trading days, as demonstrated in the following screenshot:

We have called all of these datasets by their symbol followed by the number 252. So, we have taken 252 from aapl252, msft252, and googl252. Next, we would like to plot these datasets so that you can get an idea of what they look like in their original shape on scale, as shown in the following example:

The first dataset on the following example will be Apple, the second will be Microsoft, and the third will be Google:

It'll just be the most recent 252 trading days. Each of these three companies trades at a particular dollar value, but each of them trades at a dollar value that doesn't relate to any of the other companies; and this is true for any stock that's publicly traded on the stock market. If you know the price of a company, that really doesn't tell you anything about if that company is doing well or not. We would like to feature scale these three companies so that they all exist on the same plot and they all exist at the same scale. So, let's introduce our feature scale function. So, our feature scale function, which is shown in the following example, is going to be called rescale:

It is an implementation of the formula that we saw at the start of this section, and it takes one dataset and it returns a dataset of equal length. As you will see, the shape of these datasets will remain the same but the scale of the datasets will be on the range of -1 to 1, as shown in the following example:

So, we have rescaled each of our three datasets using rescale in our plot line, and our plot is shown in the following screenshot:

Now, notice that because every data point is first subtracted by the very first element, the very first element for all three lists should be 0, and you can see that that's true in the previous plot. So, this is the feature scale plot for all three of our datasets. We can tell that the green line corresponds to the company that has grown the most, the blue line has corresponded to the company that has grown the second most, and the red line corresponds to the company that has grown the least. And, in order, it looks like the green line is Microsoft, the blue line is Google, and the red line is Apple. So, over the past year, Microsoft has done the best and Apple has done the least well. We can also tell from this particular plot that Microsoft and Google have increased in value over the past year, whereas Apple appears to have decreased in value over the past year.

Feature scaling allows us to take any companies and plot them on the same chart, and allows us to accurately compare the values of those companies without losing the original shape of the lines produced by their values. In our next section, we will cover scatter plots.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset