Before applying our multilayer perceptron to understand fluctuations in the currency market exchanges, let's get acquainted with some of the key learning parameters introduced in the first section.
Let's take a look at the convergence of the training of the multiple layer perceptron. The monitor trait (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts) collects and displays some execution parameters. We select to extract the profile for the convergence of the multiple layer perceptron using the difference of the backpropagation errors between two consecutive episodes (or epochs).
The test profiles the convergence of the MLP using a learning rate of η = 0.03 and a momentum factor of α = 0.3 for a multilayer perceptron with two input values: one hidden layer with three nodes and one output value. The test relies on synthetically generated random values:
The purpose of the first exercise is to evaluate the impact of the learning rate η on the convergence of the training epoch, as measured by the cumulative error of all output variables. The observations xt
(with respect to the labeled output yt
) are synthetically generated using several noisy patterns such as f1
(line 57
) and f2
functions (line 58
), as follows:
def f1(x: Double): DblArray = Array[Double]( //57 0.1+ 0.5*Random.nextDouble, 0.6*Random.nextDouble) def f2(x: Double): DblArray = Array[Double]( //58 0.6 + 0.4*Random.nextDouble, 1.0 - 0.5*Random.nextDouble) val HALF_TEST_SIZE = (TEST_SIZE>>1) val xt = Vector.tabulate(TEST_SIZE)(n => //59 if( n <HALF_TEST_SIZE) f1(n) else f2(n -HALF_TEST_SIZE)) val yt = Vector.tabulate(TEST_SIZE)(n => if( n < HALF_TEST_SIZE) Array[Double](0.0) else Array[Double](1.0) ) //60
The input values, xt
, are synthetically generated by the f1
function for half of the dataset and by the f2
function for the other half (line 59
). The data generator for the expected values yt
assigns the label 0.0 for the input values generated with the f1
function and 1.0 for the input values created with f2
(line 60
).
The test is run with a sample of size TEST_SIZE
data points over a maximum of NUM_EPOCHS
epochs, a single hidden layer of HIDDENS.head
neurons with no softmax
transformation, and the following MLP parameters:
val ALPHA = 0.3 val ETA = 0.03 val HIDDEN = Array[Int](3) val NUM_EPOCHS = 200 val TEST_SIZE = 12000 val EPS = 1e-7 def testEta(eta: Double, xt: XVSeries[Double], yt: XVSeries[Double]): Option[(ArrayBuffer[Double], String)] = { implicit val mode = new MLPBinClassifier //61 val config = MLPConfig(ALPHA, eta, NUM_EPOCHS, EPS) MLP[Double](config, HIDDEN, xt, yt) .counters("err").map( (_, s"eta=$eta")) //62 }
The testEta
method generates the profile or errors given different values of eta
.
The operating mode
has to be implicitly defined prior to the instantiation of the MLP
class (line 61
). It is set as a binomial classifier of the MLPBinClassifier
type. The execution profile data is collected by the counters
method of the Monitor
trait (line 62
) (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts).
The driver code for evaluating the impact of the learning rate on the convergence of the multilayer perceptron is quite simple:
val etaValues = List[Double](0.01, 0.02, 0.03, 0.1)
val data = etaValues.flatMap( testEta(_, xt, yt))
.map{ case(x, s) => (x.toVector, s) }
val legend = new Legend("Err",
"MLP [2-3-1] training - learning rate", "Epochs", "Error")
LinePlot.display(data, legend, new LightPlotTheme)
The profile is created with the JFreeChart library and displayed in the following chart:
The chart illustrates that the MLP model training converges a lot faster with a larger value of learning rate. You need to keep in mind, however, that a very steep learning rate may lock the training process into a local minimum for the cumulative error, generating weights with lesser accuracy. The same configuration parameters are used to evaluate the impact of the momentum factor on the convergence of the gradient descent algorithm.
Let's quantify the impact of the momentum factor α on the convergence of the training process toward an optimal model (synapse weights). The testing code is very similar to the evaluation of the impact of the learning rate.
The cumulative error for the entire time series is plotted in the following graph:
The preceding graph shows that the rate of the mean square error decreases as the momentum factor increases. In other words, the momentum factor has a positive although limited impact on the convergence of the gradient descent.
Let's consider a multilayer perceptron with two hidden layers (7 and 3 neurons). The execution profile for the training shows that the cumulative error of the output converges abruptly after several epochs for which the descent gradient failed to find a direction:
Let's apply our newfound knowledge regarding neural networks and the classification of variables that impact the exchange rate of a certain currency.
Neural networks have been used in financial applications from risk management in mortgage applications and hedging strategies for commodities pricing, to predictive modeling of the financial markets [9:14].
The objective of the test case is to understand the correlation factors between the exchange rate of some currencies, the spot price of gold, and the S&P 500 index. For this exercise, we will use the following exchange-traded funds (ETFs) as proxies for the exchange rate of currencies:
Practically, the problem to solve is to extract one or more regressive models that link one ETFs y with a basket of other ETFs {xi} y=f(xi). For example, is there a relation between the exchange rate of the Japanese yen (FXY) and a combination of the spot price for gold (GLD), exchange rate of the Euro in US dollar (FXE), the exchange rate of the Australian dollar in US dollar (FXA), and so on? If so, the regression f will be defined as FXY = f (GLD, FXE, FXA).
The following two charts visualize the fluctuation between currencies over a period of two and a half years. The first chart displays an initial group of potentially correlated ETFs:
The second chart displays another group of currency-related ETFs that shares a similar price action behavior. Neural networks do not provide any analytical representation of their internal reasoning; therefore, a visual correlation can be extremely useful to novice engineers to validate their models:
A very simple approach for finding any correlation between the movement of the currency exchange rates and the gold spot price is to select one ticker symbol as the target and a subset of other currency-based ETFs as features.
Let's consider the following problem: finding the correlation between the price of FXE and a range of currencies FXB, CYB, FXA, and FXC, as illustrated in the following diagram:
The first step is to define the configuration parameter for the MLP classifier, as follows:
val path = "resources/data/chap9/" val ALPHA = 0.8; val ETA = 0.01 val NUM_EPOCHS = 250 val EPS = 1e-3 val THRESHOLD = 0.12 val hiddens = Array[Int](7, 7) //59
Besides the learning parameters, the network is initialized with multiple topology configurations (line 59
).
Next, let's create the search space of the prices of all the ETFs used in the analysis:
val symbols = Array[String]( "FXE","FXA","SPY","GLD","FXB","FXF","FXC","FXY","CYB" ) val STUDIES = List[Array[String]]( //60 Array[String]("FXY","FXC","GLD","FXA"), Array[String]("FXE","FXF","FXB","CYB"), Array[String]("FXE","FXC","GLD","FXA","FXY","FXB"), Array[String]("FXC","FXY","FXA"), Array[String]("CYB","GLD","FXY"), symbols )
The purpose of the test is to evaluate and compare seven different portfolios or studies (line 60
). The closing prices of all the ETFs over a period of 3 years are extracted from the Google Financial tables, using the GoogleFinancials
extractor for a basket of ETFs (line 61
):
val prices = symbols.map(s => DataSource(s"$path$s.csv")) .flatMap(_.get(close).toOption) //61
The next step consists of implementing the mechanism to extract the target and the features from a basket of ETFs or studies introduced in the previous paragraph. Let's consider the following study as the list of ETF ticker symbols:
val study = Array[String]("FXE","FXF","FXB","CYB")
The first element of the study, FXE
, is the labeled output; the remaining three elements are observed features. For this study, the network architecture has three input variables (FXF
, FXB
, and CYB
) and one output variable, FXE
:
val obs = symbols.flatMap(index.get(_)) .map( prices(_).toArray ) //62 val xv = obs.drop(1).transpose //63 val expected = Array[DblArray](obs.head).transpose //64
The set of observations, obs
, is built using an index (line 62
). By convention, the first observation is selected as the label data and the remaining studies as the features for training. As the observations are loaded as an array of time series, the time features of the series is computed using transpose
(line 63
). The single target
output variable has to be converted into a matrix before transposition (line 64
).
Ultimately, the model is built through instantiation of the MLP
class:
implicit val mode = new MLPBinClassifier //65 val classifier = MLP[Double](config, hiddenLayers, xv, expected) classifier.fit(THRESHOLD)
The objective or operating mode
is implicitly defined as an MLP binary classifier, MLPBinClassifier
(line 65
). The MLP.fit
method is defined in the Training and classification section.
The test consists of evaluating six different models to determine which ones provide the most reliable correlation. It is critical to ensure that the result is somewhat independent of the architecture of the neural network. Different architectures are evaluated as part of the test.
The following charts compare the models for two architectures:
This first chart visualizes the fitness of the six regression models with an architecture consisting of a variable number of inputs (2, 7): one output variable and two hidden layers of four nodes each. The features (ETF symbols) are listed on the left-hand side of the arrow => along the y axis. The symbol on the right-hand side of the arrow is the expected output value:
The following chart displays the fitness of the six regression models for an architecture with three hidden layers of eight, five, and six nodes, respectively:
The two network architectures shared a lot of similarity; in both cases, the fittest regression models are as follows:
On the other hand, the prediction of the Canadian dollar to US dollar's exchange rate (FXC) using the exchange rate for the Japanese yen (FXY) and the Australian dollar (FXA) is poor with both the configurations.
The empirical evaluation
These empirical tests use a simple accuracy metric. A formal comparison of the regression models will systematically analyze every combination of input and output variables. The evaluation will also compute the precision, the recall, and the F1 score for each of those models (refer to the Key quality metrics section under Validation in the Assessing a model section in Chapter 2, Hello World!).
The next test consists of evaluating the impact of the hidden layer(s) of configuration on the accuracy of three models: FXF, FXB, CYB => FXE, FCX, GLD, FXA =>FXY, and FXC, GLD, FXA, FXY, FXB => FXE. For this test, the accuracy is computed by selecting a subset of the training data as a test sample, for the sake of convenience. The objective of the test is to compare different network architectures using some metrics, and not to estimate the absolute accuracy of each model.
The four network configurations are as follows:
Let's take a look at the following graph:
The complex neural network architecture with two or more hidden layers generates weights with similar accuracy. The four-node single hidden layer architecture generates the highest accuracy. The computation of the accuracy using a formal cross-validation technique would generate a lower accuracy number.
Finally, we take a look at the impact of the complexity of the network on the duration of the training, as shown in the following graph:
Not surprisingly, the time complexity increases significantly with the number of hidden layers and number of nodes.