Evaluation

Before applying our multilayer perceptron to understand fluctuations in the currency market exchanges, let's get acquainted with some of the key learning parameters introduced in the first section.

The execution profile

Let's take a look at the convergence of the training of the multiple layer perceptron. The monitor trait (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts) collects and displays some execution parameters. We select to extract the profile for the convergence of the multiple layer perceptron using the difference of the backpropagation errors between two consecutive episodes (or epochs).

The test profiles the convergence of the MLP using a learning rate of η = 0.03 and a momentum factor of α = 0.3 for a multilayer perceptron with two input values: one hidden layer with three nodes and one output value. The test relies on synthetically generated random values:

The execution profile

The execution profile for the cumulative error for MLP

Impact of the learning rate

The purpose of the first exercise is to evaluate the impact of the learning rate η on the convergence of the training epoch, as measured by the cumulative error of all output variables. The observations xt (with respect to the labeled output yt) are synthetically generated using several noisy patterns such as f1 (line 57) and f2 functions (line 58), as follows:

def f1(x: Double): DblArray = Array[Double](  //57
  0.1+ 0.5*Random.nextDouble, 0.6*Random.nextDouble) 
def f2(x: Double): DblArray =  Array[Double](  //58
  0.6 + 0.4*Random.nextDouble, 1.0 - 0.5*Random.nextDouble)

val HALF_TEST_SIZE = (TEST_SIZE>>1)
val xt = Vector.tabulate(TEST_SIZE)(n =>   //59
  if( n <HALF_TEST_SIZE) f1(n) else f2(n -HALF_TEST_SIZE))
val yt = Vector.tabulate(TEST_SIZE)(n => 
  if( n < HALF_TEST_SIZE) Array[Double](0.0) 
  else Array[Double](1.0) )  //60

The input values, xt, are synthetically generated by the f1 function for half of the dataset and by the f2 function for the other half (line 59). The data generator for the expected values yt assigns the label 0.0 for the input values generated with the f1 function and 1.0 for the input values created with f2 (line 60).

The test is run with a sample of size TEST_SIZE data points over a maximum of NUM_EPOCHS epochs, a single hidden layer of HIDDENS.head neurons with no softmax transformation, and the following MLP parameters:

val ALPHA = 0.3
val ETA = 0.03
val HIDDEN = Array[Int](3)
val NUM_EPOCHS = 200
val TEST_SIZE = 12000
val EPS = 1e-7

def testEta(eta: Double, 
    xt: XVSeries[Double], 
    yt: XVSeries[Double]): 
    Option[(ArrayBuffer[Double], String)] = {
  
  implicit val mode = new MLPBinClassifier //61
  val config = MLPConfig(ALPHA, eta, NUM_EPOCHS, EPS)
  MLP[Double](config, HIDDEN, xt, yt)  
    .counters("err").map( (_, s"eta=$eta")) //62
}

The testEta method generates the profile or errors given different values of eta.

The operating mode has to be implicitly defined prior to the instantiation of the MLP class (line 61). It is set as a binomial classifier of the MLPBinClassifier type. The execution profile data is collected by the counters method of the Monitor trait (line 62) (refer to the Monitor section under Utility classes in the Appendix A, Basic Concepts).

The driver code for evaluating the impact of the learning rate on the convergence of the multilayer perceptron is quite simple:

val etaValues = List[Double](0.01, 0.02, 0.03, 0.1)
val data = etaValues.flatMap( testEta(_, xt, yt))
    .map{ case(x, s) => (x.toVector, s) }

val legend = new Legend("Err", 
   "MLP [2-3-1] training - learning rate", "Epochs", "Error")
LinePlot.display(data, legend, new LightPlotTheme)

The profile is created with the JFreeChart library and displayed in the following chart:

Impact of the learning rate

The impact of the learning rate on the MLP training

The chart illustrates that the MLP model training converges a lot faster with a larger value of learning rate. You need to keep in mind, however, that a very steep learning rate may lock the training process into a local minimum for the cumulative error, generating weights with lesser accuracy. The same configuration parameters are used to evaluate the impact of the momentum factor on the convergence of the gradient descent algorithm.

The impact of the momentum factor

Let's quantify the impact of the momentum factor α on the convergence of the training process toward an optimal model (synapse weights). The testing code is very similar to the evaluation of the impact of the learning rate.

The cumulative error for the entire time series is plotted in the following graph:

The impact of the momentum factor

The impact of the momentum factor on the MLP training

The preceding graph shows that the rate of the mean square error decreases as the momentum factor increases. In other words, the momentum factor has a positive although limited impact on the convergence of the gradient descent.

The impact of the number of hidden layers

Let's consider a multilayer perceptron with two hidden layers (7 and 3 neurons). The execution profile for the training shows that the cumulative error of the output converges abruptly after several epochs for which the descent gradient failed to find a direction:

The impact of the number of hidden layers

The execution profile of training of an MLP with two hidden layers

Let's apply our newfound knowledge regarding neural networks and the classification of variables that impact the exchange rate of a certain currency.

Test case

Neural networks have been used in financial applications from risk management in mortgage applications and hedging strategies for commodities pricing, to predictive modeling of the financial markets [9:14].

The objective of the test case is to understand the correlation factors between the exchange rate of some currencies, the spot price of gold, and the S&P 500 index. For this exercise, we will use the following exchange-traded funds (ETFs) as proxies for the exchange rate of currencies:

  • FXA: This is the rate of an Australian dollar in US dollar
  • FXB: This is the rate of a British pound in US dollar
  • FXE: This is the rate of an Euro in US dollar
  • FXC: This is the rate of a Canadian dollar in US dollar
  • FXF: This is the rate of a Swiss franc in US dollar
  • FXY: This is the rate of a Japanese yen in US dollar
  • CYB: This is the rate of a Chinese yuan in US dollar
  • SPY: This is the S&P 500 index
  • GLD: This is the price of gold in US dollar

Practically, the problem to solve is to extract one or more regressive models that link one ETFs y with a basket of other ETFs {xi} y=f(xi). For example, is there a relation between the exchange rate of the Japanese yen (FXY) and a combination of the spot price for gold (GLD), exchange rate of the Euro in US dollar (FXE), the exchange rate of the Australian dollar in US dollar (FXA), and so on? If so, the regression f will be defined as FXY = f (GLD, FXE, FXA).

The following two charts visualize the fluctuation between currencies over a period of two and a half years. The first chart displays an initial group of potentially correlated ETFs:

Test case

An example of correlated currency-based ETFs

The second chart displays another group of currency-related ETFs that shares a similar price action behavior. Neural networks do not provide any analytical representation of their internal reasoning; therefore, a visual correlation can be extremely useful to novice engineers to validate their models:

Test case

An example of correlated currency-based ETFs

A very simple approach for finding any correlation between the movement of the currency exchange rates and the gold spot price is to select one ticker symbol as the target and a subset of other currency-based ETFs as features.

Let's consider the following problem: finding the correlation between the price of FXE and a range of currencies FXB, CYB, FXA, and FXC, as illustrated in the following diagram:

Test case

The mechanism to generate features from ticker symbols

Implementation

The first step is to define the configuration parameter for the MLP classifier, as follows:

val path = "resources/data/chap9/"
val ALPHA = 0.8; 
val ETA = 0.01
val NUM_EPOCHS = 250
val EPS = 1e-3
val THRESHOLD = 0.12
val hiddens = Array[Int](7, 7) //59

Besides the learning parameters, the network is initialized with multiple topology configurations (line 59).

Next, let's create the search space of the prices of all the ETFs used in the analysis:

val symbols = Array[String](
 "FXE","FXA","SPY","GLD","FXB","FXF","FXC","FXY","CYB"
)
val STUDIES = List[Array[String]](   //60
  Array[String]("FXY","FXC","GLD","FXA"),
  Array[String]("FXE","FXF","FXB","CYB"),
  Array[String]("FXE","FXC","GLD","FXA","FXY","FXB"),
  Array[String]("FXC","FXY","FXA"),
  Array[String]("CYB","GLD","FXY"),
  symbols
)

The purpose of the test is to evaluate and compare seven different portfolios or studies (line 60). The closing prices of all the ETFs over a period of 3 years are extracted from the Google Financial tables, using the GoogleFinancials extractor for a basket of ETFs (line 61):

val prices = symbols.map(s => DataSource(s"$path$s.csv"))
      .flatMap(_.get(close).toOption) //61

The next step consists of implementing the mechanism to extract the target and the features from a basket of ETFs or studies introduced in the previous paragraph. Let's consider the following study as the list of ETF ticker symbols:

val study = Array[String]("FXE","FXF","FXB","CYB")

The first element of the study, FXE, is the labeled output; the remaining three elements are observed features. For this study, the network architecture has three input variables (FXF, FXB, and CYB) and one output variable, FXE:

val obs = symbols.flatMap(index.get(_))
              .map( prices(_).toArray )  //62
val xv = obs.drop(1).transpose  //63
val expected = Array[DblArray](obs.head).transpose //64

The set of observations, obs, is built using an index (line 62). By convention, the first observation is selected as the label data and the remaining studies as the features for training. As the observations are loaded as an array of time series, the time features of the series is computed using transpose (line 63). The single target output variable has to be converted into a matrix before transposition (line 64).

Ultimately, the model is built through instantiation of the MLP class:

implicit val mode = new MLPBinClassifier  //65
val classifier = MLP[Double](config, hiddenLayers, xv, expected)
classifier.fit(THRESHOLD)

The objective or operating mode is implicitly defined as an MLP binary classifier, MLPBinClassifier (line 65). The MLP.fit method is defined in the Training and classification section.

Evaluation of models

The test consists of evaluating six different models to determine which ones provide the most reliable correlation. It is critical to ensure that the result is somewhat independent of the architecture of the neural network. Different architectures are evaluated as part of the test.

The following charts compare the models for two architectures:

  • Two hidden layers with four nodes each
  • Three hidden layers with eight (with respect to five and six) nodes

This first chart visualizes the fitness of the six regression models with an architecture consisting of a variable number of inputs (2, 7): one output variable and two hidden layers of four nodes each. The features (ETF symbols) are listed on the left-hand side of the arrow => along the y axis. The symbol on the right-hand side of the arrow is the expected output value:

Evaluation of models

The accuracy of MLP with two hidden layers of four nodes each

The following chart displays the fitness of the six regression models for an architecture with three hidden layers of eight, five, and six nodes, respectively:

Evaluation of models

The accuracy of MLP with three hidden layers with 8, 5, and 6 nodes, respectively

The two network architectures shared a lot of similarity; in both cases, the fittest regression models are as follows:

  • FXE = f (FXA, SPY, GLD, FXB, FXF, FXD, FXY, CYB)
  • FXE = g (FXC, GLD, FXA, FXY, FXB)
  • FXE = h (FXF, FXB, CYB)

On the other hand, the prediction of the Canadian dollar to US dollar's exchange rate (FXC) using the exchange rate for the Japanese yen (FXY) and the Australian dollar (FXA) is poor with both the configurations.

Note

The empirical evaluation

These empirical tests use a simple accuracy metric. A formal comparison of the regression models will systematically analyze every combination of input and output variables. The evaluation will also compute the precision, the recall, and the F1 score for each of those models (refer to the Key quality metrics section under Validation in the Assessing a model section in Chapter 2, Hello World!).

Impact of the hidden layers' architecture

The next test consists of evaluating the impact of the hidden layer(s) of configuration on the accuracy of three models: FXF, FXB, CYB => FXE, FCX, GLD, FXA =>FXY, and FXC, GLD, FXA, FXY, FXB => FXE. For this test, the accuracy is computed by selecting a subset of the training data as a test sample, for the sake of convenience. The objective of the test is to compare different network architectures using some metrics, and not to estimate the absolute accuracy of each model.

The four network configurations are as follows:

  • A single hidden layer with four nodes
  • Two hidden layers with four nodes each
  • Two hidden layers with seven nodes each
  • Three hidden layers with eight, five, and six nodes

Let's take a look at the following graph:

Impact of the hidden layers' architecture

The impact of the hidden layers' architecture on the MLP accuracy

The complex neural network architecture with two or more hidden layers generates weights with similar accuracy. The four-node single hidden layer architecture generates the highest accuracy. The computation of the accuracy using a formal cross-validation technique would generate a lower accuracy number.

Finally, we take a look at the impact of the complexity of the network on the duration of the training, as shown in the following graph:

Impact of the hidden layers' architecture

The impact of the hidden layers' architecture on the duration of training

Not surprisingly, the time complexity increases significantly with the number of hidden layers and number of nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset