Chapter 12
Implementations
As we have proposed several online portfolio selection (OLPS) algorithms, we are
interested in whether they work in real markets. To examine their empirical efficacy,
we conducted an extensive set of empirical studies on a variety of real datasets. In our
evaluations, we adopted six real datasets, which were collected from several diverse
financial markets. The performance metrics include cumulative wealth (return) and
risk-adjusted returns (based on volatility risk and drawdown risk). We also compared
the proposed algorithms with various existing algorithms. The results clearly demon-
strate that the proposed algorithms sequentially surpass the state-of-the-art techniques
in terms of either metric.
This chapter is organized as follows. Section 12.1 describes the experimental plat-
form or the OLPS platform. Section 12.2 details the experimental testbed, including
six real datasets. Section 12.3 sets up all the proposed algorithms and illustrates sev-
eral compared approaches. Section 12.4 introduces the performance metrics used for
the empirical studies. Finally, Section 12.5 summarizes this chapter.
12.1 The OLPS Platform
To evaluate the performance of a proposed algorithm, researchers and practitioners
usually implemented a back-test system, simulating the strategies using historical
market data. We also designed a back-test system, named “OLPS”, as follows, and
Appendix A describes the details of the OLPS toolbox. It implements a frame-
work for back-testing and various algorithms for online portfolio selection. Based
on MATLAB
,
it is compatible with Window, Linux, and Mac OS. Figure 12.1
illustrates the structure of the OLPS toolkit, which consists of three parts. The first
part on the upper left preprocesses data, that is, it loads a specified dataset and initial-
izes the trading environments, such as log files, timing variable. The second part on
the lower level calls OLPS algorithms and simulates the trading process for strategies
based on the data prepared in the first part. The third part in the upper right postpro-
cesses the outputs from the second part, that is, it statistically analyzes the returns and
calculates some risk-adjusted returns.
More details are available at http://www.mathworks.com
95
T&F Cat #K23731 — K23731_C012 — page 95 — 9/30/2015 — 16:46
96 IMPLEMENTATIONS
Data
Statistical t-test
Volatility risk and Sharpe ratio
Drawdown analysis and Calmar ratio
OLPS: Postprocess
Load data
Initialize log files
OLPS: Preprocess
OLPS: Algorithmic trading
Market
Best stock
BCRP
Benchmarks
Universal portfolios
Exponential gradient
Online Newton step
Switching portfolios
Follow the winner
Anticorrelation
Passive–aggressive mean reversion
Confidence-weighted mean reversion
Online moving average reversion
Follow the loser
Nonparametric kernel-based log-optimal
Nonparametric nearest neighbor log-optimal
Correlation-driven nonparametric learning
Pattern matching–based
Figure 12.1 Structure of the OLPS toolbox.
12.1.1 Preprocess
This step aims to prepare trading environments. As existing datasets are often in MAT
files,
OLPS accepts datasets in MAT format. The dataset often contains an n ×m
matrix, where n denotes the number of trading periods and m refers to the number of
assets. It is straightforward to incorporate market feeds
from real markets, such that
the toolkit can handle real-time data and conduct paper or even real trading.
12.1.2 Algorithmic Trading
This step conducts simulations based on historical real-market data. In our framework,
implementing a new strategy generally requires four files: a start file, a run file, a
kernel file, and an expert file. The start (entry) file extracts parameters and call the
corresponding run file. The run file simulates a whole trading process and calls its
kernel file to construct a portfolio for each period, which is used for rebalancing.
The kernel file outputs a final portfolio, while it facilitates the development of meta-
algorithms, which effectively combines multiple experts’ portfolios. The expert file
outputs one portfolio depending on the input data and specific parameters. In case of
only one expert, the kernel file is not necessary and directly enters the expert file.
OLPS implements the following OLPS algorithms:
Benchmarks (Market, Best stock, and BCRP).
Follow the winner approaches (UP, EG, and ONS): make portfolio decisions fol-
lowing the assumption that the next price relatives (or experts for UP) will follow
the previous one.
Follow the loser approaches (Anticor, PAMR, CWMR, and OLMAR): make
portfolio decisions by assuming that next price relatives will revert to previous
trends.
A full description about MAT files can be found at http://www.mathworks.com/help/pdf_doc/matlab/
matfile_format.pdf
For example, Interactive Brokers (http://www.interactivebrokers.com) provides free APIs.
Both paper and real trading require users to implement an order submission step, while back-test
does not.
T&F Cat #K23731 — K23731_C012 — page 96 — 9/30/2015 — 16:46
DATA 97
Pattern matching–based approaches (B
K
,B
NN
, and CORN): locate a set containing
similar price relatives and make optimal portfolios based on the set.
Others: some are ad hoc algorithms, such as M0/T0.
12.1.3 Postprocess
After the algorithmic trading simulation, this step processes the results by providing
the following performance metrics:
Cumulative return: The most widely used in related studies;
Volatility and Sharpe ratio: Typically used to measure risk-adjusted return in the
investment industry;
Drawdown and Calmar ratio: Used to measure downside risk and related risk-
adjusted return;
T-test statistics: Tests whether a strategy’s return is significantly different from that
of the market.
12.2 Data
In our study, we focus on historical daily closing prices in stock markets, which are
easy to obtain from public domains (such as Yahoo Finance and Google Finance
),
and thus are publicly available to other researchers. Data from other types of markets,
such as high-frequency intraday quotes
and Forex markets, are either too expensive
or hard to obtain and process, and thus may reduce the experimental reproducibil-
ity. Summarized in Table 12.1, six real and diverse datasets from several financial
markets
are employed.
The first dataset, “NYSE (O),” is one “standard” dataset pioneered by Cover
(1991) and followed by others (Helmbold et al. 1998; Borodin et al. 2004; Agarwal
et al. 2006; Györfi et al. 2006, 2008). This dataset contains 5651 daily price relatives
of 36 stocks
§
in the New York Stock Exchange (NYSE) for a 22-year period from
July 3, 1962, to December 31, 1984.
The second dataset is an extended version of the NYSE (O) dataset. For consis-
tency, we collected the latest data in the NYSE from January 1, 1985, to June 30,
2010, a period that consists of 6431 trading days. We denote this new dataset as
“NYSE (N).”
Note that the new dataset consists of 23 stocks rather than the pre-
vious 36 stocks owing to amalgamations and bankruptcies. All self-collected price
Yahoo Finance: http://finance.yahoo.com; and Google Finance: http://www.google.com/finance
We did evaluate certain algorithms using high-frequency data and weekly data, as in Li et al. (2013).
All related codes and datasets, including their compositions, are available at http://stevenhoi.org/olps
Borodin et al. (2004)’s datasets (NYSE (O), TSE, SP500, and DJIA) are also available at
http://www.cs.technion.ac.il/rani/portfolios/
§
According to Helmbold et al. (1998), the dataset was originally collected by Hal Stern. The stocks are
mainly large cap stocks in NYSE; however, we do no know the criteria of choosing these stocks.
The dataset before 2007 was collected by Gábor Gelencsér (http://www.cs.bme.hu/oti/portfolio);
we collected the remaining data from 2007 to 2010 via Yahoo Finance.
T&F Cat #K23731 — K23731_C012 — page 97 — 9/30/2015 — 16:46
98 IMPLEMENTATIONS
Table 12.1 Summary of the six datasets from real markets
Dataset Market Region Time Frame # Periods # Assets
NYSE (O) Stock USA July 3, 1962– 5651 36
December 31, 1984
NYSE (N) Stock USA January 1, 1985– 6431 23
June 30, 2010
TSE Stock CA January 4, 1994– 1259 88
December 31, 1998
SP500 Stock USA January 2, 1998– 1276 25
January 31, 2003
MSCI Index Global April 1, 2006– 1043 24
March 31, 2010
DJIA Stock USA January 14, 2001– 507 30
January 14, 2003
relatives are adjusted for splits and dividends, which is consistent with the previous
“NYSE (O)” dataset.
The third dataset, “TSE,” is collected by Borodin et al. (2004), and it consists
of 88 stocks from the Toronto Stock Exchange (TSE) containing price relatives of
1259 trading days, ranging from January 4, 1994, to December 31, 1998. The fourth
dataset, SP500, is collected by Borodin et al. (2004), and it consists of 25 stocks
with the largest market capitalizations in the 500 SP500 components. It ranges from
January 2, 1998, to January 31, 2003, containing 1276 trading days.
The fifth dataset is “MSCI,” which is a collection of global equity indices that
constitute the MSCI World Index.
It contains 24 indices that represent the equity
markets of 24 countries around the world, and it consists of a total of 1043 trading
days, ranging from April 1, 2006, to March 31, 2010. The final dataset is the DJIA
dataset (Borodin et al. 2004), which consists of 30 Dow Jones composite stocks. DJIA
contains 507 trading days, ranging from January 14, 2001, to January 14, 2003.
Besides the six real-market data, in the main experiments (i.e., Experiment 1 in
Section 13.1), we also evaluate each dataset in their reversed form (Borodin et al.
2004). For each dataset, we create a reversed dataset, which reverses the original
order and inverts the price relatives. We denote these reverse datasets using a 1’
superscript on the original dataset names. In nature, these reverse datasets are quite
different from the original datasets, and we are interested in the behaviors of the
proposed algorithms on such artificial datasets.
Unlike previous studies, the above testbed covers much longer trading peri-
ods from 1962 to 2010 and much more diversified markets, which enables us to
examine the behaviors of the proposed strategies under different events and crises.
For example, it covers several well-known events in the stock markets, such as the
The constituents of the MSCI World Index are available on MSCI Barra (http://www.mscibarra.com),
accessed on 28 May 2010.
T&F Cat #K23731 — K23731_C012 — page 98 — 9/30/2015 — 16:46
SETUPS 99
dot-com bubble from 1995 to 2000 and the subprime mortgage crisis from 2007 to
2009. The five stock datasets are mainly chosen to test the capability of the pro-
posed algorithms on regional stock markets, while the index dataset aims to test their
capability on global indices, which may be potentially applicable to a fund of funds
(FOF).
As a remark, although we numerically test the proposed algorithms on stock
and exchange traded funds (ETF) markets, we note that the proposed strategies could
be generally applied to any type of financial market.
12.3 Setups
In our experiments, we implemented all the proposed approaches: CORN-U,
CORN-K, PAMR, PAMR-1, PAMR-2, CWMR-Var, CWMR-Stdev, OLMAR-1, and
OLMAR-2. For CWMR algorithms, we only present the results achieved by the
deterministic versions. The results of the stochastic versions are presented in Li et al.
(2013). Besides individual algorithms, we also designed their buy and hold (BAH)
versions whose results can be found on their respective studies (Li et al. 2011b,
2012, 2013; Li and Hoi 2012). Without ambiguity, when referring to CORN, PAMR,
CWMR, and OLMAR, we often focus on their representative versions, that is,
CORN-U, PAMR, CWMR-Stdev, and OLMAR-1, respectively.
As the proposed algorithms are all online, we follow the existing work and simply
set the parameters empirically without tuning for each dataset separately. Note that
the best values for these parameters are often dataset dependent, and our choices are
not always the best, as we will further evaluate in Section 13.3. Below, we introduce
the parameter settings of the proposed algorithms.
For the proposed CORN experts, two possible parameters can affect their perfor-
mance, that is, the correlation coefficient threshold ρ and the window size w. In our
evaluations, we simply fix ρ = 0.1 and W = 5 for the CORN-U algorithm, which is
not always the best. And for the CORN-K algorithm, we first fix W = 5, P = 10,
and K = 50, which means choose all experts in the experiments and denote it as
“CORN-K1.” We also provide “CORN-K2,” whose parameters are fixed as W = 5,
P = 10, and K = 5.
There are two key parameters in the proposed PAMR algorithms. One is the
sensitivity parameter , and the other is the aggressiveness parameter C. Specifically,
for all datasets and experiments, we set the sensitivity parameter to 0.5inthe
three algorithms, and set the aggressiveness parameter C to 500 in both PAMR-1
and PAMR-2, with which the cumulative wealth achieved tends to be stable on most
datasets. Our experiments on the parameter sensitivity show that the proposed PAMR
algorithms are quite robust with respect to different parameter settings.
CWMR has two key parameters, that is, the confidence parameter φ and the
sensitivity parameter . We set the sensitivity parameter to 0.5 and set the confi-
dence parameter φ to 2.0, or equivalently 95% confidence level, in both CWMR-Var
and CWMR-Stdev. As the results show, the proposed CWMR algorithm is generally
Note that not every index is tradable through ETFs.
T&F Cat #K23731 — K23731_C012 — page 99 — 9/30/2015 — 16:46
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset