48 CORRELATION-DRIVEN NONPARAMETRIC LEARNING
8.1 Preliminaries
8.1.1 Motivation
One main idea of existing approaches is to optimize portfolios by mining similar
patterns and information from historical market sequences. Anticor (Borodin et al.
2004) attempts to find statistical relations between pairs of stocks, such as posi-
tive auto covariance and negative cross-covariance, while pattern matching–based
strategies (Györfi et al. 2006, 2008) try to discover similar appearances among histor-
ical markets. Though successful in mining statistical relations among stocks, Anticor
ignores market movements, which are crucial for a portfolio selection task. Moreover,
Anticor is heuristic in nature, which could lead to suboptimal solutions. On the other
hand, existing pattern matching–based strategies (Györfi et al. 2006, 2008) rely on
Euclidean distance to measure the similarity between two market windows. Though
their empirical performance is excellent, the Euclidean distance cannot exploit the
directional information between the two market windows. Therefore, it may detect
some useful price relatives, but often includes some potentially useless or even harm-
ful price relatives and excludes many beneficial price relatives. Such a similarity set
will finally weaken the following portfolio optimization step, resulting in less effective
portfolios.
To better understand the drawbacks of Euclidean distance in measuring the simi-
larity between two market windows, we give a motivating example in Figure 8.1. Let
us assume that all market windows consist of two price relatives, such as a market of
one asset and the window size is two, or a market with two assets and the window size
equals one. Let the latest market window for the t-th period be x
t−1
t−2
= (1.10, 1.20).
Clearly, x
t−1
t−2
shows an increasing trend, and we aim to locate similar market windows
that also show increasing trends. Suppose we have three possible pairs of market win-
dows: A1: (0.90, 0.80), A2: (0.80, 0.90); B1: (1.2, 1.1), B2: (1.1, 1.2); C1: (1.4, 1.3),
C2: (1.3, 1.4). Note that in a long-only portfolio, relative trends, rather than absolute
trends, determine the allocations of capital.
∗
For example, although A2 contains two
decreasing price relatives (both 0.90 and 0.80 are less than 1), the market sequence is
relatively increasing (0.90 > 0.80). In case that the vectors contain two assets, for the
recent market window x
t−1
t−2
, it is better to allocate more capital on the second asset
(1.20 > 1.10), which is also the case in A2. However, this is not the case in B1 or C1,
though their absolute price relatives are all increasing.
†
Among the three pairs, A2,
B2, and C2 show increasing trends, while A1, B1, and C1 show decreasing trends.
Thus, a good similarity measure should classify A2, B2, and C2 as similar appear-
ances, which will benefit the next step, and A1, B1, and C1 as dissimilar appearances,
which will harm the subsequent portfolio optimization step.
Now let us classify these market sequences via a Euclidean distance measure with
a radius of 0.2,
‡
that is,
x
i−1
i−2
−x
t−1
t−2
≤ 0.2. According to Figure 8.1c, a Euclidean
∗
In our problem setting, there are no cash or risk-free assets. In reality, a weaker constraint (e.g., at
most, 90% of capital can be put in assets), may appear in mutual funds.
†
Because their first asset is more favorable than the second one, which is different from the latest x
t−1
t−2
.
‡
The radius is arbitrarily chosen to to limit the number of selected price relatives.
T&F Cat #K23731 — K23731_C008 — page 48 — 9/28/2015 — 21:18