132 THREATS TO VALIDITY
prices (Gosnell et al. 1996). However, calculating the midpoints of the best bid and
ask prices requires access to the order book, which is usually private and not free,
rather than simply the log of transactions. Another possibility would be to take into
account only “sell-type” (or only “buy-type”) transactions, meaning the transactions
in response to market orders to sell, in which case the buying counterpart would be
the one issuing a limit order. However, addressing the possibility also requires one
to find out the order type (Keim and Madhavan 1995; Foucault et al. 2005) of each
trade, which is usually not available to the public.
Back-tests in historical markets may suffer from “data-snooping bias” issues, one
of which is the dataset selection issue. On the one hand, we selected four datasets,
the NYSE (O), TSE, SP500, and DJIA datasets, based on previous studies with-
out consideration to the proposed approaches. On the other hand, we developed the
proposed algorithms solely based on the NYSE (O) dataset, while the other five
datasets (NYSE (N), TSE, SP500, MSCI, and DJIA) were obtained after the algo-
rithms were fully developed. However, even though we are cautious about the dataset
selection issue, it may still appear in the experiments, especially for the datasets with
a relatively long history, that is, NYSE (O) and NYSE (N). The NYSE (O) dataset,
pioneered by Cover (1991) and followed by other researchers, is a “standard” dataset
in the online portfolio selection community. Since it contains 36 large-cap NYSE
stocks that survived for 22 years, it suffers from extreme survival bias. Nevertheless,
it still has the merit to compare different algorithms as done in all previous studies. The
NYSE (N) dataset, as a continuation of NYSE (O), contains 23 assets that survived
from the previous 36 stocks for another 25 years. Therefore, it becomes even worse
than its precedent in terms of survival bias. In summary, even though the empirical
results on these datasets clearly show the effectiveness of the proposed algorithms,
one cannot make claims without noticing the deficiencies of these datasets.
Another common bias is the asset selection issue. Four of the six datasets (the
NYSE (O), TSE, SP500, and DJIA) are collected by others, and to the best of our
knowledge, their assets are mainly the largest blue chip stocks in their respective
markets. As a continuation of NYSE (O), we self-collected NYSE (N), which again
contains several of the largest survival stocks in NYSE (O). The remaining dataset
(MSCI)
∗
is chosen according to the world indices. In summary, we try to avoid
the asset selection bias via arbitrarily choosing some representative stocks in their
respective markets, which usually have large capitalization and high liquidity and
thus reduce the market impact caused by any proposed portfolio strategies.
Moreover, there are some critics regarding the datasets’ liquidity issue, which
assumes that the assets are available in unbounded quantities for buying or selling
at any given trading period. In Table 13.1, we observe cumulative of 10
13
or more,
and there are assets with capitalization less than 10
10
; then, obviously, the liquidity
assumption is not fulfilled. In NYSE (O), there are many such assets, and even in
NYSE (N) there are four such assets: SHERW, KODAK, COMME, and KINAR. The
most “dangerous” asset is KINAR, identified as asset #23 in Table 13.5, where there
∗
In fact, we collected this dataset following Li et al. (2012)’s review comments, which means the dataset
does not exist before its third-round submission.
T&F Cat #K23731 — K23731_C014 — page 132 — 9/26/2015 — 8:12