307 Spectral Analysis of Qualitative Time Series
where {b
j
} is an absolutely summable p×1 lter, is such that the mean square approximation
error
E X
t
X
t
X
t
X
t
(13.19)
is minimized.
Let b(ω) and c(ω) be the transforms of {b
j
} and {c
j
}, respectively. For example,
c(ω) =
c
j
exp(2πijω), (13.20)
j=−∞
and, consequently,
1
/2
c
j
= c(ω) exp(2πijω)dω. (13.21)
1/2
Brillinger (1981, Theorem 9.3.1) shows the solution to the problem is to choose c(ω) to
satisfy (13.15) and to set b(ω) =
c(ω). This is precisely the previous problem, with the
solution given by (13.16). That is, we choose c(ω) = e
1
(ω) and b(ω) = e
1
(ω);thelter
values can be obtained via the inversion formula given by (13.21). Using these results, in
view of (13.17), we may form the rst principal component series, say Y
t1
.
This technique may be extended by requesting another series, say, Y
t2
, for approximating
X
t
with respect to minimum mean square error, but where the coherency between Y
t2
and
Y
t1
is zero. In this case, we choose c(ω) = e
2
(ω). Continuing this way, we can obtain the
rst q p principal component series, say, Y
t
= (Y
t1
, ..., Y
tq
)
, having spectral density
f
yy
(ω) = diag{λ
1
(ω), ..., λ
q
(ω)}. The series Y
tk
is the kth principal component series.
13.8 R Code
The following R script can be used to calculate the spectral envelope and optimal scalings.
The script can also be used to perform dynamic analysis in a piecewise fashion. The scripts
are used to produce Figures 13.4 and 13.5 and Table 13.3. The scripts require the use of the R
package astsa, which must be downloaded and installed prior to running the script. The
data are included in the package as bnrf1ebv, which lists the bp using the code A = 1,
C = 2, G = 3, T = 4.
require(astsa)
u= factor(bnrf1ebv)
# first, input the data as factors and then
x= model.matrix(~u-1)[,1:3]
# make an indicator matrix
Var = var(x)
# var-cov matrix
xspec = mvspec(x, spans=c(7,7), plot=FALSE)
# spectral matrices are an array
# called fxx
fxxr = Re(xspec$fxx)
# fxxr is real(fxx)
308 Handbook of Discrete-Valued Time Series
#--- compute Q = Var^-1
/
2 ---#
ev = eigen(Var)
Q =ev$vectors%*%diag(1/sqrt(ev$values))%*%t(ev$vectors)
#--- compute spectral envelope and scale vectors ---#
num = xspec$n.used
# effective sample size
nfreq = length(xspec$freq)
# number of frequencies
specenv = matrix(0,nfreq,1)
# initialize the spectral envelope
beta = matrix(0,nfreq,3)
# initialize the scale vectors
for (k in 1:nfreq){
ev = eigen(2*Q%*%fxxr[,,k]%*%Q/num)
# get evalues of normalized spectral
# matrix at freq k
/
n
specenv[k] = ev$values[1]
# spec env at freq k
/
n is max evalue
b= Q%*%ev$vectors[,1]
# beta at freq k
/
n
beta[k,] = b/sqrt(sum(b^2))
# helps to normalize beta
}
#--- output and graphics ---#
dev.new(height=3)
par(mar=c(3,3,2,1), mgp=c(1.6,.6,0))
frequency = (0:(nfreq-1))/num
plot(frequency, 100*specenv, type="l", ylab="Spectral Envelope (%)")
title("Epstein-Barr BNRF1")
## add significance threshold to plot ##
m = xspec$kernel$m
nuinv=sqrt(sum(xspec$kernel[-m:m]^2))
thresh=100*(2/num)*exp(qnorm(.9999)*nuinv)*matrix(1,nfreq,1)
lines(frequency, thresh, lty="dashed", col="blue")
#-- details --#
output = cbind(frequency, specenv, beta)
# results
colnames(output)=c("freq","specenv","A", "C", "G")
##--- dynamic part ---##
z= matrix(0,250,8)
output2 = array(0, dim=c(250,5,8))
colnames(output2) = c("freq","specenv","A", "C", "G")
for (j in 1:8){
ind = (500*(j-1)+1):(500*j)
if (j==8) ind=3501:length(bnrf1ebv)
xx = x[ind,]
# select subsequence -- the rest of the this part is the same
# as above
Var = var(xx)
xspec = mvspec(xx, spans=c(3,3), plot=FALSE)
fxxr = Re(xspec$fxx)
ev = eigen(Var)
Q =ev$vectors%*%diag(1/sqrt(ev$values))%*%t(ev$vectors)
num = xspec$n.used
nfreq = length(xspec$freq)
frequency = (0:(nfreq-1))/num
specenv = matrix(0, nfreq, 1)
beta = matrix(0, nfreq, 3)
for (k in 1:nfreq){
ev = eigen(2*Q%*%fxxr[,,k]%*%Q/num)
specenv[k] = ev$values[1]
b= Q%*%ev$vectors[,1]
beta[k,] = b/sqrt(sum(b^2))
}
309 Spectral Analysis of Qualitative Time Series
if(j<8) { z[,j] = specenv; output2[,,j] = cbind(frequency, specenv, beta)}
if(j==8) { z[1:240,8] = specenv; output2[1:240,,j] = cbind(frequency, specenv,
beta)}
}
#--- output and graphics (results in output2)---#
zz = 100*t(z)
# zz is 8x250
rowss = rep(1:8, each=2)
zz = zz[rowss,]
# now it’s 16x250
# threshold
m = xspec$kernel$m
nuinv = sqrt(sum(xspec$kernel[-m:m]^2))
thresh=100*(2/num)*exp(qnorm(.995)*nuinv)
dev.new(height = 5)
par(mar=c(3,3,3,1), mgp=c(1.6,.6,0))
xa = 0:249/500
ya1 = 1736+500*0:8
rowss = c(1, rep(2:8, each=2), 9)
ya = ya1[rowss]
ya[seq(2,16,by=2)] = ya[seq(2,16,by=2)]-.5
levs = thresh*seq(0, 4.5, by=.5)
colr = gray(c(10,9,5,4.5,4,3,2,1,0)/10)
contour(ya, xa, zz, xlab="base pair", ylab="frequency", levels=levs, col=colr,
main="Epstein-Barr BNRF1", lwd=2, drawlabels=FALSE)
Acknowledgment
This work was supported, in part, by a grant from the U.S. National Science Foundation.
References
Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, vol. 36. Society for Industrial and Applied
Mathematics, Philadelphia.
Buysse, D. J., Reynolds, C. F., Monk, T. H., Berman, S. R., and Kupfer, D. J. (1989). The Pittsburgh
sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Research,
28(2):193–213.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, vol. 66. Chapman & Hall,
London.
Ioshikhes, I., Bolshoy, A., Derenshteyn, K., Borodovsky, M., and Trifonov, E. N. (1996). Nucleosome
DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences.
Journal of Molecular Biology, 262(2):129–139.
Krafty, R. T., Xiong, S., Stoffer, D. S., Buysse, D. J., and Hall, M. (2012). Enveloping spectral surfaces:
Covariate dependent spectral analysis of categorical time series. Journal of Time Series Analysis,
33(5):797–806.
Magnus, J. and Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and
Econometrics. Wiley & Sons, New York.
R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria.
310 Handbook of Discrete-Valued Time Series
Satchwell, S. C., Drew, H. R., and Travers, A. A. (1986). Sequence periodicities in chicken nucleosome
core DNA. Journal of Molecular Biology, 191(4):659–675.
Shumway, R. and Stoffer, D. (2011). Time Series Analysis and Its Applications, 3rd edn. Springer,
New York.
Stoffer, D. S. (1991). Walsh-Fourier analysis and its statistical applications. Journal of the American
Statistical Association, 86(414):461–479.
Stoffer, D. S., Scher, M. S., Richardson, G. A., Day, N. L., and Coble, P. A. (1988). A Walsh-Fourier anal-
ysis of the effects of moderate maternal alcohol consumption on neonatal sleep-state cycling.
Journal of the American Statistical Association, 83(404):954–963.
Stoffer, D. S., Tyler, D. E., and McDougall, A. J. (1993a). Spectral analysis for categorical time series:
Scaling and the spectral envelope. Biometrika, 80(3):611–622.
Stoffer, D. S., Tyler, D. E., McDougall, A. J., and Schachtel, G. (1993b). Spectral analysis of DNA
sequences (with discussion). Bulletin of the International Statistical Institute, 1:345–361.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset