13: Spectral Analysis of Qualitative Time Series (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Spectral Analysis of Qualitative Time Series

David Stoffer

CONTENTS

13.1 Introduction...................................................................................287

13.2 Scaling Categorical Time Series.............................................................287

13.3 Denition of Spectral Envelope.............................................................291

13.4 Estimation......................................................................................293

13.5 Numerical Examples.........................................................................296

13.6 Enveloping Spectral Surfaces...............................................................298

13.7 Principal Components. .. ....................................................................305

13.8 R Code..........................................................................................307

References............................................................................................309

13.1 Introduction

Qualitative-valued time series are frequently encountered in diverse applications such as

economics, medicine, psychology, geophysics, and genomics, to mention a few. The fact

that the data are categorical does not preclude the need to extract pertinent information in

the same way that is done with quantitative-valued time series. One particular area that

was neglected was the frequency domain, or spectral analysis, of categorical time series.

In this chapter, we explore an approach based on scaling and the spectral envelope, which

was introduced by Stoffer et al. (1993a).

First, we discuss the concept of scaling categorical variables, and then we use the idea

to develop spectral analysis of qualitative time series. In doing so, the spectral envelope

and optimal scaling are introduced, and their properties are discussed. Spectral envelope

and the corresponding optimal scaling are a population idea; consequently, efcient esti-

mation is presented. Pertinent theoretical results are also summarized. Examples of using

the methodology on a DNA sequence are given. The examples include a piecewise analysis

of a gene in the Epstein–Barr virus (EBV). Often, one collects qualitative-valued time series

in experimental designs. This problem is also explored as we discuss the analysis of repli-

cated series that depend on covariates. The spectral envelope is intimately associated with

the concept of principal component analysis of time series, and this relationship is explored

in a separate section. Finally, we list an R script that can be used to calculate the spectral

envelope and optimal scalings. The script can also be used to perform the aforementioned

dynamic analysis.

287

288 Handbook of Discrete-Valued Time Series

13.2 Scaling Categorical Time Series

Our work on the spectral envelope was motivated by collaborations with researchers who

collected categorical-valued time series with an interest in the cyclic behavior of the data.

For example, Table 13.1 shows the per minute sleep state of an infant taken from a study on

the effects of prenatal exposure to alcohol. Details can be found in Stoffer et al. (1988), but

briey, an electroencephalographic () sleep recording of approximately 2 hours (h) is

obtained on a full-term infant 24–36 h after birth, and the recording is scored by a pediatric

neurologist for sleep state. There are two main types of sleep, non-rapid eye movement

(on-), also known as quiet sleep and rapid eye movement (), also known as active

sleep. In addition, there are four stages of on- (NR1–NR4), with NR1 being the “most

active” of the four states, and nally awake (AW), which naturally occurs briey through

the night. This particular infant was never awake during the study.

It is not too difcult to notice a pattern in the data if one concentrates on  versus

on- sleep states. But it would be difcult to try to assess patterns in a longer sequence,

or if there were more categories, without some graphical aid. One simple method would

be to scale the data, that is, assign numerical values to the categories and then draw a time plot

of the scales. Since the states have an order,

∗

one obvious scaling is as follows:

NR4 = 1, NR3 = 2, NR2 = 3, NR1 = 4, REM = 5, AW = 6, (13.1)

and Figure 13.1 (often referred to as a hypnogram) shows the time plot using this scaling.

Another interesting scaling might be to combine the quiet states and the active states:

NR4 = NR3 = NR2 = NR1 = 0, REM = 1, AW = 2. (13.2)

TABLE 13.1

Per Minute Infant  Sleep States (Read Down and Across)

REM NR2 NR4 NR2 NR1 NR2 NR3 NR4 NR1 NR1 REM

REM REM NR4 NR1 NR1 NR2 NR4 NR4 NR1 NR1 REM

REM REM NR4 NR1 NR1 REM NR4 NR4 NR1 NR1 REM

REM NR3 NR4 NR1 REM REM NR4 NR4 NR1 NR1 REM

REM NR4 NR4 NR1 REM REM NR4 NR4 NR1 NR1 REM

REM NR4 NR4 NR2 REM NR2 NR4 NR4 NR1 NR1 NR2

REM NR4 NR4 REM REM NR2 NR4 NR4 NR1 REM

NR2 NR4 NR4 NR1 REM NR2 NR4 NR4 NR1 REM

REM NR2 NR4 NR1 REM NR3 NR4 NR2 NR1 REM

∗

The so-called “ordering” of sleep states is somewhat tenuous. For example, sleep does not progress through

these stages in sequence. For a typical normal healthy adult, sleep begins in stage NR1 and progresses into

stages NR2, NR3,and NR4. Sleep moves through these stages repeatedly before entering REM sleep. Moreover,

sleep typically transitions between REM and stage NR2. Sleep cycles through these stages approximately four

or ve times throughout the night. On average, adults enter the REM stage approximately 90 min after falling

asleep. The rst cycle of REM sleep might last only a short amount of time, but each cycle becomes longer.

289 Spectral Analysis of Qualitative Time Series

REM

NR1

NR2

Sleep state

NR3

NR4

0 20 40 60 80 100

Minute

FIGURE 13.1

Time plot of the  sleep state data in Table 13.1 using the scaling in (13.1).

Periodogram ordinates

1/60

0.0 0.1 0.2 0.3 0.4 0.5

Frequency

FIGURE 13.2

Periodogram of the  sleep state data in Table 13.1 based on the scaling in (13.1). The peak corresponds to a

frequency of approximately one cycle every 60 min.

The time plot using (13.2) would be similar to Figure 13.1 as far as the cyclic (in and out

of  sleep) behavior of this infant’s sleep pattern. Figure 13.2 shows the periodogram of

the sleep data using the scaling in (13.1). Note that there is a large peak at the frequency

corresponding to one cycle every 60 min. As one might imagine, the general appearance of

the periodogram using the scaling (13.2) (not shown) is similar to Figure 13.2. Most of us

would feel comfortable with this analysis even though we made an arbitrary and ad hoc

choice about the particular scaling. It is evident from the data (without any scaling) that,

if the interest is in infant’s sleep cycling, this particular sleep study indicates that an infant

cycles between  and on- sleep at a rate of about one cycle per hour.

The intuition used in the previous example is lost when one considers a long DNA

sequence. Briey, a DNA strand can be viewed as a long string of linked nucleotides. Each

nucleotide is composed of a nitrogenous base, a ve carbon sugar, and a phosphate group.

There are four different bases that can be grouped by size, the pyrimidines, thymine (T)

and cytosine (C), and the purines, adenine (A) and guanine (G). The nucleotides are linked

together by a backbone of alternating sugar and phosphate groups with the 5



carbon of

one sugar linked to the 3



carbon of the next, giving the string direction. DNA molecules

occur naturally as a double helix composed of polynucleotide strands with the bases facing

inward. The two strands are complementary, so it is sufcient to represent a DNA molecule

by a sequence of bases on a single strand. Thus, a strand of DNA can be represented as a

sequence of letters, termed base pairs (bp), from the nite alphabet {A, C, G, T}.Theorder

290 Handbook of Discrete-Valued Time Series

of the nucleotides contains the genetic information specic to the organism. Expression of

information stored in these molecules is a complex multistage process. One important task

is to translate the information stored in the protein-coding sequences (CDS) of the DNA.

A common problem in analyzing long DNA sequence data is in identifying CDS that are

dispersed throughout the sequence and separated by regions of noncoding (which makes

up most of the DNA). Table 13.2 shows part of the EBV DNA sequence. The entire EBV

DNA sequence consists of approximately 172,000 bp.

One could try scaling according to the pyrimidine–purine alphabet, that is, A = G = 0

and C = T = 1, but this is not necessarily of interest for every CDS of EBV. There are

numerous possible alphabets of interest, for example, one might focus on the strong–weak

hydrogen bonding alphabet

∗

S ={C, G}=0and W ={A, T}=1. While model calcula-

tions as well as experimental data strongly agree that some kind of periodic signal exists in

certain DNA sequences, there is a large disagreement about the exact type of periodicity.

In addition, there is disagreement about which nucleotide alphabets are involved in the

signals; for example, compare Ioshikhes et al. (1996) with Satchwell et al. (1986).

If we consider the naive approach of arbitrarily assigning numerical values (scales) to

the categories and then proceeding with a spectral analysis, the result will depend on the

particular assignment of numerical values. The obvious problem of being arbitrary is illus-

trated as follows: Suppose we observe the sequence ATCTACATG ..., then setting A = G = 0

TABLE 13.2

Part of the Epstein–Barr Virus DNA Sequence (Read Across and Down)

AGAATTCGTC TTGCTCTATT CACCCTTACT TTTCTTCTTG CCCGTTCTCT TTCTTAGTAT

GAATCCAGTA TGCCTGCCTG TAATTGTTGC GCCCTACCTC TTTTGGCTGG CGGCTATTGC

CGCCTCGTGT TTCACGGCCT CAGTTAGTAC CGTTGTGACC GCCACCGGCT TGGCCCTCTC

ACTTCTACTC TTGGCAGCAG TGGCCAGCTC ATATGCCGCT GCACAAAGGA AACTGCTGAC

ACCGGTGACA GTGCTTACTG CGGTTGTCAC TTGTGAGTAC ACACGCACCA TTTACAATGC

ATGATGTTCG TGAGATTGAT CTGTCTCTAA CAGTTCACTT CCTCTGCTTT TCTCCTCAGT

CTTTGCAATT TGCCTAACAT GGAGGATTGA GGACCCACCT TTTAATTCTC TTCTGTTTGC

ATTGCTGGCC GCAGCTGGCG GACTACAAGG CATTTACGGT TAGTGTGCCT CTGTTATGAA

ATGCAGGTTT GACTTCATAT GTATGCCTTG GCATGACGTC AACTTTACTT TTATTTCAGT

TCTGGTGATG CTTGTGCTCC TGATACTAGC GTACAGAAGG AGATGGCGCC GTTTGACTGT

TTGTGGCGGC ATCATGTTTT TGGCATGTGT ACTTGTCCTC ATCGTCGACG CTGTTTTGCA

GCTGAGTCCC CTCCTTGGAG CTGTAACTGT GGTTTCCATG ACGCTGCTGC TACTGGCTTT

CGTCCTCTGG CTCTCTTCGC CAGGGGGCCT AGGTACTCTT GGTGCAGCCC TTTTAACATT

GGCAGCAGGT AAGCCACACG TGTGACATTG CTTGCCTTTT TGCCACATGT TTTCTGGACA

CAGGACTAAC CATGCCATCT CTGATTATAG CTCTGGCACT GCTAGCGTCA CTGATTTTGG

GCACACTTAA CTTGACTACA ATGTTCCTTC TCATGCTCCT ATGGACACTT GGTAAGTTTT

CCCTTCCTTT AACTCATTAC TTGTTCTTTT GTAATCGCAG CTCTAACTTG GCATCTCTTT

TACAGTGGTT CTCCTGATTT GCTCTTCGTG CTCTTCATGT CCACTGAGCA AGATCCTTCT

GGCACGACTG TTCCTATATG CTCTCGCACT CTTGTTGCTA GCCTCCGCGC TAATCGCTGG

TGGCAGTATT TTGCAAACAA ACTTCAAGAG TTTAAGCAGC ACTGAATTTA TACCCAGTGA

∗

S refers to guanine (G)orcytosine(C)for the strong hydrogen bonding interaction between the base pairs.

W refers to adenine (A) or thymine (T)forthe weak hydrogen bonding interaction between the base pairs.





291 Spectral Analysis of Qualitative Time Series

and C = T = 1 yields the numerical sequence 011101010 ..., which is not very interesting.

However, if we use the strong–weak bonding alphabet, W ={A, T}=0and S ={C, G}=1,

then the sequence becomes 001001001 ..., which is very interesting. It should be clear, then,

that one does not want to focus on only one scaling. Instead, the focus should be on nding

scalings that bring out all of the interesting features in the data. Rather than choose values

arbitrarily, the spectral envelope approach selects scales that help emphasize any periodic

feature that exists in a categorical time series of virtually any length in a quick and auto-

mated fashion. In addition, the technique can help in determining whether a sequence is

merely a random assignment of categories.

13.3 Denition of Spectral Envelope

As a general description, the spectral envelope is a frequency-based, principal component

technique applied to a multivariate time series. In this section, we will focus on the basic

concept and its use in the analysis of categorical time series. Technical details can be found

in Stoffer et al. (1993a).

Briey, in establishing the spectral envelope for categorical time series, we addressed

the basic question of how to efciently discover periodic components in categorical

time series. This was accomplished via nonparametric spectral analysis as follows. Let

; t = 0, ±1, ±2, ...} be a categorical-valued time series with nite state-space C =

, c

, ..., c

k+1

}. Assume that X

is stationary and p

= Pr{X

}> 0for j =1, 2, ..., k + 1.

For β =(β

, β

, ..., β

k+1

)



∈R

k+1

, denote by X

(β) the real-valued stationary time series

corresponding to the scaling that assigns the category c

the numerical value β

,for

j = 1, 2, ..., k + 1. Our goal was to nd scaling β so that the spectral density is in some

sense interesting and to summarize the spectral information by what we called the spectral

envelope.

We chose β to maximize the power (variance) at each frequency ω, across frequencies

ω ∈ (−1/2, 1/2], relative to the total power σ

(β) = Var{X

(β)}. That is, we chose β(ω),at

each ω of interest, so that

f (ω; β)

λ(ω) = sup

(β)

, (13.3)

over all β not proportional to 1

k+1

,the(k+1)×1 vector of ones. Note that λ(ω) is not dened

if β = a1

k+1

for a ∈ R because such a scaling corresponds to assigning each category

the same value a; in this case, f (ω; β) ≡ 0and σ

(β) = 0. The optimality criterion λ(ω)

possesses the desirable property of being invariant under location and scale changes of β.

As in most scaling problems for categorical data, it was useful to represent the categories

in terms of the vectors e

, e

, ..., e

k+1

, where e

represents the (k + 1) × 1 vector with one

in the jth row and zeros elsewhere. We then dened a (k + 1)-dimensional stationary time

series Y

by Y

= e

when X

= c

. The time series X

(β) can be obtained from the Y

time

series by the relationship X

(β) = β



. Assume that the vector process Y

has a continu-

ous spectral density denoted by f

(ω). For each ω, f

(ω) is, of course, a (k + 1) × (k + 1)

complex-valued Hermitian matrix. Note that the relationship X

(β) = β



implies that

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13: Spectral Analysis of Qualitative Time Series (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
13: Spectral Analysis of Qualitative Time Series (1/5)