Chapter 3

Classical Detection

3.1 Formalism of Quantum Information

Fundamental Fact: The noise lies in a high-dimensional space; the signal, by contrast, lies in a much lower-dimensional space.

If a random matrix A has i.i.d rows Ai, then A*A = ∑iAiAiT where A* is the adjoint matrix of A. We often study A through the n × n symmetric, positive semidefinite matrix, the matrix A*A. The eigenvalues of images are therefore nonnegative real numbers.

An immediate application of random matrices is the fundamental problem of estimating covariance matrices of high-dimensional distributions [107]. The analysis of the row-independent models can be interpreted as a study of sample covariance matrices. For a general distribution in images, its covariance can be estimated from a sample size of N = O(nlogn) drawn from the distribution. For sub-Gaussian distributions, we have an even better bound N = O(n). For low-dimensional distributions, much fewer samples are needed: if a distribution lies close to a subspace of dimension r in images, then a sample of size N = O(rlogn) is sufficient for covariance estimation.

There are deep results in random matrix theory. The main motivation of this subsection is to exploit the existing results in this field to better guide the estimate of covariance matrices, using nonasymptotic results [107].

3.2 Hypothesis Detection for Collaborative Sensing

The density operator (matrix) ρ is the basic building block. An operator ρ satisfies: (1) (Trace condition) ρ has trace equal to one, that is, Trρ = 1; (2) (Positivity) ρ is a positive operator, that is, ρ ≥ 0. Abusing terminology, we will use the term “positive” for “positive semide finite (denoted A ≥ 0).” Covariance matrices satisfy the two necessary conditions. We say A > 0 when A is a positive definite matrix; B > A means that BA is a positive definite matrix. Similarly, we say this for nonnegative definite matrix for BA ≥ 0. The hypothesis test problems can be written as

3.1 3.1

where Rn is the covariance matrix of the noise and RS that of the signal. The signal is assumed to be uncorrelated with the noise. From (3.1), it follows that RS ≥ 0 and Rn > 0. In our applications, this noise is modeled as additive so that if x(n) is the “signal” and w(n) the “noise,” the recorded signal is

images

Often, this additive noise is assumed to have zero mean and to be uncorrelated with the signal. In this case, the covariance of the measured data, y(n), is the sum of the covariance of x(n) and w(n). Specifically, note that since

images

if x(n) and w(n) are uncorrelated, then

images

and it follows that

3.2 3.2

Discrete-time random processes are often represented in matrix form. If

images

is a vector of p + 1 values of a process x(n), then the outer product

images

is a (p + 1) × (p + 1) matrix. If x(n) is wide-sense stationary, taking the expected value and using the Hermitian symmetry of the covariance sequence, images, leads to the (p + 1) × (p + 1) matrix of covariance values

3.3 3.3

referred to as the covariance matrix. The correlation matrix Rx has the following structure:

3.4 3.4

where I is identity matrix and TrA represents the trace of A. The covariance matrix has the following basic structures:

1. The covariance matrix of a WSS random process is a Hermitian Toeplitz matrix, Rx = Rx*.
2. The covariance matrix of a WSS random process is positive semidefinite, Rx ≥ 0. In other words, the eigenvalues, λk, of this covariance matrix are real-valued and nonnegative, that is, λk ≥ 0.

A complete list of properties is given in Table 3.1. When the mean values mx and my are zero, the autocovariance and matrices are equal. We will always assume that all random processes have zero mean. Therefore, we use the two definitions interchangeably.


Example 3.1 (Covariance matrices for sinusoids and complex exponentials)
An important random process in radar and communications is the harmonic process. An example of a real-valued harmonic process is the random phase sinusoid, which is defined by

images

where A and ω0 are fixed constants and ϕ is a random variable that is uniformly distributed over interval − π and π. The mean of this process can easily be shown to be zero. Thus, x(n) is a zero mean process. The covariance of x(n) is

images

Using the trigonometric identity

images

we have

images

Note that the first term is the expected value of a constant and the second term is equal to zero. Therefore,

images

As another example, consider the complex harmonic process

images

where, as with the random phase sinusoid, ϕ is a random variable that is uniformly distributed between − π and π. The mean of this process is zero. The covariance is

images

Consider a harmonic process consisting of L sinusoids

images

Assuming the random variables ϕl and Al are uncorrelated, the covariance sequence is

images

The 2 × 2 covariance matrix for L = 1 sinusoid is

images

The 2 × 2 covariance matrix for L sinusoids is

images

where

images

As another example, consider the complex-valued process consisting of a sum of two complex exponentials

images

The covariance sequence for two uncorrelated processes is

images

The 2 × 2 covariance matrix for two complex exponentials is

images


Table 3.1 Definition and properties for correlation and covariance matrices [108, p. 39]

Auto-correlation and covariance images images
Symmetry Rx = Rx* Cx = Cx*
Positive semidefinite Rx ≥ 0 Cx ≥ 0
Interrelation Rx = Cx + mxmx* images
Cross-correlation and cross-covariance images images
Relation to Ryx and Cyx Rxy = Ryx Cxy = Cyx
Interrelation Rxy = Cxy + mxmy* images
Orthogonal and uncorrelated x, y orthogonal: Rxy = 0 x, y uncorrelated: Cxy = 0
Sum of Rx+y = Rx + Ry Cx+y = Cx + Cy
x and y if x, y orthogonal if x, y uncorrelated

Example 3.2 (Covariance matrix for white noise)
The 2 × 2 covariance matrix for white additive noise is

images

In practice, we must deal with this form

images

where the elements of X are approximately zero-mean random variables whose variances are 10 dB lower than the diagonal elements of Rw. At low SNR, such as −20 dB, these random variables make Rw a random matrix. One realization example is

images


Let A be a Hermitian operator with qIAQI. The matrices QIA and AqI are positive and commute with each other [109, p. 95]. Since Rw is positive (of course, Hermitian), we have

images

The random matrix images is Hermitian, but not necessarily positive. X is Hermitian, since its eigenvalues must be real. The Hoffman-Wielandt is relevant in this context.


Lemma 3.1 (Hoffman-Wielandt)
[16, p. 21] Let A and B be N × N Hermitian matrices, with eigenvalues images and images. Then,

3.5 3.81

where X and Y are random symmetric matrices.

(3.5) can be used to bound the difference of the eigenvalues between A and B.

3.3 Sample Covariance Matrix

In Section 3.2, the true covariance matrix is needed for hypothesis detection. In practice, we only have access to the sample covariance matrix which is a random matrix. We first present some basic definitions and properties related to a sample covariance matrix. The determinant of a random matrix S, det S, also called generalized variance, is of special interest. It is an important measure of spread in multidimensional statistical analysis.

3.3.1 The Data Matrix

The general (n × N) data matrix will be denoted X or X(n × N). The element in row i and column j is xij. We write the matrix X = (xij). The rows of X will be written as

images

Or

images

where

images


Example 3.3 (Random matrices)
MATLAB Code: N=1000; X=randn(N,N); This code generates a random matrix of size 1000 × 1000.
r = randn(n) returns an n-by-n matrix containing pseudorandom values drawn from the standard normal distribution. randn returns a scalar. randn(size(A)) returns an array the same size as A.
(1) Generate values from a normal distribution with mean 1 and standard deviation 2.
r = 1 + 2.*randn(100,1);
(2) Generate values from a bivariate normal distribution with specified mean vector and covariance matrix. mu = [1,2]; Sigma = [1 .5; .5 2]; R = chol(Sigma); z = repmat(mu,100,1) + randn(100,2)*R;

3.3.1.1 The Mean Vector and Covariance Matrix

The sample mean of the ith variable is

3.6 3.5

and the sample variance of the ith variable is

3.7 3.6

The sample covariance between the ith and jth variable is

3.8 3.7

The vector of means,

3.9 3.8

is called the sample mean vector, or simply the “mean vector.” The N × N matrix

images

is called the sample covariance matrix, or simply “covariance matrix.” It is more convenient to express the statistics in matrix notation. Corresponding to (3.6) and (3.9), we have

3.10 3.9

where 1 is a column vector of n ones. On the other hand,

images

so that

3.11 3.10

This may be expressed as

images

using (3.10). Writing

images

where H is called the centering matrix, we obtain the following standard form

3.12 3.11

which is a convenient matrix representation of the sample covariance matrix. We need a total of nN points of samples to estimate the sample covariance matrix S. Turning the table around, we can “summarize” information of nN points of samples into one single matrix S. In spectrum sensing, we are given a long record of data about some random variables, or a random vector of large data dimensionality.


Example 3.4 (Representation of sample covariance matrix)
Given a total of 105 points of samples, how many sample covariance matrices are needed? Collect a data segment consisting of 1024 points to form N-dimensional vectors, where N = 32. These N-dimensional data vectors are used to form a sample covariance matrix S of N × N.
By doing this, we have K = 105/1025 = 97 segments. From each segment, we estimate a sample covariance matrix. Thus we have K = 97 sample covariance matrices; in other words, a series of K matrices, S1, S2, ···,SK are obtained.

Let us check the most important property of S: S is a positive semidefinite matrix. Since H is a symmetric idempotent matrix: H = HT, H = H2, for any N-vector a,

images

where y = HXa. Thus, the covariance matrix S is positive semidefinite, writing

images

For continuous data, we expect S is not only positive semidefinite, but positive definite, writing

images

if nN + 1.

It is often convenient to define the covariance matrix with a divisor of n − 1 instead of n. Set

images

If the data forms a random vector sample from a multivariate distribution, with finite second moments, then Su is an unbiased estimate of the true covariance matrix. See Theorem 2.8.2 of [110, p. 50].

The sample correlation coefficient between the ith and the j variables is

images

Unlike sij, the correlation coefficient is invariant under both changes of scale and origin of the ith and the jth variables. This property is the foundation of detection of correlated structures among random vectors. Clearly,

images

where |a| is the absolute value of a. Define the sample correlation matrix as

images

with ρii = 1. It follows that

images

If Σ = I, we say that the variables are uncorrelated. This is the case for white Gaussian noise. If D = diag(si), then

images

3.3.1.2 Measure of Multivariate Scatter

The matrix S is one possible multivariate generation of the univariate notion of variance, measuring scatter above the mean. Physically, the variance is equivalent to the power of the random vector. For example, for a white Gaussian random variable, the variance is its power.

Sometimes, for example, for hypothesis testing problems, we would rather have a single number to measure multivariate scatter. Of course, the matrix S contains many more structures (“information”) than this single real number. Two common such measures are

1. the generalized variance det S, or |S|.
2. the total variance, TrS.

A motivation for these measures is in principle component analysis (PCA) that will be treated later. For both measures, large values indicate a high degree of scatter about the mean vector images—physically larger power. Low values represent concentration about the mean vector images. Two different measures reflect different aspects of the variability in the data. The generalized variance plays an important role in maximum likelihood (ML) estimation while the total variance is a useful concept in principal component analysis. In the context of low SNR detection, it seems that the total variance is a more sensitive measure to decide on two alternative hypotheses.

The necessity for studying the (empirical) sample covariance matrix in statistics arose during 1950s when practitioners were searching for a scalar measure of dispersion for the multivariate data [111, Chapter 2]. This scalar measure of dispersion is relevant under the context of hypothesis testing. We need a scalar measure to set the threshold for testing.

3.3.1.3 Linear Combinations

Linear transformations can simplify the structure of the covariance matrix, making interpretation of the data more straightforward. Consider a linear combination

images

where a1, ···, aN are given. From (3.10), the mean is

images

and the variance is

images

where (3.11) is used.

For a q-dimensional linear transformation, we have

images

which may be written as

images

where Y is a q × q matrix and b is a q-vector. Usually, qN.

The mean vector and covariance matrix of the new objects yl are

images

If A is nonsingular (so, in particular, q = N), then

images

Here we give three most important examples: (1) the scaling transform; (2) Mahalanobis transform; (3) principal component transformation (or analysis).

3.3.1.4 The Scaling Transform

The n vectors of dimension N are objects of interest. Define the scaling transform as

images

This transformation scales each variable to have unit variance and thus eliminates the arbitrariness in the choice of scale. For example, if x(1) measures lengths, then y(1) will be the same. We have

images

3.3.1.5 Mahalanobis Transformation

If S > 0, then S−1 has a unique symmetric positive definite square root S−1/2. See A.6.15 of [110]. We define the Mahalanobis transformation as

images

Then

images

so that this transformation eliminates the correlation between the variables and standardizes the variance of each variable.

3.3.1.6 Principle Component Analysis

In the era of high-dimensionality data processing, PCA is extremely important to reduce the dimension of the data. One is motivated to summarize the total variance using much fewer dimensions. The notion of rank of the data matrix occurs naturally in this context. For zero-mean random vector, it follows, from (3.12), that

images

This mathematical structure plays a critical role in its applications.

By spectrum decomposition theorem, the covariance matrix S may be written as

images

where U is an orthogonal matrix and Λ is a diagonal matrix of the eigenvalues of S,

images

The principal component transformation is defined by the unitary rotation

images

Since

images

the columns of W, called principal components, represent uncorrelated linear combinations of the variables. In practice, one hopes to summarize most of the variability in the data using only the principal components with the highest variances, thus reducing the dimensions. This approach is the standard benchmark for dimensionality reduction.

The principal components are uncorrelated with variances

images

It seems natural to define the “overall” spread of the data by some symmetric monotonically increasing function of λ1, λ2, ···, λn, such as the geometric mean and the arithmetic mean

images

Using the properties of linear algebra, we have

images

We have used the facts for a N × N matrix

images

where λi is the eigenvalues of the matrix A. See [110, A.6] or [112], since the geometric mean of the nonnegative sequence is always smaller than the arithmetic mean of the nonnegative sequence, or [113]

images

where ai are nonnegative real numbers. Besides, the special structure of S ≥ 0, implies that all eigenvalues are nonnegative [114, p. 160]

images

The arithmetic mean-geometric mean inequality is thus valid for our case. Finally we obtain

3.13 3.12

The rotation to principal components provides a motivation for the measures of multivariate scatter. Let us consider one application in spectrum sensing. The key idea behind covariance-based primary user's signal detection that the primary user signal received at the CR user is usually correlated because of the dispersive channels, the utility of multiple receiver antennas, or even oversampling. Such correlation can be used by the CR user to differentiate the primary signal from white noise.

Since Sx is a random matrix, det Sx and images are scalar random variables. Girko studied random determinants [111]. 5.14 relates the determinant to the trace of the random matrix Sx. In Chapter 4, the tracial functions of Sx are commonly encountered.


Example 3.5 (Covariance-based detection)
The received signal is

images

where θ = 1 and θ = 0 denote the presence and absence of the primary signal, respectively. The sample covariance matrix of the received signal is estimated as

3.14 3.82

When the number of samples N approaches infinity, images converges in probability at

images

where Rs and Rw are, respectively, the L × L covariance matrices of the primary signal vector and the noise vector

images

Our standard problem is

3.15 3.83

where Rs and Rw are, respectively, covariance matrices of signal and noise.
Based on the sample covariance matrix images, various test statistics can be used. Let μmin and μmax denote the minimum and maximum eigenvalues of images. Then

images

where αmax and αmin are the maximum and minimum eigenvalues of Rs and images is assumed, where images is the noise power and IL is the L × L identity matrix. Because of the correlation among the sampled signal, αmax > αmin. Thus, if there is no primary signal

images

otherwise

images

Based on the above heuristic, the max-min eigenvalue algorithm is formulated as follows:
1. Estimate the covariance matrix of the received signal according to (3.14).
2. Calculate the ratio of the the maximum and minimum eigenvalues.
3. If the ratio images, claim images otherwise claim images

The max-min eigenvalue algorithm is simple and has a fairly good performance under the context of low SNR. At extremely low SNR, say − 25 dB, the calculated eigenvalues of the sample covariances matrix are random and look identical. This algorithm breaks down as a result of this phenomenon. Note that the eigenvalues are the variances of the principal components. The problem is that this algorithm depends on the variance of one dimension (associated with the minimum or the maximum eigenvalues).

The variances of different components are uncorrelated random variables. It is thus more natural to use the total variance or total variation.

3.4 Random Matrices with Independent Rows

We focus on a general model of random matrices, where we only assume independence of the rows rather than all entries. Such matrices are naturally generated by high-dimensional distributions. Indeed, given an arbitrary probability distribution in images, one takes a sample of N independent points and arranges them as the rows of an N × n matrix A. Recall that n is the dimension of the probability space.

Let X be a random vector in images. For simplicity we assume that X is centered, or images. Here images denotes the expectation of X. The covariance matrix of X is the n × n matrix images. The simplest way to estimate images is to take some N independent samples Xi from the distribution and form the sample covariance matrix images. By the law of large numbers, images almost surely as N → ∞. So, taking sufficiently many samples we are guaranteed to estimate the covariance matrix as well as we want. This, however, does not address the quantitative aspect: what is the minimal sample size N that guarantees approximation with a given accuracy?

The relation of this question to random matrix theory becomes clear when we arrange the samples Xi = :Ai as rows of the N × n random matrix A. Then, the sample covariance matrix is expressed as images. Note that A is a matrix with independent rows but usually not independent entries. The reference of [107] has worked out the analysis of such matrices, separately for sub-Gaussian and general distributions.

We often encounter covariance estimation for sub-Gaussian distribution due to the presence of Gaussian noise. Consider a sub-Gaussian distribution in images with covariance matrix images, and let images ∈ (0, 1), t ≥ 1. Then with probability at least 1 − 2exp(− t2n) one has

3.16 3.13

Here C depends only on the sub-Gaussian norm of a random vector taken from this distribution; the spectral norm of A is denoted as ||A||, which is equal to the maximum singular value of A, that is, smax = ||A||.

Covariance estimation for arbitrary distribution is also encountered when a general noise interference model is used. Consider a sub-Gaussian distribution in images with covariance matrix images and supported in some centered Euclidean ball whose radius we denote images. Let images ∈ (0, 1) and t ≥ 1. Then, with probability at least images, one has

3.17 3.14

Here C is an absolute constant and log denotes the natural logarithm. In (3.17), typically images. The required sample size is NC(t/images)2nlogn.

Low rank estimation is used, since the distribution of a signal in images lies close to a low-dimensional subspace. In this case, a much smaller sample size suffices for covariance estimation. The intrinsic dimension of the distribution can be measured with the effective rank of the matrix images, defined as

3.18 3.15

where images denotes the trace of images. One always has images, and this bound is sharp. The effective rank images always controls the typical norm of X, as images. Most of the distribution is supported in a ball of radius images where images. The conclusion of (3.17) holds with sample size NC(t/images)2rlogn.

Summarizing the above discussion, (3.16) shows that the sample size N = O(n) suffices to approximate the covariance matrix of a sub-Gaussian distribution in images by the sample covariance matrix. While for arbitrary distribution, N = O(nlogn) is sufficient. For distributions that are approximately low-dimensional, such as that of a signal, a much smaller sample size is sufficient. Namely, if the effective rank of images equals r, then a sufficient size is N = O(rlogn).

Each observation of the sample covariance matrix is a random matrix. We can study the expectation of the observed random matrix. Since the expectation of a random matrix can be viewed as a convex combination, and also the positive semidefinite cone is convex [115, p. 459], expectation preserves the semidefinite order [116]:

3.19 3.16

Noncommunicativity of two sample covariance matrices: If positive matrices X and Y commute, then the symmetrized product is: images which is not true,1 if we deal with two sample covariances. A simple MATLAB simulation using two random matrices can verify this observation. It turns out that this observation is of an elementary nature: Quantum information is built upon this noncommunicativity of operators (matrices). If the matrices A and B commute, the problem of (3.1) is equivalent to the classical likelihood ratio test [117]. A unifying framework including classical and quantum hypothesis testing (first suggested in [117]) is developed here.

When only N samples are available, the sample covariance matrices can be used to approximate the actual ones. A random vector images is used to model the noise or interference. Similarly, a random vector images models the signal. In other words, (3.1) becomes

3.20 3.17

where images

For any A ≥ 0, all the eigenvalues of A are nonnegative. Since some eigenvalues of images and images are negative, they are indefinite matrices of small tracial values. Under extremely low signal-to-noise ratios, the positive term (signal) images in (3.20) is extremely small, compared with the other three terms. All these matrices are random matrices with dimension n.

Our motivation is to use the fundamental relation of (3.19). Consider (sufficiently large) K i.i.d. observations of random matrices A and B:

3.21 3.18

The problem at hand is how the fusion center combines the information from these K observations. The justification for using the expectation is based on the basic observation of (3.20): expectation increases the effective signal-to-noise ratio. For the K observations, the signal term experiences coherent summation, while these other three random matrices go through incoherent summation.

Simulations: In (3.20), images is no bigger than 0.5, so they do not have significant influence on the gap between images and images. Figure 3.1 shows that this gap is very stable. A narrowband signal is used. The covariance matrix images is 4 × 4. About 25 observations are sufficient to recover this matrix with acceptable accuracy. Two independent experiments are performed to obtain images and images. In our algorithms, we need to set the threshold first for the hypothesis test of images; we rely on images for images. To obtain every point in the plot, N = 600 is used in (3.20).

Figure 3.1 The traces of covariances at extremely low SNR as a function of K observations. (a) SNR = −30 dB; (b) SNR = −34 dB.

3.1

Denote the set of positive-definite matrices by images. The following theorem [115, p. 529] provides a framework: Let images, assume that A and B are positive semidefinite, assume that AB, assume that f(0) = 0, f is continuous, and f is increasing. Then,

3.22 3.19

A trivial case is: f(x) = x. If A and B are random matrices, combining (3.22) with (3.19) leads to the final equation

3.23 3.20


Algorithm 3.1
(1) Claim hypothesis images if matrix inequality (3.22) is satisfied; (2) otherwise, images is claimed.

Consider a general Gaussian detection problem: images, images where images, images, and s and w are independent. The Neyman-Pearson detector decides images if images. This LRT leads to a structure of a prewhitener followed by an EC [118, p. 167]. In our simulation, we assume that we know perfectly the signal and noise covariance matrices. This serves as the upper limit for the LRT detector. It is amazing to discover that Algorithm 3.1 outperforms the LRT by several dBs!

Related Work: Several sample covariance matrix based algorithms have been proposed in spectrum sensing. Maximum-minimum eigenvalue (MME)[119] and arithmetic-to-geometric mean (AGM) [120] uses the eigenvalues information, while feature template matching (FTM) [121] uses eigenvectors as prior knowledge. All these algorithms are based on covariance matrices. All the thresholds are determined by probability of false alarm.

Preliminary Results Using Algorithm 3.1: Sinusoidal signals and DTV signals captured in Washington D.C are used. For each simulation, zero-mean i.i.d. Gaussian noise is added according to different SNR. 2,000 simulations are performed on each SNR level. The threshold obtained by Monte Carlo simulations is in perfect agreement with that of the derived expression. The number of total samples contained in the segment is Ns = 100, 000 (corresponding to about 5 ms sampling time). The smoothing factor L is chosen to be 32. Probability of false alarm is fixed with Pfa = 10%. For simulated sinusoidal signal, the parameters are set the same.

Hypothesis detection using a function of matrix detection (FMD) is based on (3.23). For more details, we see [122]. It is compared with the benchmark EC, together with AGM, FTM, MME, as shown in Figure 3.2. FMD is 3 dB better than EC. While using simulated sinusoidal signal, the gain between FMD and EC is 5 dB. The longer the data, the bigger this gain.

Figure 3.2 Probability of detection. (a) Simulated Narrowband Signal; (b) Measured DTV Data.

3.2

3.5 The Multivariate Normal Distribution

The multivariate normal (MVN) distribution is the most important distribution in science and engineering [123, p. 55]. The reasons are manifold: Central limit theorems make it the limiting distributions for some sums of random variables, its marginal distributions are normal, linear transformations of multivariate normals are also multivariate normal, and so on. Let

images

denote an N × 1 random vector. The mean of X is

images

The covariance matrix of X is

images

The random vector X is called to be multivariate normal if its density function is

3.24 3.21

We assume that the nonnegetive definite matrix R is nonsingular. Since the integral of the density function is unit, this leads to

images

The quadratic form

images

is a weighted norm called Mahalanobis distance from x to m.

Characteristic Function

The characteristic function of X is the multidimensional Fourier transform of the density

images

By some manipulation [123], we have

3.25 3.22

The characteristic function itself is a multivariate normal function of the frequency variable images.

Linear Transforms

Let Y be a linear transformation of a multivariate normal random variable:

images

The characteristic function of Y is

images

Thus, Y is also a multivariate normal random variable with a new mean vector and new variance matrix

images

if the matrix ATRA is nonsingular.

Diagonalizing Transforms

The correlation matrix is symmetric and nonnegative definite. In other words, R ≥ 0. Therefore, there exists an orthogonal matrix U such that

images

The vector Y = UTX is distributed as

images

The random variables Y1, Y2, …, YN are uncorrelated since

images

In fact, Yn are independent normal random variables with mean UTm and variances images:

images

This transformation Y = UTX is called a Karhunen-Loeve or Hotelling transform. It simply diagonalizes the covariance matrix

images

Such a transform can be implemented in MATLAB, using a function called eig or svd.

Quadratic Forms in MVN Random Variables

Linear functions of MVN random vectors remain MVN. In GLRT, quadratic forms of MVNs are involved. The natural question is what about quadratic forms of MVNs? In some important cases, the quadratic forms have a χ2 distributions. Let X denote an images random variable. The distribution

images

is images distributed. The characteristic function of Q is

images

which is the characteristic function of a chi-squared distribution with N degrees of freedom, denoted images. The density function for Q is the inverse Fourier transform

images

The mean and variance of Q are obtained from the characteristic function

images

Sometimes we encounter more general quadratic forms in the symmetric matrix P:

images

The mean and variance of Q are

images

The characteristic function of Q is

images

If PR is symmetric: PR = RP, the characteristic function is

images

where λn are eigenvalues of PR. This is the characteristic function of a images random variable iff

images

If R = I, meaning X consists of indepedent components, then the quadratic form Q is images iff P is idempotent:

images

Such a matrix is called a projection matrix. We have the following result: if X is images and P is a rank r projection, the linear transformation Y = PX is images, and the quadratic form YTY = XTPX is images. More generally, if X is images, R = UΛ2UT, images and P is chosen to be images with images, then images and the quadratic form YTY is images.

Let Q = (Xm)TR−1(Xm) where X is images. Equivalently,

images

is a quadratic form in the i.i.d. images random variables X1, X2, ···, XN. We have shown that Q is images. Form the random variable

images

The new random variable V asymptotically has mean 0 and variance 1, that is, asymptotically images.

The Matrix Normal Distribution

Let X(n × N) be a matrix whose rows images, are independently distributed as images. Then X has the matrix normal distribution and represents a random matrix observation from images. Using (3.24), we find that the density function of X is [110]

images

where 1 is a column vector having the number 1 as its elements.

Transformation of Normal Data Matrices

We often encounter random vectors. Let

images

be a random sample from images [110]. We call

images

a data matrix from images or simply a “normal data matrix.” This matrix is a basic building block in spectrum sensing. We must understand it thoroughly. In practice, we deal with data with high dimensionality. We use this notion as our basic information elements in data processing.

Consider linear functions

images

where A(m × n) and B(p × q) are fixed matrices of real numbers. The most important linear function is the sample mean

images

where A = n−11T and B = Ip.


Theorem 3.1 (Sample mean is normal)
If X(n × p) is a data matrix from images, and if images, then images is images distribution.

 


Theorem 3.2 (Y = AXB is a normal data matrix)
If X(n × p) is a normal data matrix from images, and if Y = AXB, then Y is a normal data matrix if and only if
1. A1 = α1 for some scalar α, or images, and
2. AAT = β1 for some scalar β or images
When both conditions are satisfied, then Y is a normal data matrix from images

 


Theorem 3.3
The elements of Y = AXB are independent of those of Z = CXD.
If X(n × p) is a normal data matrix from images, and if Y = AXB and Z = CXD, then the elements of Y are independent of Z if and only if
1. images or
2. ACT = 0.

Under the conditions of Theorem 3.3, images is independent of HX, and thus is independent of S = n−1XTHX.

The Wishart Distribution

We often encounter the form XTCX where C is a symmetric matrix. This is a matrix-valued quadratic function. The most important special case is the sample covariance matrix obtained by putting C = n−1H, where H is the centering matrix. These quadratic forms often lead to the Wishart distribution, which is a matrix generalization of the univariate chi-squared distribution, and has many similar properties.

If M(p × p) is

images

where X(m × p) is a data matrix from images, then M is said to have a Wishart distribution with scale matrix images and degrees of freedom parameter m. We write

images

When images the distribution is said to be in standard form.

When p = 1, the W12, m) distribution is given by xTx, where the elements of x(m × 1) are i.i.d. images variables; that is the W12, m) distribution is the same as the images distribution.

The scale matrix images plays the same role in the Wishart distribution as σ2 does in the images distribution. We shall usually assume

images


Theorem 3.4 (The class of Wishart matrix is closed under linear transformation)
If images and B is a (p × q) matrix, then

images


The diagonal submatrices of M themselves have a Wishart distribution. Also,

images

If M ~ Wp(I, m) and B(p × q) satisfies BTB = Iq, then

images


Theorem 3.5 (Ratio transform)
If images, and a is any fixed p-vector such that images then

images


Also, we have images


Theorem 3.6 (The class of Wishart matrix is closed under addition)
If images and images, then

images


 


Theorem 3.7 (Cochran, 1934)
If X(n × p) is a data matrix from images, and if C(n × n) is a symmetric matrix, then
1. XTCX has the same distribution as a weighted sum of independent images matrices, where the weights are eigenvalues of C.
2. XTCX has a Wishart distribution if and only if C is idempotent, in which case

images

where r is the rank r = TrC = rankC;
3. S = n−1XTHX is the sample covariance matrix, then

images


 


Theorem 3.8 (Craig, 1943; Lancaster, 1969, p. 23)
If the rows of X(n × p) are i.i.d. images, and if C1, …, Ck are symmetric matrices, then

images

are jointly independent if CrCs = 0 for all rs.

The Hotelling T2 Distribution

[110, p. 73] Let us study the functions such as dTM−1d, where d is normal, M is Wishart, and d and M are independent. For example, d may be the sample mean, and d proportional to the sample covariance matrix. Hotelling (1931) initiated the work to derive the general distribution of quadratic forms.

If α is used to represent mdTM−1d where d and M are independently distributed as images and Wp(I, m), respectively, then we say that α has the Hotelling T2 distribution with parameters p and m. We write α ~ T2(p, m).


Theorem 3.9 (T2 distribution)
If x and M are independently distributed as images and images, respectively, then

images


If images and S are the mean vector and covariance matrix of a sample of size n from images and Su = (n/(n − 1))S, then

images

The T2 statistic is invariant under any nonsingular linear transformation xAx + b. images where B(·, ·) is a beta variable.


Theorem 3.10 (dTMd is independent of M + dTd)
If d and M are independently distributed as images and W(I, m), respectively. Then, dTMd is independent of M + dTd.

Wilks' Lambda Distribution


Theorem 3.11 (Wilks' lambda distribution)
If images and images are independent and if mp and np. Then,

images

is proporational to the product of p independent F variables, of which the i-th has degrees of freedom ni + 1 and mi + 1.

If images and images are independent and if mp, we say that

images

has a Wilks' lambda distribution with parameters p, m, n.

The Λ family of distributions occurs frequently in the context of likelihood ratio test. The parameter p is dimension. The parameter m represents the “error” degrees of freedom and n the “hypothesis” degrees of freedom. Thus m + n represents the “total” degrees of freedom. Like the T2 statistic, Wilks' lambda distribution is invariant under changes of the scale parameters of A and B.


Theorem 3.12 (Independent variables Wilks' lambda distribution)

images

where ui, …, un are independent variables and

images


 


Theorem 3.13 (Total degrees of freedom)
The images and images distributions are the same.

If images is independent of images where mp. Then, the largest eigenvalue θ of (A + B)−1B is called the greatest root statistic and its distribution is denoted θ(p, m, n).

If λ is an eigenvalue of A−1B, then images is an eigenvalue of (A + B)−1B. Since this is a monotone function of λ, θ is given by

images

where λ1 is the largest eigenvalue of A−1B. Since λ1 > 0, we see that 0 ≤ θ ≤ 1.

For multisample hypotheses, we see [110, p. 138].

Geometric Ideas

The multivariate normal distribution in N dimensions has constant density on ellipses or ellipsoids of the form

3.26 3.23

where c is a constant. These ellipsoids are called the contour of the distribution or the ellipsoids of equal concentration. For images, these contours are centered at x = 0, and when images the contours are circles or in higher dimensions spheres or hyperspheres.

The principal component transformation facilitates interpretation of the ellipsoids of equal concentration. Using the spectral decomposition

images

where images is the matrix of eigenvalues of images, and images is an orthogonal matrix whose columns are the corresponding eigenvectors. As in Section 3.3.1, define the principal component transform by

images

In terms of y, (3.26) becomes

images

so that the components of y represents axes of the ellipsoid.

In the GLRT, the following difference between two ellipsoids is encountered

3.27 3.24

where d is a constant. When the actual covariance matrices images and images are perfectly known, the problem is fine. Technical difficulty arises from the fact that sample covariance matrices images and images are used in replacement of images and images. The fundamental problem is to guarantee that (3.27) has a geometric meaning; in other words, this implies

3.28 3.25

As in Section A.3, the trace function, TrA = ∑iaii, satisfies the following properties [110] for matrices A, B, C, D, X and scalar α:

3.29 3.26

Using the fact that for A, B ≥ 0, we have

images

we have the necessary condition (for (3.28) to be valid)

3.30 3.27

since

3.31 3.28

The necessary condition (3.30) is easily satisfied when the actual covariance matrices images and images are known. When, in practice, the sample covariance matrices images and images

are known, instead, the problem arises

3.32 3.29

This leads to a natural problem of sample covariance matrix estimation and the related GLRT. The fundamental problem is that the GLRT requires the exact probability distribution functions for two alternative hypotheses. This condition is rarely met in practice. Another problem is intuition that the random vectors fit the exact probability distribution function. In fact, the empirical probability distribution function does not satisfy this condition. This problem is more obvious when we deal with high data dimensionality such as N = 105, 106 which are commonly feasible in real-world spectrum sensing problems. For example, N = 100, 000 corresponding to 4.65 milliseconds sampling time.

3.6 Sample Covariance Matrix Estimation and Matrix Compressed Sensing

Fundamental Fact: The noise lies in a high-dimensional space; the signal, on the contrast, lies in a much lower-dimensional space.

If a random matrix A has i.i.d column Ai, then

images

where A* is the adjoint matrix of A. We often study A through the n × n symmetric, positive semidefinite matrix, the matrix A*A; in other words

images

The absolute matrix is defined as

images

A matrix C is positive semidefinite,

images

if and only all its eigenvalues λi are nonnegative

images

The eigenvalues of |A| are therefore nonnegative real numbers or

images

An immediate application of random matrices is the fundamental problem of estimating covariance matrices of high-dimensional distributions [107]. The analysis of the row-independent models can be interpreted as a study of sample covariance matrices.

There are deep results in random matrix theory. The main motivation of this subsection is to exploit the existing results in this field to better guide the estimate of covariance matrices, using nonasymptotic results [107].

We focus on a general model of random matrices, where we only assume independence of the rows rather than all entries. Such matrices are naturally generated by high-dimensional distributions. Indeed, given an arbitrary probability distribution in images, one takes a sample of N independent points and arranges them as the rows of an N × n matrix A. Recall that n is the dimension of the probability space.

Let X be a random vector in images. For simplicity we assume that X is centered, or images. Here images denotes the expectation of X. The covariance matrix of X is the n × n matrix

images

The simplest way to estimate images is to take some N independent samples Xi from the distribution and form the sample covariance matrix

images

By the law of large numbers,

images

almost surely as N → ∞. So, taking sufficiently many samples we are guaranteed to estimate the covariance matrix as well as we want. This, however, does not address the quantitative aspect: what is the minimal sample size N that guarantees approximation with a given accuracy?

The relation of this question to random matrix theory becomes clear when we arrange the samples

images

as rows of the N × n random matrix A. Then, the sample covariance matrix is expressed as

images

Note that A is a matrix with independent rows but usually not independent entries. The reference of [107] has worked out the analysis of such matrices, separately for sub-Gaussian and general distributions.

We often encounter covariance matrix estimation for sub-Gaussian distribution due to the presence of Gaussian noise. Consider a sub-Gaussian distribution in images with covariance matrix images, and let images ∈ (0, 1), t ≥ 1. Then with probability of at least

images

one has

3.33 3.30

Here C depends only on the sub-Gaussian norm of a random vector taken from this distribution; the spectral norm of A is denoted as ||A||, which is equal to the maximum singular value of A, that is,

images

Covariance matrix estimation for arbitrary distribution is also encountered when a general noise interference model is used. Consider a sub-Gaussian distribution in images with covariance matrix images and supported in some centered Euclidean ball whose radius we denote images. Let images ∈ (0, 1) and t ≥ 1. Then, with probability at least images, one has

3.34 3.31

Here C is an absolute constant and log denotes the natural logarithm. In (3.34), typically

images

Thus the required sample size is

images

Low rank estimation is often used, since the distribution of a signal in images lies close to a low-dimensional subspace. In this case, a much smaller sample size suffices for covariance estimation. The intrinsic dimension of the distribution can be measured with the effective rank of the matrix images, defined as

3.35 3.32

where images denotes the trace of images. One always has

images

and this bound is sharp. The effective rank

images

always controls the typical norm of X, as

images

Most of the distribution is supported in a ball of radius images, where

images

The conclusion of (3.34) holds with sample size

images

Summarizing the above discussion, (3.34) shows that the sample size

images

suffices to approximate the covariance matrix of a sub-Gaussian distribution in images by the sample covariance matrix. While for arbitrary distribution,

images

is sufficient. For distributions that are approximately low-dimensional, such as that of a signal, a much smaller sample size is sufficient. Namely, if the effective rank of images equals r, then a sufficient size is

images

As in Section 3.3.1, our standard problem (3.15) is

images

where Rs and Rw are, respectively, covariance matrices of signal and noise. When the sample covariance matrices are used in replacement of actual covariance matrices, it follows that

3.36 3.33

We must exploit the fundamental fact that images requires only O(rlogn) samples, while images requires O(n). Here the effective rank r of Rs is small. For one real sinusoid signal, the rank is only two, that is, r = 2. We can sum up the K sample covariance matrices images, for example, K = 200. Let us consider the three-step algorithm:

1. break the long record of data into a total of K segments. Each segment has a length of p. In other words, a total of pK is available for signal processing.

3.37 3.34

2. Choose the p such that p > O(rlogn); so the sample covariance matrix images accurately approximates the actual covariance matrix:

3.38 3.35

3. We can sum up the K estimated sample covariance matrices images:

3.39 3.36

where, without loss of generality, we have assumed Rs, k = Rs, 1.

In Step 3, the signal part coherently adds up and the noise part randomly adds up. This step enhances SNR, which is especially relevant at low SNR signal detection. This basic idea will be behind the chapter on quantum detection.

Another underlying idea is to develop a nonasymptotic theory for signal detection. Given a finite number of (complex) data samples collected into a random vector images, the vector length n is very large, but not infinity, in other words, n < ∞. We cannot simply resort to the central limit theorem to argue that the probability distribution function of this vector x is Gaussian since this theorem requires n → ∞.

The recent work on compressed sensing is highly relevant to this problem. Given n, we can develop a theory that is valid with an overwhelming probability. As a result, one cornerstone of our development here is compressed sensing to recover the “information” from n data points x. Another cornerstone is the concentration of measure to study sums of random matrices. For example what are the statistics of the resultant random matrix images?

This problem is closely related to classical multivariate analysis [110, p. 108]. Section 3.6.1 will show that the sum of the random matrices is the ML estimation of the actual covariance matrix.

3.6.1 The Maximum Likelihood Estimation

The problem of Section 3.6.1 is closely related to classical multivariate analysis [110, p. 108]. Given k independent data matrices

images

where the rows of Xi(n × p) are i.i.d.

images

what is the ML estimation of the sample covariance matrix?

In practice, the most common constraints are

images

or

images

If (b) holds, we can treat all the data matrices as constituting one matrix sample from a single population (distribution).

Suppose that x1, …, xn is a random vector sample from a population (distribution) with the pdf images, where images is a parameter vector. The likelihood function of the whole sample is

images

Given a matrix sample X, both images and images are considered as functions of the vector parameter images.

Suppose x1, …, xn is a random sample from images. We have

images

and

3.40 3.37

Let us simplify these equations. When the identity

images

is summed over the index i = 1, …, n, the final term on the right-hand side vanishes, giving

3.41 3.38

Since each term images is a scalar, it equals the trace of itself. Thus using

images

we have

3.42 3.39

Summing (3.42) over index i and substituting in (3.41) gives

3.43 3.40

Writing

images

and using (3.40) in (3.43), we have

3.44 3.41

For the special case images and images then (3.44) becomes

3.45 3.42

To calculate the ML estimation if (a) holds, from (3.44), we have

3.46 3.43

where Si is the covariance matrix of the i-th matrix sample, i = 1, .., k, and images Since there is no restriction on the population means, the ML estimation of images is images. Setting

images

(3.46) becomes

3.47 3.44

Differentiating (3.47) with respect to images and equating to zero yields

3.48 3.45

which is the ML estimation of images under the conditions stated.

3.6.2 Likelihood Ratio Test (Wilks' Λ Test) for Multisample Hypotheses

Consider k independent normal matrix samples X1, …, Xk, whose likelihood is considered in Section 3.6.1.

images

In Section 3.6.1, the ML estimation under images is images and S, since the observation can be viewed under images as constituting a single random matrix sample. The ML estimation of images under the alternative hypothesis images is images, the i-th sample mean, and the ML estimation of the common sample covariance matrix is n−1W, where, following (3.47), we have

images

is the “within-groups” sum of squares and products (SSP) matrix and images Using (3.46), the LRT is given by

3.49 3.46

Here

images

is the “total” SSP matrix, derived by regarding all the data matrices as if they constituted a single matrix sample. In contrast, the matrix W is the “within-group” SSP and

images

may be regarded as the “between-groups” SSP matrix. Thus, from (3.49), we have

images

The matrix W−1B is an obvious generalization of the univariate variance ratio. It will tend to zero if images is true.

If np + k, under images,

3.50 3.47

where Wilks' Λ statistics is described in Section 3.5.

Let us derive the statistics of (3.50). Write the k matrix samples as a single data matrix

images

where Xi(ni × p) is the i-th matrix sample, i = 1, …, k. Let 1i be the n-vector with 1 in the places corresponding to the i-th sample and 0 elsewhere, and set Ii = diag(1i). Then I = ∑Ii, and 1 = ∑1i. Let

images

be the centering matrix for the i-th matrix sample. The sample covariance matrix can be expressed as

images

Set

images

We can easily verify that

images

Further, C1 and C2 are idempotent matrices of rank nk and k − 1, respectively, and C1C2 = 0.

Under images, H is data matrix from images Thus by Theorem 3.7 and Theorem 3.8, we have

images

and, furthermore, W and B are independent. Therefore, (3.50) is derived.

3.7 Likelihood Ratio Test

3.7.1 General Gaussian Detection and Estimator-Correlator Structure

The most general signal assumption is to allow the signal to be composed of a deterministic component and a random component. The signal then can be modeled as a random process with the deterministic part corresponding to a nonzero mean and the random part corresponding to a zero mean random processes with a given signal covarance matrix. For generality the noise covariance matrix can be assumed to be arbitrary. These assumptions lead to the general Gaussian detection problem [118, 167], which mathematically is written as

images

Define the column vector

3.51 3.48

and similarly for x and w. In vector and matrix form

images

where images, and images, and x and w are independent. The LRT decides images if

images

where

images

Taking the logarithm, retaining only the data-dependent terms, and scaling produces the test statistic

3.52 3.49

From the matrix inverse lemma [114, p. 43] [118, 167]

3.53 3.50

for images, (3.52) is rewritten as

images

where images

is the MMSE estimator of x. This is a prewhitener followed by an estimator-correlator.

Using the properties of the trace operation (see Section A.3)

3.54 3.51

where α is a constant, we have

3.55 3.52

where the matrix yy* is of size N × N and of rank one. We can easily choose T0 > 0, since T(y) > 0 that will be obvious later. Therefore, (3.55) can be rewritten as

3.56 3.53

Let A ≥ 0 and B ≥ 0 be of the same size. Then [112, 329] (see Section A.6.2)

3.57 3.54

With the help of (3.57), on one hand, (3.56) becomes

images

Or,

3.58 3.55

if

3.59 3.56

Using (3.57), on the other hand, (3.55) becomes

3.60 3.57

3.61 3.58

if

3.62 3.59

(which is identical to (3.59)) whose necessary condition to be valid is:

3.63 3.60

due to the following fact: If AB for A ≥ 0 and B ≥ 0, then (see Section A.6.2)

3.64 3.61

Note that A−1 + B−1 ≠ (B + A)−1.

Starting from (3.53), we have

3.65 3.62

with

3.66 3.63

where (3.54) is used in the second step and, in the third step, we use the following [114, p. 43]

images

Let A > 0 and B > 0 be of the same size. Then [114, p. 181]

3.67 3.64

where λmax(A) is the largest eigenvalue of A. Due to the facts A > 0 and B > 0 given in (3.66), with the help of (3.65) and (3.67), (3.61) becomes

3.68 3.65

Combining (3.65) and (3.67) yields

3.69 3.66

Using (3.69), (3.58) turns out to be

images

since

images

Here |a| is the absolutue value of a. The necessary condition of (3.63) is satisfied even if the covariance matrix of the signal Cx is extremely small, compared with that of the noise Cw. This is critical to the practical problem at hand: sensing of an extremely weak signal in the spectrum of interest. At first sight, the necessary condition of (3.63) seems be trivial to achieve; this intuition, however, is false. At extremely low SNR, this condition is too strong to be satisfied. The lack of sufficient samples to estimate the covariance matrices Cw and Cx is the root of the technical difficulty. Fortunately, the signal covariance matrix Cx lies in the space of lower rank than that of the noise covariance matrix Cw. This fundamental difference in their ranks is the departure point for most of developments in Chapter 4.

(3.68) is expressed in a form that is convenient in practice. Only the traces of these covariance matrices are needed! The traces are, of course, positive scalar real values, which, in general, are random variables. The necessary condition for (3.68) to be valid is (3.63) Cw < Cx + Cw.

3.7.1.1 Divergence

Let y:images[0, Cy] denote an N × 1 normal random vector with mean zero and covariance matric Cy. The problem considered here is the test:

images

In particular, we have C0 = Cw and C1 = Cw + Cx. Divergence [123] is a coarse measure of how the log likelihood distinguishes between images and images:

images

where

images

The matrix Q can be rewritten as

images

The log likelihood ratio can be rewritten as

images

The transformed vector z is distributed as images[0, Cz], with Cz = I for images and Cz = S for images. S is called the signal-to-noise-ratio matrix.

3.7.1.2 Orthogonal Decomposition

The matrix S has an orthogonal decomposition

images

where Λ is a diagonal matrix with diagonal elements λi and U satisfies UTU = I. This implies that images solves the generalized eigenvalue problem

images

With this representation for S, the log likelihood ratio may be expressed as

images

where the random vector y has covariance matrix images under images and I under images:

images

3.7.1.3 Rank Reduction

A reduced-rank version of the log likelihood ratio is

images

where Ir and images are reduced-rank versions of I and images that retain r nonzero terms and Nr zero terms.

This rank reduction is fine, whenever the discarded eigenvalues λi are unity (noise components). The problem is that nonunity eigenvalues are sometimes discarded with much penalty. We introduce a new criterion, divergence, a coarse measure of how the log likelihood distinguishes between images and images

images

We emphasize that it is the sum of images, not images, that determines the contribution of an eigenvalue. It will lead to penalty when we discard the small eigenvalues. At extremely low SNR, the eigenvalues are almost uniformly distributed, it is difficult to do reduced-rank processing. The trace sum of S and S−1 must be considered as a whole. Obviously, we require that

images

since

images

where A, B are Hermitian matrices. The matrix inequality condition images implies

images

or,

images

The rank r divergence is identical to the full-rank divergence when Nr of the eigenvalues in the original diagonal matrix images are unity. To illustrate, consider the case

images

The difference between C1 and C0 is

images

Assuming images, then rank(R1R0) = rank(L) and a rank(L) detector has the same divergence as a full-rank detector.


Example 3.6 (An optimal rank-one detector)
Consider building a low-rank detector for images versus images when the observed data is distributed as images(0, R0) under images and images(0, R1) under images:

images

After some manipulation, we get the following signal-to-noise-ratio matrix:

images

The eigenvalues of S are

images

A rank-one detector is optimum for this problem. It is constructed using the eigenvector corresponding to λ1. At extremely low SNR, the eigenvector is a more reliable feature to detect than the eigenvalue. In N-dimension geometric terms, the eigenvector is the direction of the data point, while the eigenvalue the length of this data point.

 


Example 3.7 (Gaussian signal plus Gaussian noise)
We can extend the previous example to a general Gaussian signal problem

images

yielding

images

Clearly, all eigenvalues are greater than one. This does not mean, however, that eigenvalues close to one cannot be discarded in order to approximate log likelihood with a low-rank detector. At extremely low SNR, this approximation will most often not work since maybe all eigenvalues are close to one.

3.7.2 Tests with Repeated Observations

Consider a binary hypothesis problem [124] with a sequence of independent distributed random vectors images If

images

denotes the likelihood radio function of a single observation, an LRT for this problem takes the form

3.70 3.67

where τ(K) denotes a threshold that may depend on the number of observations, K. Taking the logarithms on both sides of (3.70), and denoting Zk = ln(Λ(yk)), we find that

images

When τ(K) tends to a constant as K → ∞, images

Taking the logarithm, retaining only the data-dependent terms, and scaling produces the test statistic (setting μx for brevity)

3.71 3.68

Using the following fact

3.72 3.69

(3.71) becomes

3.73 3.70

If the following sufficient condition for the LRT for repeated observations is satisfied

3.74 3.71

implying (see Section A.6.2)

images

using (3.57), we have

3.75 3.72

Or,

3.76 3.73

The covariance matrix of y is defined as

images

The asymptotic case of K → ∞ is

3.77 3.74

since the sample covariance matrix converges to the true covariance matrix, that is

images

To guarantee (3.74), the following stronger condition is enough

3.78 3.75

due to (3.64).

3.7.2.1 Case 1. Diagonal Covariance Matrix on images: Equal Variances

When x[n] is a complex Gaussian random process with zero mean and covariance matrix Cx and w[n] is complex white Gaussian noise (CWGN) with variance matrix images. The probability distribution functions (PDFs) are given by

images

where I is the identity matrix of N × N, and TrI = N.

The log-likelihood ratio is

images

Consider the following special case:

images

and

images

The LRT of (3.79) becomes

3.79 3.76

Using the matrix inverse lemma

images

we have

3.80 3.77

when the signal is very weak, or images. TrCy is the total power (variation) of the received signal plus noise. Note yk are vectors of length N.

Consider the single observation case, K = 1, we have

images

which is the energy detector. Intuitively, if the signal is present, the energy of the received data increases. In fact, the equivalent test statistic images can be regarded as an estimator of the variance. Comparing this to a threshold recognizes that the variance under images is images but under images is images

3.7.2.2 Case 2. Correlated Signal

Now assume N = 2 and

images

where ρ is the correlation coefficient between x[0] and x[1].

images

where

images

where

images

If A > 0, then

images

Obviously, Cw + Cx > 0. From (3.77), we have

3.81 3.78

3.7.3 Detection Using Sample Covariance Matrices

If we know the covariance matrices perfectly, we have

images

In practice, estimated covariance matrices such as sample covariance matrices must be used:

images

where A0 and A1 are sample covariance matrices for the noise while B is the sample covariance matrix for the signal. The eigenvalues λi of

images

where A is positive semidefinite and B positive definite, satisfy

images

Let A be positive definite and B symmetric such that det (A + B) ≠ 0, then

images

The inequality, known as Olkin's inequality, is strict if and only if B is nonsingular.

Wilks' lambda test [110, p. 335]

images

where λj are the eigenvalues of A−1B.

The equicorrelation matrix is [112, p. 241]

images

or,

images

where ρ is any real number and J is a unit matrix Jp = 11T, 1 = (1, …, 1)T. Then eii = 1, eij = ρ, for ij. For statistical purposes this matrix is most useful for − (p − 1)−1 < ρ < 1. Direct verification shows that, if ρ ≠ 1, − (p − 1)−1, then E−1 exists and is given by

images

Its determinant is given by

images

Since J has rank 1 with eigenvalues p and corresponding eigenvector 1, we see that the equicorrelation matrix E = (1 − ρ)I + ρJ has eigenvalues:

images

and the same eigenvectors as J. logdet A is bounded:

images

where A and A + B are positive definite, with equality if and only if B = 0. If A ≥ 0 and B ≥ 0, then [112, 329]

images

3.7.4 GLRT for Multiple Random Vectors

The data is modeled as a complex Gaussian random vector images with probability density function

images

mean images, and covariance matrix Rxx. Consider M independent, identically distributed (i.i.d.) random vectors

images

drawn from this distribution. The joint probability density function of these vectors is written as

images

where Sxx is the sample covariance matrix

images

and mx is the sample mean vector

images

Our task is to test whether Rxx has structure 0 or the alternative structure 1:

images

The GLRT statistic is

images

The actual covariance matrices are not known, they are replaced with their ML estimates computed using the M random vectors. If we denote by images the ML estimate of Rxx under images and by images the ML estimate of Rxx under images, we have

images

If we assume further that images is the set of positive definite matrices (no special restrictions are imposed), then images, and

images

The generalized likelihood ratio for testing whether Rxx has structure images is

images

where a and g are the arithmetic and geometric means of the eigenvalues of images:

images

Based on the desirable probability of false alarm or probability of detection, we can choose a threshold l0. So if l > l0, we accept hypothesis images, and if l < l0, we reject it. Consider a special case

images

where images is the variance of each component of x. The ML estimate of Rxx under images is images, where the variance is estimated as images. Therefore, the GLRT is

images

This test is invariant under scale and unitary transformation.

3.7.5 Linear Discrimination Functions

Two random vectors can be stochastically ordered by using their likelihood ratio. Here, we take our liberty in freely borrowing materials from [123]. Linear discrimination may be used to approximate a quadratic likelihood ratio when testing y:images(0, Ci) under hypothesis imagesi. The Neyman-Pearson test of images versus images will have us compare the log likelihood ratio to a threshold:

images

We hope, on the average, L(y) will be larger than η under images and smaller than η under images. An incomplete measure of how the test of images versus images will perform is the difference in terms of L(y) under two hypotheses:

images

This function is the J-divergence between images versus images introduced previously in Section 3.7.1. It is related to information that a random sample can bring about the hypothesis imagesi. The J-divergence for the multivariate normal problem imagesi:y:N(0, Ci) is computed by carrying out the expectations:

images

This expression does not completely characterize the performance of a likelihood ratio statistic, but it does bring useful information about the “distance” between images and images.

3.7.5.1 Linear Discrimination

Assume that the data y is used to form the linear discriminant function (or statistic)

images

This statistic is distributed as images under imagesi. If a log likelihood ratio is formed using the new variable z, then the divergence between images versus images is

images

Let us define the following ratio of quadratic forms

images

Then we may rewrite the divergence as

images

It is remarkable to note that the choice of the discriminant w that maximizes divergence is also the choice of w that maximizes a function of a quadratic form. The maximization of quadratic forms is formulated as a generalized eigenvalue problem [123, p. 163]. Let us rewrite divergence as

images

The function of J is convex in λ. It achieves its maximum either at λmax or at λmin, where λmax and λmin, respectively, are maximum and minimum values of

images

The divergence is maximized as follows:

images

The linear discriminant function is either the maximum eigenvector of images or the minimum eigenvector of images, depending on the nature of the maximum and minimum values. It is not always the maximum eigenvector. The choice of w = wmax or w = wmin is also called a principal component analysis (PCA). Without loss of generality, w may be normalized as w*w = 1. Then, if R0 = I, the linear discriminant function is distributed as follows

images

3.7.6 Detection of Correlated Structure for Complex Random Vectors

For the assessment of multivariate association between two complex random vectors x and y, our treatment here draws materials from [125, 126]. Consider two real zero-mean vectors images and images with two correlation matrices

images

We assume both correlation matrices are invertible. The cross-correlation properties between x and y are described by the cross-correlation matrix

images

but this matrix is generally difficult to explain. In order to illuminate the underlying structure, many correlation analysis techniques transform x and y into p-dimensional internal representation

images

with p = min{m, n}. The full rank matrices A, B are chosen such that all partial sums over the absolute values of the correlations

images

are maximized

3.82 3.79

The solution to the maximization problem (3.82) leads to a diagonal cross-correlation matrix between images and images

3.83 3.80

with

images

In order to summarize the correlation between x and y, an overall correlation coefficient ρ is defined as a function of the diagonal correlations {ki}. This correlation coefficient shares the invariance of the {ki}. Because of the maximization (3.82), the assessment of correlation is allowed in a lower-dimensional subspace of rank

images

There is a variety of possible correlation coefficients that can be defined based on the first r canonical correlations images for a given rank r.

images

For r = p, these coefficients can be expressed in terms of the original correlation matrices

images

where

images

These coefficients share the invariance of the canonical correlations, that is, they are invariant under a nonsingular linear transformation of x and y. For jointly Gaussian x and y, images determines the mutual information between x and y

images

The complex version of correlation analysis is discussed in [125, 126].

 

 

1 This is true, if we know the true covariance matrices, rather than the sample covariance matrices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset