Chapter 5

Large Random Matrices

5.1 Large Dimensional Random Matrices: Moment Approach, Stieltjes Transform and Free Probability

The necessity of studying the spectra of large dimensional random matrices, in particular, the Wigner matrices, arose in nuclear physics in the 1950s. In quantum mechanics, the energy levels of quantum are not directly observable (very similar to many problems in today's wireless communications and the Smart Grid), but can be characterized by the eigenvalues of a matrix of observations [10].

Let Xij be i.i.d. standard normal variables of n × p matrix X

images

The sample covariance matrix is defined as

images

where n vector samples of a p-dimensional zero-mean random vector with population matrix I.

The classical limit theorem are no longer suitable for dealing with large dimensional data analysis. In the early 1980s, major contributions on the existence of the limiting spectral distribution (LSD) were made. In recent years, research on random matrix theory has turned toward second-order limiting theorems, such as the central limit theorem for linear spectral statistics, the limiting distributions of spectral spacings, and extreme eigenvalues.

Many applied problems require an estimate of a covariance matrix and/or of its inverse, where the matrix dimension is large compared to the sample size [20]. In such situations, the usual estimator, the sample covariance matrix, is known to perform poorly. When the matrix dimension p is larger than the number n of observations available, the sample covariance matrix is not even invertible. When the ratio p/n is less than one but not negligible, the sample covariance matrix is invertible but numerically ill-conditioned, which means that inverting it amplifies estimation error dramatically. For a large value of p, it is difficult to find enough observations to make p/n negligible, and therefore it is important to develop a well-conditioned estimator for large-dimensional covariance matrices such as in [20].

Suppose AN is an N × N matrix with eigenvalues λ1(AN), …, λN(AN). If all these eigenvalues are real (e.g., if AN is Hermitian), we can define a one-dimensional distribution function. The empirical cumulative distribution of the eigenvalues, also called the empirical spectrum distribution (ESD), of an N × N Hermitian matrix A is denoted by images

5.1 5.1

where 1{} is the indicator function.

Following [10], we divide available techniques into three categories: (1) Moment approach; (2) Stieltjes transform; (3) Free probability. Applications for these basic techniques will be covered.

The significance of ESD is due to the fact that many important statistics in multivariate analysis can be expressed as functionals of the ESD of some random matrices. For example, the determinant and the rank functions are the most common examples. The most significant theorem relevant to our applications is the convergence of the sample covariance matrix: the Marchenko-Pastur law.


Theorem 5.1 (Marchenko-Pastur law [251])
Consider a p × N matrix X, whose entries are independent, zero-mean complex (or real) random variables, with variance images and fourth moments of order images. As

5.2 5.97

the empirical distribution of XXH converges almost surely to a nonrandom limiting distribution with density

5.3 5.98


 


Example 5.1 (Determinant of a positive definite matrix)
Let AN be a positive definite matrix of N × N. Then

images

When N → ∞, the determinant of AN, det (AN), is approaching a nonrandom limit value.

 


Example 5.2 (Hypothesis testing)
Let the covariance matrix of the received signal have the form [14, p. 5]

images

where the dimension of images is p and the rank of images is q(< p). Note that N and p are different. Suppose SN is the sample covariance matrix based on N i.i.d. vector samples drawn from the signal. The eigenvalues of SN are

images

The test statistic for the hypothesis problem

5.4 5.99

is given by

5.5 5.100

where T is the variance of the sequence of eigenvalues.

The ultimate goal of hypothesis testing is to search for some metrics that are “robust” for decision making by setting a threshold. For example, the trace functions are commonly used. To represent the trace functions, we suggest four methodologies: moment method, Stieltjes transform, orthogonal polynomial decomposition and free probability. We only give the basic definitions and their relevance to our problems of spectral sensing. We refer to [14] for details.

The goal of random matrix theory is to present several aspects of the asymptotics of random matrix “macroscopic” quantities [252] such as

images

where ik ∈ {1, …, m}, 1 ≤ kp and images are some n × n random matrices whose size n goes to infinity. images are most often Wigner matrices, that is Hermitian matrices with independent entries, and Wishart matrices.

5.2 Spectrum Sensing Using Large Random Matrices

5.2.1 System Model

The most remarkable intuition of random matrices is that in many cases the eigenvalues of matrices with random entries turn out to converge to some fixed distribution, when both the dimensions of the signal matrix tend to infinity with the same order [253]. For Wishart matrices, the limiting joint distribution called Marchenko-Pastur Law has been known since 1967 [251]. Then, most recently, the marginal distribution of single ordered eigenvalues have been found. By exploiting these results, we are able to express the largest and the smallest eigenvalues of sample covariance matrices using their asymptotic values in closed form. The closed-form, exact expression for the standard condition number (defined as the ratio of the largest to the smallest eigenvalue) is available.

We often treat the asymptotic limiting results for large matrices to the finite-size matrices. The power of large random matrices is such that the approximate technique is often stunningly precise. If the matrices under consideration are larger than 8 × 8, the asymptotic results are accurate enough to approximate the simulated results.

The received signal contains L vectors yl, l = 1, …, L

5.6 5.2

where hl[i] is the channel gain (often having a Rayleigh fading distribution) for the i-th sample time of the l-th sensor. This is similar for signal vector sl and noise vector wl. Let y be a n × 1 vector modeled as

images

where H is an n × L matrix, s is an L × 1 “signal” vector and w is an n × 1 “noise” vector. This model appears frequently in many signal processing and communications applications. If s and w are modeled as independent Gaussian vectors with independent elements with zero mean and unit variance matrix (identity covariance matrix), then y is a multivariate Gaussian with zero mean and covariance matrix written as

5.7 5.3

In most practical applications, the true covariance matrix is unknown. Instead, it is estimated from N independent observations (“snapshots”) y1, y2, …, yN as:

images

where Yn = [y1, y2, …, yN] is referred to as the “data matrix” and SY is the sample covariance matrix.

When n is fixed and N → ∞, it is well-known that the sample covariance matrix converges to the true covariance matrix. However, when both n, N → ∞ with

images

this is no longer true. Such a scenario is very relevant in practice where stationarity constraints limit the amount of data (N) that can be used to form the sample covariance matrix. Free probability is an invaluable tool in such situations when attempting to understand the structure of the resulting sample covariance matrices [254].

In matrix form, we have the following L × N matrix:

5.8 5.4

Similarly, we do this for H, S, W. (5.8) can be rewritten in matrix form as

5.9 5.5

where X = HS. Using (5.9), (5.6) becomes our standard form:

5.10 5.6

where we have made the assumption that

5.11 5.7

(5.11) can be justified rigorously using random matrix theory.

In general, knowing the eigenvalues of two matrices, say A, B, is not enough to find the eigenvalues of the sum or product of the two matrices, unless they commute. Free probability gives us a certain sufficient condition, called asymptotic freeness, under which the asymptotic spectrum of the sum A + B or product AB can be obtained from the individual asymptotic spectra, without involving the structure of the eigenvectors of the matrices [255]. [13, p. 9] [256]


Theorem 5.2 (Wishart matrices)
If W has a Wishart distribution with m degrees of freedom and true covariance matrix images, write images, and C is a q × p matrix of rank q, then

images


The sample covariance matrix SY based on Y, which contains N samples and L column vectors, is

images

The sample covariance matrix SY is related to the true covariance matrix images by the property of Wishart distribution (see Theorem 5.2)

5.12 5.8

where Z is a L × N i.i.d. Gaussian zero mean matrix. In fact,

5.13 5.9

is the Wishart matrix.

For a standard signal plus noise model, the true covariance matrix images has the form

5.14 5.10

where images and images are, respectively, the true covariance matrix of the signal and the noise; also images if the white noise is assumed.

Comparing the true covariance matrix (5.14) with its sample counterpart (5.11) reveals the fundamental role of a rigorous random matrix theory. We really cannot say much about the relation between the two versions of equations, generally for small sample size N. Luckily, when the sample size N is very large, the two versions can be proven equivalent (which will be justified later). This is the reason why random matrix theory is so relevant to wireless communications since a majority of wireless systems can be expressed in the form of (5.9). For example, CDMA, MIMO and OFDM systems can be expressed in such a form.

5.2.2 Marchenko-Pastur Law

The Marchenko Pastur law stated in Theorem 5.1 serves as a theoretical prediction under the assumption that the matrix is “all noise” [255]. Deviations from this theoretical limit in the eigenvalue distribution should indicate nonnoisy components, in other words, they should suggest information about the matrix.


Example 5.3 (Spectrum sensing using the ratio images [119, 255, 257, 258])
We mainly follow [255] in this example. (5.8) is repeated here for convenience. In matrix form, the received signal model is expressed as the following L × N matrix

5.15 5.101

where N samples are recorded at L sensors.
For a fixed L and N → ∞, the sample covariance matrix images converges to σ2I. This is the consequence of using the Central Limit Theorem. However, in practice, N can be of the same order of magnitude as L; this scenario is what the random matrix theory offers.
In the case where the entries of Y are independent (irrespective of the specific probability distribution, which corresponds to the case where no signal is present–images), results from asymptotic random matrix theory can be used. Theorem 5.1 proposed by Marchenko and Pastur (1967) is valid for this case as L, N → ∞ with images.
Interestingly, the support of the eigenvalues is finite, even if there is no signal. The theoretical prediction offered by the Marchenko-Pastur law can be used to set the threshold for detection.
To illustrate, let us consider the case when only one signal is present for images:

images

where s[i] and z[i] = σnl[i] are, respectively, the independent signal and noise with unit variance at instant i and sensor l. Let us denote by T the matrix:

images

Clearly, TTH has only one “significant” eigenvalue

images

The behavior of images is related to the study of the eigenvalues of large sample covariance matrices of spiked population models [26]. Let us define the signal to noise ratio γ as

images

Baik and colleagues [26, 259] show recently that, when

images

the maximum eigenvalue of images converges almost surely to

images

where b1 is superior to b0 that is also defined in Theorem 5.1. The difference between b1 and b0 can be used to sense the spectrum. Whenever the distribution of the eigenvalues of the sample covariance matrix images—all entries are observable and the size of the matrix is finite—departs from the predicted distribution obtained using the Marchenko-Pastur law, the detector knows that the signal is present. This approach of sensing non-null hypothesis is standard. But the metric and the mathematical tools are novel.

5.2.2.1 Noise Distribution Unknown, Variance Known

The criteria is

images

where a and b are defined in Theorem 5.1. The results are based on the asymptotic eigenvalue distribution.

5.2.2.2 Both Noise Distribution and Variance are Unknown

The ratio of the maximum and the minimum eigenvalues in the images case does not depend on the variance of the noise. This allows us to circumvent the need for the knowledge of the noise:

images

The test images provides a good estimator of the SNR γ. The ratio of the largest eigenvalue b1 and the smallest a of images is related solely to γ and α

images

For extremely low SNR such as γ = 0.01, that is, −20 dB, the above relation becomes

images

Typically, we have α = 1/2 and α = 1/10.


Example 5.4 (Spectrum sensing using the ratio λmaxmin [260])
The example is continued from Example 5.3. We define the normalized covariance matrix as

images

whose largest eigenvalue and smallest one are, respectively, lmax and lmin. In contrast, λmax and λmin are the corresponding ones of the sample covariance matrix images. Under images, images turns out to be complex white Wishart matrix and by the Machenko-Pastur law, the eigenvalue support is finite [10]. Under images, the covariance matrix belongs to the class of “spiked population models” and its largest eigenvalue increases outside the Marchenko-Pastur support [26]. This property suggests using

images

as test statistic for signal detection. Denoting T0 the decision threshold, the detector claims images if T > T0; otherwise, it claims images.
Example 5.4 uses the asymptotic properties of Wishart matrices. The smallest and largest eigenvalues of images under images almost surely to

5.16 5.102

in the limit

5.17 5.103

where α ∈ (0, 1) is a constant.
A semi-asymptotic approach [257] can be used. It is shown in [22] that under the same assumption of (5.17), the random variable

images

with

images

converges in distribution to the Trace-Widow law of order 2 defined in (5.50). The decision threshold [257] can be linked to the probability of false alarm defined as

images

by using the asymptotic limit for the smallest eigenvalue (5.16) and the Tracy-Widom culmination distribution function for the largest one. The threshold can be expressed as

images

where FTW2−1 is the inverse Tracy-Widom culmination distribution function of order 2.
Recently, it is established [261] that the smallest eigenvalue also converges to the Tracy-Widom culmination distribution as K, L → ∞, up to a proper rescaling factor. In particular, the random variable

images

with

images

converges to the Tracy-Widom culmination distribution function of order 2.
As a consequence of (5.17), μ is always negative in the considered range of α. The test statistic may be expressed as

images

The test statistic can be linked to the probability of false alarm. For details, we refer to [260].

5.2.2.3 On the Empirical Distribution of Eigenvalues of Large Dimensional Information-Plus-Noise Type Matrices

Sample covariance matrices for systems with noise are the starting point in many problems, for example, spectrum sensing. Multiplicative free deconvolution has been shown in [262] to be a method. This method can assist in expressing limit eigenvalues distributions for sample covariance matrices, and to simplify estimators for eigenvalue distributions of covariance matrices.

We adopt a problem formulation from [263]. Let Xn be n × N containing i.i.d. complex entries and unit variance (sum of variances of real and imaginary parts equals 1), σ > 0 constant, and Rn an n × N random matrix independent of Xn. Assume, almost surely, as n → ∞, the empirical distribution function (e.d.f.) of the eigenvalues of images converges in distribution to a nonrandom probability distribution function (p.d.f.), and the ratio images tends to a positive number. Then it is shown that, almost surely, the e.d.f. of the eigenvalues of

5.18 5.11

converges in distribution. The limit is nonrandom and is characterized in terms of its Stieltjes transform, which satisfies a certain equation. n and N both converge to infinity but their ratio images converges to a positive quantity c. The aim of [263] is to show that, almost surely, images converges in distribution to a nonrandom p.d.f. F. (5.18) can be thought of as the sample covariance matrices of random vectors rn + σxn, where rn can be a vector containing the system information and xn is additive noise, with σ a measure of the strength of the noise.

The matrix CN can be viewed as the sample correlation matrix of the columns of Rn + σXn, which models situations where relevant information is contained in the R.i′s and can be extracted from images. Since R.i is corrupted by X.i, the creation of this matrix CN is hindered. If the number of samples N is sufficiently large and if the noise is centered, then CN would be a reasonable estimate of images (I denoting the n × n identity matrix), which could also yield significant (if not all) information. Under the assumption

5.19 5.12

CN models situations where, due to the size of n, the number of samples N needed to adequately approximate images is unattainable, but is of the same order of magnitude as n. (5.19) is typical of many situations arising in signal processing where one can gather only a limited number of observations during which the characteristics of the signal do not change. This is the case for spectrum sensing when fading changes rapidly.

One application of the matrix CN defined in (5.18) is the problem of spectrum sensing, for example, in Example 5.3. The beauty of the above model is that σ is arbitrary. Of course, this model applies to the low SNR detection problem for spectrum sensing.

Assume that N observations for n sensors. These sensors form a random vector rn + σxn, and the observed values form a realization of the sample covariance matrix Cn. Based on the fact that Cn is known, one is interested in inferring as much as possible about the random vector rn, and hence on the system (5.18). Within this setting, one would like to connect the following quantities:

1. the eigenvalue distribution of Cn;
2. the eigenvalue distribution of images.

5.2.2.4 Statistical Eigen-Inference from Large Wishart Matrices

The measurements are of the form

images

where images denotes a p-dimensional (real or complex) Gaussian noise vector with covariance matrix images, images denotes a k-dimensional zero-mean (real or complex) Gaussian signal vector with covariance matrix images, and A is a p × k unknown nonrandom matrix.

5.3 Moment Approach

Most of the material in this section can be found in [10]. Throughout this section, only Hermitian matrices are considered. Real symmetric matrices are treated as special cases.

Let A be an n × n Hermitian matrix, and its eigenvalues be denoted by

images

Then, from the Definition 5.1, the k-th moment of FA can be expressed as

5.20 5.13

(5.20) plays a fundamental role in random matrix theory. Most results in finding limiting spectral distribution were obtained by estimating the mean, variance or higher moments of images.

To motivate our development, let us see an example first.


Example 5.5 (Moments-based hypothesis testing)
Continued from Example 5.5.
The hypothesis problem (5.4) is reformulated as

images

by using the effective rank r

images

where ||A|| is the maximum singular value of A, and the matrix inequality (this bound is sharp) [107]

images

Claim images, if the test statistic (5.5) is replaced with the new statistic k-th moment

images

For the case of the moment M = 1, it has been found that the algorithm performs very well.

When sample covariance matrices S that are random matrices are used instead of Σ, the moments of S are scalar random variables. Girko studied the random determinants det S for decades [111]. Repeat (5.21) here for convenience:

5.21 5.14

5.3.1 Limiting Spectral Distribution

To show that FA converges to a limit, say F, we often employ the Moment Convergence Theorem,

images

in some sense, for example, almost surely (a.s.) or in probability and the Carleman's condition

images

Thus the Moment Convergence Theorem can be used to show the existence of the limiting spectral distribution.

5.3.1.1 Wigner Matrix

The celebrated semicircle law (distribution) is related to a Wigner matrix. A Wigner matrix W of order n is defined as an n × n Hermitian matrix whose entries above the diagonal are i.i.d. complex random variables with variance σ2, and whose diagonal elements are i.i.d. real random variables (without any moment requirement). We have the following theorem.


Theorem 5.3 (Semicircle law)
under the conditions described above, as n → ∞, with probability 1, the empirical spectral distribution tends to the semicircle law with scale parameter σ, whose density is given by

5.22 5.104


For each n, the entries above the diagonal of W are independent complex random variables with mean zero and variance σ2, but they may not be identically distributed and depend on n. We have the following theorem.


Theorem 5.4
If images, and for any δ > 0

5.23 5.105

where I(·) is the indication function, then the conclusion of Theorem 5.3 holds.

In Girko's book (1990) [111], (5.23) is stated as a necessary and sufficient condition for the conclusion of Theorem 5.4.

5.3.1.2 Sample Covariance Matrix

Suppose that xjn, j, n = 1, 2, … is a double array of i.i.d. complex random variables with mean zero and variance σ2. Write

images

The sample covariance matrix is defined as

images

Marchenko and Pasture (1967) [251] had the first success in finding the limit spectral distribution of S. The work also provided the tool of Stieltjes transform. Afterwards, Bai and Yin (1988) [264], Grenander and Silverstein (1977) [265], Jonsson (1982) [266], Wachter (1978) [267] and Yin (1986) [268] did further research on the sample covariance matrix.


Theorem 5.5 ([268])
Suppose that images. Under the assumptions stated at the beginning of this subsection, the empirical spectral distribution of S tends to a limiting distribution with density

images

and a point mass 1 − c−1 at the origin if c > 1, where

images


The limit distribution of Theorem 5.5 is called the Marchenko-Pastur law (distribution) with ratio index c and scale index σ2. The existence of the second moment of the entries is necessary and sufficient for the Marchenko-Pastur law since the limit spectral distribution involves the parameter σ2. The condition of zero mean can be relaxed to have a common mean.

Sometimes, in practice, the entries of X depend on N and for each N, they are independent but not identically distributed. We have the following theorem.


Theorem 5.6
Suppose that for each N, the entries of XN are independent complex variables, with a common mean and variance σ2. Assume that images, and that for any δ > 0,

5.24 5.106

Then, FS tends almost surely to the Marchenko-Pastur distribution with ratio index c and scale index σ2.

Now consider the case p → ∞, but images. Almost all eigenvalues tend to 1 and thus the empirical spectrum distribution of S tend to a degenerate one. For convenience, we consider instead the matrix

images

When X is real, under the existence of the fourth moment, Bai and Yin (1988) [264] showed that its empirical spectrum distribution tends to the semicircle law almost surely as p → ∞. Bai (1988) [10] gives a generalization of this result.


Theorem 5.7 ([10])
Suppose that, for each N, the entries of the matrix XN are independent complex random variables with a common mean and variance σ2. Assume that, for any constant δ > 0, as p → ∞ with p/N → 0,

5.25 5.107

and

5.26 5.108

Then, with probability 1, the empirical spectral distribution of W tends to the semicircular law with scale index σ2.

Conditions (5.25) and (5.26) hold if the entries of X have bounded fourth moments.


Theorem 5.8 (Theorem 4.10 of [14])
Let F = SN1SN2−1, where SN1 and SN2 are sample covariance matrices with dimension p and sample size N1 and N2 with an underlying distribution of mean 0 and variance 1. If SN1 and SN2 are independent,

images

Then, the limit spectral density Fy1, y2 of F exists and has a density

images

Further, if y1 > 0, then Fst has a point mass 1 − 1/y1 at the origin.

 


Example 5.6
Consider an example to apply Theorem 5.8. Consider

images

where WN is an underlying distribution of mean 0 and variance 1 and

images

Under images, we can apply Theorem 5.8 to get the density function. Under images, the density is different from that of images.

5.3.1.3 Product of Two Random Matrices

The motivation of studying products of two random matrices arises from the fact that the true covariance matrix images is not a multiple of an identity matrix I, and that of multivariate F = S1S2−1. When S1 and S2 are independent Wishart, the limit spectral distribution of F follows from Wachter (1980) [267].


Theorem 5.9 ([10])
Suppose that the entries of X are independent complex random variables satisfying (5.24), and assume that T(= TN) is a sequence of p × p Hermitian matrices independent of X, such that its empirical spectral distribution tends to a nonrandom and nondegenerate distribution H in probability (or almost surely). Further, assume that

images

Then, the empirical spectral distribution of the matrix product ST tends to a nonrandom limit in probability (or almost surely).

5.3.2 Limits of Extreme Eigenvalues

5.3.2.1 Limits of Extreme Eigenvalues of the Wigner Matrix

The real case of the following theorem is obtained in [269] and the complex case is in [10].


Theorem 5.10 ([10, 269])
Suppose that the diagonal entries of the Wigner matrix W are i.i.d. real random variables, the entries above the diagonal are i.i.d. complex random variables, and all these variables are independent. Then, the largest eigenvalue λ1 of images tends to 2σ > 0 with probability 1 if and only if the following four conditions are true:
1. images;
2. images;
3. images;
4. images;
where x+ = max(x, 0).

For the Wigner matrix, symmetry between the largest and smallest eigenvalues exists. Thus, Theorem 5.10 actually proves the following: the necessary and sufficient conditions (for both the largest and the smallest eigenvalues) to have finite limits almost surely are (1) the diagonal entries have finite second moments; (2) the off-diagonal entries have zero mean and finite fourth moments.

5.3.2.2 Limits of Extreme Eigenvalues of Sample Covariance Matrix

Geman (1980) [270] proved that, as images, the largest eigenvalue of a sample covariance matrix tends to b(c) almost surely, assuming a certain growth condition on the moments of the underlying distribution, where images defined in Theorem 5.5. The real case of the following theorem is in [271], and their result is extended to the complex case in [10].


Theorem 5.11 ([10, 271])
In addition to the assumptions of Theorem 5.5, we assume that the entries of X have finite fourth moment. Then,

images


If we define the smallest eigenvalue as the (pN + 1)-st smallest eigenvalue of S when p > N, then from Theorem 5.11, we immediately reach the following conclusion:


Theorem 5.12 ([10])
Under the assumptions of Theorem 5.11, we have

images


The first work to exploit Theorem 5.12 for spectrum sensing is [258] with their conference version published in 2007. Denote the eigenvalues of SN by λ1 ≤ λ2 ≤ ··· ≤ λN. Write λmax = λN and

images

Using the convention above, Theorem 5.12 is true [14] for all c ∈ (0, ∞).


Theorem 5.13 (Theorem 5.9 of [14])
Suppose that the entries of the matrix images are independent (not necessarily identically distributed) and satisfy
1. images;
2. images;
3. images; and
4. images;
where δN → 0 and b > 0. Let images. Then, for any x > images > 0 and integers j, k ≥ 2, we have

images

for some constant C > 0.

5.3.2.3 Limiting Behavior of Eigenvectors

Relatively less work has been done on the limiting behavior of eigenvectors than eigenvalues. See [272] for the latest additions to the literature.

There is a good deal of evidence that the behavior of large random matrices is asymptotically distribution-free. In other words, it is asymptotically equivalent to the case where the basic entries are i.i.d. mean 0 normal, provided that some moment requirements are met.

5.3.2.4 Miscellanea

The norm (N−1/2X)k is sometimes important.


Theorem 5.14 ([269])
If images, then

images


The following theorem is proven independently by [273] and [269].


Theorem 5.15 ([269, 273])
If images, then

images


5.3.2.5 Circular Law–Non-Hermitian Matrices

We consider the non-Hermitian matrix. Let

images

be an N × N complex matrix with i.i.d. entries xjn of mean zero and variance 1. The eigenvalues of Q are complex and thus the empirical spectral distribution of Q, denoted by FN(x, y), is defined in the complex plane. Since the 1950s, it has been conjectured that FN(x, y) tends to the uniform distribution over the unit disc in the complex plane, called the circular law. The problem was open until Bai (1997) [274].


Theorem 5.16 (Circular Law [274])
Suppose that the entries have finite (4 + images)-th moments, and that the joint distribution of the real and imaginary parts of the entries, or the conditional distribution of the real part given the imaginary part, has a uniformly bounded density. Then, the circular law holds.

5.3.3 Convergence Rates of Spectral Distributions

5.3.3.1 Wigner Matrix

Consider the model of Theorem 5.4, and assume that the entries of W above or on the diagonal are independent and satisfy

5.27 5.15


Theorem 5.17 ([275])
Under the conditions in (5.27), we have

images

where F is the semicircular law with scalar parameter 1.

 


Theorem 5.18 ([276])
Under the four conditions in (5.27), we have

images

where “p” stands for probability.

5.3.3.2 Sample Covariance Matrix

Assume the following conditions are true.

5.28 5.16


Theorem 5.19 ([275])
Under the assumptions in (5.28), for 0 < θ < Θ < 1or1 < θ < Θ < ∞,

images

where cp = p/N and Fc is the Marchenko-Pastur distribution with dimension-ratio c and parameter σ2 = 1.

 


Theorem 5.20 ([275])
Under the assumptions in (5.28), for any 0 < images < 1,

images


 


Theorem 5.21 ([275])
Under the assumptions in (5.28), the conclusions in Theorems 5.19 and 5.20 can be improved to

images

and

images


Consider images, where XN = (xij) is a p × p matrix consisting of independent complex entries with mean zero and variance one, TN is a p × p nonrandom positive definite Hermitian matrix with spectral norm uniformly bounded in p. If

images

and cN = p/N < 1 uniformly as N → ∞, we obtain [277] that the rate of the expected empirical spectral distribution of SN converging to its limit spectral distribution is images. Under the same assumption, it can be proved that for any η > 0, the rates of the convergence of the empirical spectral distribution of SN in probability and the almost sure convergence are images and images.

5.3.4 Standard Vector-In, Vector-Out Model

Random vectors are our basic building blocks in our signal processing. We define the standard vector-in, vector-out model (VIVO)1 as

images

where yn is an M × 1 complex vector of observations collected from M sensors, xn is K × 1 complex vector of transmitted waveform, H is an M × K matrix, and wn is an M × 1 complex vector of additive Gaussian noise with mean zero and variance images.

Defining

images

we have

images

The sample covariance matrix is defined as

images

For the noise-free case, that is, images, we have

images

We can formulate the problem as a hypothesis testing problem

images

5.3.5 Generalized Densities

In the generalized densities, the moments of the matrix play a critical role. Assume that the matrix A has a density

images

The joint density function of its eigenvalues is of the form

images

For example for a real Gaussian matrix, β = 1 and hn = 1, for a complex Gaussian matrix, β = 2 and hn = 1, for a quaternion Gaussian matrix, β = 4 and hn = 1, and for a real Wishart matrix with np, β = 1 and hn = xnp. The following examples illustrate this.

1. Real Gaussian matrix, that is, symmetric, AT = A:

images

The diagonal entries of A are i.i.d. real images and entries above diagonal are i.i.d. real images.
2. Complex Gaussian matrix, that is, Hermitian, A* = A:

images

The diagonal entries of A are i.i.d. real images and entries above diagonal are i.i.d. complex images (whose real and imaginary parts are i.i.d. images).
3. Real Wishart matrix, of order p × n:

images

The entries of A are i.i.d. real images.
4. Complex Wishart matrix, of order p × n:

images

The entries of A are i.i.d. complex images.

For generalized densities, we have

1. Symmetric matrix:

images

where G(t2) is a polynomial of even orders with a positive leading coefficient, such as G(t2) = 4t4 + 2t2 + 3.
2. Hermitian matrix:

images

where G(t2) is a polynomial of even orders with a positive leading coefficient.
3. Real covariance matrix, of dimension p and degrees of freedom n:

images

where G(t) is a polynomial with a positive leading coefficient, such as G(t) = 4t3 + 2t2 + 3t + 5.
4. Complex covariance matrix, of dimension p and degrees of freedom n:

images

where G(t) is a polynomial with a positive leading coefficient.

The book of [14] mainly concentrates on results without assuming density conditions.

5.4 Stieltjes Transform

We follow [10] closely for the definition of the Stieltjes transform. Let G be a function of bounded variation defined on the real line. Then, its Stieltjes transform is defined by

5.29 5.17

where z = u + iv with v > 0. The integrand in (5.29) is bounded by 1/v, the integral always exists, and

images

This is the convolution of G with a Cauchy density with a scale parameter v. If G is a distribution function, then its Stieltjes transform always has a positive imaginary part. Thus, we can easily verify that, for any continuity points x1 < x2 of G,

5.30 5.18

(5.30) provides a continuity theorem between the family of distribution functions and the family of their Stietjes transforms.

Also, if Im(m(z)) is continuous at x0 + i0, then G(x) is differentiable at x = x0 and its derivative equals images. (5.30) gives an easy way to find the density of a distribution if its Stieltjes transform is known.

Let G be the empirical spectral distribution of a Hermitian matrix AN of N × N. It is seen that

5.31 5.19

where images is the i-th column vector of A with the i-th entry removed and Ai is the matrix obtained from A with the i-th row and column deleted. (5.31) is a powerful tool in analyzing the spectrum of large random matrix. As mentioned above, the mapping from distribution functions to their Stieltjes transforms is continuous.


Example 5.7 (Limiting spectral distributions of the wigner matrix)
As an illustration of how to use (5.31), let us consider the Wigner matrix to find its limiting spectral distribution.
Let mN(z) be the Stieltjes transform of the empirical spectral distribution of N−1/2W. By (5.31), and noticing wii = 0, we have

images

where

images

For any fixed v0 > 0 and B > 0, with z = u + iv, we have (omitting the proof)

5.32 5.109

Omitting the middle steps, we have

5.33 5.110

From (5.33) and (5.32), it follows that, with probability 1, for every fixed z with v > 0,

images

Letting v → 0, we find the density of the semicircle law as given in (5.22).

Let AN be an N × N Hermitian matrix and images be its empirical spectral distribution. If the measure μ admits a density f(x) with support Ω:

images

Then, the Stieltjest transform of images is given for complex arguments by

5.34 5.20

where Mk = ∫Ωxkf(xdx is the k-th moment of F. This provides a link between the Stieltjes transform and the moments of AN. The moments of random Hermitian matrices become practical if direct use of the Stieltjes transform is too difficult.

Let images, such that AB is Hermitian. Then, for images, we have [12, p. 37]

images

In particular, we can apply AB = XXH.

Let images be Hermitian and a be a nonzero real. Then, for images

images

There are only a few kinds of random matrices for which the corresponding asymptotic eigenvalue distributions are known explicitly [278]. For a wider class of random matrices, however, explicit calculation of the moments turns out to be unfeasible. The task of finding an unknown probability distribution given its moments is known as the problem of moments. It was addressed by Stieltjes in 1894 using the integral transform defined in (5.34). A simple Taylor series expansion of the kernel of the Stieltjes transform

images

shows how the moments can be found given the Stieltjes transform, without the need for integration. The probability density function can be obtained from the Stieltjes transform, simply taking the limit

images

which is called the Stieltjes inverse formula [11].

We follow [279] for the following properties:

1. Identical sign for imaginary part

images

where images is the imaginary part of images.
2. Monotonicity. If images, then Ψμ(z) is well defined and

images

3. Inverse formula

5.35 5.21

Note that if images, then images.
4. Dirac measure. Let δx be the Dirac measure at x

images

Then,

images

An important example is

images

5. Link with the resolvent. Let X be a M × M Hermitian matrix

images

and consider its resolvent Q(z) and spectral measure LM

images

The Stieltjes transform of the spectral measure is the normalized trace of the resolvent

images

Gaussian tools [280] are useful. Let the imagess be independent complex Gaussian random variables denoted by z = (Z1, ···, Zn).

1. Integration by part formula

images

2. Poincaré-Nash inequality

images

5.4.1 Basic Theorems


Theorem 5.22 ([281])
Let mF(z) be the Stieltjes transform of a distribution function F, then
1. mF is analytic over images;
2. if images, then images;
3. if images, images and images;
4. if F(0) = 0, then mF is analytic over images. Moreover, images implies images and we have the inequalities

images

with dist being the Euclidean distance.
Conversely, if mF(z) is a function analytical on images such that images if images and

images

then mF(z) is the Stieltjes transform of a distribution function F given by

images

If, moreover, images, then F(0) = 0, in which case mF(z) has an analytic continuation on images.

Our version of the above theorem is close to [12] with slightly different notation.

Let t > 0 and mF(z) be the Stieltjes transform of a distribution function F. Then, for images we have [12]

images

Let images and images be Hermitian, nonnegative definite. Then, for images we have [12]

images

The fundamental result in the following theorem [282] states the equivalence between pointwise convergence of Stieltjes transform and weak onvergence of probability measures.


Theorem 5.23 (Equivalence)
Letn) be probability measures on images and (images), images the associated Stieltjes transform. Then the following two statements are equivalent:
1. images for all images
2. images.

Let the random matrix W be square N × N with i.i.d. entries with zero mean and variance images. Let Ω be the set containing eigenvalues of W. The empirical distribution of the eigenvalues

images

converges a nonrandom distribution functions as N → ∞. Table 5.2 lists commonly used random marices and their density functions.

Table 5.1 compiles some moments for commonly encountered matrices from [278]. Calculating eigenvalues λk of a matrix X is not a linear operation. Calculation of the moments of the eigenvalue distribution is, however, conveniently done using a normalized trace since

images

Thus, in the large matrix limit, we define tr(X) as

images

Table 5.1 Common random matrices and their moments (The entries of W are i.i.d. with zero mean and variance images; W is square N × N, unless otherwise specified. images)

Convergence Laws Definitions Moments
Full-Circle Law W square N × N
Semicircle Law images images
Quarter Circle Law images images
Q2
Deformed Quarter Circle Law images
R2 images
Haar Distribution images
Inverse Semicircle Law Y = T + TH

Table 5.2 is made self-contained and only some remarks are made here. For Haar distribution, all eigenvalues lie on the complex unit circle since the matrix T is unitary. The essential nature is that the eigenvalues are uniformly distributed. Haar distribution demands for Gaussian distributed entries in the random matrix W. This condition does not seem to be necessary, but allowing for any complex distribution with zero mean and finite variance is not sufficient.

Table 5.2 Definition of commonly encountered random matrices for convergence laws (the entries of W are i.i.d. with zero mean and variance images; W is square N × N, unless otherwise specified)

Convergence Laws Definitions Density Functions
Full-Circle Law W square N × N images
Semicircle Law images images
Quarter Circle Law images images
Q2 images
Deformed Quarter Circle Law images images
R2 images
Haar Distribution images images
Inverse Semicircle Law Y = T + TH images

Table 5.32 lists some transforms (Stieltjes, R-, S- transforms) and their properties. The Stieltjes transform is more fundamental since both R-transform and S-transform can be expressed in terms of the Stieltjes transform.

Table 5.3 Table of Stieltjes, R- and S-transforms

Stieltjes Transform R-Transform S-Transform
images, images images images
images images images,
GK(z) = images images images
images images images
images images images
images images SY(z) = undefined
images images SAB(z) = SA(z)SB(z)
images images
images, images images
images
images
images

5.4.1.1 Products of Random Matrices

Almost surely, the eigenvalue distribution of the matrix product

images

converges in distribution, as K, N → ∞ but β = K/N.

5.4.1.2 Sums of Random Matrices

Consider the limiting distribution of random Hermitian matrices of the form [251, 283]

images

where W(N × K), D(K × K), A(N × N) are independent, with W containing i.i.d. entries having second moments, D is diagonal with real entries, and A is Hermitian. The asymptotic regime is

images

The behavior is expressed using the limiting distribution function images. The remarkable result is that the convergence of

images

to a nonrandom F.


Theorem 5.24 ([251, 283])
Let A be an N × N Hermitian matrix, nonrandom, for which FA(x) converge weakly as N → ∞ to a distribution function images. Let FD(x) converges weakly as N → ∞ to a distribution function denoted images. Suppose the entries of images i.i.d. for fixed N with unit variance (sum of the variances of the real and imaginary parts in the complex case). Then, the eigenvalue distribution of A + WDWH converges weakly to a deterministic F. Its Stieltjes transform G(z) satisfies the equation:

images


 


Theorem 5.25 ([284])
Assume
1. images, where 1 ≤ in, 1 ≤ jp, and Xi, j, N are independent real random variables with a common mean and variance σ2, satisfying

images

where I(x) is an indication function and ε n2 is a positive sequence tending to zero;
2. images
3. Tn is an p × p random symmetric matrix with images converging almost surely to a distribution H(t) as n → ∞;
4. Bn = An + XnTnXnH, where An is a random p × p symmetric matrix with images almost surely to FA, a (possibly defective) nonrandom distribution;
5. XN, TN, AN are independent.
Then, as n → ∞, images converges almost surely to a nonrandom distribution F, whose Stieltjes transform m(z) satisfies

images


 


Theorem 5.26 ([285])
Let Sn denote the sample covariance matrix of n pure noise vectors distributed images. Let l1 be the largest eigenvalue of Sn. In the joint limit p, n → ∞, with p/nc ≥ 0, the distribution of the largest eigenvalues of Sn converges to a Tracy-Widom distribution

images

with β = 1 for real valued noise and β = 2 for complex valued noise. The centering and scaling parameters, μn, p and ξn, p are functions of n and p only.

 


Theorem 5.27 ([285])
Let l1 be the largest eigenvalue as in Theorem 5.26. Then,

images

where

images


Consider the standard model for signals with p sensors. Let images denote p-dimensional i.i.d. observations of the form

5.36 5.22

sampled at n distinct times ti, where A = [a1, …, aK]T is the p × K matrix of K linearly independent p-dimensional vectors. The K × 1 vector s(t) = [s1(t), …, sK(t)]T represents the random signals, assumed zero-mean and stationary with full rank covariance matrix. σ is the unknown noise level, and bfn(t) is a p × 1 additive Gaussian noise vector, distributed images and independent of s(t).


Theorem 5.28 ([285])
Let Sn denote the sample covariance matrix of n observations from (5.36) with a single signal of strength λ. Then, in the joint limit p, n → ∞, with p/nc ≥ 0, the largest eigenvalue of Sn converges almost surely to

images


 


Theorem 5.29 ([286])
Let Cp×p be positive semidefinite. Fix an integer lp and assume the tail

images

of the spectrum of C decays sufficiently fast that

images

Let images be i.i.d. samples drawn from a images distribution. Define the sample covariance matrix

images

Let κl be the condition number associated with a dominant l-dimensional invariance subspace of C,

images

If

images

then with high probability

images


Theorem 5.29 says, assuming sufficiently fast decay of the residual eigenvalues, images samples ensure that the top l eigenvalues are captured with relative precision.

5.4.2 Large Random Hankel, Markov and Toepltiz Matrices

Two most significant matrices, whose limiting spectral distributions have been extensively studied, are the Wigner and the sample covariance matrices. Here, we study other structured matrices. The important papers include Bryc, Dembo, and Jiang [287], Bose et al. [288–294], and Miller et al. [295, 296]. We mainly follow Bryc, Dembo, and Jiang (2006) [287] for this development. For a symmetric n × n matrix A, let λj(A), 1jn, denote the eigenvalues of the matrix A, written in a nonincreasing order. The spectral measure of A, denoted images, is the empirical spectral distribution (ESD) of its eigenvalues, namely

images

where δx is the Dirac delta measure at x. So when A is a random matrix, images is a random measure on images.

The ensembles of random matrices are studied here. Let Xk:k = 0, 1, 2, … be a sequence of i.i.d. real-valued random variables. We can visualize the Wigner matrix as

images

It is well known that almost surely, the limiting spectral distribution of n−1/2(Wp) is the semicircle law.

The sample covariance matrix S is defined as

images

where Wp = ((Xij))1≤ip, 1≤jn.

1. If p → ∞ and p/n → 0, then almost surely, the limiting spectral distribution of images is the semicircle law.
2. If p → ∞ and p/nc ∈ (0, ∞), then almost surely, Sp is the Marchenko-Pastur law.

In view of the above discussion, it is thus natural to study the limiting spectral distribution of matrices of the form images where Xp is a p × n suitably patterned (asymmetric) random matrix. Asymmetry is used very loosely. It just means that Xn is not necessarily symmetric. One may ask the following questions [290]:

1. Suppose that p/nc, 0 < c < ∞. When does the limiting spectral distribution of images exist?
2. Suppose that p/n → 0. When does the imiting spectral distribution of images exist?

For images, define a random n × n Hankel matrix Hn = [Xi+j−1]1≤i, jn,

images

and a random n × n Toeplitz matrix Tn = [X|ij|]1≤i, jn,

images


Theorem 5.30 (Toeplitz matrices by Bryc, Dembo, and Jiang (2006) [287])
Let Xk:k = 0, 1, 2, … be a sequence of i.i.d. real-valued random variables with variance one Var(X1) = 1. Then, with probability 1, the empirical spectral distribution of images, or images, converges weakly, as n → ∞, to a nonrandom symmetric probability measure, γT, which does not depend on the distribution of the entries of X1 and has unbounded support.

 


Theorem 5.31 (Hankel matrices by Bryc, Dembo, and Jiang (2006) [287])
Let Xk:k = 0, 1, 2, … be a sequence of i.i.d. real-valued random variables with variance one Var(X1) = 1. Then, with probability 1, the empirical spectral distribution of images, or images, converges weakly, as n → ∞, to a nonrandom symmetric probability measure, γH, which does not depend on the distribution of the entries of X1 and has unbounded support and is not unimodal.

A symmetric distribution ν is said to be unimodal, if the function images is convex for x < 0.

To state the theorem on the Markov matrices, define the free convolution of two probability measures μ and ν as the probability measure whose nth cumulant is the sum of the nth cumulants of μ and ν.

Let us define the Markov matrices Mn. Let Xij:ji ≥ 1 be an infinite upper triangular array of i.i.d. random variables and define Xij = Xjiforji ≥ 1. Let Mn be a random n × n symmetric matrix given by

images

where Xn = [Xij]1≤i, jn and images is a diagonal matrix, so each of the rows of Mn has a zero sum. The values of Xij are irrelevant for Mn.

Wigner's classical result says that images converges weakly as n → ∞ to the (standard) semicircle law with the density images on (− 2, 2). For normal Xn and normal i.i.d. diagonal images independent of Xn, the weak limit of images is the free convolution of the semicircle and standard normal measures; see [297] and the references therein. The predicted result holds for the Markov matrix Mn, but the problem is nontrivial since Dn strongly depends on Xn.

images


Theorem 5.32 (Markov matrices by Bryc, Dembo, and Jiang (2006) [287])
Let the entries of a Markov matrix Mn be i.i.d. real-valued random variables with mean zero and variance one. Then, with probability one, the empirical spectral distribution of images converges weakly, as n → ∞, to the free convolution of the semicircle and standard normal measures. This measure is a nonrandom symmetric probability measure with smooth bounded density, which does not depend on the distribution of the entries of the underlying random variables and has unbounded support.

5.4.3 Information Plus Noise Model of Random Matrices

We follow [282] for this subsection. We consider images such that N = M(N), M < N and cN = M/Nc ∈ (0, 1) as N → ∞. A Gaussian information plus noise model matrix is a M × N random matrix defined by

5.37 5.23

where matrix BN is deterministic such that

images

and the entries Wi, j, N of WN are i.i.d. and satisfy

images

Most results can be also extended to the non-Gaussian case.

The convergence of the empirical spectral measure of images defined by

images

with δx is the Dirac measure at point x.

We define the resolvent of matrix images by

images

images. Its normalized trace images can be written as the Stieltjes transform of images defined as

images

The weak convergence of images can be studied by characterizing the convergence of images as N → ∞, with the aid of (5.4.3). The main result is summarized in this theorem.


Theorem 5.33
There exists a deterministic probability measure μN, satisfying images, and such that images as N → ∞ with probability one. Equivalently, the Stieltjes transform mN(z) of μN satisfies images almost surely images. Moreover, images, mN(z) is the unique solution of the equation

5.38 5.111

satisfying Im(mN(z)) > 0 for images.

This result was first proven by Girko [298] and later Dozier-Silverstein [263]. This result is also valid for the non-Gaussian case.

If the spectral distribution

images

of matrix images converges to the distribution F(x) as N → ∞, then

images

with μ probability measure, whose Stieltjes transform

images

satisfies

images

The convergence of images can be guaranteed by the following theorem.


Theorem 5.34 (The convergence of images)
For all images,

images

for all large N, with P1, P2 two polynomials with positive coefficients independent of N, z.

According to Theorem 5.33, images is a good approximation of images. The following theorem shows that the entries of QN(z) also approximate the entries of TN(z).


Theorem 5.35 (The entries of QN(z) approximate the entries of TN(z))
Let TN(z) be defined in (5.38). Let (d1, N) and (d2, N) be two sequences of deterministic vectors such that

images

Then,

images

almost surely for all images. Moreover,

images

for all large N, with P1, P2 two polynomials with positive coefficients independent of N, z.

Theorem 5.35 is valid for the non-Gaussian case that is proven in [299].


Definition 5.1 (Assumption 5.1)
Matrix BN has rank K = K(N) < M, and the eigenvalues of BNBNH has multiplicity one for all N.

 


Definition 5.2 (Assumption 5.2)
The rank K > 0 of BNBNH does not depend on N and for all k = 1, …, K, the positive sequence images is expressed as

images

with

images

and increasing values

images


The support of μN is studied in [300]. Under further assumption such as Assumption 5.1, this is studied in [299]. Assumption 5.2 is stronger than Assumption 5.1. Assumption 5.2 says the rank of BNBNH is independent of N.


Theorem 5.36 (Exact separation of the eigenvalues for the spiked model [299])
Under Assumption 5.2, define

images

and assume that

images

that is,

images

Thus, for N large enough, the support ΩN has Q = Ks + 1 clusters, that is,

images

The first cluster is associated with images and is given by

images

For q = 2, 3, …, Ks + 1 and k = q − 1, the cluster [xq, N, xq, N+] is associated with λMK + k, N and

images

and images is a positive images term.

Under the spiked model assumption, measure μN is intuitively expected to be very close to the Marchenko-Pastur distribution μ, and particularly ΩN should be close to

images

Theorem 5.36 shows that the first cluster images is very close to the support of the Marchenko-Pastur distribution; we have the presence of additional clusters, if the eigenvalues of BNBNH are large enough. Indeed, if Ks eigenvalues of BNBNH converge to different limits, above the threshold images, then there will be Ks additional clusters in the support of ΩN for all large N.

Theorem 5.36 also states that the smallest MKs eigenvalues of BNBNH are associated with the first cluster, or equivalently that

images

and that

images

The conditions for the support ΩN to split into several clusters depend in a nontrivial way on σ, the eigenvalues of BNBNH, the distance between them. However, under stronger Assumption 5.2 (K independent of N and convergence of the eigenvalues to different limits), explicit conditions for the separation of the eigenvalues are obtained: an eigenvalue of BNBNH is separated from the others if its limit is greater than images. The nonseparated eigenvalues are those associated with images. Therefore, in the spiked model case, the behaviors of the clusters of ΩN are completely characterized.

The spectral decomposition of images and images are expressed as

images

with images unitary matrices and images, images. The eigenvalues of images and images are decreasingly ordered such that 0 ≤ λ1, N ≤…≤ λM, N and images, respectively.


Theorem 5.37
Under Assumption 5.2,

images


Let us consider the eigenvectors of images and images. Let us first start with a problem of DOA estimation, and then convert the problem into the standard information plus noise model defined in (5.37).

The observed M-dimensional time series yn for the n-vector sample are expressed as

images

with

images

where sn collects K < M nonobservable “source signals,” the matrix A of M × K is deterministic with an unknown rank K < M, and images is additive white Gaussian noise such that images. Here images denotes the set of all integers.

In matrix form, we have YN = (y1, …, yN), observation matrix of M × N. Similarly, we do this for SN and VN. Then,

images

Using the normalized matrices

images

we obtain the standard model

5.39 5.24

which is identical to (5.37). Recall that

  • BN is a rank K deterministic matrix;
  • WN is a complex Gaussian matrix with i.i.d. entries having zero mean and variance σ2/N.

The “noise subspace” is defined as

images

that is, the eigenstate associated with 0 of images, and the “signal space” the orthogonal complement, that is, the eigenspace associated with the non-null eigenvalues of images. The goal of subspace estimation is to find the projection matrix onto the noise subspace, that is,

images

The subspace estimation problem we consider here is to find a consistent estimation of

images

where (dN) is a sequence of deterministic vectors such that images.

Traditionally, ηN is estimated by

images

in other words, by replacing the eigenvectors of true signal covariance images (information only) with those of their empirical estimates images (information plus noise). This estimator only makes sense in the regime where M does not depend on N (thus cN → 0), because from the classical law of large numbers, we have

images

whose convergence is not true in general, if cNc > 0. It can be shown that images does not converge to zero.

Fortunately, we can derive a consistent estimate of ηN by using the results concerning the convergence of bilinear forms of the resolvent of images.


Theorem 5.38 (Consistent estimate for the spiked model [282])
Let

images


where images, and m(x) is the Stieltjes transform of the Marchenko-Pastur law, expressed as

images

and

images

Then, under Assumption 5.2, if

images

it holds that

images

This theorem is derived using a different method [301].

5.4.4 Generalized Likelihood Ratio Test Using Large Random Matrices

The material in this subsection can be found in [302]. Denote by N the number of observed samples

images

where

  • (w[n]), n = 0, …, N − 1 represents an indepdent and identically distributed (i.i.d.) process of K × 1 vectors with circular complex Gaussian entries with mean zero and covariance matrix σ2IK;
  • vector images is deterministic, signal s[n], n = 0, …, N − 1 denotes a scalar i.i.d. circular complex Gaussian process with zero mean and unit variance;
  • (w[n]), n = 0, …, N − 1 and s[n], n = 0, …, N − 1 are assumed to be independent processes.

We stack the observed data into a K × N matrix

images

Denote by images the sample covariance matrix defined as

5.40 5.25

We denote by p0(Y2) and p1(Y2) the likelihood functions of the observation matrix Y indexed by the unknown parameters h and σ2 under hypotheses images and images.

As Y is a K × N matrix whose columns are i.i.d Gaussian vectors with covariance matrix images

images

When parameters h and σ2 are known, the Neyman-Pearson procedure gives a uniformly most power test, defined by the likelihood function

images

In practice, this is not the case: parameters h and σ2 are not known. We will deal with this case in the following. No simple procedure guarantees a uniformly most powerful test, and a classical approach called GLRT considers

images

The GLRT rejects hypothesis images when LN is above some threshold ξN

5.41 5.26

where ξN is selected in order that the probability of false alarm images does not exceed a given level α.

With the aid of [303, 304], the closed form expression of the GLRT LN is derived in [302]. Denote by

images

the ordered eigenvalues of images (all distinct with probability one).


Proposition 5.1
Let TN be defined by

5.42 5.112

Then, the GLRT writes

5.43 5.113

where

images


Since TN ∈ (1, K) and ϕ(·) is an increasing function in this interval, (5.43) is equivalent to

5.44 5.27

Using (5.44), (5.41) is rewritten as

5.45 5.28

with

images

The GLRT (5.45) requires setting the threshold γN which is a function of N. Let pN(t) be the complementary c.d.f. of the statistics TN under the null hypothesis images

images

The threshold γN is thus defined as

images

which guarantees that the probability of false alarm images is kept under a desired level α ∈ (0, 1).

Since pN(t) is continuous and decreasing from 1 to 0 in the interval t ∈ [0, ∞), the threshold images is well defined. It is more convenient to rewrite the GLRT (5.45) as the final form

5.46 5.29

The exact expression required in (5.46) has been derived in [302]. The fundamental point is that TN is only a function of the eigenvalues of λ1, …, λK of the sample covariance matrix images defined in (5.40). The adopted approach is to study the asymptotic behavior of the complex c.d.f. pN as the number of observations N goes to infinity. The asymptotic regime is defined as the joint limit where both the number K of sensors and the number N of snapshots go to infinity at the same speed

5.47 5.30

This asymptotic regime (5.47) is relevant in cases where the sensing system must be able to perform source detection in a moderate amount of time, that is, both the number K of sensors and the number N of snapshots are of the same order. Very often, the number of sensors is lower than the number of snapshots; hence, the ratio c is lower than 1.

(5.47) is particularly the case for “cognitive radio network as sensors” presented in Chapter 12. The basic idea behind this concept is that spectrum sensing is required in the cognitive radio systems. The availability of so much information that is used for spectrum sensing can also be exploited for sensing the radio environment (as “sensors”); in this manner, a cognitive radio network is used as sensors. Note that the cognitive radio network has much more information at its disposal than the traditional sensors such as ZigBee and Wi-Fi. The programmability of software defined radios must be exploited. Waveforms are programmable in these systems. Waveform diversity for remote sensing is thus enabled.

Under hypothesis images, images. Sample covariance matrix images is a complex Wishart matrix. Its mathematical properties are well studied.

Under hypothesis images, images. Sample covariance matrix images follows a single spiked model, in which all the population eigenvalues are one except for a few fixed eigenvalues [26].

The sample covariance matrix is not only central to the GLRT, but also to multivariate statistics. In many examples, indeed, a few eigenvalues of the sample covariance matrix are separated from the rest of the eigenvalues. Many practical examples show that the samples have non-null covariance. It is natural to ask whether it is possible to determine which non-null population model can possibly lead to the few sample eigenvalues separated from the Marchenko-Pastur density.

The simplest non-null case would be when the population covariance is finite rank perturbation of a multiple of the identity matrix. In other words, all but finitely many eigenvalues of the population covariance matrix are the same, say equal to 1. Such a population model has been called “spiked population model”: a null or purely noise model “spiked” with a few significant eigenvalues. The spiked population model was first proposed by [22]. The question is how the eigenvalues of the sample covariance matrix would depend on the nonunit population eigenvalues as N, K → ∞, as for example, a few large population eigenvalues would possibly pull up a few sample eigenvalues.

Since the behavior of TN is not affected if the entries of Y are multiplied by a given constant, we find it convenient to consider the model

images

Define the signal-to-noise ratio (SNR) as

images

The matrix

images

where U is a unitary matrix and

images

The limiting behavior of the largest eigenvalue λ1 can change, if the signal-to-noise ratio ρK is large enough, above a threshold.

The support of the Marchenko-Pastur distribution is defined as [λ, λ+], with λ the left edge and λ+ the right edge, where

5.48 5.31

A further result due to Johnstone [22] and Nadler [305] gives its speed of convergence images. Let Λ1 be defined as

5.49 5.32

then Λ1 converges in distribution toward a standard Tracy-Widom random variable with c.d.f. FTW defined in (5.50). The Tracy-Widom distribution was first introduced in [24, 25], as the asymptotic distribution of the centered and rescaled large eigenvalue of a matrix from the Gaussian Unitary Ensemble.


Definition 5.3 (Trace-Widom Law [24])

5.50 5.114

where q(s) is the solution of the Painleve II differential equation

images

satisfying the condition q(s) ~ − Ai(s) (the Airy function) as s → + ∞.

Tables of the Tracy-Widom law are available, for example, in [306], and a practical algorithm [307] is used to efficiently evaluate (5.50). Refer to [19] for an excellent survey.


Definition 5.4 (Assumption 5.1)
The following constant images exists

images


We call ρ the limiting SNR. We also define

images

Under hypothesis images, the largest eigenvalue has the following asymptotic behavior [26] as N, K → ∞

images

where images is strictly larger than the right edge λ+. In other words, if the perturbation is large enough, the largest eigenvalue converges outside the support of Marchenko-Pastur distribution [λ, λ+]. The condition for the detectability of the rank one perturbation is

5.51 5.33


Proposition 5.2 (Limiting behavior of TN under images and images)
Let Assumption 5.1 hold true and further assume (5.51) is true, that is, images. Then,

images


In Theorem 5.39, we take advantage of the fundamental fact: the largest eigenvalues of the sample covariance matrix images, defined in (5.40), converge in the asymptotic regime, defined in (5.47). The threshold and the p-value of interest can be expressed in terms of Tracy-Widom quantiles. Related work includes [24, 25, 308–311], Johnstone [9, 19, 22, 312–318], and Nadler [305].


Theorem 5.39 (Limiting behavior of GLRT [319])
Consider a fixed level α ∈ (0, 1) and let γN be the threshold for which the power of (5.45) is maximum, that is,

5.52 5.115

with

images

Then,
1. The following convergence is true

images

2. The probability of false alarm of the following test

images

converges to α.
3. The p-value pN(TN) associated with the GLRT can be approximated by

images

in the sense that

images


 


Definition 5.5 (Hypothesis test of the condition number)
Define the random variable of the condition number χN as

images

where λ1 and λK are the largest and the lowest eigenvalue of the sample covariance matrix images defined as (5.40).

A related test [257] uses the ratio of the maximum to the minimum of the eigenvalues of the sample covariance matrix. As for TN, χN is independent of the unknown noise power σ2. This test χN is based on an observation based on (5.48).

Under hypothesis images, the spectral measure of images weakly converges to the Marchenko-Pastur distribution with support (λ, λ+) with λ and λ+ defined in (5.48). The largest eigenvalue of images, λ1, converges toward λ+ under images, and images under images.

The lowest eigenvalue of images, λK, converges to [26, 271, 320]

images

under both images and images. Therefore, the statistic χN admits the following limit

images

The test is based on the observation that the limit of χN under the alternative images is strictly larger than the ratio images, at least when the SNR ρ is large enough.

The threshold must be determined before using the condition number test. It is proven in [302] that TN outperforms χN. Λ1 is defined in (5.49) (repeated below) and ΛK is defined as

images

Then, both Λ1 and ΛK converge toward Tracy-Widom random variables

images

where X and Y are independent random variables, both distributed according to FTW(x). A direct use of the Delta method [321, Chapter 3] gives the following convergence in distribution

images

where

images

The optimal threshold is found to be

images

In particular, ξN is bounded as N, K → ∞.

5.4.5 Detection of High-Dimensional Signals in White Noise

We mainly follow [322] for this development. We observe M samples (“snapshots”) of possibly signal bearing N-dimensional snapshot vectors x1, …, xM. For each i,

images

where xi are mutually independent. The snapshot vectors are modelled as

images

where

  • images, denotes an N-dimensional (real or circularly symmetric complex) Gaussian noise vector whose σ2 is assumed to be unknown;
  • images denotes a K-dimensional (real or circularly symmetric complex) Gaussian signal vector with Rs;
  • and H is a N × K unknown nonrandom matrix.
  • H encodes the parameter vector associated with the j-th signal whose magnitude is described by the j-th element of si.

Since the signal and noise vectors are independent of each other, the covariance matrix of xi can be decomposed as

images

where

images

The sample covariance matrix is defined as

images

where

images

is the matrix of observations (samples).

It is assumed that the rank of images is K. Equivalently, the NK smallest eigenvalues of images are equal to zero. Denote the eigenvalues of R by

images

then the smallest NK eigenvalues of R are all equal to σ2 so that

images

In practice, we have to estimate the value of K, so called rank estimation.

We assume M > N and images. Similarly to the case of the true covariance matrix, the eigenvalues of images are ordered

images

Our estimator developed here is robust to high-dimensionality and sample size constraints.

A central object in the study of large random matrices is the empirical distribution function (e.d.f.) of the eigenvalues. Under images, the e.d.f. of images converges to the Marchenko-Pastur density FW(x). The almost sure convergence of the e.d.f. of the signal-free sample covariance matrix (SCM) implies that the moments of the eigenvalues converge almost surely, so that

images

where [266]

images

For finite N and M, the sample moments, that is, images, will fluctuate about these limiting values.


Proposition 5.3 (Convergence of moments in distribution [322])
Let images denote a signal-free sample covariance matrix found from a N × M matrix of observations with i.i.d. Gaussian samples of mean zero and variance λ = σ2. For the asymptotic regime

images

we have

images

where the convergence is in distribution.

 


Proposition 5.4 (Convergence of the statistic qN)
Assume images satisfies the hypothesis of Proposition 5.3 for some λ. Consider the statistic

images

Then, as

images

we have

images

where the convergence is in distribution.

The two Propositions 5.3 and 5.4 deal with images. Now we introduce the two propositions in the signal-bearing case images. In the signal-bearing case, a so-called phase transition phenomenon is observed, in that the largest eigenvalue will converge to a limit different from that in the signal-free case only if the “signal” eigenvalues are above a certain threshold.


Proposition 5.5 (Convergence of the eigenvalues of images)
Let images denote a sample covariance matrix formed from a N × M matrix of observations with i.i.d. Gaussian observations whose columns are independent of each other and identically distributed with mean zero and variance R. Denote the eigenvalues of R by

images

Let lj be the j-th largest eigenvalue of images. Then,

images

we have

images


where convergence is almost certain.

This result appears in [26] for a very general setting. A matrix theoretic proof for the real-valued SCM case may be found in [27] while a determinental proof for the complex case may be found in [259]. A heuristic derivation appears in [323].

The “signal” eigenvalues strictly below the threshold described in Proposition 5.5 exhibit, on rescaling, fluctuations described by the Tracy-Widom distribution [24, 25]. An excellent survey is given in [19].


Proposition 5.6 (Convergence of the eigenvalues of images)
Assume R and images satisfies the hypotheses of Proposition 5.5. If images has multiplicity 1 and if images, then

images

where the convergence in distribution is almost sure.

A matrix theoretic proof for the real-valued SCM case may be found in [27] while a determinental proof for the complex case may be found in [259]. The result has been strengthened for non-Gaussian situations by Baik and Silverstein for general c ∈ (0, ∞).


Theorem 5.40 (The eigenvalues of R and images converge to the same limit)
Let R and images be two N × N sized covariance matrices whose eigenvalues are related as

images

where for some c ∈ (0, ∞), and

images

Let R and images be the associated sample covariance matrices formed from M snapshots. Then, for

images

and

images

where the convergence is almost surely and images is the estimate of the number of signals obtained using the algorithm in [322].

By Proposition 5.3, we heuristically define the effective number of (identifiable) signals as

images

Consider an example of

images

which has the N − 2 smallest eigenvalues λ3 = ··· = λN = σ2 and the two largest eigenvalues

images

respectively. Applying the result in Proposition 5.3, we can express the effective number of signals as

images

In the special situation when

images

we can (in an asymptotic sense) reliably detect the presence of both signals from the sample eigenvalues alone, whenever we have the following condition

images

We define images as

images

which measures the (theoretical) separation of the j-th “signal” eigenvalue from the largest “noise” eigenvalue in standard deviations of the j-th signal eigenvalue's fluctuations. Simulations suggest that reliable detection (with the empirical probability greater than 90%) of the effective number of signals is possible if images lies between 5 and 15.

5.4.6 Eigenvalues of (A + B)−1B and Applications

Roy's largest root test [324] is relevant under this context. We follow [325] for this development. Let X be an m × p normal data matrix: each row is an independent observation from images. A p × p matrix A = XHX is then said to have a Wishart distribution

images

Let

images

Assume that mp; then A−1 exists and the nonzero eigenvalues of A−1B generalize the univariate F ratio. The scale matrix images has no effect on the distribution of these eigenvalues; without loss of generality we assume that images. We follow the definition of [110, p. 84] for the greatest root statistic.


Definition 5.6 (Greatest root statistic)
Let images is independent of images, where mp. Then the largest eigenvalue θ of (A + B)−1B is called the greatest root statistic and a random variate having this distribution is denoted λ1(p, m, n) or λ1, p for short.

Since A is positive definite, 0 < λi < 1, for the i-th eigenvalue. Equivalently, λ1(p, m, n) is the largest root of the determinantal equation

images

The parameter p stands for dimension, m the “error” degrees of freedom and n the “hypothesis” degrees of freedom. Thus m + n represents the “total” degrees of freedom.

The greatest root distribution has the property

5.53 5.34

useful in the case when n < p. [110, p. 84]

Assume p is even and that p, m = m(p), n = n(p) all go to infinity together such that

5.54 5.35

Define the logit transform Wp as

images

Johnstone (2008) [325] shows Wp, is, with appropriate centering and scaling, approximately Tracy-Widom distributed:

images

The distribution function images was found by Tracy and Widom to be the limiting law of the largest eigenvalue of a p × p Gaussian symmetric matrix [25].

The centering and scaling parameters are

5.55 5.36

where the angle parameters γ, φ are defined by

5.56 5.37


Theorem 5.41 (Johnstone (2008) [325])
Assume that m(p), n(p) → ∞ as p → ∞ through even values of p according to (5.54). For each images, there exists C > 0 such that for t > t0,

images

Here C depends on (γ, φ) and also on t0 if t0 < 0.

Data matrices X based on complex-valued data rises frequently in signal processing and communications. If the rows of X are drawn independently from a complex normal distribution images, then we say

images

In parallel with the real case definition, if

images

are independent, then the joint density of the eigenvalues

images

of (A + B)−1B, or equivalently

images

is given by [326]

images

The largest eigenvalue λC(p, m, n) of (A + B)−1B is called the greatest root statistic, with distribution λC(p, m, n). The property (5.53) carries over to the complex case.

Again, we define

images


Theorem 5.42 (Johnstone (2008) [325])
Assume that m(p), n(p) → ∞ as p → ∞ according to (5.54). For each images, there exists C > 0 such that for t > t0,

images

Here C depends on (γ, φ) and also on t0 if t0 < 0.

The centering images and scaling images are given in [325]. Software implementation is also available. See [325] for details.

We are now in a position to consider several settings in multivariate statistics using double Wishart models.

5.4.7 Canonical Correlation Analysis

Suppose that there are N observations on each of L + M variables. For definiteness, assume that LM. The first L variables are grouped into an N × L data matrix

images

and the last M into N × L data matrix

images

Write

images

for cross-product matrices. Canonical correlation analysis (CCA), or more precisely, the zero-mean version of CCA, seeks the linear combinations aTx and bTy that are most highly correlated, that is, to maximize

5.57 5.38

This leads to a maximal correlation ρ1 and associated canonical vectors a1 and b1, usually each taken to have unit length. The procedure may be iterated. We restrict the search to vectors that are orthogonal to those already found:

images

The successive canonical correlations ρ1 ≥ ρ2 ≥ ··· ≥ ρL ≥ 0 may be found as the roots of the determinantal equation

5.58 5.39

See, for example, [110, p. 284]. A typical question in applications is how many of the ρk are significantly different from zero.

After some manipulations, (5.58) becomes

5.59 5.40

Now assume that Z = [XY] is an N × (L + M) Gaussian data matrix with mean zero. The covariance matrix is partitioned into

images

Under these Gaussian assumptions, X and Y variable sets will be independent if and only if

images

This is equivalent to asserting that

images

The canonical correlations (ρ1, …, ρL) are invariant under block diagonal transformations

images

of the data (for B and C nonsingular L × L and M × M matrices, respectively). It follows that under hypothesis

images

the distribution of the canonical correlations can be found (without loss of generality) by assuming that

images

In this case, the matrices A and B of (5.59) are

images

From the definition, the largest squared canonical correlation images has the Λ(L, NM, M) distribution under the null hypothesis images.

In practice, it is more common to allow each variable to have a separate, unknown mean. One can correct the mean using the approach, for example, in [325].

5.4.8 Angles and Distances between Subspaces

The cosine of the angle between two vectors images is given by

images

(5.57) becomes

images

5.4.9 Multivariate Linear Model

In the multivariate model,

images

where

1. Y of N × M is an observed matrix of M response variables on each of N individuals (sensors);
2. H of N × K is a known design matrix (channel response);
3. X of K × M is a matrix of unknown regression parameters;
4. W of N × M is a matrix of unobserved random distributions (additive white Gaussian noise). It is assumed that W is a normal matrix of N vector samples from images, so that the rows are independent Gaussian, each with mean 0 and common covariance matrix images.

The model matrix H remains the same for each response; however, there are separate vectors of unknown coefficients and errors for each response; these are organized into X of regression coefficients and N × M matrix E of errors [327]. Assuming for now that the model matrix H has full rank, the least squares estimator is

images

Consider the linear hypothesis

images

where C1 is a r × K matrix of rank r. For more details about C1, we refer to [327].

The hypothesis sums and errors sums of squares and product matrices become

images

It follows [327] that

images

and under hypothesis images,

images

in addition, D and E are independent. Generalization of the images-test is obtained from the eigenvalues of the matrix E−1D, or equivalently, the eigenvalues of (D + E)−1D.

Thus, under the null hypothesis C1X = 0, Roy's maximum root statistic λ1 has null distribution

images

5.4.10 Equality of Covariance Matrices

Suppose that independent samples from two normal distributions images and images lead to covariance estimates S1 and S2 which are independent and Wishart distributed on N1 and N2 degrees of freedom:

images

Then the largest root test of the null hypothesis

images

is based on the largest eigenvalue λ of (A1 + A2)−1A2, which under images has the Λ(M, N1, N2) distribution [328].

5.4.11 Multiple Discriminant Analysis

Suppose that there are K populations, the i-th population being assumed to follow an M-variate normal distribution images, with the covariance matrix assumed to be unknown, but common to all populations. A sample of size Ni (vector) observations is available from the i-th population, leading to a total of N = ∑Ni observations. Multiple discriminant analysis uses the “within groups” and “between groups” sums of squares and products matrices W and B to construct linear discriminant functions based on eigenvectors of W−1B. A test of the null hypothesis that discrimination is not worthwhile

images

can be based, for example, on the largest root of W−1B, which leads to use of the Λ(M, NK, K − 1) distribution [110, pages 318 and 138].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset