5.5 Case Studies and Applications

5.5.1 Fundamental Example of Using Large Random Matrix

We follow [279] for this development. Define an M × N complex matrix as

images

where (Xij)1≤i≤M, 1≤jN are (a number of MN) i.i.d. complex Gaussian variables images. x1, x2, …, xN are columns of X. The covariance matrix R is

images

The empirical covariance matrix is defined as

images

In practice, we are interested in the behavior of the empirical distribution of the eigenvalues of images for large M and N. For example, how do the histograms of the eigenvalues (λi)i=1, …, M of images behave when M and N increase? It is well known that when M is fixed, but N increases, that is, images is small, the large law of large numbers requires

images

In other words, if N images M, the eigenvalues of images are concentrated around σ2.

On the other hand, let us consider the practical case when M and N are of the same order of magnitude. As

5.60 5.41

it follows that

images

but

images

does not converge toward zero. Here || · || denotes the norm of a matrix. It is remarkable to find (by Marchenko and Pastur [251]) that the histograms of the eigenvalues of images tend to concentrate around the probability density of the so-called Marchenko-Pastur distribution

5.61 5.42

with

images

(5.61) is still true in the non Gaussian case. One application of (5.61) is to evaluate the asymptotic behavior of linear statistics

5.62 5.43

where f(x) is an arbitrary continuous function. The use of (5.61) allows many problems to be treated in closed forms. To illustrate, let us consider several examples:

1. images. Using (5.62), it follows that

images

where mN(− ρ2) is a unique positive solution of the equation

images

2. images. Using (5.62), it is found that the expression

images

is nearly equal to

5.63 5.44

The fluctuations of the linear statistics (5.62) can be cast into closed forms. The bias of the linear estimator is

images

The variance of the linear estimator is

images

where Δ2 is the variance and images denotes the normal Gaussian distribution. In other words,

images

5.5.2 Stieltjes Transform

We here follow [329]. Let us consider

images

where images is an N × K matrix with i.i.d. entries with zero mean and variance one, for

images

Denote A = σ2I and D = IK, K. For this case, we have

images

Applying Theorem 5.25, it follows that

5.64 5.45

images is the Cauchy transform of the eigenvalue distribution of matrix σ2I. The solution of (5.64) gives

images

The asymptotic eigenvalue distribution is given by

images

where δ(x) is a unit mass at 0 and [z]+ = max(0, z).

Another example is the standard vector-input, vector-output (VIVO) model3

5.65 5.46

where x and y are, respectively, input and output vectors, and H and n are channel transfer function and additive white Gaussian noise with zero mean and variance σ2. Here H is a random matrix. (5.65) covers a number of systems including CDMA, OFDM, MIMO, cooperative spectrum sensing and sensor network. The mutual information between the input vector x and the output vector y is a standard result in information theory

images

Differentiating C with respect to σ2 yields

images

It is interesting to note that we get the closed form in terms of the Stieltjes transform.

5.5.3 Free Deconvolution

We follow the definitions and notations of the example shown in Section 5.5.1. For more details, we see [12, 329, 330]. For a number of N vector observations of xi, i = 1, …, N, the sample covariance matrix is defined as

5.66 5.47

Here, W is an M × N matrix consisting of i.i.d. zero-mean, Gaussian vectors of variance 1/N. The main advantage of free deconvolution techniques is that asymptotic “kick-in” at a much earlier stage than other techniques available up to now [329]. Often, we know the values of R which are the theoretical values. We would like to find images. If we know the behavior of the matrix WWH, with the aid of (5.66), images can be obtained. Thus, our problem of finding images is reduced to understand WWH. Fortunately, the limiting distribution of the eigenvalues of WWH is well-known to be the Marchenko-Pastur law.

Due to our invariance assumption on one of the matrices (here WWH), the eigenvector structure does not matter. The result enables us to compute the eigenvalues of R, by knowing only the eigenvalues of images. The invariance assumption “frees,” in some sense, one matrix from the other by “disconnecting” their eigenspaces.

5.5.4 Optimal Precoding of MIMO Systems

Given M receive antennas and N transmit antennas, the standard vector channel model is

5.67 5.48

where the complex entries of H of M × N are the MIMO channel gains and H is a nonobservable Gaussian random matrix with known (or well estimated) second order statistics. Here, x is the transmitted signal vector and n is the additive Gaussian noise vector at the receiver with images.

The optimum precoding problem is to find the covariance matrix Q of x in order to maximize some figure of merit of the system. For example, the optimization problem can be expressed as

images

A possible alternative is to maximize a large system approximation of I(Q). Closed-form expressions (5.63) can be used [331]. For more details, see [279].

5.5.5 Marchenko and Pastur's Probability Distribution

We follow [279] for this development. The Stieltjes transform is one of the numerous transforms associated with a measure. It is well suited to study large random matrices and was first introduced in this context by Marchenko and Pastur [251]. The Stieltjes transform is defined in (5.34).

Consider

5.68 5.49

where VN is a M × N matrix with i.i.d. complex Gaussian random variables images. Our aim is the limiting spectral distribution of X = WNWNH. Consider the associated resolvent and its Stieltjes transform

5.69 5.50

The main assumption is: The ratio images is bounded away from zero and upper bounded, as M, N → ∞.

The approach is sketched here. First one derives the equation that is satisfied by the Stieltjes transform of the limiting spectral distribution images defined in (5.69). Afterwards, one relies on the inverse formula (see (5.35)) of Stieltjes transform, to obtain the so-called Marchenko-Pastur distribution.

There are three main steps:

1. To prove that images. This enables us to replace images by its expectation images in the derivation.
2. To establish the limiting equation satisfied by images.
3. To recover the probability distribution, with the help of the inverse formula of Stieltjes transform (5.35).

The Stieltjes transform in this work of large random matrices plays a role analogous to the Fourier transform in a linear, time-invariant (LTI) system.

5.5.6 Convergence and Fluctuations Extreme Eigenvalues

We here follow [279] for presentation. Consider the WWH defined in (5.68). Denote by

images

the ordered eigenvalues of WWH. The support of Marchenko-Pastur distribution is

images

One theorem is: If cNc*, we have

images

where “a.s.” denotes “almost surely.” The ratio of two limit expressions is used for spectrum sensing in Example 5.4.

A central limit theorem holds for the largest eigenvalue of matrix WWH, as M, N → ∞. The limiting distribution is known as Tracy-Widom Law's distribution (see (5.50)) for fluctuations of images.

The function FTW2(s) stands for Tracy-Widom culmination distribution function. MATLAB codes are available to compute [332].

Let us cNc*. When corrected centered and rescaled, images converges to a Tracy-Widom distribution:

images

5.5.7 Information plus Noise Model and Spiked Models

We refer to [263, 279, 299, 300, 302, 333–335] for more details. The observed M-dimensional time series yn for the n-vector sample are expressed as

images

with

images

where sn collects K < M nonobservable “source signals,” the matrix A is deterministic with an unknown rank K < M, and images is additive white Gaussian noise such that images. Here images denotes the set of all integers.

In matrix form, we have YN = (y1, …, yN)T, observation matrix of M × N. We do this for SN and VN. Then,

images

Using the normalized matrices

images

we obtain

5.70 5.51

Detection of the presence of signal(s) from matrix images is to tell whether K = 1 versus K = 0 (noise only) to simplify. Since K does not scale with M, that is, K images M, a spiked model is reached.

We assume that the number of sources K is K images N. (5.39) is a model of

images

The asymptotic regime is defined as

images

Let us further assume that SN is a random matrix with independent images elements (Gaussian iid source signals), and AN is deterministic. It follows that

images

where BN is M × N with independent images elements.

Consider a spectral factorization of images

images

Let PN be the M × M matrix

images

Then

images

where WN is M × N with independent images elements and images denotes weak convergence. Since PN is a fixed rank perturbation of identity, we reach the so-called multiplicative spike model

images

Similarly, we can define the additive spike model. Let us assume that SN is a deterministic matrix and

images

is such that

images

The additive spike model is defined as

images

A natural question arises: What is the impact of images on the spectrum of images in the asymptotic regime?

Let images and FN be the distribution functions of the spectral measures of images and images, respectively. Then

images

Thus images and images have identical (Marchenko-Pastur) limit spectral measure, either for the multiplicative or the additive spike model.

We use our measured data to verify the Marchenko-Pastur law. There are five USRP platforms serving as sensor nodes. The data acquired from one USRP platform are segmented into twenty data blocks. All these data blocks are used to build large random matrices. In this way, we emulate the network with 100 sensor nodes. If there is no signal, the spectral distribution of noise sample covariance matrix is shown in Figure 5.1(a) which follows the Marchenko-Pastur law in (5.3). When signal exists, the spectral distribution of sample covariance matrix of signal plus noise is show in Figure 5.1(b). The experimental results well agree with the theory. The support of the eigenvalues is finite. The theoretical prediction offered by the Marchenko-Pastur law can be used to set the threshold for detection.

Figure 5.1 Spectral distribution. (a) Spectral distribution of noise sample covariance matrix; (b) Spectral distribution of sample covariance matrix of signal plus noise.

5.1

Main results on the eigenvalues can be summarized into the theorem [279].


Theorem 5.43 (Main result on the eigenvalues)
The additive spike model is

images

where BN is a deterministic rank-K matrix such that

images

for k = 1, …, K, and WN is a M × N random matrix with independent images elements. Let iK be the maximum index for which images. Then, for k = 1, …, i,

images

where images denotes the presence of signal(s) while images the absence of signal(s).

5.5.8 Hypothesis Testing and Spectrum Sensing

This example is continued from the example shown in Section 5.5.7. For more details, we see [263, 279, 299, 300, 302, 333–335]. One motivation is to exploit the asymptotic limiting distribution for spectrum sensing.

The hypothesis test is formulated as

images

Assume further K = 1 source for convenience.

images

is a rank one matrix such that

images

The GLRT is

5.71 5.52

The natural question is: what is the asymptotic performance of TN under the assumption of large random matrices?

Under images and images, we have

images

As a consequence of Theorem 5.43, under images, if images, then

images

If images, then

images

Using the above result in (5.71), under images, we have

images

Under images, if images, we have

images

If images, we have

images

Recall that

images

The limit of detectability by the GLRT is given by

images

Defining images, we have

images

For extremely low SNR, it follows that c* must be very small, implying

images

With the help of the Tracy-Widom law, false alarm probability can be evaluated and linked with the decision threshold TN.

For finite, low rank perturbation of large random matrices, the eigenvalues and eigenvectors are studied in [335].


Example 5.8 (Dozier and Silverstein [263, 279, 300])
According to Dozeir and Silverstein [263, 279, 300] it exists a deterministic probability measure μN by images such that

images

Consider the additive spike model (5.69) repeated here for convenience

5.72 5.116

The approach to characterize μN is sketched here: The Stieltjes transform of μN is defined on images as

images

with

images


5.5.9 Energy Estimation in a Wireless Network

Consider a wireless (primary) network [330] in which K entities are transmitted data simultaneously on the same frequency resource. Transmitter k ∈ (1, …, K) has transmitted power Pk and is equipped with nk antennas. We denote

images

the total number of transmit antennas of the primary network.

Consider a secondary network composed of a total of N, Nn, sensing devices: they may be N single antennas devices or multiple devices embedded with multiple antennas whose sum is equal to N. The N sensors are collectively called the receiver. To ensure that every sensor in the second network roughly captures the same amount of energy from a given transmitter, it is assumed that the respective transmitter-sensor distances are alike. This is realistic assumption for anb in-house femtocell network.

Denote images the multiple antenna channel matrix between transmitter k and the receiver. We assume that the entries of images are independent and identically distributed (i.i.d.), with zero mean, unit variance, and finite fourth-order moment.

At time instant m, transmitter k emits the multi-antenna signal vector images, whose entries are assumed to be i.i.d., with zero mean, unit variance, and finite fourth-order moment.

Further, we assume that at time instant m, the received signal vector is impaired by an additive white Gaussian noise (AWGN) vector, denoted images, whose entries are assumed to be i.i.d., with zero mean, variance σ2, and finite fourth-order moment on every sensor. The entries of images have unit variance.

At time m, the receiver senses the signal images defined as

images

It is assumed that at least M consecutive sampling periods, the channel fading coefficients are constant. We concatenate M successive signal realizations into

images

we have

images

for every k. This can be further recast into the final form

5.73 5.53

where images is diagonal with first n1 entries P1, subsequent n2 entries P2, …, last nK entries PK,

images

By convention, it is assumed that

imagesH, W and X have independent entries of finite fourth-order moment. The entries of X need not be identically distributed, but may originate from a maximum of K distinct distributions.

Our objective is to infer the values of P1, ···, PK from the realization of the random matrix Y. The problem at hand is to exploit the eigenvalue distribution of images as N, n and M grow large at the same rate.


Theorem 5.44 (Stieltjes transform of images)
Let

images

where Y is defined in (5.73). Then, for M, N, n growing large with limit ratios

images

the eigenvalue distribution function images of BN, referred to as the empirical spectral function (e.s.d.) of BN, converges almost surely to the deterministic distribution function F, referred to as the limit spectral function (l.s.d.) of BN, whose Stieltjes transform mF(z) satisfies, for images

images

where mF(z) is the unique solution with positive imaginary part of the implicit equation in mF

images

in which we denote f the value

images


For Assumption 5.3 and Assumption 5.4—too long to be covered in this context—that are used in the following theorem, we refer to [330].

5.5.10 Multisource Power Inference

Let images be defined as in Theorem 5.45, and

images

be the vector of the ordered eigenvalues of BN. Further assume that the limiting ratios c, c1, …, cK and P are such that Assumptions 5.3 and 5.4 are fulfilled for some images. Then, as N, n, M grow large, we have images where the estimates images is given by

  • if MN,

images

  • if M = N,

images

in which

images

1, …, ηN) are ordered eigenvalues of the matrix images and (μ1, …, μN) are the ordered eigenvalues of the matrix images.

A blind multisource power estimation has been derived in [330]. Under the assumptions that the ratio between the number of sensors and the number of signals are not too small, and the source transmit powers are sufficiently distinct from one another, they derive a method to infer the individual source powers if the number of sources are known. This novel method outperforms the alternative estimation techniques in the medium to high SNR regime. This method is robust to small system dimensions. As such, it is particularly suited to the blind detection of primary mobile users in future cognitive radio networks.

5.5.11 Target Detection, Localization, and Reconstruction

We follow [336] for this development. A point reflector can model a small dielectric anomaly in electromagnetism; a small density anomaly in acoustics, or more generally, a local variation of the index of refraction in the scalar wave equation. The contrast of the anomaly can be of order one but its volume is small compared to the wavelength. In such a situation, it is possible to expand the solution of the wave equation around the background solution.

Consider the scalar wave equation in a d-dimensional homogeneous medium with the index of refraction n0. The reference speed of propagation is denoted by c. It is assumed that the target is a small reflector of inclusion D with the index of refraction nrefn0. The support of the inclusion is of the form D = xref + B, where B is a domain with small volume. Thus the scalar wave equation with the source S(t, x) takes the form

images

where the index of refraction is given by

images

For any yn, zm far from xref the field Re[(yn, zm)e−jωt], observed at yn, when a point source emits a time-harmonic signal with frequency ω at zm, can be expanded as powers of the volume as

images

where k0 = n0ω/c is the homogeneous wavenumber, ρref is the scattering amplitude

images

and images(y, z) is the Green's function or fundamental solution of the Helmhotz equation with a point source at z:

images

More explicitly, we have

images

where images is the Hankel function of the first kind of order zero.

When there are M sources (zm)m = 1, …, M and N receivers (yn)n = 1, …, N, the response matrix is the N × M matrix

images

defined by

images

This matrix has rank one:

images

The nonzero singular value is

5.74 5.54

The associated left and right singular vectors uref and vref are given by

images

where the normalized vectors of Green's functions are defined as

images

where * denotes the conjugation of the function.

The matrix H0 is the complete data set that can be collected. In practice, the measured matrix is corrupted by electronic or measurement noise that has the form of an additive noise. The standard acquisition gives

images

where the entries of W are independent complex Gaussian random variables with zero mean and variance images. We assume that NM.

The detection of a target can be formulated as a standard hypothesis testing problem

images

Without target images, the behavior of W is has been extensively studied. With target images, the singular values of the perturbed random response matrix are of interest. This model is also called the information plus noise model or the spiked population model. The critical regime of practical interest is that the singular values of an unperturbed matrix are of the same order, as the singular values of the noise, that is, σref is of the same order of magnitude as σ. Related work is in [24, 25, 308–311], Johnstone [9, 19, 22, 312–318], and Nadler [305].


Proposition 5.7 (The singular values of the perturbed random response matrix [336])
In the regime M → ∞,
1. The normalized l2-norm of the singular values satisfies

images

where Z0 follows a Gaussian distribution with zero mean and variance one and “D” denotes convergence in distribution.
2. If σref < γ1/4σ, then the maximum singular value satisfies

images

where Z2 follows a type-2 Tracy-Widom distribution.
3. If σref = γ1/4σ, σref < γ1/4σ, then the maximum singular value satisfies

images

where Z2 follows a type-3 Tracy-Widom distribution.
4. If σref > γ1/4σ, then the maximal singular value has Gaussian distribution with the mean and variance given by

images


The type-3 Tracy-Widom distribution has the cdf ΦTW3(z) given by

images

The expectation of Z3 is images and its variance is Var[Z3] = 1.22.

The singular eigenvectors of the perturbed response matrix are described in the following proposition. Define the scalar product as

images


Proposition 5.8 (The singular vectors of the perturbed random response matrix [336])
In the regime M → ∞,
1. If σref < γ1/4σ, then the angles satisfy

images

2. If σref > γ1/4σ, then the angles satisfy

images


A standard imaging function for target localiztion is the MUSIC function defined by

images

where u(x) is the normalized vector of Green's function. It is a nonlinear function of a weighted subspace migration functional

images

The reconstruction can be formulated in this context. Using Proposition 5.7, we can see that the quantity

5.75 5.55

is an estimator of σref, provided that images. From (5.74), we can estimate the scattering amplitude ρref of the inclusion by

images

with images the estimator of (5.75) of σref and images is an estimator of the position of the inclusion. This estimator is not biased asymptotically since it compensates for the level repulsion of the first singular value due to the noise.

5.5.12 State Estimation and Malignant Attacker in the Smart Grid

A natural situation to use the large random matrices is in the Smart Grid where the big network is met. We use one example to illustrate this potential. We follow the model of [337] for our setting. State estimation and a malignant attack on it are considered in the context of large random matrices.

Power network state estimators are broadly used to obtain an optimal estimate from redundant noisy measurements, and to estimate the state of a network branch which, for economical or computational reasons, is not directly monitored.

The state of a power network at a certain instant of time is composed of the voltage angles and magnitudes at all the system buses. Explicitly, let images and images be, respectively, the state and measurements vector. Then, we have

5.76 5.56

where h(x) is a nonlinear measurement function, and images is a zero mean random vector satisfying

images

The network state could be obtained by measuring directly the voltage phasors by means of phasor measurement devices. We adopt the approximated estimation model that follows from the linearization around the origin of (5.76)

images

where

images

Because of the interconnection structure of the power network, the measurement matrix H is sparse.

We assume that zi is available from i = 1 to i = N. We denote by ZN the p × N observation matrix. (5.76) can be rewritten as

5.77 5.57

where

images

From this matrix ZN, we can define the sample covariance matrix of the observation as

images

while the empirical spatial correlation matrix associated with the noiseless observation will take the form

images

To simplify the notation in the future, we define the matrices

images

so that (5.77) can be equivalently formulated as

5.78 5.58

where images is the (normalized) matrix of observations, BN is a deterministic matrix containing the signals contribution, and WN is a complex Gaussian white noise matrix with i.i.d. entries that have zero mean and variance σ2/N.

If N → ∞ while M is fixed, the sample covariance matrix of the observations

images

of ZN converges toward the matrix

images

in the sense that

5.79 5.59

However, in the joint limits

images

which is the practical case, (5.79) is no longer true. The random matrix theory must be used to derive the consequences. (5.78) is a standard form in [282, 333, 338, 339].

Given the distributed nature of a power system and the increasing reliance on local area networks to transmit data to a control center, it is possible for an attacker to attack the network functionality by corrupting the measurements vector z. When a malignant agent corrupts some of the measurements, the new state to measurements relation becomes

5.80 5.60

where images is chosen by the attacker, and thus, it is unknown and unmeasurable by any of the monitoring stations.

(5.80) is a standard hypothesis testing problem. The GLRT can thus be used, together with the random matrix theory. Following the same standard procedure as above, we have

images

where

images

By studying the sample covariance matrix

images

we are able to infer different behavior under hypothesis images or images. It seems that this result for this example is reported for the first time.

5.5.13 Covariance Matrix Estimation

We see [340] for more details. Consider a discrete-time complex-valued K-user N-dimensional vector channel with M channel uses. We define images and images. We assume the system load β < 1(K < N); otherwise the signal subspace is simply the entire N-vector space. In the m-th channel use, the signal at the receiver can be represented by an N-vector defined by

5.81 5.61

where hkm is the channel symbol of user k, having unit power, xk is the signature waveform of user k (note that sk is independent of the sample index m), and w(m) is additive noise. By defining

images

(5.81) is rewritten as

5.82 5.62

We do not assume specific distribution laws of the entries in H, x, w, thereby making the channel model more general [340]:

  • The entries of X are mutually independent random variables, each having zero expectation and variance images. Therefore, images almost surely, as N → ∞.
  • The entries of h(m) are mutually independent random variables. The random vectors images are mutually independent for different values of m and satisfying

images

  • The entries of w(m) are mutually independent random variables. The random vectors w(m)m = 1, …, M are mutually independent for different values of m and satisfy

images

  • X, h(m), w(m) are jointly independent.

Such a model is useful for CDMA and MIMO systems.

The covariance matrix of the received signal (5.81) is given by

5.83 5.63

Based on (5.82) and

images

the unbiased sample covariance matrix estimate is defined as

5.84 5.64

where

images

By applying the theory of noncrossing partitions, one can obtain explicit expressions for the asymptotic eigenvalue moments of the covariance matrix estimate [340]. Here we only give some key results.

5.5.13.1 Noise-Free Case

When images, the sample covariance matrix is given by

images

The generic eigenvalue of images is denoted by images and one defines the eigenvalue moments as

images

The explicit expressions are derived in [340].


Corollary 5.1 ([340])
The eigenvalue moments of the matrix images, where Z is an M × N matrix with mutually independent entries having unit variance, are the same as those of the matrix images.

The Stieltjes transform of images is denoted by images.


Corollary 5.2 ([340])
When σw2, the Stieltjes transform of images satisfies

5.85 5.117


(5.85) can be used to derive the cumulation distribution function (CDF) and the probability distribution function (PDF) of images, through the inverse formula for the Stieltjes transform.


Lemma 5.1 ([340])
There exist a constant C > 0 and images such that

images


 


Theorem 5.45 ([340])
The distribution of images converges weakly to a unique distribution determined by the eigenvalue moments as K, N, M → ∞.

 


Theorem 5.46 ([340])
When σw2, images, the PDF images(x) of the random variable images is given by

images


The closed-form PDF of images within its support has been derived in [340] and is too long to be included here.


Theorem 5.47 ([340])
The PDF images(x) has the following properties:
1. the support of images(x) is given by images, where

images

2. images;
3. for sufficiently large α, images; and
4. for sufficiently small α < β, images.

5.5.13.2 Noisy Case

We extend the anaysis to the general case of images. When images, the exact covariance matrix is of full rank and there is a mass point at images with probability 1 − β.


Theorem 5.48 ([340])
The distribution of eigenvalues of the matrix imagesW)H is the same as that of the matrix images, as K, M, N → ∞, where Z is an M × N matrix, whose entries are mutually independent random variables with unit variance.

Similar to the noise-free case, the eigenvalue moments of images

are derived in a closed form in [340]. Let us give the first four moments

images

The asymptotic eigenvalue moments of the estimated covariance matrix are larger than those of the exact covariance matrix (except for the expectation). This is true for both noisy and noise-free cases.

The Stieltjes transform of the eigenvalue images, denoted by images, is given by

images

We define

images

Their counterparts for the exact covariance matrix, denoted by λmin, λmax, and images, are given by images, and images, respectively.


Theorem 5.49 ([340])
There is no mass point for any positive eigenvalue images. The support of images satisfies the following properties:
1. for sufficiently large α, the support of images is not continuous interval when σw2 > 0;
2. images;
3. for sufficiently large α, images; and
4. for sufficiently small images.

The properties 3 and 4 in Theorem 5.47 are the same as in Theorem 5.49. Property 1 is completely different. The essential reason is the existence of a mass point at images. When images, the mass point at 0 always exists with probability 1 − β and the support on positive eigenvalues is continuous. When images, and 1 < α < ∞, the estimated covariance matrix is of full rank and there is no mass point. When α → ∞, the support of positive eigenvalues has to be separated into at least two disjoint intervals such that the support around images shrinks to a point.

5.5.14 Deterministic Equivalents

Deterministic equivalents for certain functions of large random matrices are of interest. The most important references are [281, 341–344]. Let us follow [281] for this presentation. Consider an N × n random matrix images, where the entries are given by

images

n. Here (σij(n), 1 ≤ iN, 1 ≤ jn) is a bounded sequence of real numbers called a variance profile; the images are centered with unit variance, independent and identically distributed (i.i.d.) with finite 4 + ε moment. Consider now a deterministic N × n matrix An whose columns and rows are uniformly bounded in the Euclidean norm.

Let

images

This model has two interesting features: the random variables are independent but not i.i.d. since the variance may vary and An, the centering perturbation of Yn, can have a very general form. The purpose of our problem is to study the behavior of

images

that is, the Stieltjes transform of the empirical eigenvalue distribution of images when n → ∞, and N → ∞ in such a way that images.

There exists a deterministic N × N matrix-valued function Tn(z) analytic in images such that, almost surely,

images

In other words, there exists a deterministic equivalent to the empirical Stieltjes transform of the distribution of the eigenvalues of images. It is also proved that images is the Stieltjes transform of a probability measure πn(dλ), and that for every bounded continuous function f, the following convergence holds almost surely

images

where the (λk)1≤k≤N are the eigenvalues of images. The advantage of considering images as a deterministic approximation instead of images (which is deterministic as well) lies in the fact that Tn(z) is in general far easier to compute than images whose computation relies on Monte Carlo simulations. These Monte Carlo simulations become increasingly heavy as the size of the matrix Σn increases.

This work is motivated by the MIMO wireless channels. The performance of these systems depends on the so-called channel matrix Hn whose entries images represent the gains between transmit antenna j and receive antenna i. Matrix Hn is often modeled as a realization of a random matrix. In certain context, the Gram matrix images is unitarily equivalent to a matrix (Yn + An)(Yn + An)* where An is a possibly full rank deterministic matrix. As an application, we derive a deterministic equivalent to the mutual information:

images

where σ2 is a known parameter.

Let us consider the extension of the above work. Consider

images

where (σij(n), 1 ≤ iN, 1 ≤ jn) is uniformly bounded sequence of real numbers, and the random variables images are complex, centered, i.i.d. with unit variance and finite 8th moment.

We are interested in the fluctuations of the random variable

images

where Yn* is the Hermitian adjoint of Yn and ρ > 0 is an additional parameter. It is proved [342] that when centered and properly scaled, this random variable satisfies a Center Limit Theorem (CLT) and has a Gaussian limit whose parameters are identified. Understanding its fluctuations and in particular being able to approximate its standard deviation is of major interest for various applications such as for instance the computation of the so-called outage probability.

Consider the following linear statistics of the eigenvalues

images

where λi is the eigenvalue of matrix images. This functional is of course the mutual information for the MIMO channel. The purpose of [342] is to establish a CLT for In(ρ) whenever images.

There exists a sequence of deterministic probability measure πn such that the mathematical expectation images satisfies

images

We study the fluctuations of

images

and prove that this quantity properly rescaled converges toward a Gaussian random variable. In order to prove the CLT, we study the quantity

images

from which the fluctuations arise and the quantity

images

which yields a bias.

The variance of Θ2 of images takes a remarkably simple closed-form expression. In fact, there exists a n × n deterministic matrix An whose entries depend on the variance profile σij such that the variance takes the form:

images

where images in the fourth cumulant of the complex variable X11 and the CLT is expressed as:

images

The bias can be also modeled. There exists a deterministic quantity Bn such that:

images

In [343], they study the fluctuations of the random variable:

images

where

images

as the dimensions of the matrices go to infinity at the same pace. Matrices Xn and An are respectively random and deterministic N × n matrices; matrices Dn and images are deterministic and diagonal. Matrix Xn has centered, i.i.d., entries with unit variance, either real and complex. They study the fluctuations associated to noncentered large random matrices. Their contribution is to establish the CLT regardless of specific assumptions on the real or complex nature of the underlying random variables. It is in particular not assumed that the random variables are Gaussian, neither that whenever the random variables Xij are complex, their second moment images is zero nor is it assumed that the random variables are circular.

The mutual information In has a strong relationship with the Stieltjes transform

images

of the spectral measure of images:

images

Accordingly, the study of the fluctuations of In is also an important step toward the study of general linear statistics of the eigenvalues of images which can be expressed via the Stieltjes transform:

images

5.5.15 Local Failure Detection and Diagnosis

The joint fluctuations of the extreme eigenvalues and eigenvectors are studied for a large dimensional sample covariance matrix [345], when the associated population covariance matrix is a finite-rank perturbation of the identity matrix, corresponding to the so-called spiked model in random matrix theory. The asymptotic fluctuations, as the matrix size grows large, are shown to be intimately linked with matrices from the Gaussian unitary ensemble (GUE). When the spiked population eigenvalues have unit multiplicity, the fluctuations follow a central limit theorem. This result is used to develop an original framework for the detection and diagnosis of local failure in large sensor networks, from known or unknown failure magnitude. This approach is relevant to the Cognitive Radio Network and the Smart Grid. This approach is to perform fast and computationally reasonable detection and localization of multiple failure in large sensor networks through this general hypothesis testing framework. Practical simulations suggest that the proposed algorithms allow for high failure detection and localization performance even for networks of small sizes, although for those much more observations than theoretically predicted are in general demanded.

5.6 Regularized Estimation of Large Covariance Matrices

Estimation of population covariance matrices from samples of multivariate data has always been important for a number of reasons [344, 346, 347]. Principals among these are:

1. estimation of principal components and eigenvalues in order to get an interpretable low-dimensional data representation (principal component analysis, or PCA);
2. construction of linear discriminant functions for classification of Gaussian data (linear discriminant analysis, or LDA);
3. establishing independence and conditional independence relations between components using exploratory data analysis and testing;
4. setting confidence intervals on linear functions of the means of the components.

(1) requires estimation of the eigenstructure of the covariance matrix while (2) and (3) require estimation of the inverse. In signal processing and wireless communication, the covariance matrix is always the starting point.

Exact expressions were cumbersome, and multivariate data were rarely Gaussian. The remedy was asymptotic theory for large sample and fixed relatively small dimensions. Recently, due to the rising vision of “big data” [1], datasets that do not fit into this framework have been very common—the data are very high-dimensional and sample sizes can be very small relative to dimension.

It is well known by now that the empirical covariance matrix for samples of size n from a p-variate Gaussian distribution, images, is not a good estimator of the population covariance if p is large. Johnstone and his students [9, 19, 22, 312–318, 325, 327] are relevant here.

The empirical covariance matrix for samples of size n from a p-variate Gaussian distribution has unexpected features if both p and n are large. If p/nc ∈ (0, 1), and the covariance matrix images (the identity), then the empirical distribution of the eigenvalues of the sample covariance matrix images follows the Marchenko-Pastur law [348], which is supported on

images

Thus, the larger p/n (thus c), the more spread out the eigenvalues.

Two broad classes of covariance estimators [347] have emerged: (1) those that rely on a natural ordering among variables, and assume that variables far apart in the ordering are only weakly correlated, and (2) those invariant to variable permutations. However, there are many applications for which there is no notion of distance between variables at all.

Implicitly, some approaches, for example, [312], postulate different notions of sparsity. Thresholding of the sample covariance matrix has been proposed in [347] as a simple and permutation-invariant method of covariance regulation. A class of regularized estimators of (large) empirical covariance matrices corresponding to stationary (but not necessarily Gaussian) sequences is obtained by banding [344].

We follow [346] for notation, motivation, and background.

We observe X1, …, Xn, i.i.d. p-variate random variables with mean 0 and covariance matrix images, and write

images

For now, we assume that Xi are multivariate normal. We want to study the behavior of estimates of images as both p and n → ∞. It is well known that the ML estimation of images, the sample covariance matrix,

images

behaves optimally if p is fixed, converging to images at rate n−1/2. If p → ∞, images can behave very badly, unless it is “regularized” in some fashion.

5.6.1 Regularized Covariance Estimates

5.6.1.1 Banding the Sample Covariance Matrix

For any matrix A = [aij]p × p, and any 0 ≤ kp, define

images

and estimate the covariance images. This kind of regularization is ideal in the situation where the indexes have been arranged in a such a way that in images we have

images

This assumption holds, for example, if images is the covariance matrix of Y1, …, Yp, where Y1, …, Yp is a finite inhomogeneous moving average (MA) process,

images

and xj are i.i.d. mean 0. Banding an arbitrary covariance matrix does not guarantee positive definitiveness.

All our sets will be the subsets of the so-called well-conditioned covariance matrices, images, such that, for all p,

images

Here, images and images are the maximum and minimum eigenvalue of images, and images is independent of p.

Examples of such matrices [349] include

images

where Xi is a stationary ergodic process, and Wi is a noise process independent of images. This model also includes the “spiked model” of Paul [27], since a matrix of bounded rank is Hilbert-Schmidt. We discuss this model in detail elsewhere.

We define the first class of positive definite symmetric well conditioned matrices images

as follows:

5.86 5.65

The class images in (5.86) contains the Toeplitz class images defined by

images

where fΣ(m) denotes the mth derivative of f. By [350], images is symmetric, Toeplitz, images, with σ(− k) = σ(k), and images has an absolutely continuous spectral distribution with Radon-Nikodym derivative fΣ(t), which is continuous on (− 1, 1), then

images

A second uniformity class of nonstationary covariance matrices is defined by

images

The bound independent of dimension identifies any limit as being of “trace class” as operator for m > 1.

The main work is summarized in the following theorem.


Theorem 5.50 (Bickel and Levina (2008) [346])
Suppose that X is Gaussian and images is the class of covariance matrices defined in (5.86). Then, if knimages(n−1logp)−1/(2(α+1)),

5.87 5.118

uniformly on images.

5.6.2 Banding the Inverse

Suppose we have

images

defined on a probability space, with probability measure images, which is images, images. Let

5.88 5.66

be the images projection of images on the linear span of X1, …Xj−1, with Zj = (X1, …, Xj−1)T the vector of coordinates up to j − 1, and aj = (aj1, …, aj, j−1)T the vector of coefficients. If j = 1, let images. Each vector aj can be computed as

5.89 5.67

Let the lower triangular matrix A with zeros on the diagonal contain the coefficients aj arranged in rows. Let images, images and let images be a diagonal matrix. The geometry of images or standard regression theory implies independence of the residuals. After applying the covariance operator to the identity

images

we obtain the modified Cholesky decomposition of images and images:

5.90 5.68

Suppose now that k < p. It is natural to define an approximation to images by restricting the variables in regression (5.88) to

images

In other words, in (5.88), we regress each Xj on its closest k predecessors only. Let Ak be the k-banded lower triangular matrix containing the new vectors of coefficients images, and let images be the diagonal matrix containing the corresponding residual variance. Population k-banded approximations images and images are obtained by plugging in Ak and Dk in (5.90) for A and D.

If

images

with images lower triangular, images, let

5.91 5.69


Theorem 5.51 (Bickel and Levina (2008) [346])
Uniformly for images, if kn≈(n−1logp)−1/(2(α+1)), and images,

5.92 5.119


 


Corollary 5.3 (Bickel and Levina (2008) [346])
For m ≥ 2, uniformly on images, if kn≈(n−1logp)−1/2m,

5.93 5.120


5.6.3 Covariance Regularization by Thresholding

Bickel and Levina (2008) [347] considers regularizing a covariance matrix of p variables estimated from n (vector) observations, by hard thresholding. They show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (logp)/n → 0, and obtain explicit rates.

The approach of thresholding of the sample covariance matrix is a simple and permutation-invariant method of covariance regularization. We define the thresholding operator by

images

which we refer to as A thresholded at s. Ts preserves symmetry and is invariant under permutations of variables labels, but does not necessarily preserve positive definiteness. However, if

images

then Ts(A) is necessarily positive definite, since for all vectors v with ||v||2 = 1, we have

images

Here, λmin(A) stands for the minimum eigenvalue of A.

images in (5.86) defines the uniformity class of “approximately bandable” covariance matrices. Here, we define the uniformity class of covariance matrices invariant under permutations by

images

If q = 0, we have

images

is a class of sparse matrices. Naturally, there is a class of covariance matrices images that satisfy both banding and thresholding conditions. Define a subset of images by

images

for α > 0.

We consider n i.i.d. p-dimensional observations X1, …, Xn distributed according to a distribution images, with images (without loss of generality), and images. We define the empirical (sample) covariance matrix by

images

where images and write images.


Theorem 5.52 (Bickel and Levina (2008) [347])
Suppose images is Gaussian. Then, uniformly on images, for sufficiently large M′, if

images

and images, then

images

and uniformly on images,

images


This theorem is in parallel with the banding result of Theorem 5.50.

5.6.4 Regularized Sample Covariance Matrices

Let us follow [344] to state a certain central limit theorem for regularized sample covariance matrices. We just treated how to band the covariance matrix images; here we consider how to band the sample covariance matrix images. We consider regularization by banding, that is, by replacing those entries of XTX that are, at distance, exceeding b = b(p) away from the diagonal by 0. Let Y = Y(p) denote the thus regularized empirical matrix.

Let X1, …, Xk be real random variables on a common probability space with moments of all orders, in which the characteristic function

images

is an infinitely differentiable function of the real variables t1, …, tk. One defines the joint cumulant C(X1, …, Xk) by the formula

5.94 5.70

(The middle expression is a convenient abbreviated notation.) The quantity C(X1, …, Xk) depends symmetrically and images-multilinearly on X1, …, Xk. Moreover, dependence is continuous with respect to the images-norm. One has in particular

images


Lemma 5.2
If there exists 0 < l < k such that the σ-fields images and images are independent, then C(X1, …, Xk) = 0.

 


Lemma 5.3
The random vector X1, …, Xk has a Gaussian joint distribution if and only if images for every integer r ≥ 3 and sequence i1, …, ir ∈ 1, …, k.

Let

images

be a stationary sequence of real random variables, satisfying the following conditions:

1. Assumption 5.5. As p → ∞, we have b → ∞, n → ∞ and b/n → 0, with bp.
2. Assumption 5.6.

5.95 5.71

5.96 5.72

5.97 5.73

Let us turn to random matrices. Let

images

be an i.i.d. family of copies of images. Let X = X(p) be the n × p random matrices with entries

images

Let B = B(p) be the p × p deterministic matrix with entries

images

Let Y = Y(p) be the p × p random symmetric matrix with entries

5.98 5.74

and eigenvalues images.

For integers j, let

images

For integers m > 0 and all integers i and j, we write

5.99 5.75

Here, the convolution images is defined for any two summable functions images :

images

Now we are in a position to state a central limit theorem.


Theorem 5.53 (Anderson and Zeitouni (2008) [344])
Let Assumption 5.5 and Assumption 5.6 hold. Let Y = Y(p) be as in (5.98). Let Qij and Ri(m) be as in (5.99). Then the process

images

converge in distribution, as p → ∞, to a zero mean Gaussian process images, with covariance specified by

images


 


Example 5.9 (Some stationary sequences satisfying Assumption 5.6 [344])
Fix a summable function images and an i.i.d. sequence images of mean zero random variables with moments of all orders. Now convolve: put

images

It is obvious that (5.95) and (5.96) hold. To see the summability condition (5.97) on joint cumulants, assume at first that h has finite support. Then, by standard properties of joint cumulants (Lemma 5.2), we get the formula

images

By a straightforward calculation, this leads the analogous formula without the assumption of finite support of h, whence in turn verification of (5.97).

5.6.5 Optimal Rates of Convergence for Covariance Matrix Estimation

Despite recent progress on covariance matrix estimation, there has been remarkably little fundamental theoretical study on optimal estimation. Cai, Zhang and Zhou (2010) [351] establish the optimal rates for estimating the covariance matrix under both the operator norm and Frobenius norm. Optimal procedures under two norms are different, and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is reached by constructing a special class of tapering estimators and by studying their risk properties. The banding estimator treated previously in Section 5.6.1 is suboptimal and the performance can be significantly improved using the technique to be covered now.

We write anbn if there are positive constants c and C independent of n such that can/bnC. For matrix A, its operator norm is defined as images. We assume that p ≤ exp(γn) for some constant γ > 0.

5.100 5.76

where images is the maximum eigenvalue of the matrix images, and α > 0, M > 0 and M0 > 0.


Theorem 5.54 (Minimax risk by Cai, Zhang and Zhou (2010) [351])
The minimax risk of estimating the covariance matrix images over the class images satisfies

5.101 5.121


The proposed procedure does not attempt each row/column optimally as a vector. This procedure does not optimally trade bias and variance for each row/column. This proposed estimator has good numerical performance; it nearly uniformly outperforms the banding estimator.


Example 5.10 (Tapering estimator [351])
For a given even integer with 1 ≤ kp, we define a tapering estimator

5.102 5.122

where σij* are the entries in the ML estimator images and the weights

images

where kh = k/2. Without loss of generality, we assume that k is even. The weights wij can be rewritten as

5.103 5.123

The tapering estimators are different from the banding estimators used in [346]. See also Section 5.6.1.

 


Lemma 5.4
The tapering estimator images given in (5.102) can be expressed as

5.104 5.124


Assume that the distribution of the X1's are sub-Gaussian in the sense that there is ρ > 0 such that

5.105 5.77

Let images denote the set of distributions of X1 that satisfy (5.100) and (5.105).


Theorem 5.55 (Upper bound by Cai, Zhang and Zhou (2010) [351])
The tapering estimator images, defined in (5.104), of the covariance matrix images with p > n1/(2α+1) satisfies

5.106 5.125

for k = o(n), logp = o(n) and some constant C > 0. In particular, the estimator images with k = n1/(2α+1) satisfies

5.107 5.126


From (5.106), it is clear that the optimal choice of k is of order n−2α/(2α+1). The upper bound given in (5.107) is thus rated optimal, among the class of the tapering estimators defined in (5.104). The minimax lower bound derived in Theorem 5.56 shows that the estimator images with k = n−2α/(2α+1) is in fact rated optimal among all estimators.


Theorem 5.56 (Lower bound by Cai, Zhang and Zhou (2010) [351])
Suppose p ≤ exp(γn) for some constant γ > 0. The minimax risk for estimating the covariance matrix images over images under the operator norm satisfies

images


Theorem 5.55 and Theorem 5.56 together show that the minimax risk for estimating the covariance matrices over the distribution space images satisfies, for p > n1/(2α+1),

5.108 5.78

The results also show that the tapering estimator images with tapering parameter k = n1/(2α+1) attains the optimal rate of convergence images.

It is interesting to compare the tapering estimator with the banding estimator of [346]. A banding estimator with bandwidth images was proposed and the rate of convergence of images was proven.

Both the tapering estimator and the banding estimator are not necessarily positive semidefinite. A practical proposal to avoid this would projecting the estimator images to the space of positive semidefinite matrices under the operator norm. One may first diagonalize images and then replace negative eigenvalues by zeros. The resulting estimator will be then positive semidefinite.

In addition to the operator norm, the Frobenius norm is another commonly used matrix norm. The Frobenius norm of a matrix A is defined as the l2 vector norm of all entries in the matrix

images

This is equivalent to treating the matrix A as a vector of length p2. It is easy to see that the operator norm is bounded by the Frobenius norm, that is, ||A|| ≤ ||A||F.

Consider estimating the covariance matrix images from the sample images. We have considered the parameter space images defined in (5.100). Other similar parameter spaces can be also considered. For example, in time series analysis it is often assumed the covariance |σij| decays at the rate |ij|−(α−1) for some α > 0. Consider the collection of positive-definite symmetric matrices satisfying the following conditions

5.109 5.79

where images is the maximum eigenvalue of the matrix images. images is a subset of images as long as M1 ≤ αM.

Let images denote the set of distribution of X1 that satisfies (5.105) and (5.109).


Theorem 5.57 (Minimax risk under Frobenius norm by Cai, Zhang and Zhou (2010) [351])
The minimax risk under Frobenius norm satisfies

5.110 5.127


The inverse of the covariance matrix images is of significant interest. For this purpose, we require the minimum eigenvalue of images to be bounded away from zero. For δ > 0, we define

5.111 5.80

Let images denote the set of distributions of X1 that satisfy (5.100), (5.105), and (5.111), and similarly, distribution in images that satisfy (5.105), (5.109), and (5.111).


Theorem 5.58 (Minimax risk of estimating the inverse covariance matrix by Cai, Zhang and Zhou (2010) [351])
The minimax risk under Frobenius norm satisfies

5.112 5.128

where images denotes either images or images.

5.6.6 Banding Sample Autocovariance Matrices of Stationary Processes

Nonstationary covariance estimators by banding a sample covariance matrix or its Cholesky factor were considered in [352] and [346] in the context of longitudinal and multivariate data. Estimation of covariance matrices of stationary processes was considered in [353]. Under a short-range dependent condition for a wide class of nonlinear processes, it is shown that the banded covariance matrix estimates converge, in operator norm, to the true covariance matrix with explicit rates of convergence. Their consistency was established under some regularity conditions when

images

where n and p are the number of subjects and variables, respectively. Many good references are included in [353].

Given a realization of X1, …, Xn of a mean-zero stationary process {Xt}, its autocovariance function σk = cov(X0, Xk) can be estimated by

5.113 5.81

It is known that for fixed images, under ergodicity condition, images in probability. Entry-wise convergence, however, does not automatically imply that images is a good estimator of images. Indeed, although positive definite, images is not uniformly close to the population (true) covariance matrix images, in the sense that the largest eigenvalue or the operator norm of images does not converge to zero. Such uniform convergence is important when studying the rate of convergence of the finite predictor coefficients and performance of various classification methods in time series.

Not necessarily positive definite, the covariance matrix estimator is of the form

5.114 5.82

where l ≥ 0 is an integer. It is a truncated version of images preserving the diagonal and the 2l main subdiagonals; if ln − 1, then images. By following [346], images is called the banded covariance matrix estimate and l its band parameter.

Hannan and Deistler (1988) [354] have considered certain linear ARMA processes and obtained the uniform bound

images

Here, we consider the comparable results for nonlinear processes, mainly following the notation and results of [353].

Let images, be independent and identically distributed (i.i.d.) random variables. Assume that images is a causal process of the form

5.115 5.83

where g is a measurable function such that Xi is well-defined and images. Many stationary processes fall within the framework of (5.115).

To introduce the dependent structure, let images be an independent copy of images and ξi = (···, εi−1, ε i). Following [355], for i ≥ 0, let

images

For α > 0, define the physical dependence of measure

5.116 5.84

Here, for a random variable Z, we write images, if

images

and write || · || = || · ||2. Observe that images is a coupled version of Xi = gi) with ε 0 in the latter replaced by an i.i.d. copy images. The quantity δp(i) measures the dependence of Xi on ε 0. We say that images is short-range dependent with moment α if

5.117 5.85

That is, the cumulative impact of ε 0 on future values of the process or images is finite, thus implying a short-range dependence.


Example 5.11 ([353])
Let

images

where ai are real coefficients with images ε i are i.i.d. with images, and g is a Lipschitz continuous function. Then, images is a well-defined random variable and images. Hence we have (5.117).

 


Example 5.12 ([353])
Let ε i be i.i.d. random variables and set

images

where g is a bivariate function. Many nonlinear time series models follow this framework.

Let ρ2(A) is the largest eigenvalue of ATA. The n × n matrix A has the operator norm ρ(A).

We define the project operator images as

images


Theorem 5.59 (No convergence in probability [353])
Assume that the process Xi in (5.115) satisfies

images

If images then, images does not converge to zero in probability.

 


Theorem 5.60 (Convergence in probability [353])
Let 2 < α ≤ 4 and q = α/2. Assume (5.117) and 0 ≤ l < n − 1. Then,

5.118 5.129

where cα > 0 is a constant depending only on α.

5.7 Free Probability

In quantum detection, tensor products are needed. For a large number of random matrices, tensor products are too computationally expensive for our problem at hand. Free probability is a highly noncommunicative probability theory with independence based on free products instead of tensor products [356]. Basic examples include the asymptotic behavior of large Gaussian random matrices. The freeness (its beauty and fruitfulness) is the central concept [357].

Independent symmetric Gaussian matrices which are random matrices (also noncommunitative matrix-valued random variables) are asymptotic free. See Appendix A.5 for details on noncommunicative matrix-valued random variables: random matrices are their special cases.

In this subsection, we take the liberty of drawing material from [12, 13]. Here we are motivated for spectrum sensing and (possible) other applications in cognitive radio network. Free probability is a mathematical theory that studies noncommunitative random variables. The “freeness” is the analogue of the classical notation of independence, and it is connected with free products. This theory was initiated by Dan Voiculescu around 1986, who made the statement [16]:

images

His first motivation was to study the von Neumann algebras of free groups. One of Voiculescu's central observations was that such groups can be equipped with tracial states (also called states), which resemble expectations in classical probability.

What is the spectrum of the sum A + B [358]? For deterministic matrices A and B one cannot in general determine the eigenvalues of A + B from those of A and B alone, as they depend on the eigenvectors of A and B as well. However, it turns out that for large random matrices A and B satisfying a property called freeness, the limiting spectrum of the sum A + B can indeed be determined from the individual spectra of A and B. This is a central result in free probability theory.

Define the functional φ as

images

ϕ stands for the normalized expected trace of a random matrix.

The matrices A1, …, Am are called free if

images

whenever

  • p1, …, pk are polynomials in one variable;
  • i1i2i3 ··· ≠ ik (only neighboring elements are required to be distinct);
  • images for all j = 1, …, k.

For independent random variables, the joint distribution can be specified completely by the marginal distributions [359]. For free random variables, the same result can be proven, directly from definition. In particular, if X and Y are free, then the moments φ[(X + Y)n] of X and Y can be completely specified by the moments of X and the moments of Y. The distribution is naturally called free convolution of the two marginal distributions. Classical convolution can be computed via transforms: the log moment generating function of the distribution of X + Y is the sum of the log moment generating function of the individual distributions of X and Y. In contrast, for free convolution, the appropriate transform is called the R-transform. This is defined via the Steltjes transform given by (5.34).

Asymptotic Freeness

To apply the theory of free probability to random matrix theory, we need to extend the definition of free to asymptotic freeness, by replacing the state functional φ by ϕ

images

The expected asymptotic pth moment is ϕ(Ap) and ϕ(I) = 1. The definition of asymptotic freeness is analogous to the concept of independent random variables. However, statistical independence does not imply asymptotic freeness.

The Hermitian random matrices A and B are asymptotic free if for all l and for all polynomials pi(·) and qi(·) with 1 ≤ il such that

images

We state the following useful relationships for asymptotically free A and B

images

One approach to characterize the asymptotic spectrum of a random matrix is to obtain its moments of all orders. The moments of a noncommunicative polynomial p(A, B) of two asymptotically free random matrices can be computed from the individual moments of A and B. Thus, if p(A, B), A and B are Hermitian, the asymptotic spectrum of p(A, B) depends on only those of A and B, even if they do not have the same eigenvectors!


Example 5.13 (Moments of polynomial matrix function p(A, B) = A + B)
Let us consider the important special case of p(A, B) = A + B. Under images, the sample covariance matrix has the form

images

All higher moments can be computed analogously.

[13] compiles a list of some of the most useful instances of asympotic freeness that have been shown so far. Let us list some here:

1. Any random matrix and the identity are asymptotically free.
2. Independent Gaussian standard Wigner matrices are asymptotically free.
3. Let X and Y be independent standard Gaussian matrices. Then images and images are asymptotically free.
4. Independent standard Wigner matrices are asymptotically free.

Sum of Asymptotic Free Random Matrices

Free probability is useful mainly due to the following theorem.


Theorem 5.61 (Sum of two asymptotic free random matrices)
If A and B are asymptotically free random matrices, then the R-transform of their sum satisfies

images


In particular, the following translation property is valid

images


Theorem 5.62 (Free probability central limit theorem)
If A1, A2, … are a sequence of N × N asymptotically free random matrices. Assume that ϕ(Ai) = 0 and ϕ(Ai2) = 1. Further assume that images for all k. Then, as m, N → ∞, the asymptotic spectrum of

images

converges in distribution to the semicircle law, that is, for every k,

images


Let us revisit the problem of sum of K random matrices in Section 3.6. The K sample covariance matrices are asymptotic free.


Example 5.14 (HHH [13])
Let H be an N × m random matrix whose entries are zero-mean i.i.d. Gaussian random variables with variance images and denote

images

We can represent

5.119 5.130

with si an N-dimensional vector whose entries are zero-mean i.i.d. with variance images, it can be shown that as N, m → ∞ with images, the asymptotic spectrum of the matrix

images

is the semicircle law.

 


Example 5.15 (Sum of K (random) sample covariance matrices in Section 3.3)
The sample covariance matrices have the form

images

where Yk have m row vectors and N column vectors. A long data record is divided into K segments; each segment can be used to estimate the sample covariance matrix. The sum of K sample covariance matrices is

images

Under images : only Gaussian noise is present, each Sk is of the form of (5.119). Thus the sum has the form

images

The sum of K sample covariance matrices will make the asymptotic spectrum more like the semicircle law since in practice images with faster rate.

Products of Asymptotic Free Random Matrices

The S-transform plays an analogous role to the R-transform for products (instead of sum) of asymptotically free matrices.


Theorem 5.63
Let A and B be nonnegative asymptotically free random matrices. The S-transform of their products satisfies

images


The S-transform is the free analog of the Mellin transform in classical probability theory, whereas the R-transform is the free analog of the log-moment generating function in classical probability theory.

There are useful theorems [11] to calculate ϕ[(A + B)n] and ϕ[(AB)n].

Moments of the Sums and Products


Theorem 5.64 ([13])
Consider matrices A1, …, Al whose size is such that the product A1, …, Al is defined. Some of these matrices are allowed to be identical. Omitting repetitions, assume that the matrices are asymptotically free. Let ρ be the partition of images determined by the equivalence relation jk if ij = ik. For each partition images of images, let

images

There exist universal coefficients images such that

images

where images indicates that images is finer than ρ.

Finding an explicit formula for the coefficients images is a nontrivial combinatorial problem that has been solved by Speicher [360]. From Theorem 5.64, ϕ(A1 ··· Al) is completely determined by the moments of the individual matrices.


Theorem 5.65 ([11])
Let A and B be nonnegative asymptotically free random matrices. Then, the moments of their sum A + B are expressed by the free cumulants of A and B as

images

where the summation is over all noncrossing partitions of 1, …, n, cl(A) denotes the lth free cumulant of A, and |V| denotes the cardinality of V.

Theorem 5.65 is based on the fact that, if A and B be nonnegative asymptotically free random matrices, the free cumulants of the sum satisfy

images


Theorem 5.66 ([11])
Let A and B be nonnegative asymptotically free random matrices. Then, the moments of their sum A + B are expressed as

images

where the summation is over all noncrossing partitions of 1, …, n.

5.7.1 Large Random Matrices and Free Convolution

5.7.1.1 Random Matrices and Free Random Variables

In free probability, large random matrices is an example of “free” random variables. Let AN be an N × N symmetric (or Hermitian) random matrix with real eigenvalues. So the two-dimensional complex problem is converted into a one-dimensional real-value problem. The probability measure on the set of its eigenvalues

images

(counted with multiplicities) is given by

images

We are interested in the limiting spectral measure μA as N → ∞. This limiting spectral measure is uniquely characterized by its moments, when compactly supported. We refer to A as an element of the “algebra” with probability measure μA and moments above.

For two random matrices AN and BN with limiting probability distribution μA and μB, we would like to compute the limiting probability distribution for AN + BN and ANBN in terms of the moments of μA and μB. As treated above, the appropriate structure of “freeness,” analogous to independence for “classical” random variables, is what we need to impose on AN and BN, in order to compute these distributions. Since A and B do not commute we are dealing with noncommutative algebra. Since all possible products of A and B are allowed, we have the “free” products, that is, all words in A and B are allowed. We have already dealt with how to compute the moments of these products. The connection with random matrices comes in, because a pair of random matrices AN and BN are asymptotically free, that is, in the limit of N → ∞, so long as at least one of AN or BN has what amounts to eigenvectors that are uniformly distributed with Haar measure. This result is stated precisely in [356].

Table 5.3 lists definitions of R-transform and S-transform and their properties.

5.7.1.2 Free Additive Convolution

When AN and BN are asymptotically free, the (limiting) spectral measure μAB for random matrices of the form

images

is given by the free additive convolution of the probability measures μA and μB and written as [356]

5.120 5.86

An algorithm in terms of the so-called R-transform exists for computing μA+B from μA and μB. See [356] for details and [361] for computational issues.

5.7.1.3 Free Multiplicative Convolution

When AN and BN are asymptotically free, the (limiting) spectral measure μAB for random matrices of the form

images

is given by the free multiplicative convolution of the probability measures μA and μB and written as [356]

5.121 5.87

The algorithm for computing μAB is given in [254, 361–364].

The convolution operators on the noncommunicative algebra of large random matrices exist, and can be computed efficiently (e.g., in MATLAB codes). Symbolic computational tools are now available to perform these nontrial computations efficiently [361, 362]. These tools enable us to analyze the structure of sample covariance matrices and design algorithms that take advantage of this structure [254].

5.7.1.4 Applications to Rank Estimation and Spectrum Sensing

Since the Wishart matrix so formed in (5.13) has eigenvectors that are uniformly distributed with Haar measure, the matrices R and W(α) are asymptotically free! Thus the limiting probability measure images can be obtained using free multiplication convolution as

5.122 5.88

where images is the limiting probability measure on the true covariance matrix R and μW is the Marchenko-Pastur density [251], which is defined in (5.3). As given in (5.7), the limiting spectral measure of R is simply

images

The free probability results are exact when N → ∞, but the predictions are very accurate for N ≈ 8, for rank estimation [254].


Example 5.16 (Rank estimation)
Let HHH in (5.7) have np of its eigenvalues of magnitude ρ and n(1 − p) of its eigenvalues of manitude 0 where p < 1. This corresponds to H being an n × L matrix with L < n with images so that L of its singular values are of magnitude images while the eigenvectors of H are unknown or random. Since free multiplicative convolution predicts the spectrum of the sample covariance matrix images so accurately such that we can use free multiplicative deconvolution, to infer the parameters of the underlying covariance matrix model from just one realization of the sample covariance matrix!
The first three moments of images can be analytically parameterized in terms of the unknown parameters β, ρ and the known parameter c = n/N as

images

Given an n × N observation matrix Yn, we can compute estimates of the first three moments as

images

Since we know c = n/N, we can estimate ρ, p by simply solving the nonlinear system of equations (minimizing the least squares)

images

For the example of n = 200 and p = 0.5, the estimated rank is within 1 dimension of the true rank of the system which is np = 100.

 


Example 5.17 (Spectrum sensing)
Consider the standard form of (5.10), which is repeated here for convenience,

5.123 5.131

The true covariance matrices are

5.124 5.132

The conventional approach to find the power of the received signal plus noise is to use (5.124). In practice, the usual approaches are to use large sample covariance matrices through (5.123). Indeed, the sample covariance matrix is connected with the true covariance matrix by the property of Wishart distribution through (5.12).
Using (5.122), we can convert the problem of calculating the sample covariance matrix images into the problem of calculating the true covariance matrix R, with the help of the Wishart matrix W(c)! Recall that images is formed from an n × N Gaussian random matrix. Once again, c is defined as the limit n/Nc > 0 as n, N → ∞. Under images, we have the form of (5.7). We can thus calculate the limiting probability measure μimages using (5.12).

5.7.3 Vandermonde Matrices

For notation and some key theorems, we follow [365] closely. Vandermonde matrices have a central role in signal processing such as the fast Fourier transform or Hadamard transforms. A Vandermonde matrix with complex entries on the unit circle has the following form

5.125 5.89

where the factor images and the assumption of images are included to ensure that the analysis will give limiting asymptotic behavior defined in the asymptotic regime of

5.126 5.90

We are interested in the case where ω1, …, ωL are independent and identically distributed (i.i.d.), taking values in [0, 2π]. The ωi is called phase distributions. V will be only to denote Vandermonde matrices in this section with a given phase distribution, and the dimensions of the Vandermonde matrices will always be N × L.

[111] has some related results. The overwhelming majority of the known results are concerned about Gaussian matrices or matrices with independent entries. Very few results are available in the literature on matrices whose structure is strongly related to the Vandermonde case.

Often, we are interested in only the moments. It will be shown that asymptotically, the moments of the Vandermonde matrices V depend only on the ratio c and the phase distributions, and have explicit expressions. Moments are useful for performing deconvolution.

The normalized trace is defined as

images

The matrices Dr(N), 1 ≤ rn will denote nonrandom diagonal L × L matrices, where we implicitly assume that images.

We say the images have the joint limit distribution as N → ∞ if the limit

images

exists for all choices of images.

The concepts from partition theory are needed. We denote by images the set of all partitions of images, and use ρ as notation for a partition in images. We write images, where Wj will be used to denote the blocks of ρ. |ρ| = k denotes the number of blocks in ρ and |Wj| will represent the number of entries in a given block.

For images, with images, we define

images

For images, define

images

where

5.127 5.91

are i.i.d. (indexed by the blocks of ρ), all with the same distribution as ω, and where b(k) is the block of ρ which contains k (notation is cyclic, that is, b(0) = b(n). If the limit

images

exists, then we call it a Vandermonde mixed moment expansion coefficient.


Theorem 5.67 ([365])
Assume that the images have a joint limit distribution as N → ∞. Assume also that all Vandermonde mixed moment expansion coefficients Kρ, ω exist. Then, the limit

images

also exists when images, and equals

images


For the case of Vandermonde matrices with uniform phase distribution, the noncrossing partitions play a central role. Let u denote the uniform distribution on [0, 2π].


Theorem 5.68 ([365])
Assume D1(N) = D2(N) = ··· = Dn(N), set images, and define

images

When ω = μ, we have that

images


Let us consider generalized Vandermonde matrices defined as

5.128 5.92

where f is called the power distribution, and is a function from [0, 1] to [0, 1]. We also consider the more general case when f is replaced with a random variable λ,

5.129 5.93

with the λi i.i.d. and distributed as λ, defined and taking values in [0, 1], and also independent from the ωj.

For (5.128) and (5.129), define

images

where images are defined as in (5.127). If the limits

images

exist, then they are called Vandermonde mixed moment expansion coefficients.


Theorem 5.69 ([365])
Theorem 5.67 holds also when Vandermonde matrices (5.125) are replaced with generalized Vandermonde matrices on either form (5.128) or (5.129), and with Kρ, ω replaced with either Kρ, ω, f or Kρ, ω, λ.

 


Theorem 5.70 ([365])
Assume that the images have a joint limit distribution as N → ∞. Assume also that V1, V2, … are independent Vandermonde matrices with the same phase distribution ωi, and that the density of ω is continuous. Then, the limit

images

exists when images. The limit is 0 when n is odd, and equals

5.130 5.133

where

images

is the partition where two blocks are the even numbers, and the odd numbers.

 


Corollary 5.4 ([365])
The first three mixed moments

images

of independent Vandermonde matrices V1, V2 are given by

images

where

images


In particular, when the phase distribution is uniform, the first three moments are given by

images


Theorem 5.71 ([365])
Assume that images are independent Vandermonde matrices, where Vi has continuous phase distribution ωi. Denoted by images, the density of ωi. Then, (5.130) still holds, with Kρ, ω replaced by

images

where ρi consists of all numbers k such that ik = i.

 


Example 5.18 (Detection of the number of sources [365])
In this example, d is the distance between the antennas whereas λ is the wavelength. The ratio images is a figure of the resolution with which the system will be able to separate users in space. Let us consider a central node equipped with N receiving antennas, and with L mobiles (each with a single antenna). The received signal at the central node is given by

5.131 5.134

where
  • yi is the N × 1 received vector,
  • xi is the L × 1 transmit vector by the L users; we assume images,
  • wi is N × 1 additive, white, Gaussian noise of variance images, and
  • all components in xi and wi are assumed independent.
In the case of line of sight between the users and the central node, for a uniform linear array (ULA), the matrix V has the following form

5.132 5.135

Here, θi is the angle of the user and is supposed to be uniformly distributed over [ − α, α]. P1/2 is an L × L diagonal power matrix due to the different distances from which the users emit. The phase distribution has been assumed to have the form images with θ uniformly distributed on [ − α, α].
By taking inverse function, the density is, for images,

images

on images, and 0 elsewhere.
By defining

5.133 5.136

(5.131) is rewritten as

images

The sample covariance matrix can be written as

images

If we have only the sample covariance matrix S, in order to get an estimate of P, we have three independent parts to deal with: X, W, V. We can achieve this by combining Gaussian decomposition [366] and Vandermonde deconvolution by the following steps:
1. Estimate the moments of images using multiplicative free convolution [262]. This is the denoising part.
2. Estimate the moments of PVVH, using multiplicative free convolution.
3. Estimate the moments of P using Vandermonde deconvolution in the paper of [365].

 


Proposition 5.9 ([365])
Define

images

and denote the moments of P and S by

images

Then, the equations

images

provide an asymptotically unbiased estimator for the moments Pi from the moments of Si (or vice versa) when

images


 


Example 5.19 (Estimation of the number of paths [365])
Consider a multipath channel

images

Here xi are i.d. Gaussian random variables with power Pi and τi are uniformly distributed delay over [0, T]. The xi represent the attenuation factors due to different physical mechanisms such as reflections, refractions, or diffractions. L is the total number of paths. In the frequency domain, the channel is

images


A generalized multipath model that has taken into account the per-path pulse distortion [367–373] is relevant to the context. The so-called scatter centers that are used for the radar community are mathematically modeled by the multiple maths that are used in wireless communications. As a result, this work bridges the gap between two communities. Deeper research can be pursued using this mathematical analogy between two different systems. Physically, the two systems are equivalent.

By sampling the continuous frequency signal at sampling rate images

where B is the bandwidth (in Hertz), we have (for a given channel realization)

5.134 5.94

where

images

We set here B = T = 1, which implies that the ωi of (5.125) are uniformly distributed over [0, 2π). When additive noise w is taken into account, our model again becomes that of (5.131): The only difference is that the phase distribution of the Vandermonde matrix now is uniform. L now is the number of paths, N the number of frequency samples, and P is the unknown L × L diagonal power matrix. Taking K observations, we reach the same form as in (5.133). We can do even better than Proposition 5.9. Our estimators for the moments are unbiased for any number of observations K and frequency samples N.


Proposition 5.10 ([365])
Assume that V has uniform phase distribution, and let Pi be the moments of P, and Si = tr(Si) the moments of the sample covariance matrix. Define also

images

Then,

images

Wavelength in (5.132) can be also estimated. See [365] for details.

 


Example 5.20 (Signal reconstruction and estimation of the sampling distribution [365])
Consider the signal y(t) as a superposition of its N frequency components

5.135 5.137

We sample the continuous signal y(t) at time instants t = [t1, …, tL] with ti ∈ [1]. (5.135) can be written equivalently as

images

In the presence of noise, one has

images

with

images

and x and w are defined in (5.131). V is defined as our standard model (5.125). [374] has a similar analysis for such cases.
We define

images

Consider the asymptotic regime

images


 


Proposition 5.11 ([365])

5.136 5.138

where In is defined in Proposition 5.19.

Consider a phase distribution ω which is uniform on [0, α], and 0 elsewhere. The density is thus images on [0, α], and 0 elsewhere. In this case we have

images

The first of these equations, combined with (5.136), enable us to estimate α.

Certain matrices similar to Vandermonde matrices have analytical expressions for the moments. In [375], the matrices with entries of the form Ai, j = Fi, ωj) are considered. This is relevant to the Vandermonde matrices since

images


Example 5.21 (Vandermonde matrices with unit complex entries [376])
Consider the network with M mobile users talking to a base station with N antenna elements, arranged in a uniform linear array. The antenna array response is a Vandermonde matrix. We refer to [376] for this example.

5.7.4 Convolution and Deconvolution with Vandermonde Matrices

In the large dimensional limit, certain random matrices a deterministic behavior of the eigenvalue distribution [377]. In particular, one can obtain the eigenvalue distribution of AB and A + B, based on only the individual eigenvalue distributions of A and B, when the matrices are independent and large. This operation is called convolution, and the inverse operation is called deconvolution.

Gaussian-like matrices fit into this setting, since the concept of freeness [11] can be used. [364] used large Wishart matrices. Random matrix theory was used in [9]; other deterministic equivalents [17, 281, 298, 378] are used; Although used successfully [366], all these techniques can only treat very simple models, that is, one of these considered matrices are unitarily invariant.

The method of moments, which is the focus in this section, is very appealing and powerful when freeness does not apply, for which we still do not have a general framework. It requires the combinatorial skills and can be used for a large class of random matrices. Compared with the Stieltjes transform, this approach has the main drawback that it rarely provides the exact eigenvalue distribution. In many applications, however, we only need a subset of the moments. We mainly follow Ryan and Debbah (2011) [377] for our development.

A N × N Vandermonde matrix V is defined in (5.125). We repeat it here for convenience:

5.137 5.95

The ω1, …, ωL, also called phase distributions, will be assumed i.i.d., taking values in [0, 2π]. Similarly, we consider the asymptotic regime defined in (5.126): N and L go to infinity at the same rate, and write images.

In Section 5.7.2, the limit eigenvalue distributions of combinations of VHV and diagonal matrices D(N) were shown to be dependent on the limit eigenvalue distributions of the two matrices.

Define

5.138 5.96

where V1, V2, … are assumed independent, with phase distributions ω1, …, ωL.

Consider the following four expressions:

1. images
2. images
3. images
4. images.

Theorem 5.72 ([377])
Let Vi be independent Ni × L Vandermonde matrices with aspect ratios images and phase distributions ωi with continuous densities in [0, 2π]. The limit

5.139 5.139

always exists, when Di(N) have a joint limit distribution, whenever the matrix product is well-defined and square. Moreever, (5.138) converges almost surely in distribution to the limit in (5.139). When σ ≥ [0, 1]n (that is, there are no terms in the form of VrHVs with Vr and Vs independent and with different phase distributions), (5.139) can be expressed as a formula in the aspect ratio ci, σ, and the individual moments

5.140 5.140


A special case of Theorem 5.72 is considered here. This theorem states in particular that

images

depends only on the moments. This expression characterizes the singular law of a sum of independent Vandermonde matrices. Also, expressions 1 and 3 are found to only rely on the spectra of the component matrices. For convolution expression 1, we have the following corollary.


Corollary 5.5 ([377])
Assume that V has a phase distribution with continuous density, and define

images

where images. Whenever either images or images are known, and images (or images) are known, then images (or images) are uniquely determined.

For expression 3, we have the following corollary.


Corollary 5.6 ([377])
Assume that V1 and V2 are independent Vandermonde matrices where the phase distributions have continuous densities, and set

images

Mn and Nn are completely determined by V2(i), V3(i), … and the aspect ratios

images

Also, whenever either images or images are known, and images are known, then images are uniquely determined.

For expression 4, we have the following corollary.


Corollary 5.7 ([377])
Assume that V1 and V2 are independent Vandermonde matrices with the same continuous density, and set

images

Then, images are uniquely determined from images.

The spectral separability seems to be a phenomenon for large N-limit. We are only aware of Gaussian and deterministic matrices where spectral separability occur in finite case [379]. The moments of Hankel, Markov, and Toeplitz matrices [287] are relevant to this context.

A practical example is studied in [377]:

1. From observations of the form images or images, one can infer on either the spectrum of D(N), or the spectrum or phase distribution of V, when exactly one of these is unknown.
2. From observations of the form images or images, one can infer on the spectrum or phase distribution of one of the Vandermonde matrices, when one of the Vandermonde matrices is known.

The example only makes an estimate of the first moments of the component matrix D(N). These moments can give valuable information: in cases where it is known that there are few distinct eigenvalues, and the multiplicities are known, only some lower order moments are needed, in order to get an estimate of these eigenvalues.

5.7.5 Finite Dimensional Statistical Inference

We follow [379] for the development here, converting to our notation. Given X and Y are two N × N independent square Hermitian (or symmetric) random matrices:

1. Can one derive the eigenvalues distribution of X from the ones of X + Y and Y? If feasible in the large N-limit, this operation is named additive free deconvolution.
2. Can one derive the eigenvalues distribution of X from the ones of XY and Y? If feasible in the large N-limit, this operation is named multiplicative free deconvolution.

The method of moments [380] and the Stieltjes transform method [381] can be used. The expression is simple, if some kind of asymptotic freeness [11] of the matrices is assumed. Freeness, however, is not valid for finite matrices. Remarkably, the method of moments can still be used for this purpose. The general finite-dimensional statistical inference framework was proposed [379], and the codes for MATLAB implementation are available [382]. The calculations are tedious. Only Gaussian matrices are addressed. But other matrices such as Vandermonde matrices can also be implemented in the same vein. The general case is more difficult.

Consider the doubly correlated Wishart matrix [383]. Let M, N be positive integers, W be M × N standard, complex, Gaussian, and D (deterministic) M × M and E N × N. Given any positive integer p,, the following moments

images

exist and can be calculated [379].

The framework of [379] enables us to compute the moments of many types of combinations of independent, Gaussian- and Wishart random matrices, without any assumptions on the matrix dimensions. Since the method of moments only encode information about the lower order moments, it lacks much information which is encoded naturally into the Stieltjes transform; spectrum estimation based on the Stieltjes transform is more accurate than the case when a few moments are used. One interesting question is to ask how many moments are typically required, in order to reach the performance close to that of the Stieltjes transform.


Example 5.22 (MIMO rate estimation [379])
One has K noisy symbol-observations of the channel

images

where D is an M × N deterministic channel matrix, Wi is an M × N standard, complex, Gaussian representing the noise, and σ is the noise variance. The channel D is assumed to stay constant during K symbols measurements. The rate estimator is given by

images

where images is the SNR, and λi are the eigenvalues of images.
The problem falls within the framework suggested before. The extra parameter did not appear in any of the main theorems of [379]. An unbiased estimator for the expression of

images

has been derived in [379].

 


Example 5.23 (Understanding network in a finite time [379])
In cognitive MIMO network, one must learn and control the “black box” (wireless channel) with vector inputs and vector outputs. Let y be the output vector, and x and w, respectively, the input signal and the noise vector,

5.141 5.141

By defining

images

we have

images

In the Gaussian case, the rate is given by

5.142 5.142

where RY is the covariance of the output signal vector, and RW is the covariance of the noise vector. According to (5.142), one can fully find the information transfer of the system, by knowing only the eigenvalues of RY and RW. Unfortunately, the receiver has only access to a limited number (samples) of N observations of the output vector y, not the covariance matrix RY. In other words, the system has access to only the sample covariance matrix imagesY, not the true covariance matrix RY. Here, we define

images

When x and w in (5.141) are both Gaussian vectors, we can write y as

5.143 5.143

where z is an i.i.d. standard Gaussian vector. The problem falls, therefore, in the realm of inference with a correlated Wishart model defined by

images

where

images


 


Example 5.24 (Power estimation [379])
Under the assumption of a large number of observations, the finite-dimensional inference framework was not strictly required in the above two examples. The observations can, instead, be stacked into a large matrix, where asymptotic results are more suitable. This example illustrates a model, where it is unclear how to apply such a stacking strategy, thus, making the finite-dimensional results more useful. In many multiuser MIMO applications, one needs to determine the power of each user. Consider the system given by

images

where H, P, si, wi are, respectively, the N × M channel gain matrix, the M × M diagonal power matrix due to the different distances from which the users transmit, the M × 1 vector of signals and the N × 1 vector representing the noise with variance σ. In particular, P, si, wi are independent standard, complex, Gaussian matrices and vectors. We suppose that we have K observations of the received signal vector yi, during which the channel gain matrix H stays constant.
Consider the 2 × 2 matrix

images

We can estimate the moments of the matrix P from the moments of the matrix YYH, where Y = [y1, …, yK] is the component observation matrix.
We assume that we have an increasing number of observations K of the matrix Y, and perform an averaging over the estimated moments—we average across a number of block fading channels. From the estimated moments of P, we can then estimate its eigenvalues. When K increases, the prediction is close to the true eigenvalues of P. K = 1, 200 was considered in [379].

 

 

1 Multiple-input, multiple-output (MIMO) has a special meaning in wireless communications.

2 This table is primarily compiled from [278].

3 MIMO has a special meaning in the context of wireless communications. This informal name VIVO captures our perception of the problem. Vector nature is fundamental. Vector space is the fundamental mathematical space for us to optimize the system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset