24. MIMO PLC Hardware Feasibility Study

In order to prove the concept of multiple-input multiple-output (MIMO) for power line communications (PLCs), a feasibility study was being implemented. The MIMO PLC system was designed at Sony EuTEC laboratories and its basic parameters are based on an existing single-input single-output (SISO) PLC system implementation [1]. The design focuses on the desired application of comparing SISO and MIMO and rapid development. Hence, the design is not based on any PLC standard. It is a proprietary system that serves to understand and investigate some implementation-specific issues related to current broadband PLC systems as introduced in Chapters 12 through 14. The implemented demonstrator system consists of two modems, a transmitter and a receiver. It does not include a full multi-node media access control (MAC) layer. As an application, the system allows high-definition (HD) video streaming and monitoring several system parameters such as measured maximum bitrate and bit error ratio (BER). First, an overview of the implemented system is provided (see Section 24.2). A detailed description of the implemented MIMO blocks is given in Sections 24.3 through 24.6. Section 24.7 discusses possible changes of the implemented system architecture. Finally, the verification of the system (Section 24.8) concludes the chapter.

24.2 System Architecture

Figure 24.1 shows the setup of the demonstrator system. The left PC serves as transmitter and the PC shown on the right acts as the receiver. Four analog front-ends are mounted on each of the two PCs, allowing each PC to act as a transmitter or receiver. Only two of the front-ends are used in transmit mode. Two coaxial cables connect the front-ends of the transmitter to the Tx MIMO coupler (delta-style coupler). The transmit coupler is connected to an artificial MIMO channel (MIMO artificial mains [MAM] network; for details see Section 24.8). The receiver coupler is also connected to the MAM (bottom right-hand side in the figure) and couples the signals of the four receive ports to the four analog front-ends of the receiver. Each of the transmitter and receiver modems consists of a standard personal computer (PC) running Linux as an operating system. Each PC includes a peripheral component interconnect (PCI) board containing two Xilinx Virtex 5 field programmable gate arrays (FPGAs). The FPGAs realise the real-time functions of the demonstrator system, while the PC implements the control functions in software. The functions implemented in software are described in [1]; the key parameters are briefly summarised here. MATLAB® applications running on a third remote PC (not depicted in Figure 24.1) allow the control and monitoring of the PLC modems via hypertext transfer protocol (HTTP). Each embedded MIMO PLC modem computer includes a web server running on the Linux system, which communicates with the PLC drivers. Several applications like video streaming sources generate user datagram protocol (UDP) payload data to be transmitted and received via power line. The PLC driver is an extension of a standard Linux Ethernet driver implementing additional application programming interfaces (APIs) and hardware access functions.

FIGURE 24.1
Hardware setup.

The PLC driver implements both a network driver as an interface to the host PC’s Internet protocol (IP) stack and a device driver as an interface to the FPGA hardware. The remote PC might render the video stream.

The system architecture shown in Figure 24.2 focuses on the MIMO-specific blocks. The top row illustrates the blocks of the transmitter, while the bottom row illustrates the receiver (from right to left). Both, transmitter and receiver chain, are implemented on the FPGAs. The MIMO PLC channel connects the two transmit ports of the transmitter to the four receive ports of the receiver. As described earlier, a MATLAB application controls and monitors the PLC driver which, in turn, controls the FPGA of the transmitter and receiver. Note that although the PLC driver and interface are depicted only once in the figure, they are physically running on each of the PCs, whereas the MATLAB graphical user interface (GUI) is located on the third, remote PC. The input bit stream is first processed by the forward error correction (FEC) block which basically inserts redundancy bits (details see later). From a functional point of view, at this location, the bit stream would be split into the two streams of the two (logical) MIMO streams. However, the two MIMO streams are multiplexed into one signal path at a higher clock rate. The number of MIMO streams is indicated at the signal paths in Figure 24.2. By choosing this design, only one quadrature amplitude modulation (QAM) mapper and only one orthogonal frequency division multiplex (OFDM) modulation block are required. The adaptive QAM assigns the input bits to complex QAM symbols according to the constellation of each subcarrier. The PLC driver registers the constellation to be used by each subcarrier. Depending on the QAM constellation, the block decides how many input bits are modulated for each subcarrier. The QAM mapping is realised by a look-up table (LUT) which assigns to each input bit combination a complex QAM symbol. The next unit is the power allocation (PA). Here, three different PA coefficients 0,1, 2–√ $\sqrt{2}$ are used (refer to the simplified PA algorithm in Chapter 8). The PA coefficient is derived from the constellations. If one stream is not bit loaded, the other stream assigns twice the power. The complex symbols of each stream are assigned to a vector which is multiplied by the precoding matrix V depending on the codebook index of each subcarrier (see Section 24.6). The codebook comprises a predefined set of different precoding matrices, where the index defines which precoding matrix is selected out of this set of predefined precoding matrices. The codebook indices are programmed by the PLC driver. The next block inserts training symbols before entering the OFDM modulation. Each burst consists of 4 training OFDM symbols and up to 20 data OFDM symbols. The OFDM modulation applies a 2048 inverse fast Fourier transform (IFFT) and inserts an OFDM guard interval of length 1/16. The guard interval consists of a cyclic prefix, which is a copy of the tail of the OFDM symbol. The guard interval is needed to prevent intersymbol interference (ISI) that would otherwise be caused by the multipath channel. The block transmitter (Tx) front-end summarises several blocks: A digital quarter period mixer shifts the complex baseband signal to 0–40 MHz to obtain the real-valued signal to be transmitted. A preamble is inserted before each burst. The preamble consists of a constant amplitude with zero auto correlation (CAZAC) sequence and is needed for time synchronisation in the receiver. Finally, the analog front-ends feed the signals to the channel. They consist of two digital-to-analogue (D/A) converters, low-pass filters, line driver amplifiers and couplers.

Similar blocks follow in reverse order in the receiver. The receiver (Rx) front-end includes four analog front-ends (couplers, band-pass filters, automatic gain control [AGCs] and analog-to-digital [A/D] converters) and the quarter period mixers. The band-pass filter limits the input signal to the interesting frequencies between 4 and 30 MHz. The AGC includes a programmable gain amplifier (PGA) to control the signal level at the A/D. A quarter period mixer converts the received signal to the complex baseband. A correlation function individually for each reception path detects the CAZAC sequence of the preamble, which is needed for time synchronisation and to define the individual gain level for the AGC of each Rx path. The OFDM demodulation cuts the cyclic prefix and transforms the signal back to the frequency domain by applying the fast Fourier transform (FFT). A splitter separates the training and data symbols. Beside channel estimation, the training symbols are used to estimate the sample clock offset (not shown in the figure). The clock offset arises from different oscillator frequencies in the transmitter and receiver. The clock offset estimation controls the voltage-controlled crystal oscillator (VCXO) of the A/D and ensures that the clock frequencies of the transmitter and receiver are synchronised. Instead of using an expensive VCXO, digital clock offset compensation might be realised by phase rotating the QAM symbols, depending on the subcarrier index and OFDM symbol index. Based on the received training symbols, the channel and signal-to-noise ratio (SNR) estimation derive the channel matrix of each subcarrier and SNR values of the two logical MIMO streams, respectively (see Section 24.5). The channel matrices serve as input to the pseudoinverse block which calculates the detection matrices. The detection (or equaliser) matrices are applied on the received data symbols by the MIMO detection unit (see Section 24.3). The training symbols are not precoded. Hence, MIMO detection is unable to take the precoding into account and separate compensation of the precoding is required. Changes in the design if the training symbols are also precoded are discussed in Section 24.7. The QAM demodulation recovers the bit sequence obtained from the received QAM symbols, depending on the constellations programmed by the PLC driver. The FEC corrects bit errors and removes the redundancy bits. Based on the detection matrix, the codebook index of each subcarrier is derived in the codebook search (see Section 24.6). The codebook indices and SNR values are passed to the PLC driver. The PLC driver determines the constellations, according to the SNR values, and programmes the constellations and codebook indices into the transmitter and receiver. It is also possible to monitor several internal signals, for example, the received training symbols or channel estimation results.

FIGURE 24.2
System: Functional overview.

Figure 24.3 shows details of the FEC chain in the transmitter. Basically, the reverse order is used in the receiver. First, the incoming bits to be transmitted are added with redundancy by a Reed–Solomon (RS) code [2]. The RS operation is a block code, which corrects a maximum number of errors depending on the length of the code. The bit errors can be located anywhere inside the block. The RS code serves as an outer code to correct remaining errors which could not be corrected by the inner code. The time interleaver rearranges the order of the bits to spread burst errors into separated and isolated bit errors, which are then corrected by the RS code. Burst errors may occur in the case of impulsive noise. A convolution encoder is used as an inner code where redundancy is added to the input bits. A Viterbi decoder [2] is applied in the receiver. The frequency interleaver changes the order of the bits to ensure that adjacent bits are not mapped to adjacent subcarriers. Frequency-selective noise or deep fades of the transfer function may only affect a few subcarriers, resulting in burst errors if no frequency interleaver is used. More advanced codes like turbo codes or low-density parity check (LDPC) codes may be used to improve the performance further. However, as already discussed, this demonstrator implementation focuses on the MIMO-specific blocks.

FIGURE 24.3
FEC in the transmitter.

TABLE 24.1

Basic Physical Layer Parameters

MIMO parameters
MIMO setup	Up to 2 Tx and up to 4 Rx ports
MIMO modes	Two-stream or one-stream beamforming (based on 7 bit codebook for matrix quantisation), spatial multiplexing (beamforming switched off), SISO
OFDM parameters
Sampling frequency (MHz)	80
Frequency band (MHz)	0–40, active subcarriers: 4–30
FFT points	2048
Number of active subcarriers (4–30 MHz)	1296
Carrier spacing (kHz)	19.53
Symbol length (μs)	51.2
Guard interval (μs)	3.2 (1/16)
QAM and FEC parameters
Adaptive modulation (per subcarrier)	QPSK, 16-, 64-, 256-, 1024-QAM
Forward error correction	RS (204, 188), convolutional coding (Viterbi, code rate 1/2 or 3/4)
Maximum transmission speed, gross physical (PHY) layer bitrate (Mbit/s)	506

The basic system parameters are summarised in Table 24.1. Note that the OFDM parameters are the same as used in Chapter 9.

24.3 Detection and Pseudoinverse Calculation

24.3.1 Introduction

Figure 24.2 shows the zero-forcing (ZF)-detection process at the receiver. First, the pseudoinverse of the estimated channel matrix of each subcarrier is calculated. These detection matrices are applied to the received data vectors. According to Equation 8.18 in Chapter 8, the pseudoinverse [⋅]+ is calculated as

H+(2×4)==(HHH)−1((2×4)(4×2))−1HH,(2×4). $\begin{matrix} H^{+} & = & {(H^{H} H)}^{- 1} & H^{H}, \\ (2 \times 4) & = & {((2 \times 4) (4 \times 2))}^{- 1} & (2 \times 4) . \end{matrix}$

(24.1)

The calculation involves a matrix inversion of the 2 × 2 matrix:

HHH=[hH1hH2][h1h2]=[hH1h1hH2h1hH1h2hH2h2]=[hH1h1(hH2h2)HhH1h2hH2h2],≐A=[a11a∗12a12a22]. $\begin{array}{l} H^{H} H = [\begin{matrix} h_{1}^{H} \\ h_{2}^{H} \end{matrix}] [\begin{matrix} h_{1} & h_{2} \end{matrix}] = [\begin{matrix} h_{1}^{H} h_{1} & h_{1}^{H} h_{2} \\ h_{2}^{H} h_{1} & h_{2}^{H} h_{2} \end{matrix}] = [\begin{matrix} h_{1}^{H} h_{1} & h_{1}^{H} h_{2} \\ {(h_{2}^{H} h_{2})}^{H} & h_{2}^{H} h_{2} \end{matrix}], \\ ≐ A = [\begin{matrix} a_{11} & a_{12} \\ a_{12}^{*} & a_{22} \end{matrix}] . \end{array}$

(24.2)

NOTE: a₁₁ and a₂₂ are real. There is a closed-form expression of the inverse of A:

A−1=1a11a22−|a12|2[a22−a∗12−a12a11]. $A^{- 1} = \frac{1}{a_{11} a_{22} - {| a_{12} |}^{2}} [\begin{matrix} a_{22} & - a_{12} \\ - a_{12}^{*} & a_{11} \end{matrix}] .$

(24.3)

Equations 24.1 through 24.3 describe a possible implementation. However, a fixed-point implementation of this approach faces numerical problems. In particular, the calculation of the matrix inverse can be numerically unstable. If the two products a₁₁a₂₂ and |a12|2 ${| a_{12} |}^{2}$ are in the same order of magnitude, the difference becomes very small and 1/(a11a12−|a12|2) $1 / (a_{11} a_{12} - {| a_{12} |}^{2})$ becomes large. Calculations of ‘square products’ of the form H^HH should be also avoided for numerical reasons: The word width of the multiplication output is doubled compared to the input word width for full precision. Also, the calculation requires many multiplications which consume many hardware resources. These reasons motivate the use of another, more efficient algorithm for detection, based on a QR decomposition. The implementation of this algorithm uses numerically stable unitary matrix operations and is well suited for a parallel and efficient realisation [3,4]. The QR decomposition decomposes the channel matrix H into an upper triangular matrix R and a unitary matrix Q:

H(4×2)==Q(4×2)R(2×2)=Q[r110r12r22]. $\begin{matrix} H & = & Q & R & = Q [\begin{matrix} r_{11} & r_{12} \\ 0 & r_{22} \end{matrix}] . \\ (4 \times 2) & = & (4 \times 2) & (2 \times 2) \end{matrix}$

(24.4)

Replacing Equation 24.1 with Equation 24.4 results in the pseudoinverse

H+=R−1QH. $H^{+} = R^{- 1} Q^{H} .$

(24.5)

The calculation of Q involves only unitary matrix operations or rotations. Although the inversion of the triangular matrix R can be computed efficiently, the fixed-point parameters have to be chosen carefully to achieve good numerical results. More sophisticated algorithms avoid the calculation of R⁻¹ [5,6]. These algorithms also support more complex detection algorithms as minimum mean squared error (MMSE) or ordered successive interference cancellation (OSIC). However, the increased computational complexity requires more hardware resources.

The detection algorithm can be expressed using the results earlier (the noise is neglected):

s∨=H+r=R−1QHrr=R−1r˜⎛⎝⎜=R−1QHHQRs=s⎞⎠⎟. $\overset{\lor}{s} = H^{+} r = R^{- 1} \underset{r}{\underset{︸}{Q^{H} r}} = R^{- 1} \tilde{r} (= R^{- 1} Q^{H} \underset{Q R}{\underset{︸}{H}} s = s) .$

(24.6)

The intermediate result r˜=QHr $\tilde{r} = Q^{H} r$ is introduced in Equation 24.6 since Q is not calculated explicitly. Therefore, H⁺ is also not calculated explicitly. Q is characterised by several rotation angles, and the multiplication of r by Q^H to obtain r˜=QHr $\tilde{r} = Q^{H} r$ is actually a rotation of r by these rotation angles describing Q and Q^H, respectively (as described in Section 24.3.2). Figure 24.2 shows the block diagram of the described operations; see the blocks pseudoinverse and MIMO detection. First, the QR decomposition is applied on the estimated channel matrix H resulting in R and the rotation angles describing Q. Next, R⁻¹ is calculated. Q^H and R⁻¹ are stored into a memory. For detection, the stored results are applied to the received data vectors.

24.3.2 Pseudoinverse Calculation Based on QR Decomposition

Starting from a mathematical description of the idea of the QR decomposition, the hardware architecture is derived in the following.

The goal of the QR decomposition is to transform H into an upper triangular matrix R:

H=⎡⎣⎢⎢⎢⎢h11h21h31h41h12h22h32h42⎤⎦⎥⎥⎥⎥→R=⎡⎣⎢⎢⎢⎢r11000r12r2200⎤⎦⎥⎥⎥⎥. $H = [\begin{matrix} h_{11} & h_{12} \\ h_{21} & h_{22} \\ h_{31} & h_{32} \\ h_{41} & h_{42} \end{matrix}] \to R = [\begin{matrix} r_{11} & r_{12} \\ 0 & r_{22} \\ 0 & 0 \\ 0 & 0 \end{matrix}] .$

(24.7)

The algorithm works iteratively, introducing zeros in each step by applying unitary rotation matrices (so-called Givens rotations).

In a first step, the entries of the first column of H are turned into real values by multiplying by the matrix

⎡⎣⎢⎢⎢⎢⎢⎢ejα(1)10000ejα(1)20000ejα(1)30000ejα(1)4⎤⎦⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢h11h21h31h41h12h22h32h42⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢h¯11h¯21h¯31h¯41h′12h′22h′32h′42⎤⎦⎥⎥⎥⎥, $[\begin{matrix} e^{j α}_{1}^{(1)} & 0 & 0 & 0 \\ 0 & e^{j α}_{2}^{(1)} & 0 & 0 \\ 0 & 0 & e^{j α}_{3}^{(1)} & 0 \\ 0 & 0 & 0 & e^{j α}_{4}^{(1)} \end{matrix}] [\begin{matrix} h_{11} & h_{12} \\ h_{21} & h_{22} \\ h_{31} & h_{32} \\ h_{41} & h_{42} \end{matrix}] = [\begin{matrix} {\bar{h}}_{11} & {h^{'}}_{12} \\ {\bar{h}}_{21} & {h^{'}}_{22} \\ {\bar{h}}_{31} & {h^{'}}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}],$

(24.8)

where the bar (x¯) $(\bar{x})$ indicates real numbers and the prime (x’) indicates which matrix entries are also affected by the operations. The rotation angles are calculated as

α(1)1=−arctanI{h11}R{h11},⋮α(1)4=−arctanI{h41}R{h41}. $\begin{matrix} α_{1}^{(1)} = - \arctan \frac{ℑ {h_{11}}}{ℜ {h_{11}}}, \\ ⋮ \\ α_{4}^{(1)} = - \arctan \frac{ℑ {h_{41}}}{ℜ {h_{41}}} . \end{matrix}$

(24.9)

The superscript (1) in Equation 24.9 indicates that the angles correspond to the first column of H. Based on the results obtained by the right-hand side of Equation 24.8, the first 0 is introduced in the next step:

⎡⎣⎢⎢⎢⎢⎢cosβ(1)1sinβ(1)100−sinβ(1)1cosβ(1)10000100001⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢h¯11h¯21h¯31h¯41h′12h′22h′32h′42⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢h¯′110h¯31h¯41h′′12h′′22h′32h′42⎤⎦⎥⎥⎥⎥, $[\begin{matrix} \cos β_{1}^{(1)} & - \sin β_{1}^{(1)} & 0 & 0 \\ \sin β_{1}^{(1)} & \cos β_{1}^{(1)} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} {\bar{h}}_{11} & {h^{'}}_{12} \\ {\bar{h}}_{21} & {h^{'}}_{22} \\ {\bar{h}}_{31} & {h^{'}}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}] = [\begin{matrix} {\bar{h}}^{'}_{11} & {h^{'}}^{'}_{12} \\ 0 & {h^{'}}^{'}_{22} \\ {\bar{h}}_{31} & {h^{'}}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}],$

(24.10)

β(1)1=−arctanh¯21h¯11. $β_{1}^{(1)} = - \arctan \frac{{\bar{h}}_{21}}{{\bar{h}}_{11}} .$

(24.11)

The next two zeros are introduced similarly by

⎡⎣⎢⎢⎢⎢⎢cosβ(1)20sinβ(1)200100−sinβ(1)20cosβ(1)200001⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢h¯′110h¯31h¯41h′′12h′′22h′32h′42⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢h¯′′1100h¯41h′′′12h′′22h′′32h′42⎤⎦⎥⎥⎥⎥⎥, $[\begin{matrix} \cos β_{2}^{(1)} & 0 & - \sin β_{2}^{(1)} & 0 \\ 0 & 1 & 0 & 0 \\ \sin β_{2}^{(1)} & 0 & \cos β_{2}^{(1)} & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} {\bar{h}}^{'}_{11} & {h^{'}}^{'}_{12} \\ 0 & {h^{'}}^{'}_{22} \\ {\bar{h}}_{31} & {h^{'}}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}] = [\begin{matrix} {\bar{h}}^{'}^{'}_{11} & {h^{'}}^{'}^{'}_{12} \\ 0 & {h^{'}}^{'}_{22} \\ 0 & {h^{'}}^{'}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}],$

(24.12)

β(1)2=−arctanh¯31h¯′11. $β_{2}^{(1)} = - \arctan \frac{{\bar{h}}_{31}}{{\bar{h}}^{'}_{11}} .$

(24.13)

and

⎡⎣⎢⎢⎢⎢⎢cosβ(1)300sinβ(1)301000010−sinβ(1)300cosβ(1)3⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢⎢h¯′′1100h¯41h′′′12h′′22h′′32h′42⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢h¯′′′11000h′′′′12h′′22h′′32h′′42⎤⎦⎥⎥⎥⎥, $[\begin{matrix} \cos β_{3}^{(1)} & 0 & 0 & - \sin β_{3}^{(1)} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \sin β_{3}^{(1)} & 0 & 0 & \cos β_{3}^{(1)} \end{matrix}] [\begin{matrix} {\bar{h}}^{'}^{'}_{11} & {h^{'}}^{'}^{'}_{12} \\ 0 & {h^{'}}^{'}_{22} \\ 0 & {h^{'}}^{'}_{32} \\ {\bar{h}}_{41} & {h^{'}}_{42} \end{matrix}] = [\begin{matrix} {\bar{h}}^{″}^{'}_{11} & {h^{″}}^{″}_{12} \\ 0 & {h^{″}}_{22} \\ 0 & {h^{″}}_{32} \\ 0 & {h^{″}}_{42} \end{matrix}],$

(24.14)

β(1)3=−arctanh¯41h¯′′11. $β_{3}^{(1)} = - \arctan \frac{{\bar{h}}_{41}}{{\bar{h}}^{″}_{11}} .$

(24.15)

After these steps, the manipulation of the first-column vector is finished and r11=h¯′′′11 $r_{11} = {\bar{h}}^{‴}_{11}$ and r12=h''''12 $r_{12} = h_{12}^{″ ″}$ are determined. In the equations earlier, the number of the primes indicates the number of iterations which affects this variable. This iterative nature of the algorithm is reflected later by feedbacks in the hardware design.

Next, zeros have to be introduced into the second column. The operations described for the first column have to be repeated for the second column. The dimension is reduced by one. Again, first h′′22,h′′32 ${h^{″}}_{22}, {h^{″}}_{32}$ and h′′42 ${h^{″}}_{42}$ have to be made real delivering the rotation angles α(2)1,α(2)2 $α_{1}^{(2)}, α_{2}^{(2)}$ and α(2)3 $α_{3}^{(2)}$ . Introducing the zeros provides the rotation angles β(2)1 $β_{1}^{(2)}$ and β(2)2 $β_{2}^{(2)}$ and the last element of R: r₂₂. The product of all rotation matrices is equal to Q^H since the rotation matrices are applied from the left-hand side:

QHH=R,⇔H=QR. $\begin{array}{l} Q^{H} H = R, \\ \Leftrightarrow H = Q R . \end{array}$

(24.16)

The calculated rotation angles describe Q^H. Note that r₁₁ and r₂₂ are real, while r₁₂ is in general complex.

FIGURE 24.4
Rotation: Rotation of input vector (x, y) by ϕ (a) and corresponding block diagram (b).

Basically, the algorithm of the QR decomposition as described earlier includes two operations: on the one hand, the application of rotations by the given rotation angles and, on the other hand, the calculation of these rotation angles.

24.3.2.1 Rotation

The rotation of an input vector (x, y) by the angle ϕ is described by

x′=xcosϕ−ysinϕ,y′=xsinϕ+ycosϕ. $\begin{array}{l} x^{'} = x \cos ϕ - y \sin ϕ, \\ y^{'} = x \sin ϕ + y \cos ϕ . \end{array}$

(24.17)

Note that the application of the rotation matrices in Equations 24.8 and 24.10 through 24.14 affects two elements in each column. Figure 24.4a and b illustrates the rotation and the corresponding block diagram, respectively.

24.3.2.2 Vectoring

To calculate the rotation angles, one output of the rotation of (x, y) is set to 0. In Equations 24.8 and 24.9, the imaginary part is forced to be 0, while in Equations 24.10 through 24.15, zeros are introduced to the matrix. Forcing y’ = 0 in Equation 24.17 results in

x′=x2+y2−−−−−−√=xcosϕ−ysinϕ,y′=0=xsinϕ+ycosϕ $\begin{array}{l} x^{'} = \sqrt{x^{2} + y^{2}} = x \cos ϕ - y \sin ϕ, \\ y^{'} = 0 = x \sin ϕ + y \cos ϕ \end{array}$

(24.18)

with

ϕ=−arctanyx. $ϕ = - \arctan \frac{y}{x} .$

(24.19)

The rotation shown in Figure 24.5a is performed in such a way that the output vector lies on the x˜ $\tilde{x}$ -axis. Figure 24.5b shows another interpretation of the vectoring unit. Assuming that x˜ $\tilde{x}$ and y˜ $\tilde{y}$ represent the real and imaginary part in the complex plane, the equations earlier calculate the phase angle and absolute value of the complex input x + jy.

FIGURE 24.5
Vectoring: Rotation of the input vector (x, y) to the x˜ $\tilde{x}$ axis (a) and corresponding block diagram (b).

FIGURE 24.6
System architecture of the QR decomposition.

Figure 24.6 shows the architecture of the QR decomposition using the rotation and vectoring blocks of Figures 24.4 and 24.5, respectively [3,4]. The architecture is a so-called systolic array implementation, in which different processing units are placed in parallel. There are several data paths between the different processing units in contrast to a linear data path. The design includes two blocks labelled outer cell 1 and outer cell 2. An outer cell block consists of two vectoring blocks as defined in Figure 24.5b. Furthermore, the inner cell block consists of three rotation blocks as introduced in Figure 24.4b. The first column of the channel matrix H serves as input to outer cell 1, while the second column serves as input to the inner cell. The feedback paths contain delay elements (labelled by Δ) which are needed to align the signals.

First, the entries of the first column of H are phase rotated to become real in the upper vectoring unit of outer cell 1 according to Equation 24.8. The vectoring unit provides the corresponding rotation angles and the real entries of the first-column vector. Equation 24.8 shows that the second column of H has to be phase rotated by the same angles. This rotation is applied in the first rotation block of the inner cell. The output of the first vectoring unit serves as input to the second vectoring unit (input y) of outer cell 1. For the processing of the first element h¯11 ${\bar{h}}_{11}$ , the x input of the vectoring unit is set to 0 via the multiplexer. Since this input is 0, h¯11 ${\bar{h}}_{11}$ is guided through the vectoring unit without change resulting in an output angle of −π/2. When the next element h¯21 ${\bar{h}}_{21}$ arrives at the y input, h¯11 ${\bar{h}}_{11}$ is available at the x input via the feedback and the first 0 is introduced according to Equations 24.10 and 24.11. The result of this operation is h¯′11 ${\bar{h}}^{'}_{11}$ , which will be combined with the next input h¯31 ${\bar{h}}_{31}$ to introduce the second 0 according to Equations 24.12 and 24.13. After the next step, the third 0 is introduced resulting in r₁₁. The obtained rotation angles are used to manipulate the second column of H according to Equations 24.10, 24.12 and 24.14 in the two lower rotation units of the inner cell. Since the entries of the second column are complex, two rotation units are needed, each for real and imaginary parts. The output of the inner cell provides the manipulated second-column vector which includes r₁₂. The output is connected to the second outer cell which introduces the remaining zeros and calculates the corresponding rotation angles.

The vectoring and rotation algorithms can be efficiently implemented by the coordinate rotation digital computer (CORDIC) algorithm [7]. The CORDIC algorithm iteratively solves trigonometric equations.* The CORDIC algorithm applies iteratively predefined rotation angles, which become smaller in each iteration by only using shift and add operations. In a pipelined implementation, cascaded CORDIC stages realise the iterations. The higher the number of stages, the higher the precision. To satisfy an m-bit precision CORDIC operation, m + 1 iterations are needed and the data path word length has to be m + 2 + log₂(m) [9]. The number of CORDIC stages mainly determines the latency. The vectoring units use the CORDIC algorithm. The latency L of the vectoring unit has to be considered in the design of the pseudoinverse calculation of Figure 24.6. The feedback signal of the second vectoring (input x) and the second input signal (input y) have to be aligned. Because of the feedback loop and the latency of the vectoring unit (L > 1), the input signals h¯11,h¯21,h¯31 ${\bar{h}}_{11}, {\bar{h}}_{21}, {\bar{h}}_{31}$ and h¯41 ${\bar{h}}_{41}$ of the input y have to be spread by the latency of the vectoring unit. This spreading would decrease the throughput by a factor of the latency. However, the calculations have to be performed for each subcarrier and changing the order of the input avoids any idling cycles of the vectoring unit. First, h¯11 ${\bar{h}}_{11}$ of the first L subcarriers serve as input, followed by h¯21 ${\bar{h}}_{21}$ of the first L subcarriers and so on. By changing the order in that way, the feedback signal of the vectoring unit is aligned and the design is fully pipelined without any idling cycles. The rotation unit could also be implemented with the help of CORDIC. However, here, the rotation according to Equation 24.17 is realised by multipliers and LUTs to implement the sine and cosine functionality. The FPGA provides digital signal processing (DSP) slices, which are used to implement the multipliers. The combination of multipliers and sine/cosine LUTs has the advantage of smaller latency compared to the CORDIC algorithm and the available resources of DSP slices of the FPGA are fully utilised. The implementation makes use of the Xilinx Cores available for the CORDIC algorithm [10], the sine/cosine LUTs [11] and multipliers [12].

24.3.2.3 Calculation of R⁻¹

So far, the matrix R has been calculated. Now, the inverse of R has to be found. The inverse of the upper triangular 2 × 2 matrix R=[r˜110r˜12r˜22] $R = [\begin{matrix} {\tilde{r}}_{11} & {\tilde{r}}_{12} \\ 0 & {\tilde{r}}_{22} \end{matrix}]$ is calculated by

R−1=[r˜110r˜12r˜22]=[1r110−r12⋅1r11⋅1r221r22]. $R^{- 1} = [\begin{matrix} {\tilde{r}}_{11} & {\tilde{r}}_{12} \\ 0 & {\tilde{r}}_{22} \end{matrix}] = [\begin{matrix} \frac{1}{r_{11}} & - r_{12} \cdot \frac{1}{r_{11}} \cdot \frac{1}{r_{22}} \\ 0 & \frac{1}{r_{22}} \end{matrix}] .$

(24.20)

The implemented pipeline design uses one divider [13] and three multipliers to realise the calculations according to Equation 24.20. Details can be found in [14].

24.3.3 ZF Detection

Figure 24.2 shows the ZF detection in hardware. First, Q^H is applied to the received data vector r. Figure 24.7 shows the implementation of the application of the rotation angles; r_m denotes the received symbol of receive port m (m = 1, …,4). Two inner cells are cascaded. The inner cells have the same structure as the inner cells introduced in Figure 24.6.

The multiplication by R⁻¹ requires eight real multiplications. The eight multiplications are realised by two hardware multipliers utilising the four-times-multiplexed format of the signals. This serialisation at a higher clock rate is similar to the multiplication by the precoding matrix (see later in Section 24.4). Details can be found in [14].

24.3.4 Hardware Parameters

The pipeline structure of the different processing units described in the previous sections require some further data processing to make sure that the signals have the suitable format when passing to the several processing units. Also, the latency of the blocks has to be considered when aligning the signals at the different points of the design. Details of the implementation, including the data formats and the control logic, can be found in [14].

FIGURE 24.7
Detection: Rotation by Q^H.

FIGURE 24.8
BER performance of the fixed-point pseudoinverse implementation.

To verify the fixed-point implementation, the model under test is embedded into the double-precision environment. Figure 24.8 shows the BER, depending on the SNR, for the fixed-point implementation of the pseudoinverse. 1024-QAM is chosen as modulation scheme since it is the most sensitive one to errors. No FEC and no precoding is applied. The figure shows the average performance for randomly generated channel matrices. The channels were normalised in such a way that the norm of the rows is equal to 1, that is, the SNR value corresponds to the SNR per receiving port. Figure 24.8 compares the doubleprecision (64 bit) detection matrix (exact detection matrix) to the detection matrix where only the rotation angles are obtained from the fixed-point model (Q quantised) and to the complete fixed-point detection matrix (R⁻¹ and Q quantised). The performance loss is very small especially for operation points of the BER between 10⁻³ and 10⁻².

24.4 Precoding: Multiplication by V

Figure 24.9 shows the codebook-based precoding at the transmitter (for details about the codebook, see Section 24.6). The codebook indices of all subcarriers are stored in the index memory (block ‘index memory’ in Figure 24.9). This memory is written by the PLC driver every time when new codebook indices are available. The codebook index determines the precoding matrix V out of the codebook. In the matrix multiplication block, the precoding matrix V is applied to the transmit symbol vectors. The compensation of the precoding at the receiver works similarly.

FIGURE 24.9
Codebook-based precoding at the transmitter.

The matrix multiplication is described by

s=Vb=[υ1υ2−υ∗2υ1][b1b2]=[υ1b1−υ∗2b2υ2b1+υ1b2], $s = V b = [\begin{matrix} υ_{1} & - υ_{2}^{*} \\ υ_{2} & υ_{1} \end{matrix}] [\begin{matrix} b_{1} \\ b_{2} \end{matrix}] = [\begin{matrix} υ_{1} b_{1} - υ_{2}^{*} b_{2} \\ υ_{2} b_{1} + υ_{1} b_{2} \end{matrix}],$

(24.21)

where the precoding matrix is constructed from the codebook entries υ₁ and υ₂ (for details, see [14]).

In order to derive the design of the implemented matrix multiplication, Equation 24.21 is rewritten with real and imaginary part notation:

s=Vb=[R{s1}+jI{s1}R{s2}+jI{s2}],=[R{υ1}+jI{υ1}R{υ2}+jI{υ2}−R{υ2}+jI{υ2}R{υ1}+jI{υ1}][R{b1}+jI{b1}R{b2}+jI{b2}],=[R{υ1}R{b1}−I{υ1}I{b1}−R{υ2}R{b2}−I{υ2}I{b2}R{υ2}R{b1}−I{υ2}I{b1}−R{υ1}R{b2}−I{υ1}I{b2}],+j[I{υ1}R{b1}+R{υ1}I{b1}+I{υ2}R{b2}−R{υ2}I{b2}I{υ2}R{b1}+R{υ2}I{b1}+I{υ1}R{b2}+R{υ1}I{b2}]. $\begin{array}{l} s = V b = [\begin{matrix} ℜ {s_{1}} + j ℑ {s_{1}} \\ ℜ {s_{2}} + j ℑ {s_{2}} \end{matrix}], \\ = [\begin{matrix} ℜ {υ_{1}} + j ℑ {υ_{1}} & - ℜ {υ_{2}} + j ℑ {υ_{2}} \\ ℜ {υ_{2}} + j ℑ {υ_{2}} & ℜ {υ_{1}} + j ℑ {υ_{1}} \end{matrix}] [\begin{matrix} ℜ {b_{1}} + j ℑ {b_{1}} \\ ℜ {b_{2}} + j ℑ {b_{2}} \end{matrix}], \\ = [\begin{array}{l} ℜ {υ_{1}} ℜ {b_{1}} - ℑ {υ_{1}} ℑ {b_{1}} - ℜ {υ_{2}} ℜ {b_{2}} - ℑ {υ_{2}} ℑ {b_{2}} \\ ℜ {υ_{2}} ℜ {b_{1}} - ℑ {υ_{2}} ℑ {b_{1}} - ℜ {υ_{1}} ℜ {b_{2}} - ℑ {υ_{1}} ℑ {b_{2}} \end{array}], \\ + j [\begin{array}{l} ℑ {υ_{1}} ℜ {b_{1}} + ℜ {υ_{1}} ℑ {b_{1}} + ℑ {υ_{2}} ℜ {b_{2}} - ℜ {υ_{2}} ℑ {b_{2}} \\ ℑ {υ_{2}} ℜ {b_{1}} + ℜ {υ_{2}} ℑ {b_{1}} + ℑ {υ_{1}} ℜ {b_{2}} + ℜ {υ_{1}} ℑ {b_{2}} \end{array}] . \end{array}$

(24.22)

The matrix multiplication requires 16 real multiplications. Figure 24.10 shows the design using four multiply and accumulate units. Due to the faster clock rate domain, four clock cycles are used to describe one subcarrier. The four values describing the transmit symbol vector b are multiplexed and serve as input to all of the four multipliers. The four values R{υ1},I{υ1},R{υ2} $ℜ {υ_{1}}, ℑ {υ_{1}}, ℜ {υ_{2}}$ and I{υ2} $ℑ {υ_{2}}$ describing the precoding matrix have to be repeated and reordered to realise the matrix multiplication according to Equation 24.22 The reordering is implemented by an appropriate memory access of the codebook memory. The control logic provides the sign of each of the multiply and accumulate units, depending on the clock cycle, to realise the plus/minus signs in Equation 24.22.

Table 24.2 defines special codebook entries to support different MIMO modes without precoding. The corresponding codebook index allows to select several MIMO modes without additional implementation efforts. If the first entry is used, the precoding matrix is equal to the identity matrix and no precoding is applied. The second entry is interesting for the MISO mode where the second MIMO stream carries no information and the second transmit symbol is set to 0. In this case, this codebook entry applies no precoding for the MISO mode. By selecting the corresponding codebook index of Table 24.2 for all subcarriers, one of the two described MIMO modes without precoding is selected in the demonstrator system.

FIGURE 24.10
Block diagram of precoding: Multiplication by V.

TABLE 24.2

Special Codebook Entries

Codebook Index	1	2
Codebook entry[υ1υ2] $[\begin{matrix} υ_{1} \\ υ_{2} \end{matrix}]$	[10] $[\begin{matrix} 1 \\ 0 \end{matrix}]$	⎡⎣12√12√⎤⎦ $[\begin{matrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{matrix}]$
Description	No precoding	No precoding for MISO (second input must be 0)

24.5 Channel and SNR Estimation

24.5.1 Estimation of the Channel Matrix

Channel estimation is based on four OFDM training symbols. These four training symbols are included at the beginning of each burst and are known at the receiver. MIMO requires a special format of the training symbols to estimate all MIMO paths simultaneously. In the following, only one receive port is considered. The calculation of the other receive ports proceeds likewise. Assume s_t to be the training symbol of one subcarrier. The following orthogonal sequence is transmitted:

Tx port 1: +st+st−st−st, $\begin{matrix} Tx port 1: + s_{t} & + s_{t} & - s_{t} & - s_{t}, \end{matrix}$

Tx port 2: +st−st+st−st. $\begin{matrix} Tx port 2: + s_{t} & - s_{t} & + s_{t} & - s_{t} . \end{matrix}$

The columns represent the time, that is, two positive training symbols are transmitted in the first two time instances, followed by two negative training symbols.

Basically, only two training symbols are needed to separate the two MIMO paths from each transmit port to one receive port. Averaging over four training symbols improves the performance of the estimation. r⁽¹⁾ and r⁽²⁾ are the two received training symbols (the superscript denotes the time slot) which consecutively arrive at receive port one:

r(1)=h11st+h12st,r(2)=h11st−h12st. $\begin{array}{l} r^{(1)} = h_{11} s_{t} + h_{12} s_{t}, \\ r^{(2)} = h_{11} s_{t} - h_{12} s_{t} . \end{array}$

(24.23)

h₁₁ and h₁₂ are the channel coefficients from transmit port 1 and 2 to receive port 1, respectively. The noise is neglected. It is assumed that the channel coefficients do not change during the four time instances. This assumption holds for the quasi-static PLC channel. Combining the two consecutively received symbols results in

r(1)+r(2)=2h11str(1)−r(2)=2h12st⇒⇒h11h12==r(1)−r(2)2st,r(1)−r(2)2st. $\begin{matrix} r^{(1)} + r^{(2)} = 2 h_{11} s_{t} & \Rightarrow & h_{11} & = & \frac{r^{(1)} - r^{(2)}}{2 s_{t}}, \\ r^{(1)} - r^{(2)} = 2 h_{12} s_{t} & \Rightarrow & h_{12} & = & \frac{r^{(1)} - r^{(2)}}{2 s_{t}} . \end{matrix}$

(24.24)

The division by the training symbol simplifies to a multiplication of the conjugate complex training symbol if the absolute value of the training symbol is equal to 1. The division is further simplified if the training symbols are binary phase-shift keying (BPSK) modulated, that is, s_t ∈ {+1, −1}. In this case, only the sign has to be changed. The channel estimation is implemented according to Equation 24.24 with |st|=1 $| s_{t} | = 1$ . Training symbols are allocated to each subcarrier. The channel estimation may be improved by more advanced algorithms. In particular, the duration of the training phase may be reduced by utilising the correlation of neighboured subcarriers. Only a few subcarriers, which are called pilot subcarriers, are allocated by training symbols. The spacing of the pilot subcarriers should be linked to the coherence bandwidth of the channel. The channel coefficients between the pilot subcarriers have to be interpolated.

24.5.2 SNR Estimation

Adaptive QAM modulation requires knowledge about the SNR per subcarrier. Figure 24.11 shows the SNR estimation based on the received training symbols.

First, the received training symbols

r=Hst=n $r = H s_{t} = n$

(24.25)

are equalised by the MIMO detection block, which is the same as used for the data symbols (see Section 24.3). In the next step, the training symbols are subtracted. After ZF detection and subtraction of the training symbols, only the effect of the detection on the noise remains:

H+r−st=H+Hst+H+n−st=H+n. $H^{+} r - s_{t} = H^{+} H s_{t} + H^{+} n - s_{t} = H^{+} n .$

(24.26)

FIGURE 24.11
SNR estimation.

The training symbols are not precoded. Therefore, the effect of precoding on the SNR has to be considered by the multiplication of V^H. This results in the detection matrix W = V^HH⁺, taking into account the effect of precoding. The variance calculation (implemented by squaring the absolute values) results in the SNR of the two logical MIMO streams. The SNR values are averaged over several bursts to get more accurate results.

As shown in Figure 24.11, SNR estimation requires MIMO detection of the received training symbols. According to Figure 24.2, the system design comprises two instances of MIMO detection, one for the received data symbols and one for the received training symbols (within SNR estimation). The second detection unit for the training path may be unnecessary if the training symbols are equalised by the detection unit for the data symbols. The detection unit for the data symbols does not work during the processing of the training symbols and may be reused for the detection of the training symbols. This alternative design would require some reorganisation of the data flow of the design.

24.6 Codebook Search

As introduced in Section 24.2, the unitary precoding (beamforming) is based on a predefined set of precoding matrices (codebook). A matrix quantisation based on a code-book is the optimum quantisation of the precoding matrix since it takes into account the statistical distribution of the precoding matrices in the precoding space. Thus, the feedback overhead to represent the precoding matrix can be reduced to a minimum. A direct rounding scalar quantisation of the parameters describing the precoding matrix is also possible (see Chapter 14). In the implementation presented here, 7 bits per matrix are used to quantise the precoding matrix. For this small number of bits, the codebook-based quantisation achieves notable better performance compared to the direct quantisation. If the precoding matrix should be quantised with a higher resolution (e.g. 12 bit per matrix), the performance gain of the codebook-based approach gets smaller compared to the direct quantisation. The design of the codebook which is used here is described in [14].

Depending on the channel conditions of each subcarrier, the optimum precoding matrix out of the codebook (defined by the index within the codebook) has to be determined for each subcarrier separately. By exploiting the correlation between neighboured subcarriers, it may not be necessary to determine the precoding matrix for each subcarrier separately. The precoding matrix may be defined only for a subset of subcarriers (pilot subcarriers) and the precoding matrix of the other subcarriers may be obtained via interpolation. Alternatively, one precoding matrix may be defined for a group of neighboured subcarriers. This saves feedback information about the precoding matrices and saves memory to store the precoding matrices. A detailed analysis of these approaches can be found in [14]. This section explains how the best precoding matrix is found for each subcarrier based on the channel matrix of each subcarrier. Assume V˜ $\tilde{V}$ is a precoding matrix out of the codebook and H is the channel matrix of one subcarrier. Multiplying the channel matrix H by V˜ $\tilde{V}$ leads to the equivalent channel H˜=HV˜.ZF $\tilde{H} = H \tilde{V} . ZF$ detection applies the pseudoinverse of the equivalent channel matrix to obtain the detection matrix (refer also to Chapter 8):

W=H˜+=(H˜HH˜)−1H˜H=((HV˜)HHV˜)−1(HV˜)H,=(V˜HHHHV˜)−1V˜HHH. $\begin{array}{l} W = {\tilde{H}}^{+} = {({\tilde{H}}^{H} \tilde{H})}^{- 1} {\tilde{H}}^{H} = {({(H \tilde{V})}^{H} H \tilde{V})}^{- 1} {(H \tilde{V})}^{H}, \\ = {({\tilde{V}}^{H} H^{H} H \tilde{V})}^{- 1} {\tilde{V}}^{H} H^{H} . \end{array}$

(24.27)

If V˜ $\tilde{V}$ is a unitary matrix, the inverse matrix V˜−1=V˜H ${\tilde{V}}^{- 1} = {\tilde{V}}^{H}$ exists. Then, Equation 24.27 is equivalent to

W=H˜+=V˜H(HHH)−1V˜V˜HHH=V˜HH+, $W = {\tilde{H}}^{+} = {\tilde{V}}^{H} {(H^{H} H)}^{- 1} \tilde{V} {\tilde{V}}^{H} H^{H} = {\tilde{V}}^{H} H^{+},$

(24.28)

that is, the precoding matrix can be separated from the pseudoinverse of the channel matrix.

If i(1≤i≤2q) $i (1 \leq i \leq 2^{q})$ is the index within the codebook (with q the number of bits to quantise the precoding matrix), each matrix T_i out of the codebook will result in a different detection matrix W_i according to Equation 24.28. The SNR of the two MIMO streams depends on the norm of the rows ∥w1,i∥2 ${‖ w_{1, i} ‖}^{2}$ and ∥w2,i∥2 ${‖ w_{2, i} ‖}^{2}$ of the detection matrix W_i. The lower ∥w1,i∥2 ${‖ w_{1, i} ‖}^{2}$ and ∥w2,i∥2 ${‖ w_{2, i} ‖}^{2}$ , the higher SNR (see also Chapter 8). The optimum precoding matrix out of the codebook is then found by

c′=argminTi{∥w1,i∥2,∥w2,i∥2}, 1≤i≤2q. $c^{'} = \arg \min_{T_{i}} {{‖ w_{1, i} ‖}^{2}, {‖ w_{2, i} ‖}^{2}}, 1 \leq i \leq 2^{q} .$

(24.29)

The unitary precoding matrix V˜ $\tilde{V}$ can be represented by two real parameters υ1(0≤υ1≤1) $υ_{1} (0 \leq υ_{1} \leq 1)$ and ϕ2(−π < ϕ2≤π) $ϕ_{2} (- π < ϕ_{2} \leq π)$ . With υ₁ and υ2=1−υ21ejϕ2−−−−−−−−√, V˜ $υ_{2} = \sqrt{1 - υ_{1}^{2} e^{j ϕ 2}}, \tilde{V}$ may be given by [14]

V˜=[v˜1v˜2]=[υ1υ2−υ∗2υ1]. $\tilde{V} = [\begin{matrix} {\tilde{v}}_{1} & {\tilde{v}}_{2} \end{matrix}] = [\begin{matrix} υ_{1} & - υ_{2}^{*} \\ υ_{2} & υ_{1} \end{matrix}] .$

(24.30)

Using Equation 24.30 in Equation 24.28 results in

WWH=V˜HH+(V˜HH+)H=H+=[p1p2][v˜H1v˜H2][p1pH1p2pH1p1pH2p2pH2][v˜1v˜2], $W W^{H} = {\tilde{V}}^{H} H^{+} {({\tilde{V}}^{H} H^{+})}^{H} \underset{H^{+} = [\begin{matrix} p_{1} \\ p_{2} \end{matrix}]}{=} [\begin{matrix} {\tilde{v}}_{1}^{H} \\ {\tilde{v}}_{2}^{H} \end{matrix}] [\begin{matrix} p_{1} p_{1}^{H} & p_{1} p_{2}^{H} \\ p_{2} p_{1}^{H} & p_{2} p_{2}^{H} \end{matrix}] [\begin{matrix} {\tilde{v}}_{1} & {\tilde{v}}_{2} \end{matrix}],$

(24.31)

and

∥w1∥2=[WWH]11=p1pH1υ21+p1pH2υ1υ2+(p1pH2υ1υ2)H+p1pH2|υ2|2,=∥p1∥2υ21+2R{p1pH2υ1υ2}+∥p2∥2|υ2|2, $\begin{array}{l} {‖ w_{1} ‖}^{2} = {[W W^{H}]}_{11} = p_{1} p_{1}^{H} υ_{1}^{2} + p_{1} p_{2}^{H} υ_{1} υ_{2} + {(p_{1} p_{2}^{H} υ_{1} υ_{2})}^{H} + p_{1} p_{2}^{H} {| υ_{2} |}^{2}, \\ = {‖ p_{1} ‖}^{2} υ_{1}^{2} + 2 ℜ {p_{1} p_{2}^{H} υ_{1} υ_{2}} + {‖ p_{2} ‖}^{2} {| υ_{2} |}^{2}, \end{array}$

(24.32)

∥w2∥2=[WWH]22=∥p2∥2|υ2|2−2R{p1pH2υ1υ2}+∥p2∥2υ21, ${‖ w_{2} ‖}^{2} = {[W W^{H}]}_{22} = {‖ p_{2} ‖}^{2} {| υ_{2} |}^{2} - 2 ℜ {p_{1} p_{2}^{H} υ_{1} υ_{2}} + {‖ p_{2} ‖}^{2} υ_{1}^{2},$

(24.33)

where

p₁ and p₂ are the two row vectors of H⁺

υ₁ is real

In order to find the optimum precoding matrix (which maximises the SNR), the minimum of ∥w1∥2 ${‖ w_{1} ‖}^{2}$ and ∥w2∥2 ${‖ w_{2} ‖}^{2}$ has to be found for the different codebook entries.

Figure 24.12 visualises ∥w1∥2 ${‖ w_{1} ‖}^{2}$ (contour lines) depending on the precoding matrix, as described by the two parameters υ₁ and ϕ₂. Note that ϕ₂ is periodic with 2π. The contour lines ∥w1∥2 ${‖ w_{1} ‖}^{2}$ differ for different channel matrices H There is one minimum and one maximum (for −π < ϕ₂ ≤ π). The minimum (diamond symbol ♦) is achieved by the optimum precoding vector (or precoding matrix). The maximum of ∥w1∥2 ${‖ w_{1} ‖}^{2}$ (star symbol *) is the minimum of ∥w2∥2 ${‖ w_{2} ‖}^{2}$ and represents the orthogonal vector (or optimum precoding matrix with changed columns). The squares □ represent the codebook entries. Here, the number of feedback bits is equal to q = 7, that is, 128 codebook entries. The codebook search delivers the codebook entry closest to the optimum precoding matrix and is marked by the filled black square ■. Evaluating ∥w2∥2 ${‖ w_{2} ‖}^{2}$ results in the codebook entry closest to the orthogonal vector (marked by the circle •). Here, the overall minimum is obtained by the first index (filled black square ■). Note that ∥w1∥2 ${‖ w_{1} ‖}^{2}$ gives an indication of the SNR at the receiver. If a ZF equaliser is used, the SNR is given as Λ1=(1/(∥w1∥2))(ρ/NT) $Λ_{1} = (1 / ({‖ w_{1} ‖}^{2})) (ρ / N_{T})$ with ρ the transmit power to noise power ratio (refer also to Chapter 8). The lower ∥w1∥2 ${‖ w_{1} ‖}^{2}$ , the higher is the SNR.

FIGURE 24.12
Codebook search: Influence of the precoding matrix on ∥w1∥2 ${‖ w_{1} ‖}^{2}$ , number of codebook entries: 2^q = 2⁷ = 128.

A search of all possible codebook entries would be computationally complex and time-consuming. A faster and more efficient codebook search was developed where first a sub-set of codebook entries is determined and the search to find the minimum is limited to this subset. Details of the algorithm may be found in [14].

24.7 Alternative Approach: Precoded Training Symbols

The training symbols are not precoded in the design described earlier. If the training symbols are also precoded, the V matrix block in the receiver is not needed anymore, since the channel estimation works on the equivalent channel matrix and the detection algorithm processes this equivalent channel. There is another advantage: The pseudoinverse calculation of the equivalent channel improves the numerical accuracy since the condition number of the equivalent channel matrix is increased compared to the actual channel matrix without precoding. This becomes especially important for correlated channels. The precoding of the training symbols or the preamble improves the coverage: The probability to reach a certain receiver is increased. It has to be ensured that no hidden nodes are created by optimising the preamble for only one receiver since the preamble has to be received by all modems in the network. The beamforming of the preamble might be optimised to the receiver with the worst channel conditions to improve the coverage and avoid potential hidden nodes. It has to be noted that an arbitrary precoding (no precoding) might also cause hidden nodes.

The codebook search has to be adapted since the precoding needs to be optimised to the actual channel and not to the equivalent channel.

Assume that H₁ is the current channel matrix with the optimum precoding matrix V₁. If the channel changes to H₂, the channel estimation observes the equivalent channel matrix H₂V₁. The optimum precoding matrix V₂ of H₂ has to be found. Performing a singular value decomposition (SVD) of the new equivalent channel results in

H2V1=UDVH. $H_{2} V_{1} = U D V^{H} .$

(24.34)

Replacing H₂ by the SVD yields

H2V1=U2D2VH2V1=UDVH. $H_{2} V_{1} = U_{2} D_{2} V_{2}^{H} V_{1} = U D V^{H} .$

(24.35)

Note that H₂ is not known in the receiver and is used only for the purpose of derivation. From Equation 24.35 follows

VH2V1=VH,⇔V2=V1V. $\begin{array}{l} V_{2}^{H} V_{1} = V^{H}, \\ \Leftrightarrow V_{2} = V_{1} V . \end{array}$

(24.36)

The new precoding matrix V₂ is calculated via the SVD of the new equivalent channel and V₁. The receiver knows which precoding V₁ is applied in the transmitter since the precoding information was sent back from the receiver to the transmitter.

The algorithm is easily extended to the codebook-based precoding. Assume ind1 is the codebook index corresponding to V₁. The codebook search on the equivalent channel H₂V₁ derives ind corresponding to V according to Equation 24.35. The task is to derive ind2 corresponding to V₂, according to Equation 24.36 if ind1 and ind are known. Because of the quantisation of the codebook, there is a finite set of possible combinations. For each possible combination of ind1 and ind, the best index ind2 is precalculated and stored in a 2D LUT. The receiver knows which precoding matrix is used in the transmitter (V₁ or ind1) and the codebook search delivers ind of the equivalent channel. The LUT provides ind2 for the input combination ind1 and ind.

24.8 Verification of the MIMO PLC Demonstrator

This section describes the verification of the implemented prototype system. A configurable MIMO PLC channel (see Section 24.8.1) is used to test the system in the laboratory. Results of the demonstrator in the laboratory and in buildings under real conditions are presented in Section 24.8.2.

24.8.1 MIMO Artificial Mains Network

The MAM is an artificial and configurable MIMO PLC channel [15]. The basic schematic of the MAM is shown in Figure 24.13. The common-mode (CM) and differential-mode (DM) channels of each MIMO path can be configured by several filters (low-pass, high-pass or band-pass filters) or attenuators to model typical PLC transfer functions. The use of the MAM allows simple testing of the system in the laboratory. The MAM consist of three units, the first and third units are responsible to generate asymmetries to the mains network. This causes CM currents (see Figure 1.3 in Chapter 1) flowing toward the ground. The other CM channel unit will pick up these currents from ground forwarding them to the second MIMO PLC modem. The DM channel unit provides the symmetrical channels from transmitter to receiver. The MIMO channel can be adjusted by pluggable filter units. As filter units, low-pass, high-pass, band-pass or band-gaps filters with one or many poles are available. Attenuators are also needed to model a typical PLC transfer function. Furthermore, the MAM includes several coaxial plugs where coax cables with individual lengths might be connected. These cables cause signal reflections analog to the stubs found at the mains grid in private homes. Each of the CM channel unit embeds a filtered connection to the power supply allowing to drive MIMO PLC modems connected to the MAM.

FIGURE 24.13
MAM network.

24.8.2 Results of the Demonstrator System

Figure 24.1 shows the demonstrator system connected to the MAM. In the following, results of the system are presented, in which the MAM was used as MIMO PLC channel. Figure 24.14 shows the absolute value of the channel estimation output which is proportional to the absolute value of the channel transfer function (magnitude response) from each transmit port (two columns) to each receive port (four rows). The scale of the y-axis is a result of the 16 bit signed fixed-point data format used here and has to be scaled by a factor depending on the AGC setting to obtain the magnitude responses. The magnitude responses show the typical frequency-selective shape of power line channels. A high variety between individual paths is visible.

Figure 24.15 shows some of the signals of the detection algorithm. The norm of the rows of the detection matrix is illustrated for the two MIMO streams. Figure 24.15a shows ∥w1∥2 ${‖ w_{1} ‖}^{2}$ and Figure 24.15b shows ∥w1∥2 ${‖ w_{1} ‖}^{2}$ , each without and with precoding. The calculation is based on R⁻¹ (see Section 24.3). R⁻¹ is the result obtained from the pseudoinverse calculation. As explained in Section 8.5.1 of Chapter 8, ∥wp∥2,p=1,2 ${‖ w_{p} ‖}^{2}, p = 1, 2$ , describes the two logical MIMO streams and is inversely proportional to the SNR. The solid lines show the results without precoding. The training symbols are not precoded and the results of the ZF detection do not include the precoding. The dashed lines show the results if precoding is included. The codebook indices obtained from the codebook search of the demonstrator system are used for this calculation. The difference between precoding and no precoding shows the gain of precoding. The first stream is increased significantly (up to 20 dB in some frequency ranges), while the second stream is decreased only slightly. This results in a gain of equivalent number of dB in SNR.

The demonstrator system supports also SISO transmission where only one transmit port and one receive port are used. The data throughput of SISO transmission is compared to the throughput of a MIMO transmission with two transmit and four receive ports. The adaptive modulation is adjusted for error-free transmission with maximum throughput. An HD video is streamed and no bit errors are monitored in the rendered video at the receiver. Also, the RS decoder provides an output which indicates if all bit errors are corrected. This flag is monitored continuously to ensure that the BER after the FEC is equal to 0.

Figure 24.16 shows a screenshot of the MATLAB application during SISO transmission. The application monitors the following key parameters of the transmission (four subplots in Figure 24.16 from top to bottom):

• SNR values and constellations of the OFDM subcarriers

• BER of the transmission

• Data throughput

• AGC settings

FIGURE 24.14
Absolute value of the channel estimation results of the demonstrator system which are proportional to the absolute value of the channel transfer functions from each transmit port (columns) to each receive port (rows).

The thin line of the top figure shows the SNR in dB of each subcarrier in the frequency range from 4 to 30 MHz. The SNR information is derived out of the training symbols. The selected constellations are shown by the bold line (for QAM steps, see right-hand side of the figure). The constellations of this transmission range from QPSK to 1024-QAM. If the SNR of a subcarrier is too low, the subcarrier is not used (e.g. some subcarriers around 18 MHz). The following three subplots show the past 60 s on the x-axis. The second figure shows several parameters of the BER. The BER is monitored at two different points: Firstly, before the Viterbi decoder (BER Viterbi) and, secondly, before the RS decoder (BER RS). The value BER RS indicates how many bits are corrected for one RS block. If the RS decoder is not able to correct an RS block, the corresponding value shows the number of errors (RS failed). RS failed equal to 0 indicates that the RS decoder is able to correct every transmission error.*

FIGURE 24.15
∥wp∥2,p=1,2 ${‖ w_{p} ‖}^{2}, p = 1, 2$ of the two MIMO streams (a) stream 1 and (b) stream 2, depending on the frequency: without and with precoding.

FIGURE 24.16
SISO transmission, screenshot of the monitor application: top figure: measured SNR (thin line) and selected constellations (bold line); second figure: actual BER of the past 60 s, BER before Viterbi decoder (BER Viterbi) and BER before RS decoder (BER RS), RS = 0 indicates that all remaining errors are corrected by the RS decoder; third figure: actual data rate of the past 60 s, raw PHY data rate, including OFDM preamble and including FEC; bottom figure: AGC settings of the past 60 s of all four receivers.

FIGURE 24.17
MIMO transmission, screenshot of the monitor application: top figure: measured SNR (thin line) and selected constellations (bold line) of the two MIMO streams (black and grey); second figure: actual BER of the past 60 s, BER before Viterbi decoder (BER Viterbi) and BER before RS decoder (BER RS), RS = 0 indicates that all remaining errors are corrected by the RS decoder; third figure: actual data rate of the past 60 s, raw PHY data rate, including OFDM preamble and including FEC; bottom figure: AGC settings of the past 60 s of all four receivers.

The third figure shows the actual data throughput. The raw physical (PHY) rate (dash-dotted line) is calculated as the number of bits per OFDM symbol divided by the OFDM symbol duration. Taking the OFDM preamble (CAZAC sequence and training symbols) and the guard interval into account results in the data rate shown by the dashed line (legend: incl. OFDM sync). Multiplying by the code rates of the FEC results in the net PHY data rate (solid line). The bottom figure shows the AGC settings of the four front ends. Note that all front-ends are connected to the coupler and receive the signals of all receive ports. However, in case of SISO transmission, the signals of three of the receive ports are not used during detection. The data rate for error-free HD video transmission is 90 Mbit/s.

Figure 24.17 shows the screenshot of the MIMO transmission for the same channel directly after SISO transmission. No channel change was observed. Basically, the screenshot shows the same parameters as described before. Of course, MIMO transmission uses two (logical) MIMO streams. The SNR of these two streams is shown in the top figure (thin lines). The black line represents the SNR of the first MIMO stream, while the grey line represents the second MIMO stream. The bold lines (black and grey) show the selected constellations of the two MIMO streams. The data rate is 255 Mbit/s.

FIGURE 24.18
Setup of the field test: (a) transmitter and (b) receiver.

The comparison between SISO and MIMO is not completely fair since the transmit power of each of the two transmit ports for MIMO transmission is the same as for SISO transmission at this demonstrator system. The total transmit power of MIMO is thus 3 dB higher, compared to SISO. To keep the total transmit power the same, the transmit power of SISO transmission has to be increased by 3 dB. As discussed in Chapter 7, a back-off of 3 dB is an expected upper limit. It was observed that increasing the transmit power by 3 dB enhances the data rate by approximately 15 Mbit/s.* Comparing the corrected SISO bitrate (≈105 Mbit/s) to the bitrate of MIMO shows an improvement of the bit rate by more than a factor of 2. However, the different feeding power of the SISO and MIMO test in Figures 24.16 and 24.17 are balanced by the receivers AGC settings. The lowest graph in each figure shows these AGC settings. In the MIMO case, all AGC amplifiers are set to a 6 dB reduced (voltage) amplitude. So far the AGC is in operating range; it is ok to compare SISO and MIMO throughput rates.

The demonstrator system was also verified in private homes. Here, the apartment covers two levels with an area of 120 m². Figure 24.18 shows the transmitter (Figure 24.18a) and Figure 24.18b depicts the receiver located in the building. The transmitter is connected via a delta-style coupler to the mains network on the upper level, while the receiver is connected via a star-style coupler on the lower level of the apartment. Two transmit ports are used. The receiver uses all four receive ports (the three star-style ports and the CM port). To receive the CM signals, the coupler is mounted on a wooden board which is covered by a fleece made of copper. This construction can be easily transported. A metal plate could be used as well. A counterpoise with the size of around 1 m² was found to ensure a proper CM reception.

Schneider et al. [16] shows the results of the verification in this building. For SISO transmission, a bitrate of 143 Mbit/s and, for MIMO transmission, a bitrate of 315 Mbit/s were documented.

24.9 Conclusions

The implementation of a MIMO PLC feasibility study in hardware was described in this chapter. The demonstrator system allows up to 2 × 4 MIMO with beamforming. The systems allow monitoring several system parameters including the bit rate and channel estimation results. The system also supports a SISO mode. Comparing SISO and MIMO transmissions, the gain of MIMO is proven. The gain of precoding is shown by this demonstrator system by activating beamforming on the fly. Also, further aspects of MIMO PLC transmission, such as the influence of the number of receive ports and the influence of the noise, were investigated. The verification of the demonstrator system in buildings under real conditions shows the gain of MIMO versus SISO to be more than factor two. The demonstrator system proves the theoretical investigations of MIMO PLC and supported the standardisation work at HomePlug AV2.

References

1 A. Schwager, Powerline communications: Significant technologies to become ready for integration, Dr.-Ing. dissertation, Universität Duisburg-Essen, Essen, Germany, May 2010.

2 D. J. C. MacKay, Information Theory, Inference and Learning Algorithms. Cambridge University Press, New York, 2003.

3 C. Rader, VLSI systolic arrays for adaptive nulling, IEEE Signal Processing Magazine, 13(4), 29–49, 1996.

4 R. Walke, R. Smith and G. Lightbody, Architectures for adaptive weight calculation on ASIC and FPGA, in Asilomar Conference on Signals, Systems, and Computers, vol. 2, Pacific Grove, CA, 1999, pp. 1375–1380.

5 B. Hassibi, An efficient square-root algorithm for BLAST, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Istanbul, Turkey, 2000, pp. 737–740.

6 Z. Guo and P. Nilsson, A VLSI architecture of the square root algorithm for V-BLAST detection, The Journal of VLSI Signal Processing, 44(3), 219–230, September 2006.

7 J. E. Volder, The CORDIC trigonometric computing technique, IRE Transactions on Electronic Computers, EC-8(3), 330–334, 1959.

8 J. S. Walther, A unified algorithm for elementary functions, in Spring Joint Computer Conference, Atlantic City, NJ, 1971, pp. 379–385.

9 J. Valls, M. Kuhlmann and K. K. Parhi, Evaluation of CORDIC algorithms for FPGA design, Journal of VLSI Signal Processing Systems, 32(3), 207–222, 2002.

10 Xilinx, CORDIC v3.0, Xilinx, 2005.

11 Xilinx, Sine/Cosine Look-Up Table v5.0, Xilinx, 2004.

12 Xilinx, Multiplier v10.1, Xilinx, 2008.

13 Xilinx, Divider v2.0, Xilinx, 2008.

14 D. Schneider, Inhome power line communications using multiple input multiple output principles, Dr.-Ing. dissertation, Verlag Dr. Hut, Munich, Germany, January 2012.

15 A. Schwager, D. Schneider, W. Bschlin, A. Dilly and J. Speidel, MIMO PLC: Theory, measurements and system setup, in International Symposium on Power Line Communications and Its Applications, Udine, Italy, 2011.

16 D. Schneider, A. Schwager, J. Speidel and A. Dilly, Implementation and results of a MIMO PLC feasibility study, in International Symposium on Power Line Communications and Its Applications, Udine, Italy, 2011.

* The CORDIC algorithm can also be used to solve a broad range of equations, including hyperbolic and square root equations [8].

* RS failed is not detected if the RS decoder finds valid code words, although many bits are corrupted in a RS block. This case is extremely unlikely for large block sizes. In practice, this will happen only if there is a very erroneous transmission with high BER. Here, a high BER of the Viterbi decoder will indicate this case.

* The BER performance depending on the SNR of uncoded QAM differs by approximately 3 dB if the QAM is increased by 1 bit. Hence, the bit load of each subcarrier is increased by 1 bit if the transmit power is increased by ≈3 dB. This results in an increase of the bitrate of 1296/51.2 μs ≈ 25 Mbit/s. The use of only even QAM constellations, and the fact that not all subcarriers exceed the corresponding SNR thresholds, results in the usually observed enhancement of approximately 15 Mbit/s.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 24. MIMO PLC Hardware Feasibility Study

Create new playlist

Sign In

Sign Up

Table of Contents for
24. MIMO PLC Hardware Feasibility Study