MIMO PLC Hardware Feasibility Study
CONTENTS
24.3 Detection and Pseudoinverse Calculation
24.3.2 Pseudoinverse Calculation Based on QR Decomposition
24.4 Precoding: Multiplication by V
24.5 Channel and SNR Estimation
24.5.1 Estimation of the Channel Matrix
24.7 Alternative Approach: Precoded Training Symbols
24.8 Verification of the MIMO PLC Demonstrator
24.8.1 MIMO Artificial Mains Network
24.8.2 Results of the Demonstrator System
In order to prove the concept of multiple-input multiple-output (MIMO) for power line communications (PLCs), a feasibility study was being implemented. The MIMO PLC system was designed at Sony EuTEC laboratories and its basic parameters are based on an existing single-input single-output (SISO) PLC system implementation [1]. The design focuses on the desired application of comparing SISO and MIMO and rapid development. Hence, the design is not based on any PLC standard. It is a proprietary system that serves to understand and investigate some implementation-specific issues related to current broadband PLC systems as introduced in Chapters 12 through 14. The implemented demonstrator system consists of two modems, a transmitter and a receiver. It does not include a full multi-node media access control (MAC) layer. As an application, the system allows high-definition (HD) video streaming and monitoring several system parameters such as measured maximum bitrate and bit error ratio (BER). First, an overview of the implemented system is provided (see Section 24.2). A detailed description of the implemented MIMO blocks is given in Sections 24.3 through 24.6. Section 24.7 discusses possible changes of the implemented system architecture. Finally, the verification of the system (Section 24.8) concludes the chapter.
Figure 24.1 shows the setup of the demonstrator system. The left PC serves as transmitter and the PC shown on the right acts as the receiver. Four analog front-ends are mounted on each of the two PCs, allowing each PC to act as a transmitter or receiver. Only two of the front-ends are used in transmit mode. Two coaxial cables connect the front-ends of the transmitter to the Tx MIMO coupler (delta-style coupler). The transmit coupler is connected to an artificial MIMO channel (MIMO artificial mains [MAM] network; for details see Section 24.8). The receiver coupler is also connected to the MAM (bottom right-hand side in the figure) and couples the signals of the four receive ports to the four analog front-ends of the receiver. Each of the transmitter and receiver modems consists of a standard personal computer (PC) running Linux as an operating system. Each PC includes a peripheral component interconnect (PCI) board containing two Xilinx Virtex 5 field programmable gate arrays (FPGAs). The FPGAs realise the real-time functions of the demonstrator system, while the PC implements the control functions in software. The functions implemented in software are described in [1]; the key parameters are briefly summarised here. MATLAB® applications running on a third remote PC (not depicted in Figure 24.1) allow the control and monitoring of the PLC modems via hypertext transfer protocol (HTTP). Each embedded MIMO PLC modem computer includes a web server running on the Linux system, which communicates with the PLC drivers. Several applications like video streaming sources generate user datagram protocol (UDP) payload data to be transmitted and received via power line. The PLC driver is an extension of a standard Linux Ethernet driver implementing additional application programming interfaces (APIs) and hardware access functions.
The PLC driver implements both a network driver as an interface to the host PC’s Internet protocol (IP) stack and a device driver as an interface to the FPGA hardware. The remote PC might render the video stream.
The system architecture shown in Figure 24.2 focuses on the MIMO-specific blocks. The top row illustrates the blocks of the transmitter, while the bottom row illustrates the receiver (from right to left). Both, transmitter and receiver chain, are implemented on the FPGAs. The MIMO PLC channel connects the two transmit ports of the transmitter to the four receive ports of the receiver. As described earlier, a MATLAB application controls and monitors the PLC driver which, in turn, controls the FPGA of the transmitter and receiver. Note that although the PLC driver and interface are depicted only once in the figure, they are physically running on each of the PCs, whereas the MATLAB graphical user interface (GUI) is located on the third, remote PC. The input bit stream is first processed by the forward error correction (FEC) block which basically inserts redundancy bits (details see later). From a functional point of view, at this location, the bit stream would be split into the two streams of the two (logical) MIMO streams. However, the two MIMO streams are multiplexed into one signal path at a higher clock rate. The number of MIMO streams is indicated at the signal paths in Figure 24.2. By choosing this design, only one quadrature amplitude modulation (QAM) mapper and only one orthogonal frequency division multiplex (OFDM) modulation block are required. The adaptive QAM assigns the input bits to complex QAM symbols according to the constellation of each subcarrier. The PLC driver registers the constellation to be used by each subcarrier. Depending on the QAM constellation, the block decides how many input bits are modulated for each subcarrier. The QAM mapping is realised by a look-up table (LUT) which assigns to each input bit combination a complex QAM symbol. The next unit is the power allocation (PA). Here, three different PA coefficients 0,1,
Similar blocks follow in reverse order in the receiver. The receiver (Rx) front-end includes four analog front-ends (couplers, band-pass filters, automatic gain control [AGCs] and analog-to-digital [A/D] converters) and the quarter period mixers. The band-pass filter limits the input signal to the interesting frequencies between 4 and 30 MHz. The AGC includes a programmable gain amplifier (PGA) to control the signal level at the A/D. A quarter period mixer converts the received signal to the complex baseband. A correlation function individually for each reception path detects the CAZAC sequence of the preamble, which is needed for time synchronisation and to define the individual gain level for the AGC of each Rx path. The OFDM demodulation cuts the cyclic prefix and transforms the signal back to the frequency domain by applying the fast Fourier transform (FFT). A splitter separates the training and data symbols. Beside channel estimation, the training symbols are used to estimate the sample clock offset (not shown in the figure). The clock offset arises from different oscillator frequencies in the transmitter and receiver. The clock offset estimation controls the voltage-controlled crystal oscillator (VCXO) of the A/D and ensures that the clock frequencies of the transmitter and receiver are synchronised. Instead of using an expensive VCXO, digital clock offset compensation might be realised by phase rotating the QAM symbols, depending on the subcarrier index and OFDM symbol index. Based on the received training symbols, the channel and signal-to-noise ratio (SNR) estimation derive the channel matrix of each subcarrier and SNR values of the two logical MIMO streams, respectively (see Section 24.5). The channel matrices serve as input to the pseudoinverse block which calculates the detection matrices. The detection (or equaliser) matrices are applied on the received data symbols by the MIMO detection unit (see Section 24.3). The training symbols are not precoded. Hence, MIMO detection is unable to take the precoding into account and separate compensation of the precoding is required. Changes in the design if the training symbols are also precoded are discussed in Section 24.7. The QAM demodulation recovers the bit sequence obtained from the received QAM symbols, depending on the constellations programmed by the PLC driver. The FEC corrects bit errors and removes the redundancy bits. Based on the detection matrix, the codebook index of each subcarrier is derived in the codebook search (see Section 24.6). The codebook indices and SNR values are passed to the PLC driver. The PLC driver determines the constellations, according to the SNR values, and programmes the constellations and codebook indices into the transmitter and receiver. It is also possible to monitor several internal signals, for example, the received training symbols or channel estimation results.
Figure 24.3 shows details of the FEC chain in the transmitter. Basically, the reverse order is used in the receiver. First, the incoming bits to be transmitted are added with redundancy by a Reed–Solomon (RS) code [2]. The RS operation is a block code, which corrects a maximum number of errors depending on the length of the code. The bit errors can be located anywhere inside the block. The RS code serves as an outer code to correct remaining errors which could not be corrected by the inner code. The time interleaver rearranges the order of the bits to spread burst errors into separated and isolated bit errors, which are then corrected by the RS code. Burst errors may occur in the case of impulsive noise. A convolution encoder is used as an inner code where redundancy is added to the input bits. A Viterbi decoder [2] is applied in the receiver. The frequency interleaver changes the order of the bits to ensure that adjacent bits are not mapped to adjacent subcarriers. Frequency-selective noise or deep fades of the transfer function may only affect a few subcarriers, resulting in burst errors if no frequency interleaver is used. More advanced codes like turbo codes or low-density parity check (LDPC) codes may be used to improve the performance further. However, as already discussed, this demonstrator implementation focuses on the MIMO-specific blocks.
MIMO parameters |
|
MIMO setup |
Up to 2 Tx and up to 4 Rx ports |
MIMO modes |
Two-stream or one-stream beamforming (based on 7 bit codebook for matrix quantisation), spatial multiplexing (beamforming switched off), SISO |
OFDM parameters |
|
Sampling frequency (MHz) |
80 |
Frequency band (MHz) |
0–40, active subcarriers: 4–30 |
FFT points |
2048 |
Number of active subcarriers (4–30 MHz) |
1296 |
Carrier spacing (kHz) |
19.53 |
Symbol length (μs) |
51.2 |
Guard interval (μs) |
3.2 (1/16) |
QAM and FEC parameters |
|
Adaptive modulation (per subcarrier) |
QPSK, 16-, 64-, 256-, 1024-QAM |
Forward error correction |
RS (204, 188), convolutional coding (Viterbi, code rate 1/2 or 3/4) |
Maximum transmission speed, gross physical (PHY) layer bitrate (Mbit/s) |
506 |
The basic system parameters are summarised in Table 24.1. Note that the OFDM parameters are the same as used in Chapter 9.
24.3 Detection and Pseudoinverse Calculation
Figure 24.2 shows the zero-forcing (ZF)-detection process at the receiver. First, the pseudoinverse of the estimated channel matrix of each subcarrier is calculated. These detection matrices are applied to the received data vectors. According to Equation 8.18 in Chapter 8, the pseudoinverse [⋅]+ is calculated as
(24.1) |
The calculation involves a matrix inversion of the 2 × 2 matrix:
(24.2) |
NOTE: a11 and a22 are real. There is a closed-form expression of the inverse of A:
(24.3) |
Equations 24.1 through 24.3 describe a possible implementation. However, a fixed-point implementation of this approach faces numerical problems. In particular, the calculation of the matrix inverse can be numerically unstable. If the two products a11a22 and
(24.4) |
Replacing Equation 24.1 with Equation 24.4 results in the pseudoinverse
(24.5) |
The calculation of Q involves only unitary matrix operations or rotations. Although the inversion of the triangular matrix R can be computed efficiently, the fixed-point parameters have to be chosen carefully to achieve good numerical results. More sophisticated algorithms avoid the calculation of R−1 [5,6]. These algorithms also support more complex detection algorithms as minimum mean squared error (MMSE) or ordered successive interference cancellation (OSIC). However, the increased computational complexity requires more hardware resources.
The detection algorithm can be expressed using the results earlier (the noise is neglected):
(24.6) |
The intermediate result
24.3.2 Pseudoinverse Calculation Based on QR Decomposition
Starting from a mathematical description of the idea of the QR decomposition, the hardware architecture is derived in the following.
The goal of the QR decomposition is to transform H into an upper triangular matrix R:
(24.7) |
The algorithm works iteratively, introducing zeros in each step by applying unitary rotation matrices (so-called Givens rotations).
In a first step, the entries of the first column of H are turned into real values by multiplying by the matrix
(24.8) |
where the bar
(24.9) |
The superscript (1) in Equation 24.9 indicates that the angles correspond to the first column of H. Based on the results obtained by the right-hand side of Equation 24.8, the first 0 is introduced in the next step:
(24.10) |
(24.11) |
The next two zeros are introduced similarly by
(24.12) |
(24.13) |
and
(24.14) |
(24.15) |
After these steps, the manipulation of the first-column vector is finished and
Next, zeros have to be introduced into the second column. The operations described for the first column have to be repeated for the second column. The dimension is reduced by one. Again, first
(24.16) |
The calculated rotation angles describe QH. Note that r11 and r22 are real, while r12 is in general complex.
Basically, the algorithm of the QR decomposition as described earlier includes two operations: on the one hand, the application of rotations by the given rotation angles and, on the other hand, the calculation of these rotation angles.
The rotation of an input vector (x, y) by the angle ϕ is described by
(24.17) |
Note that the application of the rotation matrices in Equations 24.8 and 24.10 through 24.14 affects two elements in each column. Figure 24.4a and b illustrates the rotation and the corresponding block diagram, respectively.
To calculate the rotation angles, one output of the rotation of (x, y) is set to 0. In Equations 24.8 and 24.9, the imaginary part is forced to be 0, while in Equations 24.10 through 24.15, zeros are introduced to the matrix. Forcing y’ = 0 in Equation 24.17 results in
(24.18) |
with
(24.19) |
The rotation shown in Figure 24.5a is performed in such a way that the output vector lies on the
Figure 24.6 shows the architecture of the QR decomposition using the rotation and vectoring blocks of Figures 24.4 and 24.5, respectively [3,4]. The architecture is a so-called systolic array implementation, in which different processing units are placed in parallel. There are several data paths between the different processing units in contrast to a linear data path. The design includes two blocks labelled outer cell 1 and outer cell 2. An outer cell block consists of two vectoring blocks as defined in Figure 24.5b. Furthermore, the inner cell block consists of three rotation blocks as introduced in Figure 24.4b. The first column of the channel matrix H serves as input to outer cell 1, while the second column serves as input to the inner cell. The feedback paths contain delay elements (labelled by Δ) which are needed to align the signals.
First, the entries of the first column of H are phase rotated to become real in the upper vectoring unit of outer cell 1 according to Equation 24.8. The vectoring unit provides the corresponding rotation angles and the real entries of the first-column vector. Equation 24.8 shows that the second column of H has to be phase rotated by the same angles. This rotation is applied in the first rotation block of the inner cell. The output of the first vectoring unit serves as input to the second vectoring unit (input y) of outer cell 1. For the processing of the first element
The vectoring and rotation algorithms can be efficiently implemented by the coordinate rotation digital computer (CORDIC) algorithm [7]. The CORDIC algorithm iteratively solves trigonometric equations.* The CORDIC algorithm applies iteratively predefined rotation angles, which become smaller in each iteration by only using shift and add operations. In a pipelined implementation, cascaded CORDIC stages realise the iterations. The higher the number of stages, the higher the precision. To satisfy an m-bit precision CORDIC operation, m + 1 iterations are needed and the data path word length has to be m + 2 + log2(m) [9]. The number of CORDIC stages mainly determines the latency. The vectoring units use the CORDIC algorithm. The latency L of the vectoring unit has to be considered in the design of the pseudoinverse calculation of Figure 24.6. The feedback signal of the second vectoring (input x) and the second input signal (input y) have to be aligned. Because of the feedback loop and the latency of the vectoring unit (L > 1), the input signals
So far, the matrix R has been calculated. Now, the inverse of R has to be found. The inverse of the upper triangular 2 × 2 matrix
(24.20) |
The implemented pipeline design uses one divider [13] and three multipliers to realise the calculations according to Equation 24.20. Details can be found in [14].
Figure 24.2 shows the ZF detection in hardware. First, QH is applied to the received data vector r. Figure 24.7 shows the implementation of the application of the rotation angles; rm denotes the received symbol of receive port m (m = 1, …,4). Two inner cells are cascaded. The inner cells have the same structure as the inner cells introduced in Figure 24.6.
The multiplication by R−1 requires eight real multiplications. The eight multiplications are realised by two hardware multipliers utilising the four-times-multiplexed format of the signals. This serialisation at a higher clock rate is similar to the multiplication by the precoding matrix (see later in Section 24.4). Details can be found in [14].
The pipeline structure of the different processing units described in the previous sections require some further data processing to make sure that the signals have the suitable format when passing to the several processing units. Also, the latency of the blocks has to be considered when aligning the signals at the different points of the design. Details of the implementation, including the data formats and the control logic, can be found in [14].
To verify the fixed-point implementation, the model under test is embedded into the double-precision environment. Figure 24.8 shows the BER, depending on the SNR, for the fixed-point implementation of the pseudoinverse. 1024-QAM is chosen as modulation scheme since it is the most sensitive one to errors. No FEC and no precoding is applied. The figure shows the average performance for randomly generated channel matrices. The channels were normalised in such a way that the norm of the rows is equal to 1, that is, the SNR value corresponds to the SNR per receiving port. Figure 24.8 compares the doubleprecision (64 bit) detection matrix (exact detection matrix) to the detection matrix where only the rotation angles are obtained from the fixed-point model (Q quantised) and to the complete fixed-point detection matrix (R−1 and Q quantised). The performance loss is very small especially for operation points of the BER between 10−3 and 10−2.
24.4 Precoding: Multiplication by V
Figure 24.9 shows the codebook-based precoding at the transmitter (for details about the codebook, see Section 24.6). The codebook indices of all subcarriers are stored in the index memory (block ‘index memory’ in Figure 24.9). This memory is written by the PLC driver every time when new codebook indices are available. The codebook index determines the precoding matrix V out of the codebook. In the matrix multiplication block, the precoding matrix V is applied to the transmit symbol vectors. The compensation of the precoding at the receiver works similarly.
The matrix multiplication is described by
(24.21) |
where the precoding matrix is constructed from the codebook entries υ1 and υ2 (for details, see [14]).
In order to derive the design of the implemented matrix multiplication, Equation 24.21 is rewritten with real and imaginary part notation:
The matrix multiplication requires 16 real multiplications. Figure 24.10 shows the design using four multiply and accumulate units. Due to the faster clock rate domain, four clock cycles are used to describe one subcarrier. The four values describing the transmit symbol vector b are multiplexed and serve as input to all of the four multipliers. The four values
Table 24.2 defines special codebook entries to support different MIMO modes without precoding. The corresponding codebook index allows to select several MIMO modes without additional implementation efforts. If the first entry is used, the precoding matrix is equal to the identity matrix and no precoding is applied. The second entry is interesting for the MISO mode where the second MIMO stream carries no information and the second transmit symbol is set to 0. In this case, this codebook entry applies no precoding for the MISO mode. By selecting the corresponding codebook index of Table 24.2 for all subcarriers, one of the two described MIMO modes without precoding is selected in the demonstrator system.
Codebook Index |
1 |
2 |
Codebook entry |
||
Description |
No precoding |
No precoding for MISO (second input must be 0) |
24.5 Channel and SNR Estimation
24.5.1 Estimation of the Channel Matrix
Channel estimation is based on four OFDM training symbols. These four training symbols are included at the beginning of each burst and are known at the receiver. MIMO requires a special format of the training symbols to estimate all MIMO paths simultaneously. In the following, only one receive port is considered. The calculation of the other receive ports proceeds likewise. Assume st to be the training symbol of one subcarrier. The following orthogonal sequence is transmitted:
The columns represent the time, that is, two positive training symbols are transmitted in the first two time instances, followed by two negative training symbols.
Basically, only two training symbols are needed to separate the two MIMO paths from each transmit port to one receive port. Averaging over four training symbols improves the performance of the estimation. r(1) and r(2) are the two received training symbols (the superscript denotes the time slot) which consecutively arrive at receive port one:
(24.23) |
h11 and h12 are the channel coefficients from transmit port 1 and 2 to receive port 1, respectively. The noise is neglected. It is assumed that the channel coefficients do not change during the four time instances. This assumption holds for the quasi-static PLC channel. Combining the two consecutively received symbols results in
(24.24) |
The division by the training symbol simplifies to a multiplication of the conjugate complex training symbol if the absolute value of the training symbol is equal to 1. The division is further simplified if the training symbols are binary phase-shift keying (BPSK) modulated, that is, st ∈ {+1, −1}. In this case, only the sign has to be changed. The channel estimation is implemented according to Equation 24.24 with
Adaptive QAM modulation requires knowledge about the SNR per subcarrier. Figure 24.11 shows the SNR estimation based on the received training symbols.
First, the received training symbols
(24.25) |
are equalised by the MIMO detection block, which is the same as used for the data symbols (see Section 24.3). In the next step, the training symbols are subtracted. After ZF detection and subtraction of the training symbols, only the effect of the detection on the noise remains:
(24.26) |
The training symbols are not precoded. Therefore, the effect of precoding on the SNR has to be considered by the multiplication of VH. This results in the detection matrix W = VHH+, taking into account the effect of precoding. The variance calculation (implemented by squaring the absolute values) results in the SNR of the two logical MIMO streams. The SNR values are averaged over several bursts to get more accurate results.
As shown in Figure 24.11, SNR estimation requires MIMO detection of the received training symbols. According to Figure 24.2, the system design comprises two instances of MIMO detection, one for the received data symbols and one for the received training symbols (within SNR estimation). The second detection unit for the training path may be unnecessary if the training symbols are equalised by the detection unit for the data symbols. The detection unit for the data symbols does not work during the processing of the training symbols and may be reused for the detection of the training symbols. This alternative design would require some reorganisation of the data flow of the design.
As introduced in Section 24.2, the unitary precoding (beamforming) is based on a predefined set of precoding matrices (codebook). A matrix quantisation based on a code-book is the optimum quantisation of the precoding matrix since it takes into account the statistical distribution of the precoding matrices in the precoding space. Thus, the feedback overhead to represent the precoding matrix can be reduced to a minimum. A direct rounding scalar quantisation of the parameters describing the precoding matrix is also possible (see Chapter 14). In the implementation presented here, 7 bits per matrix are used to quantise the precoding matrix. For this small number of bits, the codebook-based quantisation achieves notable better performance compared to the direct quantisation. If the precoding matrix should be quantised with a higher resolution (e.g. 12 bit per matrix), the performance gain of the codebook-based approach gets smaller compared to the direct quantisation. The design of the codebook which is used here is described in [14].
Depending on the channel conditions of each subcarrier, the optimum precoding matrix out of the codebook (defined by the index within the codebook) has to be determined for each subcarrier separately. By exploiting the correlation between neighboured subcarriers, it may not be necessary to determine the precoding matrix for each subcarrier separately. The precoding matrix may be defined only for a subset of subcarriers (pilot subcarriers) and the precoding matrix of the other subcarriers may be obtained via interpolation. Alternatively, one precoding matrix may be defined for a group of neighboured subcarriers. This saves feedback information about the precoding matrices and saves memory to store the precoding matrices. A detailed analysis of these approaches can be found in [14]. This section explains how the best precoding matrix is found for each subcarrier based on the channel matrix of each subcarrier. Assume
(24.27) |
If
(24.28) |
that is, the precoding matrix can be separated from the pseudoinverse of the channel matrix.
If
(24.29) |
The unitary precoding matrix
(24.30) |
Using Equation 24.30 in Equation 24.28 results in
(24.31) |
and
(24.32) |
(24.33) |
where
p1 and p2 are the two row vectors of H+
υ1 is real
In order to find the optimum precoding matrix (which maximises the SNR), the minimum of
Figure 24.12 visualises
A search of all possible codebook entries would be computationally complex and time-consuming. A faster and more efficient codebook search was developed where first a sub-set of codebook entries is determined and the search to find the minimum is limited to this subset. Details of the algorithm may be found in [14].
24.7 Alternative Approach: Precoded Training Symbols
The training symbols are not precoded in the design described earlier. If the training symbols are also precoded, the V matrix block in the receiver is not needed anymore, since the channel estimation works on the equivalent channel matrix and the detection algorithm processes this equivalent channel. There is another advantage: The pseudoinverse calculation of the equivalent channel improves the numerical accuracy since the condition number of the equivalent channel matrix is increased compared to the actual channel matrix without precoding. This becomes especially important for correlated channels. The precoding of the training symbols or the preamble improves the coverage: The probability to reach a certain receiver is increased. It has to be ensured that no hidden nodes are created by optimising the preamble for only one receiver since the preamble has to be received by all modems in the network. The beamforming of the preamble might be optimised to the receiver with the worst channel conditions to improve the coverage and avoid potential hidden nodes. It has to be noted that an arbitrary precoding (no precoding) might also cause hidden nodes.
The codebook search has to be adapted since the precoding needs to be optimised to the actual channel and not to the equivalent channel.
Assume that H1 is the current channel matrix with the optimum precoding matrix V1. If the channel changes to H2, the channel estimation observes the equivalent channel matrix H2V1. The optimum precoding matrix V2 of H2 has to be found. Performing a singular value decomposition (SVD) of the new equivalent channel results in
(24.34) |
Replacing H2 by the SVD yields
(24.35) |
Note that H2 is not known in the receiver and is used only for the purpose of derivation. From Equation 24.35 follows
(24.36) |
The new precoding matrix V2 is calculated via the SVD of the new equivalent channel and V1. The receiver knows which precoding V1 is applied in the transmitter since the precoding information was sent back from the receiver to the transmitter.
The algorithm is easily extended to the codebook-based precoding. Assume ind1 is the codebook index corresponding to V1. The codebook search on the equivalent channel H2V1 derives ind corresponding to V according to Equation 24.35. The task is to derive ind2 corresponding to V2, according to Equation 24.36 if ind1 and ind are known. Because of the quantisation of the codebook, there is a finite set of possible combinations. For each possible combination of ind1 and ind, the best index ind2 is precalculated and stored in a 2D LUT. The receiver knows which precoding matrix is used in the transmitter (V1 or ind1) and the codebook search delivers ind of the equivalent channel. The LUT provides ind2 for the input combination ind1 and ind.
24.8 Verification of the MIMO PLC Demonstrator
This section describes the verification of the implemented prototype system. A configurable MIMO PLC channel (see Section 24.8.1) is used to test the system in the laboratory. Results of the demonstrator in the laboratory and in buildings under real conditions are presented in Section 24.8.2.
24.8.1 MIMO Artificial Mains Network
The MAM is an artificial and configurable MIMO PLC channel [15]. The basic schematic of the MAM is shown in Figure 24.13. The common-mode (CM) and differential-mode (DM) channels of each MIMO path can be configured by several filters (low-pass, high-pass or band-pass filters) or attenuators to model typical PLC transfer functions. The use of the MAM allows simple testing of the system in the laboratory. The MAM consist of three units, the first and third units are responsible to generate asymmetries to the mains network. This causes CM currents (see Figure 1.3 in Chapter 1) flowing toward the ground. The other CM channel unit will pick up these currents from ground forwarding them to the second MIMO PLC modem. The DM channel unit provides the symmetrical channels from transmitter to receiver. The MIMO channel can be adjusted by pluggable filter units. As filter units, low-pass, high-pass, band-pass or band-gaps filters with one or many poles are available. Attenuators are also needed to model a typical PLC transfer function. Furthermore, the MAM includes several coaxial plugs where coax cables with individual lengths might be connected. These cables cause signal reflections analog to the stubs found at the mains grid in private homes. Each of the CM channel unit embeds a filtered connection to the power supply allowing to drive MIMO PLC modems connected to the MAM.
24.8.2 Results of the Demonstrator System
Figure 24.1 shows the demonstrator system connected to the MAM. In the following, results of the system are presented, in which the MAM was used as MIMO PLC channel. Figure 24.14 shows the absolute value of the channel estimation output which is proportional to the absolute value of the channel transfer function (magnitude response) from each transmit port (two columns) to each receive port (four rows). The scale of the y-axis is a result of the 16 bit signed fixed-point data format used here and has to be scaled by a factor depending on the AGC setting to obtain the magnitude responses. The magnitude responses show the typical frequency-selective shape of power line channels. A high variety between individual paths is visible.
Figure 24.15 shows some of the signals of the detection algorithm. The norm of the rows of the detection matrix is illustrated for the two MIMO streams. Figure 24.15a shows
The demonstrator system supports also SISO transmission where only one transmit port and one receive port are used. The data throughput of SISO transmission is compared to the throughput of a MIMO transmission with two transmit and four receive ports. The adaptive modulation is adjusted for error-free transmission with maximum throughput. An HD video is streamed and no bit errors are monitored in the rendered video at the receiver. Also, the RS decoder provides an output which indicates if all bit errors are corrected. This flag is monitored continuously to ensure that the BER after the FEC is equal to 0.
Figure 24.16 shows a screenshot of the MATLAB application during SISO transmission. The application monitors the following key parameters of the transmission (four subplots in Figure 24.16 from top to bottom):
• SNR values and constellations of the OFDM subcarriers
• BER of the transmission
• Data throughput
• AGC settings
The thin line of the top figure shows the SNR in dB of each subcarrier in the frequency range from 4 to 30 MHz. The SNR information is derived out of the training symbols. The selected constellations are shown by the bold line (for QAM steps, see right-hand side of the figure). The constellations of this transmission range from QPSK to 1024-QAM. If the SNR of a subcarrier is too low, the subcarrier is not used (e.g. some subcarriers around 18 MHz). The following three subplots show the past 60 s on the x-axis. The second figure shows several parameters of the BER. The BER is monitored at two different points: Firstly, before the Viterbi decoder (BER Viterbi) and, secondly, before the RS decoder (BER RS). The value BER RS indicates how many bits are corrected for one RS block. If the RS decoder is not able to correct an RS block, the corresponding value shows the number of errors (RS failed). RS failed equal to 0 indicates that the RS decoder is able to correct every transmission error.*
The third figure shows the actual data throughput. The raw physical (PHY) rate (dash-dotted line) is calculated as the number of bits per OFDM symbol divided by the OFDM symbol duration. Taking the OFDM preamble (CAZAC sequence and training symbols) and the guard interval into account results in the data rate shown by the dashed line (legend: incl. OFDM sync). Multiplying by the code rates of the FEC results in the net PHY data rate (solid line). The bottom figure shows the AGC settings of the four front ends. Note that all front-ends are connected to the coupler and receive the signals of all receive ports. However, in case of SISO transmission, the signals of three of the receive ports are not used during detection. The data rate for error-free HD video transmission is 90 Mbit/s.
Figure 24.17 shows the screenshot of the MIMO transmission for the same channel directly after SISO transmission. No channel change was observed. Basically, the screenshot shows the same parameters as described before. Of course, MIMO transmission uses two (logical) MIMO streams. The SNR of these two streams is shown in the top figure (thin lines). The black line represents the SNR of the first MIMO stream, while the grey line represents the second MIMO stream. The bold lines (black and grey) show the selected constellations of the two MIMO streams. The data rate is 255 Mbit/s.
The comparison between SISO and MIMO is not completely fair since the transmit power of each of the two transmit ports for MIMO transmission is the same as for SISO transmission at this demonstrator system. The total transmit power of MIMO is thus 3 dB higher, compared to SISO. To keep the total transmit power the same, the transmit power of SISO transmission has to be increased by 3 dB. As discussed in Chapter 7, a back-off of 3 dB is an expected upper limit. It was observed that increasing the transmit power by 3 dB enhances the data rate by approximately 15 Mbit/s.* Comparing the corrected SISO bitrate (≈105 Mbit/s) to the bitrate of MIMO shows an improvement of the bit rate by more than a factor of 2. However, the different feeding power of the SISO and MIMO test in Figures 24.16 and 24.17 are balanced by the receivers AGC settings. The lowest graph in each figure shows these AGC settings. In the MIMO case, all AGC amplifiers are set to a 6 dB reduced (voltage) amplitude. So far the AGC is in operating range; it is ok to compare SISO and MIMO throughput rates.
The demonstrator system was also verified in private homes. Here, the apartment covers two levels with an area of 120 m2. Figure 24.18 shows the transmitter (Figure 24.18a) and Figure 24.18b depicts the receiver located in the building. The transmitter is connected via a delta-style coupler to the mains network on the upper level, while the receiver is connected via a star-style coupler on the lower level of the apartment. Two transmit ports are used. The receiver uses all four receive ports (the three star-style ports and the CM port). To receive the CM signals, the coupler is mounted on a wooden board which is covered by a fleece made of copper. This construction can be easily transported. A metal plate could be used as well. A counterpoise with the size of around 1 m2 was found to ensure a proper CM reception.
Schneider et al. [16] shows the results of the verification in this building. For SISO transmission, a bitrate of 143 Mbit/s and, for MIMO transmission, a bitrate of 315 Mbit/s were documented.
The implementation of a MIMO PLC feasibility study in hardware was described in this chapter. The demonstrator system allows up to 2 × 4 MIMO with beamforming. The systems allow monitoring several system parameters including the bit rate and channel estimation results. The system also supports a SISO mode. Comparing SISO and MIMO transmissions, the gain of MIMO is proven. The gain of precoding is shown by this demonstrator system by activating beamforming on the fly. Also, further aspects of MIMO PLC transmission, such as the influence of the number of receive ports and the influence of the noise, were investigated. The verification of the demonstrator system in buildings under real conditions shows the gain of MIMO versus SISO to be more than factor two. The demonstrator system proves the theoretical investigations of MIMO PLC and supported the standardisation work at HomePlug AV2.
1 A. Schwager, Powerline communications: Significant technologies to become ready for integration, Dr.-Ing. dissertation, Universität Duisburg-Essen, Essen, Germany, May 2010.
2 D. J. C. MacKay, Information Theory, Inference and Learning Algorithms. Cambridge University Press, New York, 2003.
3 C. Rader, VLSI systolic arrays for adaptive nulling, IEEE Signal Processing Magazine, 13(4), 29–49, 1996.
4 R. Walke, R. Smith and G. Lightbody, Architectures for adaptive weight calculation on ASIC and FPGA, in Asilomar Conference on Signals, Systems, and Computers, vol. 2, Pacific Grove, CA, 1999, pp. 1375–1380.
5 B. Hassibi, An efficient square-root algorithm for BLAST, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Istanbul, Turkey, 2000, pp. 737–740.
6 Z. Guo and P. Nilsson, A VLSI architecture of the square root algorithm for V-BLAST detection, The Journal of VLSI Signal Processing, 44(3), 219–230, September 2006.
7 J. E. Volder, The CORDIC trigonometric computing technique, IRE Transactions on Electronic Computers, EC-8(3), 330–334, 1959.
8 J. S. Walther, A unified algorithm for elementary functions, in Spring Joint Computer Conference, Atlantic City, NJ, 1971, pp. 379–385.
9 J. Valls, M. Kuhlmann and K. K. Parhi, Evaluation of CORDIC algorithms for FPGA design, Journal of VLSI Signal Processing Systems, 32(3), 207–222, 2002.
10 Xilinx, CORDIC v3.0, Xilinx, 2005.
11 Xilinx, Sine/Cosine Look-Up Table v5.0, Xilinx, 2004.
12 Xilinx, Multiplier v10.1, Xilinx, 2008.
13 Xilinx, Divider v2.0, Xilinx, 2008.
14 D. Schneider, Inhome power line communications using multiple input multiple output principles, Dr.-Ing. dissertation, Verlag Dr. Hut, Munich, Germany, January 2012.
15 A. Schwager, D. Schneider, W. Bschlin, A. Dilly and J. Speidel, MIMO PLC: Theory, measurements and system setup, in International Symposium on Power Line Communications and Its Applications, Udine, Italy, 2011.
16 D. Schneider, A. Schwager, J. Speidel and A. Dilly, Implementation and results of a MIMO PLC feasibility study, in International Symposium on Power Line Communications and Its Applications, Udine, Italy, 2011.
* The CORDIC algorithm can also be used to solve a broad range of equations, including hyperbolic and square root equations [8].
* RS failed is not detected if the RS decoder finds valid code words, although many bits are corrupted in a RS block. This case is extremely unlikely for large block sizes. In practice, this will happen only if there is a very erroneous transmission with high BER. Here, a high BER of the Viterbi decoder will indicate this case.
* The BER performance depending on the SNR of uncoded QAM differs by approximately 3 dB if the QAM is increased by 1 bit. Hence, the bit load of each subcarrier is increased by 1 bit if the transmit power is increased by ≈3 dB. This results in an increase of the bitrate of 1296/51.2 μs ≈ 25 Mbit/s. The use of only even QAM constellations, and the fact that not all subcarriers exceed the corresponding SNR thresholds, results in the usually observed enhancement of approximately 15 Mbit/s.