Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5

Communication capacity requirements

Abstract

In this chapter, the requirements of communication capacity are studied. Both deterministic and stochastic physical dynamics are considered. Various approaches in different setups are taken, such as entropy-based analysis and anytime capacity. Finally, related topics in statistical mechanics are discussed.

Keywords

Stability; Capacity; Topological entropy; Cybernetics; Communication complexity; Entropy propagation

5.1 Introduction

In this chapter, we study the communication capacity requirements (usually in terms of bits) for controlling the physical dynamics in cyber physical systems (CPSs). This is the first step of communication network design in CPSs, since we need to understand how much communication is needed before we design the details of the communication network.

5.1.1 Methodologies and Contexts

There is no unified framework to study communication requirements, since this is still an open question. Moreover, the problem may be formulated in different ways for different contexts. In this chapter, we will consider the following methodologies and contexts:

• Deterministic physical dynamics with uncertain initial states, which is appropriate to model the physical dynamics of a sudden random perturbation but deterministic subsequent evolution: topological entropy is used to describe the communication requirements. The introduction mainly follows the work of Ref. [11].

• State estimation of stochastic systems, which is appropriate to model a system subject to random perturbations and a control policy of separated estimation and control: We study the communication requirements for reliably estimating the system state in stochastic systems. The introduction mainly follows the work of Ref. [14].

• Stability control for stochastic systems: It has been found that the anytime capacity measures the capability of communication channels for feedback control to stabilize the physical dynamics [13]. We provide a brief introduction to the concept of anytime capacity and the corresponding applications.

• Shannon entropy-based approach: The above approaches can provide the communication requirements for system stability. However, it is useful to further study the communication requirements subject to different levels of control precision (e.g., the disorderliness of the system state). We adopt the concept of entropy and the intuition from the second law of thermodynamics [15] to study the communication requirements if a certain entropy of the physical dynamics needs to be achieved.

• Communication complexity: Here we study the case of two agents wanting to achieve the same desired system state and consider it as a distributed computing problem. We follow the study in Ref. [70] to apply the theory of communication complexity to obtain bounds for the communication requirements.

• Thermodynamics argument: We consider a physical system governed by thermodynamics laws, and use the theory of nonequilibrium statistical mechanics to study the impact of communications on entropy generation. This mainly originates from Ref. [71].

The contents of the different sections are summarized in Table 5.1.

Table 5.1

Summary of Methodologies and Setups of the Studies in Different Sections

Section	Dynamics Types	Communication Type	Key Concept	Goal	Source
5.2	Deterministic	Sensor to controller	Topological entropy	Stability	[11]
5.3	Stochastic	Sensor to controller	Rate distortion	Estimation precision	[14]
5.4	Stochastic	Sensor to controller	Anytime capacity	Stability	[13]
5.5	Stochastic	Sensor to controller	Shannon entropy	Control precision	[15]
5.6	Deterministic	Controller to controller	Communication complexity	Achieving desired state	[70]
5.7	Thermodynamics	Sensor to controller	Fluctuation theorem	Entropy generation	[71]

t0010

5.1.2 Basic Models

We denote by x the N-dimensional system state. For the generic case, the system state of physical dynamics evolves in the following manner:

$\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t) = f (x (t), u (t), w (t)), \\ y (t) = g (x (t), n (t)), \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t) = f (x (t), u (t), w (t)), \\ y (t) = g (x (t), n (t)), \end{matrix} \end{array}$

si1_e (5.1)

for the continuous-time case, where the functions f and g represent the evolution of physical dynamics and the observation mechanism, and

$\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t + 1) = f (x (t), u (t), w (t)), \\ y (t) = g (x (t), n (t)), \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t + 1) = f (x (t), u (t), w (t)), \\ y (t) = g (x (t), n (t)), \end{matrix} \end{array}$

si2_e (5.2)

for the discrete-time case.

A simpler but very useful model is that of linear dynamics, where the physical dynamics evolve as follows. For the continuous-time case we have

$\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t) = A x (t) + B u (t) + w (t), \\ y (t) = C x (t) + n (t), \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} \overset{\cdot}{x} (t) = A x (t) + B u (t) + w (t), \\ y (t) = C x (t) + n (t), \end{matrix} \end{array}$

si3_e (5.3)

while for the discrete-time case, we have

$\begin{array}{l} \{\begin{matrix} x (t + 1) = A x (t) + B u (t) + w (t), \\ y (t) = C x (t) + n (t) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} x (t + 1) = A x (t) + B u (t) + w (t), \\ y (t) = C x (t) + n (t) . \end{matrix} \end{array}$

si4_e (5.4)

5.2 Deterministic system: stability

We begin with the case of a deterministic dynamical system, in which there is no random perturbation in the system state evolution and the observation mechanism. For simplicity, we consider only the discrete-time system. Then the system dynamics can be described as

$\begin{array}{l} \{\begin{matrix} x (t + 1) = f (x (t), u (t)), \\ y (t) = g (x (t)) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} x (t + 1) = f (x (t), u (t)), \\ y (t) = g (x (t)) . \end{matrix} \end{array}$

si5_e (5.5)

If everything is deterministic, then there is no need for communications since we can calculate the control action in advance. In this section, we assume that the initial system state x(0) is unknown to us and the distribution of x is unknown. Then communications are needed to convey the information about x(0) from the observation and thus estimate the current system state x(t). The uncertainty incurred by the initial state x(0) will be changed by the system dynamics and thus will create a need for communications. If the uncertainty in the system state is quickly removed (e.g., the system state converges to a deterministic value in open-loop control), then there is no need for communications in order to stabilize the system dynamics; at this point, we say that the system is simple. If the uncertainty in the system dynamics is amplified, then we refer to the system as complex. The complexity of the physical dynamics is measured by the topological entropy, which will be explained subsequently. As we will see, the topological entropy creates a fundamental limit for the communication capacity requirement. Consider the illustrations in Fig. 5.1, which show the state trajectory in two-dimensional phase space; we observe that the linear case is much simpler than the Lorentz transform.

f05-01-9780128019504 — Fig. 5.1 Comparison of simple and complex dynamics. (A) Linear transformation. (B) Lorentz transformation.

5.2.1 Topological Entropy

In this subsection, we define topological entropy. There are three equivalent definitions, which will be introduced separately. For all these definitions, we consider a topological dynamical system, which is represented by the triplet (X, T, S). X is a metric space (namely, a distance is defined for any pair of points), in which the metric is denoted by d. We assume that X is compact. We define a transformation T which maps from X to X. Moreover, we assume that T is continuous; i.e., T⁻¹ maps from open sets to open sets.

Spanning orbit-based definition

We define the nth order ball of point x ∈ X, denoted by Bⁿ(x, ϵ), as the ball around x with radius ϵ with respect to metric dⁿ defined as

$\begin{array}{l} d^{n} (x, y) = max {d (T^{i} x, T^{i} y), i = 0, \dots, n - 1} . \end{array}$ $\begin{array}{l} d^{n} (x, y) = max {d (T^{i} x, T^{i} y), i = 0, \dots, n - 1} . \end{array}$

(5.6)

Fig. 5.2 shows that a track beginning from x₂ is within Bⁿ(x₁, ϵ).

f05-02-9780128019504 — Fig. 5.2 Illustration of two close tracks.

We say a set F is (n, ϵ)-spanning if it intersects every (n, ϵ)-ball (i.e., intersecting Bⁿ(x, ϵ) for every x). Then we define

$\begin{array}{l} r (n, ϵ) = min {# F : F is (n, ϵ) -spanning} . \end{array}$ $\begin{array}{l} r (n, ϵ) = min {# F : F is (n, ϵ) -spanning} . \end{array}$

(5.7)

Then we define the spanning orbit-based topological entropy as follows, where the subscript s denotes “spanning”:

Definition 1

The topological entropy is then defined as

$\begin{array}{l} \{\begin{matrix} H_{s} (T, n, ϵ) = log r (n, ϵ), \\ h_{s} (T, ϵ) = {lim}_{n \to \infty} \frac{1}{n} H_{s} (T, ϵ), \\ h_{s} (T) = {lim}_{ϵ \to 0} h_{s} (T, ϵ) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} H_{s} (T, n, ϵ) = log r (n, ϵ), \\ h_{s} (T, ϵ) = {lim}_{n \to \infty} \frac{1}{n} H_{s} (T, ϵ), \\ h_{s} (T) = {lim}_{ϵ \to 0} h_{s} (T, ϵ) . \end{matrix} \end{array}$

si8_e (5.8)

Separated orbit-based definition

We say a subset F of X is (n, ϵ)-separated if dⁿ(x_i, x_j) > ϵ for all x_i, x_j ∈ F and x_i≠x_j. Since X is compact, all (n, ϵ)-separated sets have finite elements, which is illustrated in Fig. 5.3. The maximal cardinality of all (n, ϵ)-separated sets is denoted by

$\begin{array}{l} s (n, ϵ) = max {| F | : F is (n, ϵ) -separated} . \end{array}$ $\begin{array}{l} s (n, ϵ) = max {| F | : F is (n, ϵ) -separated} . \end{array}$

(5.9)

f05-03-9780128019504 — Fig. 5.3 Illustration of an (n, ϵ)-separated set. Note that the circle representing the neighborhood is just an illustration, not a real geometric image.

We denote by s(n, ϵ) the maximum cardinality of a product (n, ϵ)-separated set, namely

$\begin{array}{l} s (n, ϵ) = min {# G (F) : F \in Ω_{n, ϵ}} . \end{array}$ $\begin{array}{l} s (n, ϵ) = min {# G (F) : F \in Ω_{n, ϵ}} . \end{array}$

(5.10)

Then we define the metric-based product topological entropy as follows:

Definition 2

We define the topological entropy as

$\begin{array}{l} \{\begin{matrix} H_{m} (n, ϵ) = log s (n, ϵ), \\ h_{m} (T, ϵ) = {lim}_{n \to \infty} \frac{1}{n} H_{m} (n, ϵ), \\ h_{m} (T) = {lim}_{ϵ \to 0} h_{m} (T, ϵ) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} H_{m} (n, ϵ) = log s (n, ϵ), \\ h_{m} (T, ϵ) = {lim}_{n \to \infty} \frac{1}{n} H_{m} (n, ϵ), \\ h_{m} (T) = {lim}_{ϵ \to 0} h_{m} (T, ϵ) . \end{matrix} \end{array}$

si11_e (5.11)

Cover-based definition

The cover-based definition of topological entropy does not require the definition of a metric; hence it is valid even in spaces without a metric structure. We define a cover of X as a family of open sets that cover X. Since X is compact, we can always find a finite subset of open sets of $U$ $U$ to cover X. We say that a cover $V$ $V$ is a subcover of cover $U$ $U$ if $V \subset U$ $V \subset U$ . The minimum cardinality of a subcover of cover $U$ $U$ is denoted by $N (U)$ $N (U)$ .

We define the join of two covers $U$ $U$ and $V$ $V$ as

$\begin{array}{l} U \lor V = {U \cap V : U \in U, V \in V} . \end{array}$ $\begin{array}{l} U \lor V = {U \cap V : U \in U, V \in V} . \end{array}$

(5.12)

We further define

$\begin{array}{l} U^{n} = \lor_{n = 0, \dots, n - 1} T^{- n} (U) . \end{array}$ $\begin{array}{l} U^{n} = \lor_{n = 0, \dots, n - 1} T^{- n} (U) . \end{array}$

(5.13)

Based on the above definitions, we can now define the cover-based topological entropy:

Definition 3

The topological entropy is defined as

$\begin{array}{l} \{\begin{matrix} H_{c} (U) = log N (U), \\ h_{c} (T, U) = {lim}_{n} \frac{1}{n} H_{c} (U^{n})), \\ h_{c} (T) = {lim}_{U} ↑ h_{c} (T, U) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} H_{c} (U) = log N (U), \\ h_{c} (T, U) = {lim}_{n} \frac{1}{n} H_{c} (U^{n})), \\ h_{c} (T) = {lim}_{U} ↑ h_{c} (T, U) . \end{matrix} \end{array}$

si22_e (5.14)

Equivalence

The above three definitions of topological entropy look quite different, focusing on different aspects of the dynamical system. Interestingly, they are equivalent in metric spaces:

Theorem 1

In metric spaces, we have

$\begin{array}{l} h_{s} (T) = h_{m} (T) = h_{c} (T) . \end{array}$ $\begin{array}{l} h_{s} (T) = h_{m} (T) = h_{c} (T) . \end{array}$

(5.15)

The proof is highly nontrivial. The details can be found in chapter 6 of Ref. [12]. Due to their equivalence, we use only the notation h_s(T) in the subsequent discussion.

5.2.2 Communication Capacity Requirements

As we have seen, the topological entropy measures the complexity of the transform in the dynamical system. Intuitively, a more complex dynamical system requires more communications for estimating and controlling the dynamics, since the controller needs more information to obtain the precise status. In this subsection, we will prove that the topological entropy provides a tight bound for the communication requirements in deterministic dynamical systems having uncertain initial states. The argument follows chapter 2 of Ref. [11].

System model

The model of the CPS is given in Fig. 5.4. We consider the following system dynamics:

$\begin{array}{l} x (t + 1) = f (x, u (t)), \end{array}$ $\begin{array}{l} x (t + 1) = f (x, u (t)), \end{array}$

(5.16)

which is a simpler version of Eq. (5.1). The initial value x(0) is unknown. An encoder observes the system state and sends out a message h(jT) every T time slots, where j is the index of the message. The total number of messages is denoted by L. Hence the average transmission rate is given by $\frac{{log}_{2} L}{T}$ $\frac{{log}_{2} L}{T}$ . If the capacity of the communication channel is R, then we have

$\begin{array}{l} \frac{{log}_{2} L}{T} \leq R . \end{array}$ $\begin{array}{l} \frac{{log}_{2} L}{T} \leq R . \end{array}$

(5.17)

The transmitted message is essentially a function of the observation history, i.e.,

$\begin{array}{l} h (j T) = F_{j} (x (1 : j T)), \end{array}$ $\begin{array}{l} h (j T) = F_{j} (x (1 : j T)), \end{array}$

(5.18)

where the subscript j in F_j means that the encoding mechanism can be time varying.

f05-04-9780128019504 — Fig. 5.4 Communication system model for topological entropy.

Upon receiving a message from the encoder, the controller decodes it and carries out the following possible actions:

• System state estimation: The controller estimates the system states between times (j − 1)T + 1 and jT (i.e., the system state in the previous period of the message), namely $\hat{x} ((j - 1) T + 1), \dots, \hat{x} (j T)$ $\hat{x} ((j - 1) T + 1), \dots, \hat{x} (j T)$ . The outcome of the estimator is given by

$\begin{array}{l} \hat{x} ((j - 1) T + 1 : j T) = G_{j} (h (T : j T)), \end{array}$ $\begin{array}{l} \hat{x} ((j - 1) T + 1 : j T) = G_{j} (h (T : j T)), \end{array}$

(5.19)

where the subscript j in G_j means that the estimation algorithm can be time varying.

• System state control: The controller computes the control action in the next T time slots (i.e., before receiving the next message), namely u(jT + 1), …, u((j + 1)T). The outcome of the controller is given by

$\begin{array}{l} u ((j - 1) T + 1 : j T) = U_{j} (h (T : j T)), \end{array}$ $\begin{array}{l} u ((j - 1) T + 1 : j T) = U_{j} (h (T : j T)), \end{array}$

(5.20)

where the subscript j in G_j means that the control algorithm can be time varying.

For simplicity, we assume that there is no transmission error during the communications. This is reasonable if powerful error correction codes are applied and a single sparse error causes only negligible impact on the dynamics. Hence the limit of communications is focused only on the capacity, not the reliability.

Then the question is: How much communication is needed for a precise system state estimation or a stable system dynamics?

Communication requirements for estimation

We first focus on the system state estimation. We define the observability of the system as follows:

Definition 4

The system in Eq. (5.16) is observable if, for all ϵ > 0, there exists a message period $T \in N$ $T \in N$ and an encoding-decoding mechanism such that¹

$\begin{array}{l} ∥ x (t) - \hat{x} (t) ∥_{\infty} < ϵ, \forall t = 1, 2, 3, \dots \end{array}$ $\begin{array}{l} ∥ x (t) - \hat{x} (t) ∥_{\infty} < ϵ, \forall t = 1, 2, 3, \dots \end{array}$

(5.21)

For the observability of the system, we have the following conclusion (Theorem 2.3.6 in Ref. [11]):

Theorem 2

Consider a compact space X of a system state and transformation T in Eq. (5.16). Then we have:

• If R < h_s(T), the system is not observable.

• If R > h_s(T), the system is observable.

Here we provide an intuitive explanation of the encoding and decoding mechanism to achieve observability, as well as the reason for the unobservability when the transmission rate is too low. A rigorous proof can be found in chapter 2 of Ref. [11].

• R < h_s(T): We assume that a coder-decoder pair can make the system observable. Then for n time slots and any ϵ > 0, we can find 2^⌈nR⌉ spanning points in time [0, n − 1] such that they form an (n, ϵ)-separated orbits, due to the definition of observability. Then the topological entropy of the system will be no larger than ${lim}_{n \to \infty} \frac{1}{n} {log}_{2} 2^{n R} < h_{s} (T)$ ${lim}_{n \to \infty} \frac{1}{n} {log}_{2} 2^{n R} < h_{s} (T)$ , which contradicts the assumption that the topological entropy is h_s(T).

• R > h_s(T): Due to the definition of h_s(T), for any n > 1 and ϵ > 0, we can always find an (n, ϵ)-spanning set with cardinality no larger than 2^nR in the state space. Then we can simply transmit the index of the corresponding point in the spanning set (thus taking a transmission rate no larger than R) and achieve an error less than ϵ. This encoding procedure is illustrated in Fig. 5.5.

f05-05-9780128019504 — Fig. 5.5 Illustration of the encoding process.

Communication requirements for linear system control

The communication requirements for control are more complicated. We focus on only the special case of linear systems without noise and with direct observation of the system state:

$\begin{array}{l} \{\begin{matrix} x (t + 1) = A x (t) + B u (t), \\ y (t) = x (t) . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} x (t + 1) = A x (t) + B u (t), \\ y (t) = x (t) . \end{matrix} \end{array}$

si35_e (5.22)

The control action is for the purpose of system stabilization, which is defined as follows:

Definition 5

The system in Eq. (5.22) is stabilizable if there exist a communication period T ≥ 1 and a 3-tuple of encoder, decoder and controller such that the system is stable in the following sense: for any ϵ > 0 there is an integer t₀ such that

$\begin{array}{l} ∥ x (t) ∥_{\infty} < ϵ, \forall t \geq t_{0}, \end{array}$ $\begin{array}{l} ∥ x (t) ∥_{\infty} < ϵ, \forall t \geq t_{0}, \end{array}$

(5.23)

for any traces of the dynamics and any initial condition x(0).

Intuitively, that the system is stabilizable means that we can arbitrarily control the system state given sufficient time. The following two assumptions are subsequently made:

• The initial state x(0) lies in a compact set $X_{1}$ $X_{1}$ which has the origin as an interior point; i.e., there exists a δ > 0 such that $∥ x ∥_{\infty} < δ$ $∥ x ∥_{\infty} < δ$ implies $x \in X_{1}$ $x \in X_{1}$ .

• The system (A, B) is stabilizable given a perfect system state feedback.

Then the following theorem provides conditions for the stabilizabilities of the linear system:

Theorem 3

Suppose that the above two assumptions hold. We denote by h_s(A) the topological entropy of system x(t + 1) = Ax(t). Then the following two statements hold:

• When R < h_s(A), the system is not stabilizable.

• When R > h_s(A), the system is stabilizable.

The first conclusion (R < h_s(A)) in the above theorem can be obtained from contradiction, as illustrated in Fig. 5.6. A detailed proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. We assume that the linear system is still stabilizable when R < h_s(A). Since R < h_s(A), we can find a constant ϵ > 0 such that

$\begin{array}{l} {lim sup}_{k \to \infty} \frac{1}{k} {log}_{2} s (k, ϵ) > R, \end{array}$ $\begin{array}{l} {lim sup}_{k \to \infty} \frac{1}{k} {log}_{2} s (k, ϵ) > R, \end{array}$

(5.24)

where s(k, ϵ) is the metric-based topological entropy. This implies that there is a (jT, ϵ)-separated set S of cardinality N such that

$\begin{array}{l} \frac{{log}_{2} N}{j T} > R, \end{array}$ $\begin{array}{l} \frac{{log}_{2} N}{j T} > R, \end{array}$

(5.25)

and for any two different points x₁ and x₂ in S satisfying

$\begin{array}{l} ∥ x_{1} (t) - x_{2} (t) ∥_{\infty} \geq ϵ . \end{array}$ $\begin{array}{l} ∥ x_{1} (t) - x_{2} (t) ∥_{\infty} \geq ϵ . \end{array}$

(5.26)

On the other hand, we set $ϵ_{0} = \frac{1}{2} ϵ$ $ϵ_{0} = \frac{1}{2} ϵ$ . Since the system is stabilizable, we can always find control actions for each element x in S such that $∥ x (t) ∥_{\infty} \leq ϵ_{0}$ $∥ x (t) ∥_{\infty} \leq ϵ_{0}$ . There are no more than 2^jRT possible control action sequences. Hence we can always find a sequence of controls, u*(0), …, u*(jT), in these sequences such that we can find x₁ and x₂ that are both well controlled by the sequence {u*}; i.e., both x₁′(t) =x₁(t) +x_u(t) and x₂′(t) =x₂(t) +x_u(t) satisfy Eq. (5.23) with respect to ϵ₀. Then it is easy to verify

$\begin{array}{l} ∥ x_{1} (t) - x_{2} (t) ∥ \geq ϵ . \end{array}$ $\begin{array}{l} ∥ x_{1} (t) - x_{2} (t) ∥ \geq ϵ . \end{array}$

(5.27)

This contradicts the assumption of the (jT, ϵ)-separated set S. Hence the cardinality of the message set cannot be less than $2^{h_{s} (A)}$ $2^{h_{s} (A)}$ .

f05-06-9780128019504 — Fig. 5.6 Illustration of the proof of Theorem 3.

The second conclusion (R > h_s(A)) in the theorem can be derived from the conclusions in the subsequent treatment of the optimal control of linear systems.

Communication requirements for optimal control

In the previous discussion, the feedback is to stabilize the system, regardless of the corresponding cost (e.g., a very large control action u). In the optimal control of linear systems, the costs of both system state magnitude and control action power are considered; i.e., the system cost with initial state x(0) is given by

$\begin{array}{l} J (x (0)) = \sum_{t = 1}^{\infty} x^{T} (t) Q x (t) + u^{T} (t) R u (t), \end{array}$ $\begin{array}{l} J (x (0)) = \sum_{t = 1}^{\infty} x^{T} (t) Q x (t) + u^{T} (t) R u (t), \end{array}$

si47_e (5.28)

where both Q and R are positive definite matrices.

When the feedback is perfect (i.e., there is no constraint on the communication rate), it is well known that the optimal feedback control is given in [72]

$\begin{array}{l} u (t) = - K x (t), \end{array}$ $\begin{array}{l} u (t) = - K x (t), \end{array}$

(5.29)

where

$\begin{array}{l} K = {(R + B^{T} P B)}^{- 1} B^{T} P A, \end{array}$ $\begin{array}{l} K = {(R + B^{T} P B)}^{- 1} B^{T} P A, \end{array}$

(5.30)

and the matrix P satisfies the following equation:

$\begin{array}{l} A (P - P B {(R + B^{T} P B)}^{- 1} B^{T} P) A + Q - P = 0 . \end{array}$ $\begin{array}{l} A (P - P B {(R + B^{T} P B)}^{- 1} B^{T} P) A + Q - P = 0 . \end{array}$

(5.31)

The optimal cost function is given by

$\begin{array}{l} J_{opt} (x (0)) = x^{T} (0) P x (0) . \end{array}$ $\begin{array}{l} J_{opt} (x (0)) = x^{T} (0) P x (0) . \end{array}$

(5.32)

Now we consider the constraint on the communication rate; i.e., R is bounded. We define the solvability of the optimal control as follows [11]:

Definition 6

The optimal control of the linear system in Eqs. (5.29)–(5.32) is solvable with respect to the communication model introduced above, if for any ϵ > 0, there exist a T ≥ 1 and a 3-tuple of encoder, decoder, and controller such that

• The closed-loop controlled system is stable in the following sense: For any ϵ > 0, there exists a t₀ ≥ 1 such that Eq. (5.23) holds.

• The cost satisfies

$\begin{array}{l} J (x (0)) \leq J_{opt} (x (0)) + ϵ, \end{array}$ $\begin{array}{l} J (x (0)) \leq J_{opt} (x (0)) + ϵ, \end{array}$

(5.33)

where J_opt(x(0)) is given in Eq. (5.32).

The conclusion on the communication requirements for the solvable optimal control is given in the following theorem:

Theorem 4

Suppose that the assumptions in Theorem 3 hold and the pair $(A, \sqrt{Q})$ $(A, \sqrt{Q})$ has no unobservable nodes on the unit circle. Then the following statements are correct:

• If R < h_s(A), then the optimal control is not solvable.

• If R > h_s(A), then the optimal control is solvable.

The conclusion for R < h_s(A) is a straightforward conclusion of Theorem 3, since the system is not even stabilizable when R < h_s(A). The proof for the case of R > h_s(A) is much more involved (note that the corresponding conclusion in Theorem 3 is merely a special case). A rigorous proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. First we choose a reasonable period T. Then we consider the system:

$\begin{array}{l} x_{a} (t + 1) = A x_{a} (t) . \end{array}$ $\begin{array}{l} x_{a} (t + 1) = A x_{a} (t) . \end{array}$

(5.34)

We can find a set Ω of initial point ${x_{n}^{*} (0)}$ ${x_{n}^{*} (0)}$ with cardinality 2^(T+1)α, where h_s(A) < α < R, with the following property: For any initial state x(0), we can always find a point $x_{n}^{*} (0)$ $x_{n}^{*} (0)$ in Ω such that the trace in Eq. (5.34) triggered by $x_{n}^{*} (0)$ $x_{n}^{*} (0)$ is very close to that triggered by x(0) within the time duration [0, T]. For each point ${x_{n}^{*} (0)}$ ${x_{n}^{*} (0)}$ in Ω, we can compute the corresponding control actions within time period [0, T], namely u*(0), …, u*(T − 1).

Then we define the following encoder and controller pair for time interval [0, T − 1]:

• Encoder: Given the initial state x(0), we choose the corresponding point in Ω, namely ${x_{n}^{*} (0)}$ ${x_{n}^{*} (0)}$ , such that the traces are sufficiently close to each other in [0, T − 1]:

$\begin{array}{l} h (1) = Enc (x (0)) = n . \end{array}$ $\begin{array}{l} h (1) = Enc (x (0)) = n . \end{array}$

(5.35)

• Controller: In the time interval [0, T − 1], the control action is set as the optimal control action for ${x_{n}^{*} (0)}$ ${x_{n}^{*} (0)}$ , namely

$\begin{array}{l} {(u^{T} (0), \dots, u^{T} (T - 1))}^{T} = {({(u_{n}^{*})}^{T} (0), \dots, {(u_{n}^{*})}^{T} (T - 1))}^{T} . \end{array}$ $\begin{array}{l} {(u^{T} (0), \dots, u^{T} (T - 1))}^{T} = {({(u_{n}^{*})}^{T} (0), \dots, {(u_{n}^{*})}^{T} (T - 1))}^{T} . \end{array}$

(5.36)

For the jth time period, namely [(j − 1)T, jT − 1], we can use the same encoding and control mechanism by scaling the system state and control actions, namely

$\begin{array}{l} h (j) = Enc (\frac{1}{b^{j}} x (0)) \end{array}$ $\begin{array}{l} h (j) = Enc (\frac{1}{b^{j}} x (0)) \end{array}$

(5.37)

and

$\begin{array}{l} {(u^{T} (j T), \dots, u^{T} ((j + 1) T - 1))}^{T} = b^{j} {({(u_{n}^{*})}^{T} (0), \dots, {(u_{n}^{*})}^{T} (T - 1))}^{T} . \end{array}$ $\begin{array}{l} {(u^{T} (j T), \dots, u^{T} ((j + 1) T - 1))}^{T} = b^{j} {({(u_{n}^{*})}^{T} (0), \dots, {(u_{n}^{*})}^{T} (T - 1))}^{T} . \end{array}$

(5.38)

The whole procedure is illustrated in Fig. 5.7.

f05-07-9780128019504 — Fig. 5.7 Illustration of the encoding, decoding, and control action procedure.

5.2.3 Calculation of Topological Entropy

The above discussions show the importance of the topological entropy on the communication requirements for controlling the physical dynamics in a CPS. Hence it is very important to evaluate the topological entropy of a given dynamical system.

Linear systems

The topological entropy of linear systems is fully explained in the following theorem (the rigorous proof is given in Theorem 2.4.2 of Ref. [11]):

Theorem 5

The topological entropy of linear system x(t + 1) = Ax(t), where A is an N × N square matrix, is given by

$\begin{array}{l} h_{s} (A) = \sum_{n = 1}^{N} {log}_{2} (max {1, | λ_{n} |}), \end{array}$ $\begin{array}{l} h_{s} (A) = \sum_{n = 1}^{N} {log}_{2} (max {1, | λ_{n} |}), \end{array}$

si65_e (5.39)

where {λ_n}_{n=1, …, N} are the eigenvalues of A.

An immediate conclusion is that, when all the eigenvalues of A are within the unit circle, the corresponding topological entropy is zero. Hence the corresponding communication requirement is zero; i.e., no communication is needed for observing or stabilizing the physical systems. This is obvious since the system state will converge to zero spontaneously.

Generic systems

For the generic case of dynamical systems, there have been no systematic approaches to compute the topological entropy. The existing approaches can be categorized into the following two classes:

• Consider some special types of dynamics, such as one-dimensional piecewise monotone mappings [73].

• Discretize the mapping and then using the approaches of calculating the topological entropy of symbol dynamics.

In this book, we provide a brief introduction to the second approach by following the argument in Ref. [74]. We consider a compact topological space M and partition it into a set of subsets Q = {A₁, …, A_q}. Consider a continuous mapping T on M and a discrete time period {0, …, t − 1}. We define

$\begin{array}{l} W_{t} (T, Q) = {(a_{0}, \dots, a_{t - 1}) | \exists x \in M and 0 \leq i \leq t - 1, s . t ., T^{i} x \in A_{a_{i}}}, \end{array}$ $\begin{array}{l} W_{t} (T, Q) = {(a_{0}, \dots, a_{t - 1}) | \exists x \in M and 0 \leq i \leq t - 1, s . t ., T^{i} x \in A_{a_{i}}}, \end{array}$

(5.40)

namely the set of all possible t-strings, each of which represents a series of regions in Q that a trajectory of T passes. An example is given in Fig. 5.8, where the trajectory generates a 6-string (1, 4, 2, 6, 7, 3).

f05-08-9780128019504 — Fig. 5.8 Illustration of the partition and trajectory.

Then we define the following quantity:

$\begin{array}{l} h^{*} (T, Q) = lim_{t \to \infty} \frac{| log W_{t} (T, Q) |}{t} . \end{array}$ $\begin{array}{l} h^{*} (T, Q) = lim_{t \to \infty} \frac{| log W_{t} (T, Q) |}{t} . \end{array}$

(5.41)

We further define the concept of generating partition:

Definition 7

We say that a partition Q of the space M is generating if

$\begin{array}{l} \lor_{i = 0}^{\infty} T^{- i} = B, or \lor_{i = 0}^{\infty} T^{i} = B, \end{array}$ $\begin{array}{l} \lor_{i = 0}^{\infty} T^{- i} = B, or \lor_{i = 0}^{\infty} T^{i} = B, \end{array}$

(5.42)

where $B$ $B$ is the Borel σ-algebra of M.

Then we obtain the following conclusion that provides bounds for the topological entropy of T:

Theorem 6

Denote by h(T) the topological entropy of transform T. Then it satisfies

$\begin{array}{l} h (T) \leq lim inf_{diam (Q) \to 0} h^{*} (T, Q) . \end{array}$ $\begin{array}{l} h (T) \leq lim inf_{diam (Q) \to 0} h^{*} (T, Q) . \end{array}$

(5.43)

Moreover, if Q is generating, we have

$\begin{array}{l} h (T) \leq h^{*} (T, Q) . \end{array}$ $\begin{array}{l} h (T) \leq h^{*} (T, Q) . \end{array}$

(5.44)

The proof of the theorem can be found in Ref. [74].

Based on this theorem, an algorithm for computing an upper bound of the topological entropy was proposed in Ref. [74], consisting of the following four steps:

• Step 1: Select a coarse partition of M, which is denoted by A.

• Step 2: Select a partition B = {B₁, …, B_K} that is much finer than A. Each element in A is the union of a set of elements in B. Then we define a topological Markov chain. The transition matrix is given by

$\begin{array}{l} B_{i j} = \{\begin{matrix} 1, & if B_{i} \cap T^{- 1} B_{j} \neq ϕ, \\ 0, & otherwise, \end{matrix} \end{array}$ $\begin{array}{l} B_{i j} = \{\begin{matrix} 1, & if B_{i} \cap T^{- 1} B_{j} \neq ϕ, \\ 0, & otherwise, \end{matrix} \end{array}$

si72_e (5.45)

which means that B_ij equals 1 if B_i and B_j are two successive regions that a trajectory of T visits. A sequence (b₀, …, b_N−1) is called a B-word if $B_{b_{i}, b_{i + 1}} = 1$ $B_{b_{i}, b_{i + 1}} = 1$ . Each B-word corresponds to an A-word (a₀, …, a_N−1), where $B_{b_{i}} \subset A_{a_{i}}$ $B_{b_{i}} \subset A_{a_{i}}$ . Then we define the set of A-words as

$\begin{array}{l} W_{N} (B, A) = {a) | \exists B -word b generating a} . \end{array}$ $\begin{array}{l} W_{N} (B, A) = {a) | \exists B -word b generating a} . \end{array}$

(5.46)

It is easy to verify

$\begin{array}{l} W_{N} (T, A) \subset W_{N} (B, A), \end{array}$ $\begin{array}{l} W_{N} (T, A) \subset W_{N} (B, A), \end{array}$

(5.47)

which implies

$\begin{array}{l} h (B, A) & = lim_{N \to} \frac{log | W_{N} (B, A) |}{N} \\ \geq lim_{N \to} \frac{log | W_{N} (T, A) |}{N} = h^{*} (T, A) . \end{array}$ $\begin{array}{l} h (B, A) & = lim_{N \to} \frac{log | W_{N} (B, A) |}{N} \\ \geq lim_{N \to} \frac{log | W_{N} (T, A) |}{N} = h^{*} (T, A) . \end{array}$

si77_e (5.48)

It has been shown that h(B, A) converges to h(T, A) if the diameter of B tends to zero.

• Step 3. The set of all B-words forms a sofic shift, whose detailed definition can be found in Ref. [74]. First a directed labeled graph can be constructed, in which each node corresponds to one element in B and an edge exists between nodes i and j if B_ij = 1. Moreover, if B_i ∈ α ∈ A, we label the edge ij (if it exists) as α. The following example from Ref. [74] illustrates the procedure:

Example 1

Consider a one-dimensional mapping T from [0, 1] to [0, 1], which is given by

$\begin{array}{l} T x = \{\begin{matrix} 2 x, & x < 1 / 2, \\ 1.5 (x - 0.5), & x \geq 1 / 2 . \end{matrix} \end{array}$ $\begin{array}{l} T x = \{\begin{matrix} 2 x, & x < 1 / 2, \\ 1.5 (x - 0.5), & x \geq 1 / 2 . \end{matrix} \end{array}$

si78_e (5.49)

The partitions A and B are selected as

$\begin{array}{l} \{\begin{matrix} A = {A_{1}, A_{2}} = {[0, 0.5), [0.5, 1)}, \\ B = {B_{1}, B_{2}, B_{3}, B_{4}} = {[0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1)} . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} A = {A_{1}, A_{2}} = {[0, 0.5), [0.5, 1)}, \\ B = {B_{1}, B_{2}, B_{3}, B_{4}} = {[0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1)} . \end{matrix} \end{array}$

si79_e (5.50)

It is easy to verify that the matrix B is given by

$\begin{array}{l} B = (\begin{matrix} 1 & 1 & 0 & 0 \\ 0 & 0 & 10 & 1 \\ 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \end{matrix}) . \end{array}$ $\begin{array}{l} B = (\begin{matrix} 1 & 1 & 0 & 0 \\ 0 & 0 & 10 & 1 \\ 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \end{matrix}) . \end{array}$

si80_e (5.51)

The graph generated by the matrix B is given in Fig. 5.9. A path in the graph can represent a string.

f05-09-9780128019504 — Fig. 5.9 The graph generated in the example.

• Step 4. We compute the entropy of the sofic shift, which is shown to be equal to the topological entropy of the corresponding Markov chain. To that end, we need to find a new graph G′(B, A) satisfying

• W_N(G′(B, A)) = W_N(G(A, B));

• All outgoing edges of the same node have different labels.

The following algorithm from Ref. [74] has been proposed to find such a subgraph of G′, which is called a reduced right-resolving presentation of G:

1. Begin from an empty graph R. Add a single node in G(B, A) to R.

2. If there is at least one hyper node without outgoing hyper edges, choose one of them and label it by H. Otherwise, go to Step 6.

3. Since the hyper node H consists of multiple nodes in G(B, A), define H′ as the set of all nodes in G(B, A) reached by the outgoing edges of the nodes in H that have the same label (say, A_i). Define a new hyper node H′ and a hyper edge pointing from H to H′. Label it as A_i.

4. Repeat Step 3 for all possible labels in A.

5. Return to Step 2.

6. If a hyper node has no incoming hyper edges, remove it.

7. If a hyper node is removed, return to Step 6. Otherwise, stop and output R.

Take the example in Fig. 5.9, for instance. If we begin from node B₁, then we label B₁ as H₁. Since all the outgoing edges of B₁ are labeled as A₁, in Step 3 we form a hyper node H₂ consisting of B₁ and B₂ (since the two outgoing edges reach B₁ and B₂). Since H₂ does not have any outgoing edges, we form a hyper node H₃ consisting of B₁, B₂, B₃ and B4. Then we consider H₃ whose outgoing edges have two labels A₁ and A₂. For A₁, we form a hyper node H₄ consisting of B₁, B₂, and B₃; for A₂, the outgoing edge returns to H₄. This results in the graph shown in Fig. 5.10. Finally, we remove the hyper node H₁.
From the graph R, we can also obtain its adjacency matrix. The example in Fig. 5.10 has the matrix R given by

$\begin{array}{l} B = (\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}) . \end{array}$ $\begin{array}{l} B = (\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}) . \end{array}$

si81_e (5.52)

f05-10-9780128019504 — Fig. 5.10 The graph R generated in the example.

We can obtain the eigenvalues of R, which are related to the topological entropy that we want to compute, in the following theorem [74]:

Theorem 7

Denote by $λ_{max} (R)$ $λ_{max} (R)$ the maximum eigenvalue of R. If T is transitive, we have

$\begin{array}{l} h^{*} (T, A) \leq h (B, A) = log λ_{max} (R) . \end{array}$ $\begin{array}{l} h^{*} (T, A) \leq h (B, A) = log λ_{max} (R) . \end{array}$

(5.53)

Note that although the above procedure provides a systematic approach for the concrete calculation of topological entropy, it does not completely solve the problem for the following reasons: (1) It is still not clear how to verify that a partition is generating. (2) It is still not clear how to choose the radius of the partitions. Hence we can only choose as small a radius as possible, within the capability of computing.

5.3 Stochastic systems: estimation

Many control systems use the structure of separated estimation and control, as illustrated in Fig. 5.11, in which an estimator provides an estimation $\hat{x}$ $\hat{x}$ of the system state, and then a controller considers the estimate $\hat{x}$ $\hat{x}$ as the true system state and computes the corresponding control action. However, such a separated structure may not be optimal, although it is more feasible in practice. On the other hand, source coding is needed for the communication of observations y in order to estimate the system state x, which is usually lossy when the system state is continuous. In traditional communication systems, the corresponding information-theoretic communication requirement for lossy source coding is based on infinite block length and thus infinite delay, which is infeasible in the context of real-time control. Hence in this section, we follow the research in Ref. [14] and study the following two fundamental questions:

f05-11-9780128019504 — Fig. 5.11 Illustration of separated estimation and control.

• When the feedback is passed through a communication channel with limited capacity, is the structure of separated estimation and control still optimal?

• When the estimation is real-time with limited delay, what are the communication requirements for the system state estimation?

5.3.1 System Model

We first introduce the system model, which is adopted in Ref. [14]. The overall system has the same structure as that in Fig. 5.4.

We assume that the system state dynamics is described by the linear system dynamics in Eq. (5.4). The noise {w(t)}_t is a sequence of i.i.d. Gaussian distributed random variables. The initial state x(0) is assumed to be Gaussian with zero expectation and covariance matrix Σ_x(0). For simplicity, we assume that the system state can be observed directly by the encoder at the sensor.

The communication channel is assumed to be stochastic, characterized by stochastic kernels P(dy|x), where y and x are the output and input of the channel. For simplicity we assume that the channel is time-invariant and memoryless. The following two special channels are discussed in Ref. [14]:

• Noisy digital channel with rate R: The input and the output of the channel are the same. There are a total of 2^R messages.

• Memoryless vector Gaussian channel: Both the input and output alphabets are d-dimensional real spaces. The output of the channel is given by

$\begin{array}{l} y (t) = x (t) + v (t), \end{array}$ $\begin{array}{l} y (t) = x (t) + v (t), \end{array}$

(5.54)

where v(t) is a series of i.i.d. Gaussian distributed random vectors with zero expectation and covariance matrix Σ_v. The input is average power constrained; i.e., E[∥x∥²] ≤ P₀.

At each time slot t, the sensor obtains an observation x(t), generates and transmits a symbol x(t) through the communication channel. Then the controller receives a symbol y(t), decodes it and obtains a system state estimation $\hat{x} (t)$ $\hat{x} (t)$ . Based on $\hat{x} (t)$ $\hat{x} (t)$ , the controller computes the control action u(t).

The following three possible information patterns are considered, where Enc denotes the function of encoding:

• The encoder output is a function of all previous knowledge, namely

$\begin{array}{l} x (t) = Enc (x (0 : t), x (0 : t - 1), y (0 : t - 1), \hat{x} (0 : t - 1), u (0 : t - 1)) . \end{array}$ $\begin{array}{l} x (t) = Enc (x (0 : t), x (0 : t - 1), y (0 : t - 1), \hat{x} (0 : t - 1), u (0 : t - 1)) . \end{array}$

(5.55)

• The encoder does not know the channel and decoder outputs, namely

$\begin{array}{l} x (t) = Enc (x (0 : t), x (0 : t - 1), u (0 : t - 1)) . \end{array}$ $\begin{array}{l} x (t) = Enc (x (0 : t), x (0 : t - 1), u (0 : t - 1)) . \end{array}$

(5.56)

• The encoder uses only the current system state to encode the message:

$\begin{array}{l} x (t) = Enc (x (t)) . \end{array}$ $\begin{array}{l} x (t) = Enc (x (t)) . \end{array}$

(5.57)

The decoder uses a stochastic mechanism for decoding, namely generating the system state estimation according to the following conditional distribution:

$\begin{array}{l} P (d \hat{x} (t) | y (0 : t), \hat{x} (0 : t - 1), u (0 : t - 1)); \end{array}$ $\begin{array}{l} P (d \hat{x} (t) | y (0 : t), \hat{x} (0 : t - 1), u (0 : t - 1)); \end{array}$

(5.58)

namely, it is dependent on the previous received communication symbols, the previous system state estimations and control actions.

The controller is a stochastic one, which is generated according to the condition probability $P (d u (t) | \hat{x} (t))$ $P (d u (t) | \hat{x} (t))$ , which is dependent only on the current system state estimation.

5.3.2 Separation

Now we study whether the communication channel will affect the optimality of the separated structure of system state estimation and control. In traditional linear quadratic Gaussian (LQG) control with perfect feedback, the certainty equivalence guarantees the optimality of the following control law:

$\begin{array}{l} u (t) = - K \hat{x} (t) \end{array}$ $\begin{array}{l} u (t) = - K \hat{x} (t) \end{array}$

(5.59)

and

$\begin{array}{l} \hat{x} (t) = E [x (t) | y (0 : t - 1), u (0 : t - 1)], \end{array}$ $\begin{array}{l} \hat{x} (t) = E [x (t) | y (0 : t - 1), u (0 : t - 1)], \end{array}$

(5.60)

which represent separated control and system estimation. We define the estimation error as

$\begin{array}{l} δ x (t) = x (t) - E [x (t) | y (0 : t - 1), u (0 : t - 1)] . \end{array}$ $\begin{array}{l} δ x (t) = x (t) - E [x (t) | y (0 : t - 1), u (0 : t - 1)] . \end{array}$

(5.61)

We say that the control has no dual effect if (here $\bar{y} (t)$ $\bar{y} (t)$ denotes the channel output when there is no feedback control)

$\begin{array}{l} E [δ x (t) δ x^{T} (t) | y (0 : t), u (0 : t - 1)] = E [δ x (t) δ x^{T} (t) | \bar{y} (0 : t)], \end{array}$ $\begin{array}{l} E [δ x (t) δ x^{T} (t) | y (0 : t), u (0 : t - 1)] = E [δ x (t) δ x^{T} (t) | \bar{y} (0 : t)], \end{array}$

(5.62)

which means that the error covariance matrix is independent of the control action taken by the controller. An intuitive explanation of the no dual effect is given in Ref. [14]: the control action can be used to both control the system state evolution and prob the system in order to estimate the system state; the no dual effect means that the control action is fully used to control the system state (since the system state estimation error is irrelevant to the control action). The following theorem shows that the no dual effect is equivalent to the certainty equivalence [14]:

Theorem 8

The optimal control law for the linear system has the property of certainty equivalence if and only if it has no dual effect.

The following lemma provides a sufficient condition for the no dual effect, which can be used to judge whether a control action with certainty equivalence is optimal:

Lemma 1

Denote by $\bar{x} (t)$ $\bar{x} (t)$ the uncontrolled system state. If $σ (\bar{y} (0 : t)) \subset σ$ $σ (\bar{y} (0 : t)) \subset σ$ (y(0 : t), u(0 : t − 1)) and $E [\bar{x} | \bar{y} (0 : t)] = E [\bar{x} | y (0 : t), \bar{y} (0 : t), u (0 : t - 1)]$ $E [\bar{x} | \bar{y} (0 : t)] = E [\bar{x} | y (0 : t), \bar{y} (0 : t), u (0 : t - 1)]$ , then the control has no dual effect.

Based on this lemma, Ref. [14] discussed the no dual effect in the two communication channels introduced above:

• Noiseless digital channel: Since the input and the output of the communication channel are the same (while the number of bits is limited), the sensor can know exactly the control action that the controller will take. It is shown in Ref. [14] that the optimal encoder has the following form:

$\begin{array}{l} x (t) & = q_{t} (x (t) - E [x (t) | y (0 : t - 1), u (0 : t - 1)]) \\ = q_{t} (\bar{x} (t) - E [\bar{x} (t) | y (0 : t - 1), u (0 : t - 1)]), \end{array}$ $\begin{array}{l} x (t) & = q_{t} (x (t) - E [x (t) | y (0 : t - 1), u (0 : t - 1)]) \\ = q_{t} (\bar{x} (t) - E [\bar{x} (t) | y (0 : t - 1), u (0 : t - 1)]), \end{array}$

(5.63)

where q_t is a quantizer. Note that the second equation is due to the sensor’s perfect knowledge of the control action. By applying induction, we can prove the following lemma:

Lemma 2

For a noiseless digital channel, $\bar{y} (t) = y (t)$ $\bar{y} (t) = y (t)$ and $\bar{x} (t) \to y (0 : t) \to u (0 : t - 1)$ $\bar{x} (t) \to y (0 : t) \to u (0 : t - 1)$ is a Markov chain.

A rigorous proof is given in Ref. [14]. Here we provide the proof for t = 1, as illustrated in Fig. 5.12. When t = 0, obviously we have $\bar{y} (0) = y (0)$ $\bar{y} (0) = y (0)$ . Since the control action is completely determined by the received message, $\bar{x} (0), w (0) \to y (0) \to u (0)$ $\bar{x} (0), w (0) \to y (0) \to u (0)$ forms a Markov chain. Since $\bar{x} (1) = A \bar{x} (0) + w (0)$ $\bar{x} (1) = A \bar{x} (0) + w (0)$ , $\bar{x} (1) \to y (0) \to u (0)$ $\bar{x} (1) \to y (0) \to u (0)$ also forms a Markov chain.

f05-12-9780128019504 — Fig. 5.12 Illustration of Markov chains in the noiseless digital channel.

By applying Lemma 1, we draw the conclusion that the control through a noiseless digital channel has no dual effect.

• Memoryless vector Gaussian channel: The optimal encoder for the vector Gaussian channel is still unknown. Ref. [14] restricted the encoder to be linear and deterministic:

$\begin{array}{l} x (t) & = D_{1 t} x (0 : t) + D_{2 t} u (0 : t - 1) \\ + D_{3 t} x (0 : t - 1) + D_{4 t} y (0 : t - 1), \end{array}$ $\begin{array}{l} x (t) & = D_{1 t} x (0 : t) + D_{2 t} u (0 : t - 1) \\ + D_{3 t} x (0 : t - 1) + D_{4 t} y (0 : t - 1), \end{array}$

(5.64)

where {D_nt}_{n=1, 2, 3, 4} are matrices of appropriate dimensions. Since the communication channel output is given by y(t) = x(t) + v(t), it can also be written as a linear combination of the initial state, dynamics noise, communication noise, and control actions:

$\begin{array}{l} y (t) & = F_{1 t} x (0) + F_{2 t} w (0 : t - 1) \\ + F_{3 t} v (0 : t) + F_{4 t} u (0 : t - 1), \end{array}$ $\begin{array}{l} y (t) & = F_{1 t} x (0) + F_{2 t} w (0 : t - 1) \\ + F_{3 t} v (0 : t) + F_{4 t} u (0 : t - 1), \end{array}$

(5.65)

where {F_nt}_{n=1, 2, 3, 4} are the corresponding matrices. The uncontrolled channel output can also be written as

$\begin{array}{l} \bar{y} (t) = F_{1 t} x (0) + F_{2 t} w (0 : t - 1) + F_{3 t} v (0 : t), \end{array}$ $\begin{array}{l} \bar{y} (t) = F_{1 t} x (0) + F_{2 t} w (0 : t - 1) + F_{3 t} v (0 : t), \end{array}$

(5.66)

since the controls are all zero. Then we have $\bar{y} (0 : t) = y (0 : t) - G_{t} u (0 : t - 1)$ $\bar{y} (0 : t) = y (0 : t) - G_{t} u (0 : t - 1)$ , where G_t is an appropriate matrix. When estimating the uncontrolled system state $\bar{x} (t)$ $\bar{x} (t)$ , all the related information in y(0;t) and u(0 : t − 1) is summarized in $\bar{y} (0; t)$ $\bar{y} (0; t)$ . Hence the condition in Lemma 1 is valid, and thus the control through the memoryless vector Gaussian channel also has no dual effect.

Based on the above arguments, we reach the conclusion that, for both the noiseless digital communication channels and memoryless vector Gaussian channels, the structure of separated system state estimation and control is still optimal in linear dynamics.

5.3.3 Sequential Estimation

When the structure of separated system state estimation and control is optimal, the controller can use the same design as that in the case of perfect feedback (e.g., the LQG control). However, the system state estimation needs substantial revisions due to the limited communication capacity. We discuss the communication requirements for system state estimation at the controller in this subsection.

For the linear dynamics in Eq. (5.4), system state estimation can be considered as a lossy source coding problem, since the sensor encodes the observations and the decoder output always has errors except in trivial cases. The lossy source coding is traditionally studied using rate-distortion theory [33], as a branch of information theory, which discusses the minimum communication requirement to satisfy the requirement on the distortion at the decoder output. However, the information-theoretic argument is based on the assumption of infinite codeword length and thus infinite delays, which is reasonable for traditional data communications (e.g., one packet encoded as a codeword may have thousands of bits). Obviously such arguments are not suitable for CPSs, since the encoding procedure must have a limited delay. Hence in contrast to the block structure of traditional data communications, the encoding procedure in a CPS should have a sequential structure, as illustrated in Fig. 5.13.

f05-13-9780128019504 — Fig. 5.13 Comparison between block and sequential estimation.

Before we study sequential rate distortion, we provide a review of traditional rate-distortion theory. When the true information symbol is x while the recovered one is y, we denote the distortion by D(x, y). Then the rate-distortion function is defined as follows:

Definition 8

For a certain information source generating symbols {X(t)}, the rate-distortion function is defined as

$\begin{array}{l} R (D) = inf_{Q_{Y | X} (y | x), D_{Q} \leq D} I (Y; X), \end{array}$ $\begin{array}{l} R (D) = inf_{Q_{Y | X} (y | x), D_{Q} \leq D} I (Y; X), \end{array}$

(5.67)

where Q_Y|X is the conditional probability of generating the recovered symbol Y given the true information symbol X, D_Q is the distortion incurred by the conditional distribution Q_Y|X, and I is mutual information.

Intuitively, the rate-distortion function means the minimum average number of bits needed to achieve a distortion less than or equal to D. Meanwhile, the rate-distortion function can also determine whether a communication channel with a certain capacity can support the transmission of an information source with a given distortion bound. Roughly speaking, when transmitting the information of a Gauss-Markov process through a communication channel having capacity C, distortion D can be achieved if R(D) < C, while it cannot be achieved if R(D) > C. This dichotomy is illustrated in Fig. 5.14.

f05-14-9780128019504 — Fig. 5.14 Dichotomy of the rate-distortion function.

Similarly to the traditional definition of rate-distortion function with infinite delay, Ref. [14] defined the sequential rate-distortion function as follows:

Definition 9

For information source {X(t)} (the recovered symbols are {Y (t)}), the sequential rate-distortion function is defined as

$\begin{array}{l} R_{T}^{s} (D) = inf_{P \in P_{T} (D)} \frac{1}{T} I (X (0 : T - 1); Y (0 : T - 1)), \end{array}$ $\begin{array}{l} R_{T}^{s} (D) = inf_{P \in P_{T} (D)} \frac{1}{T} I (X (0 : T - 1); Y (0 : T - 1)), \end{array}$

(5.68)

where the family of conditional probability $P_{T} (D)$ $P_{T} (D)$ is given by

$\begin{array}{l} P_{T} (D) & = {{P (d Y (t) | x (0 : t - 1), y (0 : t - 1))}_{t = 0, \dots, T - 1} \\ | E [d (X (t), Y (t))] \leq D, \forall t} . \end{array}$ $\begin{array}{l} P_{T} (D) & = {{P (d Y (t) | x (0 : t - 1), y (0 : t - 1))}_{t = 0, \dots, T - 1} \\ | E [d (X (t), Y (t))] \leq D, \forall t} . \end{array}$

(5.69)

We notice that the sequential rate-distortion function has the following features:

• The necessary rate may be dependent on the time T.

• Only the causal information is considered in the mutual information, which is due to the requirement of sequential decoding.

• The expected distortion is no larger than D for each time slot, instead of being averaged over all time slots.

The following theorem shows the application of the sequential rate-distortion function in the lossy transmission of information sources [14]. Note that a sufficient condition is much more difficult to find.

Theorem 9

Consider a memoryless communication channel with channel capacity C. A necessary condition for achieving the distortion D causally is $R_{T}^{s} (D) \leq C$ $R_{T}^{s} (D) \leq C$ for all T.

Gauss-Markov source

Since the sequential rate-distortion function provides a lower bound for the channel capacity of the communication channel, it is important to calculate the sequential rate-distortion function for various information sources for the purpose of online system state estimation. We follow the argument in Ref. [14] to discuss the sequential rate-distortion function for a d-dimensional Gauss-Markov source where

$\begin{array}{l} x (t + 1) = A x (t) + w (t) \end{array}$ $\begin{array}{l} x (t + 1) = A x (t) + w (t) \end{array}$

(5.70)

and

$\begin{array}{l} d (x, y) = ∥ x - y ∥_{2}^{2} . \end{array}$ $\begin{array}{l} d (x, y) = ∥ x - y ∥_{2}^{2} . \end{array}$

(5.71)

We assume a Gaussian communication channel with dimension d.

The following lemma shows the necessary property for the infimizing channel in Eq. (5.68) in the definition of a sequential rate-distortion function:

Lemma 3

The infimizing channel in the Gauss-Markov information source in Eq. (5.70), namely P(dY (t)|x(0 : t − 1), y(0 : t − 1)), is a Gaussian channel of the form P(dY (t)|x(t), y(0 : t − 1)).

Compared with the standard form P(dY (t)|x(0 : t − 1), y(0 : t − 1)), in the form P(dY (t)|x(t), y(0 : t − 1)) the distribution of Y is dependent only on the current input of the communication channel, instead of all the history. Hence the recovered symbols have the following form:

$\begin{array}{l} \hat{x} (t) = F x (t) + G y (0 : t - 1) + z (t), \end{array}$ $\begin{array}{l} \hat{x} (t) = F x (t) + G y (0 : t - 1) + z (t), \end{array}$

(5.72)

where F is a d × d matrix, G is a d × (t − 1)d matrix, and {z(t)} is a series of independent Gaussian random vectors of dimension d.

Such a channel can be realized for a memoryless Gaussian channel with perfect feedback, which is illustrated in Fig. 5.15. The encoder obtains the innovation, which is given by

$\begin{array}{l} x (t) = F_{t} (x (t) - E [x (t) | y (0 : t - 1)]) . \end{array}$ $\begin{array}{l} x (t) = F_{t} (x (t) - E [x (t) | y (0 : t - 1)]) . \end{array}$

(5.73)

Since the communication channel output y(0 : t − 1) can be sent to the encoder via the perfect feedback channel, the encoder can easily compute x(t) as above. The decoder receives y(t) = x(t) + z(t) and then obtains the system state estimation:

$\begin{array}{l} \hat{x} (t) & = F (x (t) - E [x (t) | y (0 : t - 1)]) \\ + F_{t} E [x (t) | y (0 : t - 1)] + G y (0 : t - 1) + z (t) . \end{array}$ $\begin{array}{l} \hat{x} (t) & = F (x (t) - E [x (t) | y (0 : t - 1)]) \\ + F_{t} E [x (t) | y (0 : t - 1)] + G y (0 : t - 1) + z (t) . \end{array}$

(5.74)

The channel input should satisfy the power constraint:

$\begin{array}{l} P \geq E [x (t) x^{T} (t)] = trace [F_{t}^{T} Σ_{x (t) | y (0 : t - 1)} F_{t}], \end{array}$ $\begin{array}{l} P \geq E [x (t) x^{T} (t)] = trace [F_{t}^{T} Σ_{x (t) | y (0 : t - 1)} F_{t}], \end{array}$

(5.75)

where Σ_{x(t)|y(0:t−1)} is the covariance matrix of the prediction error, namely

$\begin{array}{l} Σ_{x (t) | y (0 : t - 1)} = Cov (x (t) - E [x (t) | y (0 : t - 1)]) . \end{array}$ $\begin{array}{l} Σ_{x (t) | y (0 : t - 1)} = Cov (x (t) - E [x (t) | y (0 : t - 1)]) . \end{array}$

(5.76)

f05-15-9780128019504 — Fig. 5.15 Realization of channel P(dY (t)|x(t), y(0 : t − 1)) in a memoryless Gaussian channel with perfect feedback.

For the realized channel, Ref. [14] has proved that

$\begin{array}{l} I (x (0 : T - 1); \hat{x} (0 : T - 1)) = \sum_{t = 0}^{T - 1} I (x (t); y (t)), \end{array}$ $\begin{array}{l} I (x (0 : T - 1); \hat{x} (0 : T - 1)) = \sum_{t = 0}^{T - 1} I (x (t); y (t)), \end{array}$

si127_e (5.77)

because

$\begin{array}{l} I (x (0 : T - 1); \hat{x} (0 : T - 1)) & = \sum_{t = 0}^{T - 1} I (x (0 : T - 1); \hat{x} (t) | \hat{x} (0 : t - 1)) \\ = \sum_{t = 0}^{T - 1} I (x (t); \hat{x} (t) | \hat{x} (0 : t - 1)), \end{array}$ $\begin{array}{l} I (x (0 : T - 1); \hat{x} (0 : T - 1)) & = \sum_{t = 0}^{T - 1} I (x (0 : T - 1); \hat{x} (t) | \hat{x} (0 : t - 1)) \\ = \sum_{t = 0}^{T - 1} I (x (t); \hat{x} (t) | \hat{x} (0 : t - 1)), \end{array}$

si128_e (5.78)

where the second equation is due to the form of the estimation in Eq. (5.72), and

$\begin{array}{l} I (x (t); \hat{x} (t) | \hat{x} (0 : t - 1)) & = h (\hat{x} (t) | \hat{x} (0 : t - 1)) - h (\hat{x} (t) | x (t), \hat{x} (0 : t - 1)) \\ = h (y (t) | \hat{x} (0 : t - 1)) - h (y (t) | x (t), \hat{x} (0 : t - 1)) \\ = h (y (t)) - h (y (t) | x (t)) \\ = I (x (t); y (t)), \end{array}$ $\begin{array}{l} I (x (t); \hat{x} (t) | \hat{x} (0 : t - 1)) & = h (\hat{x} (t) | \hat{x} (0 : t - 1)) - h (\hat{x} (t) | x (t), \hat{x} (0 : t - 1)) \\ = h (y (t) | \hat{x} (0 : t - 1)) - h (y (t) | x (t), \hat{x} (0 : t - 1)) \\ = h (y (t)) - h (y (t) | x (t)) \\ = I (x (t); y (t)), \end{array}$

si129_e (5.79)

where the second equation is because

$\begin{array}{l} \hat{x} (t) = y (t) + F_{t} E [x (t) | \hat{x} (0 : t - 1)] + G_{t} \hat{x} (0 : t - 1) . \end{array}$ $\begin{array}{l} \hat{x} (t) = y (t) + F_{t} E [x (t) | \hat{x} (0 : t - 1)] + G_{t} \hat{x} (0 : t - 1) . \end{array}$

(5.80)

Before we proceed to compute the sequential rate-distortion function, we provide a review of the traditional rate-distortion function. Consider a d-dimensional Gaussian source with independent samples and d × d covariance matrix Σ_x. The distortion function is given by $d (x, \hat{x}) = ∥ x - \hat{x} ∥^{2}$ $d (x, \hat{x}) = ∥ x - \hat{x} ∥^{2}$ . Then we have [33]

$\begin{array}{l} R (D) = \frac{1}{2} \sum_{n = 1}^{d} log \frac{λ_{n}}{δ_{n}}, \end{array}$ $\begin{array}{l} R (D) = \frac{1}{2} \sum_{n = 1}^{d} log \frac{λ_{n}}{δ_{n}}, \end{array}$

si132_e (5.81)

where {λ_n}_{n=1, …, d} are the eigenvalues of Σ_x and

$\begin{array}{l} δ_{n} = \{\begin{matrix} c, & if c \leq λ_{n}, \\ λ_{n}, & if c > λ_{n}, \end{matrix} \end{array}$ $\begin{array}{l} δ_{n} = \{\begin{matrix} c, & if c \leq λ_{n}, \\ λ_{n}, & if c > λ_{n}, \end{matrix} \end{array}$

si133_e (5.82)

where the constant c is chosen such that $\sum_{n = 1}^{d} δ_{n} = D$ $\sum_{n = 1}^{d} δ_{n} = D$ . In particular, when d = 1, we have

$\begin{array}{l} R (D) = \frac{1}{2} log \frac{λ_{1}}{D} . \end{array}$ $\begin{array}{l} R (D) = \frac{1}{2} log \frac{λ_{1}}{D} . \end{array}$

(5.83)

Hence δ_n can be considered as the portion of distortion assigned to the nth eigenvector of Σ_x; then the assignment of {δ_n} can be considered as water filling, as illustrated in Fig. 5.16.

f05-16-9780128019504 — Fig. 5.16 Illustration of water filling in the rate-distortion function of a Gaussian information source.

The optimal channel to minimize the rate R given the distortion D is given as follows [75]. Two channels are defined:

• Backward channel: $P (x | \hat{x})$ $P (x | \hat{x})$ . It is given by

$\begin{array}{l} x = \hat{x} + δ, \end{array}$ $\begin{array}{l} x = \hat{x} + δ, \end{array}$

(5.84)

where δ is the error, which is Gaussian distributed with zero mean and covariance matrix Σ_x|y. Hence the output $\hat{x}$ $\hat{x}$ is also Gaussian distributed with zero mean and expectation:

$\begin{array}{l} Σ_{\hat{x}} = Σ_{x} - Σ_{x | y} . \end{array}$ $\begin{array}{l} Σ_{\hat{x}} = Σ_{x} - Σ_{x | y} . \end{array}$

(5.85)

• Forward channel: $P (\hat{x} | x)$ $P (\hat{x} | x)$ . It is given by

$\begin{array}{l} \hat{x} = H x + z, \end{array}$ $\begin{array}{l} \hat{x} = H x + z, \end{array}$

(5.86)

where $H = E [\hat{x} x^{T}] E [{(x x^{T})}^{- 1}]$ $H = E [\hat{x} x^{T}] E [{(x x^{T})}^{- 1}]$ , and z is Gaussian with zero mean and covariance matrix

$\begin{array}{l} Σ_{z} = E [\hat{x} {\hat{x}}^{T}] - E [\hat{x} x^{T}] E [{(x x^{T})}^{- 1}] E [x {\hat{x}}^{T}] . \end{array}$ $\begin{array}{l} Σ_{z} = E [\hat{x} {\hat{x}}^{T}] - E [\hat{x} x^{T}] E [{(x x^{T})}^{- 1}] E [x {\hat{x}}^{T}] . \end{array}$

(5.87)

To realize the forward channel $\hat{x} = H x + z$ $\hat{x} = H x + z$ , one can define an invertible matrix Γ such that ΓHΓ⁻¹ = I. Define $b = Γ \hat{x}$ $b = Γ \hat{x}$ , a = Γx, and n = Γz. Then the channel b = a + n with power constraint E[a^Ta] ≤ trace(ΓΣ_xΓ^T) is matched to the information source.

We also need the concept of a matched channel. The stochastic kernel $P (d \hat{x} (t) | x (0 : t - 1), \hat{x} (t - 1))$ $P (d \hat{x} (t) | x (0 : t - 1), \hat{x} (t - 1))$ can be considered as a communication channel with input x(t) and output $\hat{x} (t)$ $\hat{x} (t)$ [76]. If the real communication channel is equal to the channel that minimizes the rate given the distortion, then the communication channel is said to be matched to the information source. When the communication channel is matched to the source, there is no need for the structure of separated source encoder and channel encoder. The information can be sent in an uncoded manner without any loss in performance.

Then for the sequential rate-distortion function, we can convert it to the traditional rate-distortion expression. We begin with the scalar Gaussian source case. The distortion is given by $σ_{x}^{2} 2^{- 2 R}$ $σ_{x}^{2} 2^{- 2 R}$ . The encoder computes the innovation information:

$\begin{array}{l} x (t) & - E [x (t) | \hat{x} (0 : t - 1)] \\ = A x (t - 1) + w (t) - A \hat{x} (t - 1) \\ = A (x (t - 1) - \hat{x} (t - 1)) + w (t) \\ = A δ (t - 1) + w (t), \end{array}$ $\begin{array}{l} x (t) & - E [x (t) | \hat{x} (0 : t - 1)] \\ = A x (t - 1) + w (t) - A \hat{x} (t - 1) \\ = A (x (t - 1) - \hat{x} (t - 1)) + w (t) \\ = A δ (t - 1) + w (t), \end{array}$

si149_e (5.88)

where $δ (t) = x - \hat{x} (t)$ $δ (t) = x - \hat{x} (t)$ is the estimation error. Since the error variance is equal to the distortion, namely D(t) = E[δ²(t)], then the variance of the innovation information is given by

$\begin{array}{l} E [∥ x (t) ∥^{2}] = A^{2} D (t - 1) + σ_{w}^{2}, \end{array}$ $\begin{array}{l} E [∥ x (t) ∥^{2}] = A^{2} D (t - 1) + σ_{w}^{2}, \end{array}$

(5.89)

where $σ_{w}^{2}$ $σ_{w}^{2}$ is the scalar notation of Σ_w.

In the above argument for the traditional rate-distortion function, we have shown that the scalar Gaussian variable has variance $σ_{x}^{2}$ $σ_{x}^{2}$ when a Gaussian channel having capacity R is matched to the source. Hence the distortions at different times can be obtained recursively as follows:

$\begin{array}{l} \{\begin{matrix} D (t) = (A^{2} D (t - 1) + σ_{w}^{2}) 2^{- 2 R (t)}, \forall t \geq 1, \\ D (0) = σ_{x_{0}}^{2} 2^{- 2 R (0)} . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} D (t) = (A^{2} D (t - 1) + σ_{w}^{2}) 2^{- 2 R (t)}, \forall t \geq 1, \\ D (0) = σ_{x_{0}}^{2} 2^{- 2 R (0)} . \end{matrix} \end{array}$

si154_e (5.90)

For the sequential rate-distortion function, we have D(t) = D. Hence we have

$\begin{array}{l} \{\begin{matrix} R (t) = max {0, 1 / 2 log (A^{2} + (σ_{w}^{2} / D))}, \forall t \geq 1, \\ R (0) = max {0, 1 / 2 log (σ_{w}^{2} / D)} . \end{matrix} \end{array}$ $\begin{array}{l} \{\begin{matrix} R (t) = max {0, 1 / 2 log (A^{2} + (σ_{w}^{2} / D))}, \forall t \geq 1, \\ R (0) = max {0, 1 / 2 log (σ_{w}^{2} / D)} . \end{matrix} \end{array}$

si155_e (5.91)

When T is sufficiently large, the impact of R(0) will vanish. Hence we have

$\begin{array}{l} lim_{T \to \infty} R_{T}^{s} (D) = max \{0, \frac{1}{2} log (A^{2} + \frac{σ_{w}^{2}}{D})\} . \end{array}$ $\begin{array}{l} lim_{T \to \infty} R_{T}^{s} (D) = max \{0, \frac{1}{2} log (A^{2} + \frac{σ_{w}^{2}}{D})\} . \end{array}$

si156_e (5.92)

Now we handle the generic case of a d-dimensional Gauss-Markov source. The covariance matrix of the distortion error δ(t) is denoted by Σ(t) = E[δ(t)δ^T(t)]. The evolution of the covariance matrix can be easily shown to be

$\begin{array}{l} Σ (t) = A Σ (t - 1) A^{T} + Σ_{w} . \end{array}$ $\begin{array}{l} Σ (t) = A Σ (t - 1) A^{T} + Σ_{w} . \end{array}$

(5.93)

Consider the unitary matrix U(t) that diagonalizes the matrix AΣ(t − 1)A^T + Σ_w:

$\begin{array}{l} U (A Σ (t - 1) A^{T} + Σ_{w}) U^{T} = diag [λ_{1} (t), \dots, λ_{d} (t)] . \end{array}$ $\begin{array}{l} U (A Σ (t - 1) A^{T} + Σ_{w}) U^{T} = diag [λ_{1} (t), \dots, λ_{d} (t)] . \end{array}$

(5.94)

From the previous water-filling argument, to achieve the distortion D, the coding rate needs to be

$\begin{array}{l} R (t) = \sum_{n = 1}^{d} \frac{1}{2} log (\frac{λ_{n} (t)}{σ_{n} (t)}), \end{array}$ $\begin{array}{l} R (t) = \sum_{n = 1}^{d} \frac{1}{2} log (\frac{λ_{n} (t)}{σ_{n} (t)}), \end{array}$

si159_e (5.95)

where

$\begin{array}{l} σ_{n} (t) = \{\begin{matrix} c (t), & if c (t) \leq λ_{n} (t), \\ λ_{n} (t), & if c (t) > λ_{n} (t), \end{matrix} \end{array}$ $\begin{array}{l} σ_{n} (t) = \{\begin{matrix} c (t), & if c (t) \leq λ_{n} (t), \\ λ_{n} (t), & if c (t) > λ_{n} (t), \end{matrix} \end{array}$

si160_e (5.96)

where the constant c(t) is chosen such that $\sum_{n = 1}^{d} σ_{n} (t) = D$ $\sum_{n = 1}^{d} σ_{n} (t) = D$ .

Then by using the rate-distortion function for the scalar Gaussian source, the error covariance matrix of the d-dimensional source at time slot t is given by

$\begin{array}{l} Σ (t) & = U^{T} (t) U (t) (A Σ (t - 1) A^{T} + Σ_{w}) U^{T} (t) \\ \times (\begin{matrix} 2^{- 2 R_{1} (t)} & \dots & \dots \\ ⋱ & ⋮ & ⋱ \\ \dots & \dots & 2^{- 2 R_{d} (t)} \end{matrix}) U (t) . \end{array}$ $\begin{array}{l} Σ (t) & = U^{T} (t) U (t) (A Σ (t - 1) A^{T} + Σ_{w}) U^{T} (t) \\ \times (\begin{matrix} 2^{- 2 R_{1} (t)} & \dots & \dots \\ ⋱ & ⋮ & ⋱ \\ \dots & \dots & 2^{- 2 R_{d} (t)} \end{matrix}) U (t) . \end{array}$

si162_e (5.97)

The evolution of the error covariance matrix is explained in Fig. 5.17. The encoding and decoding procedures are given as follows:

f05-17-9780128019504 — Fig. 5.17 The coding and decoding procedure for a d-dimensional Gaussian source.

1. The sensor receives x(t) and then computes the innovation e(t) = x(t) − E[x(t)|y(0 : t − 1)].

2. The sensor computes the decoupling unitary matrix U and obtains the decoupled components x′(t) = Ue(t).

3. The sensor uses the encoding approach in the scalar Gaussian source to encode each component of e′(t) and sends out the signal through the communication channel.

4. The estimator receives the signals of each component and reconstructs the transferred innovation. The estimation is denoted by $\hat{e^{'}} (t)$ $\hat{e^{'}} (t)$ .

5. The estimator estimates the innovation by $\hat{e} (t) = U^{T} \hat{e^{'}} (t)$ $\hat{e} (t) = U^{T} \hat{e^{'}} (t)$ and then obtains the estimation $\hat{x} (t) = \hat{e} (t) + E [x (t) | y (0 : t - 1)]$ $\hat{x} (t) = \hat{e} (t) + E [x (t) | y (0 : t - 1)]$ .

When the distortion is sufficiently small, we have

$\begin{array}{l} U Σ (t) U^{T} = diag (D / d, \dots, D / d), \end{array}$ $\begin{array}{l} U Σ (t) U^{T} = diag (D / d, \dots, D / d), \end{array}$

(5.98)

and thus the error covariance matrix evolution is given by

$\begin{array}{l} Σ (t) & = U^{T} (t) U (t) (\frac{D}{d} A A^{T} + Σ_{w}) U^{T} (t) \\ \times (\begin{matrix} 2^{- 2 R_{1} (t)} & \dots & \dots \\ ⋱ & ⋮ & ⋱ \\ \dots & \dots & 2^{- 2 R_{d} (t)} \end{matrix}) U (t) \\ = \frac{D}{d} I . \end{array}$ $\begin{array}{l} Σ (t) & = U^{T} (t) U (t) (\frac{D}{d} A A^{T} + Σ_{w}) U^{T} (t) \\ \times (\begin{matrix} 2^{- 2 R_{1} (t)} & \dots & \dots \\ ⋱ & ⋮ & ⋱ \\ \dots & \dots & 2^{- 2 R_{d} (t)} \end{matrix}) U (t) \\ = \frac{D}{d} I . \end{array}$

si167_e (5.99)

Based on the above equality, we have

$\begin{array}{l} R_{n} (t) & = \frac{1}{2} log \frac{λ_{n} (\frac{D}{d} A A^{T} + Σ_{w})}{\frac{D}{d}} \\ = \frac{1}{2} log λ_{n} (A A^{T} + \frac{d}{D} Σ_{w}) . \end{array}$ $\begin{array}{l} R_{n} (t) & = \frac{1}{2} log \frac{λ_{n} (\frac{D}{d} A A^{T} + Σ_{w})}{\frac{D}{d}} \\ = \frac{1}{2} log λ_{n} (A A^{T} + \frac{d}{D} Σ_{w}) . \end{array}$

si168_e (5.100)

Hence the total rate is given by

$\begin{array}{l} \sum_{n = 1}^{d} R_{n} (t) & = \sum_{n = 1}^{d} \frac{1}{2} log λ_{n} (A A^{T} + \frac{d}{D} Σ_{w}) \\ = \frac{1}{2} log |A A^{T} + \frac{d}{D} Σ_{w}| . \end{array}$ $\begin{array}{l} \sum_{n = 1}^{d} R_{n} (t) & = \sum_{n = 1}^{d} \frac{1}{2} log λ_{n} (A A^{T} + \frac{d}{D} Σ_{w}) \\ = \frac{1}{2} log |A A^{T} + \frac{d}{D} Σ_{w}| . \end{array}$

si169_e (5.101)

Then when the distortion D is small and as T tends to infinity, we have

$\begin{array}{l} lim_{T \to \infty} R_{T}^{s} (D) = \frac{1}{2} log |A A^{T} + \frac{d}{D} Σ_{w}| . \end{array}$ $\begin{array}{l} lim_{T \to \infty} R_{T}^{s} (D) = \frac{1}{2} log |A A^{T} + \frac{d}{D} Σ_{w}| . \end{array}$

si170_e (5.102)

Noiseless digital channel

When the communication channel is digital and noiseless, it is not matched to the information source. Since the communication channel is digital, it is necessary to quantize the information source and then transmit the discrete bits through the digital channel. For the sequential rate-distortion function, we need a sequential quantizer as follows [14]:

Definition 10

A sequential quantizer is a series of functions mapping from R^(t+1)×d × R^t×d to R^d, and the corresponding range is at most countable.

For practical applications, we consider the following “operational sequential rate-distortion function,” which explicitly considers the quantizer structures:

Definition 11

The operational sequential rate-distortion function is defined as

$\begin{array}{l} R_{T}^{s, o} (D) = inf_{q_{0}, \dots, q_{T - 1} \in F_{T}^{o}} \frac{1}{T} H (\hat{x} (0 : T - 1)), \end{array}$ $\begin{array}{l} R_{T}^{s, o} (D) = inf_{q_{0}, \dots, q_{T - 1} \in F_{T}^{o}} \frac{1}{T} H (\hat{x} (0 : T - 1)), \end{array}$

(5.103)

where $F_{T}^{o}$ $F_{T}^{o}$ is the set of quantizers satisfying the distortion constraint, namely

$\begin{array}{l} F_{T}^{o} = {(q_{0}, \dots, q_{T - 1}) | E [d (x (t), \hat{x} (t))] \leq D} . \end{array}$ $\begin{array}{l} F_{T}^{o} = {(q_{0}, \dots, q_{T - 1}) | E [d (x (t), \hat{x} (t))] \leq D} . \end{array}$

(5.104)

Intuitively, the operational sequential rate-distortion function is to minimize the uncertainty of reconstruction while keeping the expected distortion at each time below D. However, it is more difficult to compute the exact value of the operational sequential rate-distortion function since it involves the structures of the quantizers. We can use the sequential rate-distortion function as an approximation. It is easy to verify that the sequential rate-distortion function provides a lower bound for the operational sequential rate-distortion function since

$\begin{array}{l} H (\hat{x} (0 : T - 1)) \geq I (x (0 : T - 1), \hat{x} (0 : T - 1)) . \end{array}$ $\begin{array}{l} H (\hat{x} (0 : T - 1)) \geq I (x (0 : T - 1), \hat{x} (0 : T - 1)) . \end{array}$

(5.105)

Moreover, Ref. [14] has shown that, for very low distortion (thus very high coding rate), the operational sequential rate-distortion and sequential rate-distortion functions are infinitesimally close to each other:

Theorem 10

For each T, we have

$\begin{array}{l} lim_{D \to 0} \frac{R_{T}^{S, o} (D)}{R_{T}^{S} (D)} = 1 . \end{array}$ $\begin{array}{l} lim_{D \to 0} \frac{R_{T}^{S, o} (D)}{R_{T}^{S} (D)} = 1 . \end{array}$

si175_e (5.106)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Communication capacity requirements

Create new playlist

Sign In

Sign Up