In this chapter, the requirements of communication capacity are studied. Both deterministic and stochastic physical dynamics are considered. Various approaches in different setups are taken, such as entropy-based analysis and anytime capacity. Finally, related topics in statistical mechanics are discussed.
Stability; Capacity; Topological entropy; Cybernetics; Communication complexity; Entropy propagation
In this chapter, we study the communication capacity requirements (usually in terms of bits) for controlling the physical dynamics in cyber physical systems (CPSs). This is the first step of communication network design in CPSs, since we need to understand how much communication is needed before we design the details of the communication network.
There is no unified framework to study communication requirements, since this is still an open question. Moreover, the problem may be formulated in different ways for different contexts. In this chapter, we will consider the following methodologies and contexts:
• Deterministic physical dynamics with uncertain initial states, which is appropriate to model the physical dynamics of a sudden random perturbation but deterministic subsequent evolution: topological entropy is used to describe the communication requirements. The introduction mainly follows the work of Ref. [11].
• State estimation of stochastic systems, which is appropriate to model a system subject to random perturbations and a control policy of separated estimation and control: We study the communication requirements for reliably estimating the system state in stochastic systems. The introduction mainly follows the work of Ref. [14].
• Stability control for stochastic systems: It has been found that the anytime capacity measures the capability of communication channels for feedback control to stabilize the physical dynamics [13]. We provide a brief introduction to the concept of anytime capacity and the corresponding applications.
• Shannon entropy-based approach: The above approaches can provide the communication requirements for system stability. However, it is useful to further study the communication requirements subject to different levels of control precision (e.g., the disorderliness of the system state). We adopt the concept of entropy and the intuition from the second law of thermodynamics [15] to study the communication requirements if a certain entropy of the physical dynamics needs to be achieved.
• Communication complexity: Here we study the case of two agents wanting to achieve the same desired system state and consider it as a distributed computing problem. We follow the study in Ref. [70] to apply the theory of communication complexity to obtain bounds for the communication requirements.
• Thermodynamics argument: We consider a physical system governed by thermodynamics laws, and use the theory of nonequilibrium statistical mechanics to study the impact of communications on entropy generation. This mainly originates from Ref. [71].
The contents of the different sections are summarized in Table 5.1.
Table 5.1
Summary of Methodologies and Setups of the Studies in Different Sections
Section | Dynamics Types | Communication Type | Key Concept | Goal | Source |
5.2 | Deterministic | Sensor to controller | Topological entropy | Stability | [11] |
5.3 | Stochastic | Sensor to controller | Rate distortion | Estimation precision | [14] |
5.4 | Stochastic | Sensor to controller | Anytime capacity | Stability | [13] |
5.5 | Stochastic | Sensor to controller | Shannon entropy | Control precision | [15] |
5.6 | Deterministic | Controller to controller | Communication complexity | Achieving desired state | [70] |
5.7 | Thermodynamics | Sensor to controller | Fluctuation theorem | Entropy generation | [71] |
We denote by x the N-dimensional system state. For the generic case, the system state of physical dynamics evolves in the following manner:
for the continuous-time case, where the functions f and g represent the evolution of physical dynamics and the observation mechanism, and
for the discrete-time case.
A simpler but very useful model is that of linear dynamics, where the physical dynamics evolve as follows. For the continuous-time case we have
while for the discrete-time case, we have
We begin with the case of a deterministic dynamical system, in which there is no random perturbation in the system state evolution and the observation mechanism. For simplicity, we consider only the discrete-time system. Then the system dynamics can be described as
If everything is deterministic, then there is no need for communications since we can calculate the control action in advance. In this section, we assume that the initial system state x(0) is unknown to us and the distribution of x is unknown. Then communications are needed to convey the information about x(0) from the observation and thus estimate the current system state x(t). The uncertainty incurred by the initial state x(0) will be changed by the system dynamics and thus will create a need for communications. If the uncertainty in the system state is quickly removed (e.g., the system state converges to a deterministic value in open-loop control), then there is no need for communications in order to stabilize the system dynamics; at this point, we say that the system is simple. If the uncertainty in the system dynamics is amplified, then we refer to the system as complex. The complexity of the physical dynamics is measured by the topological entropy, which will be explained subsequently. As we will see, the topological entropy creates a fundamental limit for the communication capacity requirement. Consider the illustrations in Fig. 5.1, which show the state trajectory in two-dimensional phase space; we observe that the linear case is much simpler than the Lorentz transform.
In this subsection, we define topological entropy. There are three equivalent definitions, which will be introduced separately. For all these definitions, we consider a topological dynamical system, which is represented by the triplet (X, T, S). X is a metric space (namely, a distance is defined for any pair of points), in which the metric is denoted by d. We assume that X is compact. We define a transformation T which maps from X to X. Moreover, we assume that T is continuous; i.e., T−1 maps from open sets to open sets.
We define the nth order ball of point x ∈ X, denoted by Bn(x, ϵ), as the ball around x with radius ϵ with respect to metric dn defined as
Fig. 5.2 shows that a track beginning from x2 is within Bn(x1, ϵ).
We say a set F is (n, ϵ)-spanning if it intersects every (n, ϵ)-ball (i.e., intersecting Bn(x, ϵ) for every x). Then we define
Then we define the spanning orbit-based topological entropy as follows, where the subscript s denotes “spanning”:
We say a subset F of X is (n, ϵ)-separated if dn(xi, xj) > ϵ for all xi, xj ∈ F and xi≠xj. Since X is compact, all (n, ϵ)-separated sets have finite elements, which is illustrated in Fig. 5.3. The maximal cardinality of all (n, ϵ)-separated sets is denoted by
We denote by s(n, ϵ) the maximum cardinality of a product (n, ϵ)-separated set, namely
Then we define the metric-based product topological entropy as follows:
The cover-based definition of topological entropy does not require the definition of a metric; hence it is valid even in spaces without a metric structure. We define a cover of X as a family of open sets that cover X. Since X is compact, we can always find a finite subset of open sets of to cover X. We say that a cover is a subcover of cover if . The minimum cardinality of a subcover of cover is denoted by .
We define the join of two covers and as
We further define
Based on the above definitions, we can now define the cover-based topological entropy:
The above three definitions of topological entropy look quite different, focusing on different aspects of the dynamical system. Interestingly, they are equivalent in metric spaces:
The proof is highly nontrivial. The details can be found in chapter 6 of Ref. [12]. Due to their equivalence, we use only the notation hs(T) in the subsequent discussion.
As we have seen, the topological entropy measures the complexity of the transform in the dynamical system. Intuitively, a more complex dynamical system requires more communications for estimating and controlling the dynamics, since the controller needs more information to obtain the precise status. In this subsection, we will prove that the topological entropy provides a tight bound for the communication requirements in deterministic dynamical systems having uncertain initial states. The argument follows chapter 2 of Ref. [11].
The model of the CPS is given in Fig. 5.4. We consider the following system dynamics:
which is a simpler version of Eq. (5.1). The initial value x(0) is unknown. An encoder observes the system state and sends out a message h(jT) every T time slots, where j is the index of the message. The total number of messages is denoted by L. Hence the average transmission rate is given by . If the capacity of the communication channel is R, then we have
The transmitted message is essentially a function of the observation history, i.e.,
where the subscript j in Fj means that the encoding mechanism can be time varying.
Upon receiving a message from the encoder, the controller decodes it and carries out the following possible actions:
• System state estimation: The controller estimates the system states between times (j − 1)T + 1 and jT (i.e., the system state in the previous period of the message), namely . The outcome of the estimator is given by
where the subscript j in Gj means that the estimation algorithm can be time varying.
• System state control: The controller computes the control action in the next T time slots (i.e., before receiving the next message), namely u(jT + 1), …, u((j + 1)T). The outcome of the controller is given by
where the subscript j in Gj means that the control algorithm can be time varying.
For simplicity, we assume that there is no transmission error during the communications. This is reasonable if powerful error correction codes are applied and a single sparse error causes only negligible impact on the dynamics. Hence the limit of communications is focused only on the capacity, not the reliability.
Then the question is: How much communication is needed for a precise system state estimation or a stable system dynamics?
We first focus on the system state estimation. We define the observability of the system as follows:
For the observability of the system, we have the following conclusion (Theorem 2.3.6 in Ref. [11]):
Here we provide an intuitive explanation of the encoding and decoding mechanism to achieve observability, as well as the reason for the unobservability when the transmission rate is too low. A rigorous proof can be found in chapter 2 of Ref. [11].
• R < hs(T): We assume that a coder-decoder pair can make the system observable. Then for n time slots and any ϵ > 0, we can find 2⌈nR⌉ spanning points in time [0, n − 1] such that they form an (n, ϵ)-separated orbits, due to the definition of observability. Then the topological entropy of the system will be no larger than , which contradicts the assumption that the topological entropy is hs(T).
• R > hs(T): Due to the definition of hs(T), for any n > 1 and ϵ > 0, we can always find an (n, ϵ)-spanning set with cardinality no larger than 2nR in the state space. Then we can simply transmit the index of the corresponding point in the spanning set (thus taking a transmission rate no larger than R) and achieve an error less than ϵ. This encoding procedure is illustrated in Fig. 5.5.
The communication requirements for control are more complicated. We focus on only the special case of linear systems without noise and with direct observation of the system state:
The control action is for the purpose of system stabilization, which is defined as follows:
Intuitively, that the system is stabilizable means that we can arbitrarily control the system state given sufficient time. The following two assumptions are subsequently made:
• The initial state x(0) lies in a compact set which has the origin as an interior point; i.e., there exists a δ > 0 such that implies .
• The system (A, B) is stabilizable given a perfect system state feedback.
Then the following theorem provides conditions for the stabilizabilities of the linear system:
The first conclusion (R < hs(A)) in the above theorem can be obtained from contradiction, as illustrated in Fig. 5.6. A detailed proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. We assume that the linear system is still stabilizable when R < hs(A). Since R < hs(A), we can find a constant ϵ > 0 such that
where s(k, ϵ) is the metric-based topological entropy. This implies that there is a (jT, ϵ)-separated set S of cardinality N such that
and for any two different points x1 and x2 in S satisfying
On the other hand, we set . Since the system is stabilizable, we can always find control actions for each element x in S such that . There are no more than 2jRT possible control action sequences. Hence we can always find a sequence of controls, u*(0), …, u*(jT), in these sequences such that we can find x1 and x2 that are both well controlled by the sequence {u*}; i.e., both x1′(t) =x1(t) +xu(t) and x2′(t) =x2(t) +xu(t) satisfy Eq. (5.23) with respect to ϵ0. Then it is easy to verify
This contradicts the assumption of the (jT, ϵ)-separated set S. Hence the cardinality of the message set cannot be less than .
The second conclusion (R > hs(A)) in the theorem can be derived from the conclusions in the subsequent treatment of the optimal control of linear systems.
In the previous discussion, the feedback is to stabilize the system, regardless of the corresponding cost (e.g., a very large control action u). In the optimal control of linear systems, the costs of both system state magnitude and control action power are considered; i.e., the system cost with initial state x(0) is given by
where both Q and R are positive definite matrices.
When the feedback is perfect (i.e., there is no constraint on the communication rate), it is well known that the optimal feedback control is given in [72]
where
and the matrix P satisfies the following equation:
The optimal cost function is given by
Now we consider the constraint on the communication rate; i.e., R is bounded. We define the solvability of the optimal control as follows [11]:
The conclusion on the communication requirements for the solvable optimal control is given in the following theorem:
The conclusion for R < hs(A) is a straightforward conclusion of Theorem 3, since the system is not even stabilizable when R < hs(A). The proof for the case of R > hs(A) is much more involved (note that the corresponding conclusion in Theorem 3 is merely a special case). A rigorous proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. First we choose a reasonable period T. Then we consider the system:
We can find a set Ω of initial point with cardinality 2(T+1)α, where hs(A) < α < R, with the following property: For any initial state x(0), we can always find a point in Ω such that the trace in Eq. (5.34) triggered by is very close to that triggered by x(0) within the time duration [0, T]. For each point in Ω, we can compute the corresponding control actions within time period [0, T], namely u*(0), …, u*(T − 1).
Then we define the following encoder and controller pair for time interval [0, T − 1]:
• Encoder: Given the initial state x(0), we choose the corresponding point in Ω, namely , such that the traces are sufficiently close to each other in [0, T − 1]:
• Controller: In the time interval [0, T − 1], the control action is set as the optimal control action for , namely
For the jth time period, namely [(j − 1)T, jT − 1], we can use the same encoding and control mechanism by scaling the system state and control actions, namely
and
The whole procedure is illustrated in Fig. 5.7.
The above discussions show the importance of the topological entropy on the communication requirements for controlling the physical dynamics in a CPS. Hence it is very important to evaluate the topological entropy of a given dynamical system.
The topological entropy of linear systems is fully explained in the following theorem (the rigorous proof is given in Theorem 2.4.2 of Ref. [11]):
An immediate conclusion is that, when all the eigenvalues of A are within the unit circle, the corresponding topological entropy is zero. Hence the corresponding communication requirement is zero; i.e., no communication is needed for observing or stabilizing the physical systems. This is obvious since the system state will converge to zero spontaneously.
For the generic case of dynamical systems, there have been no systematic approaches to compute the topological entropy. The existing approaches can be categorized into the following two classes:
• Consider some special types of dynamics, such as one-dimensional piecewise monotone mappings [73].
• Discretize the mapping and then using the approaches of calculating the topological entropy of symbol dynamics.
In this book, we provide a brief introduction to the second approach by following the argument in Ref. [74]. We consider a compact topological space M and partition it into a set of subsets Q = {A1, …, Aq}. Consider a continuous mapping T on M and a discrete time period {0, …, t − 1}. We define
namely the set of all possible t-strings, each of which represents a series of regions in Q that a trajectory of T passes. An example is given in Fig. 5.8, where the trajectory generates a 6-string (1, 4, 2, 6, 7, 3).
Then we define the following quantity:
We further define the concept of generating partition:
Then we obtain the following conclusion that provides bounds for the topological entropy of T:
The proof of the theorem can be found in Ref. [74].
Based on this theorem, an algorithm for computing an upper bound of the topological entropy was proposed in Ref. [74], consisting of the following four steps:
• Step 1: Select a coarse partition of M, which is denoted by A.
• Step 2: Select a partition B = {B1, …, BK} that is much finer than A. Each element in A is the union of a set of elements in B. Then we define a topological Markov chain. The transition matrix is given by
which means that Bij equals 1 if Bi and Bj are two successive regions that a trajectory of T visits. A sequence (b0, …, bN−1) is called a B-word if . Each B-word corresponds to an A-word (a0, …, aN−1), where . Then we define the set of A-words as
It is easy to verify
which implies
It has been shown that h(B, A) converges to h(T, A) if the diameter of B tends to zero.
• Step 3. The set of all B-words forms a sofic shift, whose detailed definition can be found in Ref. [74]. First a directed labeled graph can be constructed, in which each node corresponds to one element in B and an edge exists between nodes i and j if Bij = 1. Moreover, if Bi ∈ α ∈ A, we label the edge ij (if it exists) as α. The following example from Ref. [74] illustrates the procedure:
• Step 4. We compute the entropy of the sofic shift, which is shown to be equal to the topological entropy of the corresponding Markov chain. To that end, we need to find a new graph G′(B, A) satisfying
• All outgoing edges of the same node have different labels.
The following algorithm from Ref. [74] has been proposed to find such a subgraph of G′, which is called a reduced right-resolving presentation of G:
1. Begin from an empty graph R. Add a single node in G(B, A) to R.
2. If there is at least one hyper node without outgoing hyper edges, choose one of them and label it by H. Otherwise, go to Step 6.
3. Since the hyper node H consists of multiple nodes in G(B, A), define H′ as the set of all nodes in G(B, A) reached by the outgoing edges of the nodes in H that have the same label (say, Ai). Define a new hyper node H′ and a hyper edge pointing from H to H′. Label it as Ai.
4. Repeat Step 3 for all possible labels in A.
5. Return to Step 2.
6. If a hyper node has no incoming hyper edges, remove it.
7. If a hyper node is removed, return to Step 6. Otherwise, stop and output R.
Take the example in Fig. 5.9, for instance. If we begin from node B1, then we label B1 as H1. Since all the outgoing edges of B1 are labeled as A1, in Step 3 we form a hyper node H2 consisting of B1 and B2 (since the two outgoing edges reach B1 and B2). Since H2 does not have any outgoing edges, we form a hyper node H3 consisting of B1, B2, B3 and B4. Then we consider H3 whose outgoing edges have two labels A1 and A2. For A1, we form a hyper node H4 consisting of B1, B2, and B3; for A2, the outgoing edge returns to H4. This results in the graph shown in Fig. 5.10. Finally, we remove the hyper node H1.
From the graph R, we can also obtain its adjacency matrix. The example in Fig. 5.10 has the matrix R given by
We can obtain the eigenvalues of R, which are related to the topological entropy that we want to compute, in the following theorem [74]:
Note that although the above procedure provides a systematic approach for the concrete calculation of topological entropy, it does not completely solve the problem for the following reasons: (1) It is still not clear how to verify that a partition is generating. (2) It is still not clear how to choose the radius of the partitions. Hence we can only choose as small a radius as possible, within the capability of computing.
Many control systems use the structure of separated estimation and control, as illustrated in Fig. 5.11, in which an estimator provides an estimation of the system state, and then a controller considers the estimate as the true system state and computes the corresponding control action. However, such a separated structure may not be optimal, although it is more feasible in practice. On the other hand, source coding is needed for the communication of observations y in order to estimate the system state x, which is usually lossy when the system state is continuous. In traditional communication systems, the corresponding information-theoretic communication requirement for lossy source coding is based on infinite block length and thus infinite delay, which is infeasible in the context of real-time control. Hence in this section, we follow the research in Ref. [14] and study the following two fundamental questions:
• When the feedback is passed through a communication channel with limited capacity, is the structure of separated estimation and control still optimal?
• When the estimation is real-time with limited delay, what are the communication requirements for the system state estimation?
We first introduce the system model, which is adopted in Ref. [14]. The overall system has the same structure as that in Fig. 5.4.
We assume that the system state dynamics is described by the linear system dynamics in Eq. (5.4). The noise {w(t)}t is a sequence of i.i.d. Gaussian distributed random variables. The initial state x(0) is assumed to be Gaussian with zero expectation and covariance matrix Σx(0). For simplicity, we assume that the system state can be observed directly by the encoder at the sensor.
The communication channel is assumed to be stochastic, characterized by stochastic kernels P(dy|x), where y and x are the output and input of the channel. For simplicity we assume that the channel is time-invariant and memoryless. The following two special channels are discussed in Ref. [14]:
• Noisy digital channel with rate R: The input and the output of the channel are the same. There are a total of 2R messages.
• Memoryless vector Gaussian channel: Both the input and output alphabets are d-dimensional real spaces. The output of the channel is given by
where v(t) is a series of i.i.d. Gaussian distributed random vectors with zero expectation and covariance matrix Σv. The input is average power constrained; i.e., E[∥x∥2] ≤ P0.
At each time slot t, the sensor obtains an observation x(t), generates and transmits a symbol x(t) through the communication channel. Then the controller receives a symbol y(t), decodes it and obtains a system state estimation . Based on , the controller computes the control action u(t).
The following three possible information patterns are considered, where Enc denotes the function of encoding:
• The encoder output is a function of all previous knowledge, namely
• The encoder does not know the channel and decoder outputs, namely
• The encoder uses only the current system state to encode the message:
The decoder uses a stochastic mechanism for decoding, namely generating the system state estimation according to the following conditional distribution:
namely, it is dependent on the previous received communication symbols, the previous system state estimations and control actions.
The controller is a stochastic one, which is generated according to the condition probability , which is dependent only on the current system state estimation.
Now we study whether the communication channel will affect the optimality of the separated structure of system state estimation and control. In traditional linear quadratic Gaussian (LQG) control with perfect feedback, the certainty equivalence guarantees the optimality of the following control law:
and
which represent separated control and system estimation. We define the estimation error as
We say that the control has no dual effect if (here denotes the channel output when there is no feedback control)
which means that the error covariance matrix is independent of the control action taken by the controller. An intuitive explanation of the no dual effect is given in Ref. [14]: the control action can be used to both control the system state evolution and prob the system in order to estimate the system state; the no dual effect means that the control action is fully used to control the system state (since the system state estimation error is irrelevant to the control action). The following theorem shows that the no dual effect is equivalent to the certainty equivalence [14]:
The following lemma provides a sufficient condition for the no dual effect, which can be used to judge whether a control action with certainty equivalence is optimal:
Based on this lemma, Ref. [14] discussed the no dual effect in the two communication channels introduced above:
• Noiseless digital channel: Since the input and the output of the communication channel are the same (while the number of bits is limited), the sensor can know exactly the control action that the controller will take. It is shown in Ref. [14] that the optimal encoder has the following form:
where qt is a quantizer. Note that the second equation is due to the sensor’s perfect knowledge of the control action. By applying induction, we can prove the following lemma:
A rigorous proof is given in Ref. [14]. Here we provide the proof for t = 1, as illustrated in Fig. 5.12. When t = 0, obviously we have . Since the control action is completely determined by the received message, forms a Markov chain. Since , also forms a Markov chain.
By applying Lemma 1, we draw the conclusion that the control through a noiseless digital channel has no dual effect.
• Memoryless vector Gaussian channel: The optimal encoder for the vector Gaussian channel is still unknown. Ref. [14] restricted the encoder to be linear and deterministic:
where {Dnt}n=1, 2, 3, 4 are matrices of appropriate dimensions. Since the communication channel output is given by y(t) = x(t) + v(t), it can also be written as a linear combination of the initial state, dynamics noise, communication noise, and control actions:
where {Fnt}n=1, 2, 3, 4 are the corresponding matrices. The uncontrolled channel output can also be written as
since the controls are all zero. Then we have , where Gt is an appropriate matrix. When estimating the uncontrolled system state , all the related information in y(0;t) and u(0 : t − 1) is summarized in . Hence the condition in Lemma 1 is valid, and thus the control through the memoryless vector Gaussian channel also has no dual effect.
Based on the above arguments, we reach the conclusion that, for both the noiseless digital communication channels and memoryless vector Gaussian channels, the structure of separated system state estimation and control is still optimal in linear dynamics.
When the structure of separated system state estimation and control is optimal, the controller can use the same design as that in the case of perfect feedback (e.g., the LQG control). However, the system state estimation needs substantial revisions due to the limited communication capacity. We discuss the communication requirements for system state estimation at the controller in this subsection.
For the linear dynamics in Eq. (5.4), system state estimation can be considered as a lossy source coding problem, since the sensor encodes the observations and the decoder output always has errors except in trivial cases. The lossy source coding is traditionally studied using rate-distortion theory [33], as a branch of information theory, which discusses the minimum communication requirement to satisfy the requirement on the distortion at the decoder output. However, the information-theoretic argument is based on the assumption of infinite codeword length and thus infinite delays, which is reasonable for traditional data communications (e.g., one packet encoded as a codeword may have thousands of bits). Obviously such arguments are not suitable for CPSs, since the encoding procedure must have a limited delay. Hence in contrast to the block structure of traditional data communications, the encoding procedure in a CPS should have a sequential structure, as illustrated in Fig. 5.13.
Before we study sequential rate distortion, we provide a review of traditional rate-distortion theory. When the true information symbol is x while the recovered one is y, we denote the distortion by D(x, y). Then the rate-distortion function is defined as follows:
Intuitively, the rate-distortion function means the minimum average number of bits needed to achieve a distortion less than or equal to D. Meanwhile, the rate-distortion function can also determine whether a communication channel with a certain capacity can support the transmission of an information source with a given distortion bound. Roughly speaking, when transmitting the information of a Gauss-Markov process through a communication channel having capacity C, distortion D can be achieved if R(D) < C, while it cannot be achieved if R(D) > C. This dichotomy is illustrated in Fig. 5.14.
Similarly to the traditional definition of rate-distortion function with infinite delay, Ref. [14] defined the sequential rate-distortion function as follows:
We notice that the sequential rate-distortion function has the following features:
• The necessary rate may be dependent on the time T.
• Only the causal information is considered in the mutual information, which is due to the requirement of sequential decoding.
• The expected distortion is no larger than D for each time slot, instead of being averaged over all time slots.
The following theorem shows the application of the sequential rate-distortion function in the lossy transmission of information sources [14]. Note that a sufficient condition is much more difficult to find.
Since the sequential rate-distortion function provides a lower bound for the channel capacity of the communication channel, it is important to calculate the sequential rate-distortion function for various information sources for the purpose of online system state estimation. We follow the argument in Ref. [14] to discuss the sequential rate-distortion function for a d-dimensional Gauss-Markov source where
and
We assume a Gaussian communication channel with dimension d.
The following lemma shows the necessary property for the infimizing channel in Eq. (5.68) in the definition of a sequential rate-distortion function:
Compared with the standard form P(dY (t)|x(0 : t − 1), y(0 : t − 1)), in the form P(dY (t)|x(t), y(0 : t − 1)) the distribution of Y is dependent only on the current input of the communication channel, instead of all the history. Hence the recovered symbols have the following form:
where F is a d × d matrix, G is a d × (t − 1)d matrix, and {z(t)} is a series of independent Gaussian random vectors of dimension d.
Such a channel can be realized for a memoryless Gaussian channel with perfect feedback, which is illustrated in Fig. 5.15. The encoder obtains the innovation, which is given by
Since the communication channel output y(0 : t − 1) can be sent to the encoder via the perfect feedback channel, the encoder can easily compute x(t) as above. The decoder receives y(t) = x(t) + z(t) and then obtains the system state estimation:
The channel input should satisfy the power constraint:
where Σx(t)|y(0:t−1) is the covariance matrix of the prediction error, namely
For the realized channel, Ref. [14] has proved that
because
where the second equation is due to the form of the estimation in Eq. (5.72), and
where the second equation is because
Before we proceed to compute the sequential rate-distortion function, we provide a review of the traditional rate-distortion function. Consider a d-dimensional Gaussian source with independent samples and d × d covariance matrix Σx. The distortion function is given by . Then we have [33]
where {λn}n=1, …, d are the eigenvalues of Σx and
where the constant c is chosen such that . In particular, when d = 1, we have
Hence δn can be considered as the portion of distortion assigned to the nth eigenvector of Σx; then the assignment of {δn} can be considered as water filling, as illustrated in Fig. 5.16.
The optimal channel to minimize the rate R given the distortion D is given as follows [75]. Two channels are defined:
• Backward channel: . It is given by
where δ is the error, which is Gaussian distributed with zero mean and covariance matrix Σx|y. Hence the output is also Gaussian distributed with zero mean and expectation:
• Forward channel: . It is given by
where , and z is Gaussian with zero mean and covariance matrix
To realize the forward channel , one can define an invertible matrix Γ such that ΓHΓ−1 = I. Define , a = Γx, and n = Γz. Then the channel b = a + n with power constraint E[aTa] ≤ trace(ΓΣxΓT) is matched to the information source.
We also need the concept of a matched channel. The stochastic kernel can be considered as a communication channel with input x(t) and output [76]. If the real communication channel is equal to the channel that minimizes the rate given the distortion, then the communication channel is said to be matched to the information source. When the communication channel is matched to the source, there is no need for the structure of separated source encoder and channel encoder. The information can be sent in an uncoded manner without any loss in performance.
Then for the sequential rate-distortion function, we can convert it to the traditional rate-distortion expression. We begin with the scalar Gaussian source case. The distortion is given by . The encoder computes the innovation information:
where is the estimation error. Since the error variance is equal to the distortion, namely D(t) = E[δ2(t)], then the variance of the innovation information is given by
where is the scalar notation of Σw.
In the above argument for the traditional rate-distortion function, we have shown that the scalar Gaussian variable has variance when a Gaussian channel having capacity R is matched to the source. Hence the distortions at different times can be obtained recursively as follows:
For the sequential rate-distortion function, we have D(t) = D. Hence we have
When T is sufficiently large, the impact of R(0) will vanish. Hence we have
Now we handle the generic case of a d-dimensional Gauss-Markov source. The covariance matrix of the distortion error δ(t) is denoted by Σ(t) = E[δ(t)δT(t)]. The evolution of the covariance matrix can be easily shown to be
Consider the unitary matrix U(t) that diagonalizes the matrix AΣ(t − 1)AT + Σw:
From the previous water-filling argument, to achieve the distortion D, the coding rate needs to be
where
where the constant c(t) is chosen such that .
Then by using the rate-distortion function for the scalar Gaussian source, the error covariance matrix of the d-dimensional source at time slot t is given by
The evolution of the error covariance matrix is explained in Fig. 5.17. The encoding and decoding procedures are given as follows:
1. The sensor receives x(t) and then computes the innovation e(t) = x(t) − E[x(t)|y(0 : t − 1)].
2. The sensor computes the decoupling unitary matrix U and obtains the decoupled components x′(t) = Ue(t).
3. The sensor uses the encoding approach in the scalar Gaussian source to encode each component of e′(t) and sends out the signal through the communication channel.
4. The estimator receives the signals of each component and reconstructs the transferred innovation. The estimation is denoted by .
5. The estimator estimates the innovation by and then obtains the estimation .
When the distortion is sufficiently small, we have
and thus the error covariance matrix evolution is given by
Based on the above equality, we have
Hence the total rate is given by
Then when the distortion D is small and as T tends to infinity, we have
When the communication channel is digital and noiseless, it is not matched to the information source. Since the communication channel is digital, it is necessary to quantize the information source and then transmit the discrete bits through the digital channel. For the sequential rate-distortion function, we need a sequential quantizer as follows [14]:
For practical applications, we consider the following “operational sequential rate-distortion function,” which explicitly considers the quantizer structures:
Intuitively, the operational sequential rate-distortion function is to minimize the uncertainty of reconstruction while keeping the expected distortion at each time below D. However, it is more difficult to compute the exact value of the operational sequential rate-distortion function since it involves the structures of the quantizers. We can use the sequential rate-distortion function as an approximation. It is easy to verify that the sequential rate-distortion function provides a lower bound for the operational sequential rate-distortion function since
Moreover, Ref. [14] has shown that, for very low distortion (thus very high coding rate), the operational sequential rate-distortion and sequential rate-distortion functions are infinitesimally close to each other: