Chapter 5

Communication capacity requirements

Abstract

In this chapter, the requirements of communication capacity are studied. Both deterministic and stochastic physical dynamics are considered. Various approaches in different setups are taken, such as entropy-based analysis and anytime capacity. Finally, related topics in statistical mechanics are discussed.

Keywords

Stability; Capacity; Topological entropy; Cybernetics; Communication complexity; Entropy propagation

5.1 Introduction

In this chapter, we study the communication capacity requirements (usually in terms of bits) for controlling the physical dynamics in cyber physical systems (CPSs). This is the first step of communication network design in CPSs, since we need to understand how much communication is needed before we design the details of the communication network.

5.1.1 Methodologies and Contexts

There is no unified framework to study communication requirements, since this is still an open question. Moreover, the problem may be formulated in different ways for different contexts. In this chapter, we will consider the following methodologies and contexts:

 Deterministic physical dynamics with uncertain initial states, which is appropriate to model the physical dynamics of a sudden random perturbation but deterministic subsequent evolution: topological entropy is used to describe the communication requirements. The introduction mainly follows the work of Ref. [11].

 State estimation of stochastic systems, which is appropriate to model a system subject to random perturbations and a control policy of separated estimation and control: We study the communication requirements for reliably estimating the system state in stochastic systems. The introduction mainly follows the work of Ref. [14].

 Stability control for stochastic systems: It has been found that the anytime capacity measures the capability of communication channels for feedback control to stabilize the physical dynamics [13]. We provide a brief introduction to the concept of anytime capacity and the corresponding applications.

 Shannon entropy-based approach: The above approaches can provide the communication requirements for system stability. However, it is useful to further study the communication requirements subject to different levels of control precision (e.g., the disorderliness of the system state). We adopt the concept of entropy and the intuition from the second law of thermodynamics [15] to study the communication requirements if a certain entropy of the physical dynamics needs to be achieved.

 Communication complexity: Here we study the case of two agents wanting to achieve the same desired system state and consider it as a distributed computing problem. We follow the study in Ref. [70] to apply the theory of communication complexity to obtain bounds for the communication requirements.

 Thermodynamics argument: We consider a physical system governed by thermodynamics laws, and use the theory of nonequilibrium statistical mechanics to study the impact of communications on entropy generation. This mainly originates from Ref. [71].

The contents of the different sections are summarized in Table 5.1.

Table 5.1

Summary of Methodologies and Setups of the Studies in Different Sections

SectionDynamics TypesCommunication TypeKey ConceptGoalSource
5.2DeterministicSensor to controllerTopological entropyStability[11]
5.3StochasticSensor to controllerRate distortionEstimation precision[14]
5.4StochasticSensor to controllerAnytime capacityStability[13]
5.5StochasticSensor to controllerShannon entropyControl precision[15]
5.6DeterministicController to controllerCommunication complexityAchieving desired state[70]
5.7ThermodynamicsSensor to controllerFluctuation theoremEntropy generation[71]

t0010

5.1.2 Basic Models

We denote by x the N-dimensional system state. For the generic case, the system state of physical dynamics evolves in the following manner:

x·(t)=f(x(t),u(t),w(t)),y(t)=g(x(t),n(t)),

si1_e  (5.1)

for the continuous-time case, where the functions f and g represent the evolution of physical dynamics and the observation mechanism, and

x·(t+1)=f(x(t),u(t),w(t)),y(t)=g(x(t),n(t)),

si2_e  (5.2)

for the discrete-time case.

A simpler but very useful model is that of linear dynamics, where the physical dynamics evolve as follows. For the continuous-time case we have

x·(t)=Ax(t)+Bu(t)+w(t),y(t)=Cx(t)+n(t),

si3_e  (5.3)

while for the discrete-time case, we have

x(t+1)=Ax(t)+Bu(t)+w(t),y(t)=Cx(t)+n(t).

si4_e  (5.4)

5.2 Deterministic system: stability

We begin with the case of a deterministic dynamical system, in which there is no random perturbation in the system state evolution and the observation mechanism. For simplicity, we consider only the discrete-time system. Then the system dynamics can be described as

x(t+1)=f(x(t),u(t)),y(t)=g(x(t)).

si5_e  (5.5)

If everything is deterministic, then there is no need for communications since we can calculate the control action in advance. In this section, we assume that the initial system state x(0) is unknown to us and the distribution of x is unknown. Then communications are needed to convey the information about x(0) from the observation and thus estimate the current system state x(t). The uncertainty incurred by the initial state x(0) will be changed by the system dynamics and thus will create a need for communications. If the uncertainty in the system state is quickly removed (e.g., the system state converges to a deterministic value in open-loop control), then there is no need for communications in order to stabilize the system dynamics; at this point, we say that the system is simple. If the uncertainty in the system dynamics is amplified, then we refer to the system as complex. The complexity of the physical dynamics is measured by the topological entropy, which will be explained subsequently. As we will see, the topological entropy creates a fundamental limit for the communication capacity requirement. Consider the illustrations in Fig. 5.1, which show the state trajectory in two-dimensional phase space; we observe that the linear case is much simpler than the Lorentz transform.

f05-01-9780128019504
Fig. 5.1 Comparison of simple and complex dynamics. (A) Linear transformation. (B) Lorentz transformation.

5.2.1 Topological Entropy

In this subsection, we define topological entropy. There are three equivalent definitions, which will be introduced separately. For all these definitions, we consider a topological dynamical system, which is represented by the triplet (XTS). X is a metric space (namely, a distance is defined for any pair of points), in which the metric is denoted by d. We assume that X is compact. We define a transformation T which maps from X to X. Moreover, we assume that T is continuous; i.e., T−1 maps from open sets to open sets.

Spanning orbit-based definition

We define the nth order ball of point xX, denoted by Bn(xϵ), as the ball around x with radius ϵ with respect to metric dn defined as

dn(x,y)=max{d(Tix,Tiy),i=0,,n1}.

si6_e  (5.6)

Fig. 5.2 shows that a track beginning from x2 is within Bn(x1ϵ).

f05-02-9780128019504
Fig. 5.2 Illustration of two close tracks.

We say a set F is (nϵ)-spanning if it intersects every (nϵ)-ball (i.e., intersecting Bn(xϵ) for every x). Then we define

r(n,ϵ)=min{#F:Fis(n,ϵ)-spanning}.

si7_e  (5.7)

Then we define the spanning orbit-based topological entropy as follows, where the subscript s denotes “spanning”:

Definition 1

The topological entropy is then defined as

Hs(T,n,ϵ)=logr(n,ϵ),hs(T,ϵ)=limn1nHs(T,ϵ),hs(T)=limϵ0hs(T,ϵ).

si8_e  (5.8)

Separated orbit-based definition

We say a subset F of X is (nϵ)-separated if dn(xixj) > ϵ for all xixjF and xixj. Since X is compact, all (nϵ)-separated sets have finite elements, which is illustrated in Fig. 5.3. The maximal cardinality of all (nϵ)-separated sets is denoted by

s(n,ϵ)=max{|F|:Fis(n,ϵ)-separated}.

si9_e  (5.9)

f05-03-9780128019504
Fig. 5.3 Illustration of an (nϵ)-separated set. Note that the circle representing the neighborhood is just an illustration, not a real geometric image.

We denote by s(nϵ) the maximum cardinality of a product (nϵ)-separated set, namely

s(n,ϵ)=min{#G(F):FΩn,ϵ}.

si10_e  (5.10)

Then we define the metric-based product topological entropy as follows:

Definition 2

We define the topological entropy as

Hm(n,ϵ)=logs(n,ϵ),hm(T,ϵ)=limn1nHm(n,ϵ),hm(T)=limϵ0hm(T,ϵ).

si11_e  (5.11)

Cover-based definition

The cover-based definition of topological entropy does not require the definition of a metric; hence it is valid even in spaces without a metric structure. We define a cover of X as a family of open sets that cover X. Since X is compact, we can always find a finite subset of open sets of Usi12_e to cover X. We say that a cover Vsi13_e is a subcover of cover Usi12_e if VUsi15_e. The minimum cardinality of a subcover of cover Usi12_e is denoted by N(U)si17_e.

We define the join of two covers Usi12_e and Vsi13_e as

UV={UV:UU,VV}.

si20_e  (5.12)

We further define

Un=n=0,,n1Tn(U).

si21_e  (5.13)

Based on the above definitions, we can now define the cover-based topological entropy:

Definition 3

The topological entropy is defined as

Hc(U)=logN(U),hc(T,U)=limn1nHc(Un)),hc(T)=limUhc(T,U).

si22_e  (5.14)

Equivalence

The above three definitions of topological entropy look quite different, focusing on different aspects of the dynamical system. Interestingly, they are equivalent in metric spaces:

Theorem 1

In metric spaces, we have

hs(T)=hm(T)=hc(T).

si23_e  (5.15)

The proof is highly nontrivial. The details can be found in chapter 6 of Ref. [12]. Due to their equivalence, we use only the notation hs(T) in the subsequent discussion.

5.2.2 Communication Capacity Requirements

As we have seen, the topological entropy measures the complexity of the transform in the dynamical system. Intuitively, a more complex dynamical system requires more communications for estimating and controlling the dynamics, since the controller needs more information to obtain the precise status. In this subsection, we will prove that the topological entropy provides a tight bound for the communication requirements in deterministic dynamical systems having uncertain initial states. The argument follows chapter 2 of Ref. [11].

System model

The model of the CPS is given in Fig. 5.4. We consider the following system dynamics:

x(t+1)=f(x,u(t)),

si24_e  (5.16)

which is a simpler version of Eq. (5.1). The initial value x(0) is unknown. An encoder observes the system state and sends out a message h(jT) every T time slots, where j is the index of the message. The total number of messages is denoted by L. Hence the average transmission rate is given by log2LTsi25_e. If the capacity of the communication channel is R, then we have

log2LTR.

si26_e  (5.17)

The transmitted message is essentially a function of the observation history, i.e.,

h(jT)=Fj(x(1:jT)),

si27_e  (5.18)

where the subscript j in Fj means that the encoding mechanism can be time varying.

f05-04-9780128019504
Fig. 5.4 Communication system model for topological entropy.

Upon receiving a message from the encoder, the controller decodes it and carries out the following possible actions:

 System state estimation: The controller estimates the system states between times (j − 1)T + 1 and jT (i.e., the system state in the previous period of the message), namely x^((j1)T+1),,x^(jT)si28_e. The outcome of the estimator is given by

x^((j1)T+1:jT)=Gj(h(T:jT)),

si29_e  (5.19)

where the subscript j in Gj means that the estimation algorithm can be time varying.

 System state control: The controller computes the control action in the next T time slots (i.e., before receiving the next message), namely u(jT + 1), …, u((j + 1)T). The outcome of the controller is given by

u((j1)T+1:jT)=Uj(h(T:jT)),

si30_e  (5.20)

where the subscript j in Gj means that the control algorithm can be time varying.

For simplicity, we assume that there is no transmission error during the communications. This is reasonable if powerful error correction codes are applied and a single sparse error causes only negligible impact on the dynamics. Hence the limit of communications is focused only on the capacity, not the reliability.

Then the question is: How much communication is needed for a precise system state estimation or a stable system dynamics?

Communication requirements for estimation

We first focus on the system state estimation. We define the observability of the system as follows:

Definition 4

The system in Eq. (5.16) is observable if, for all ϵ > 0, there exists a message period TNsi31_e and an encoding-decoding mechanism such that1

x(t)x^(t)<ϵ,t=1,2,3,

si33_e  (5.21)

For the observability of the system, we have the following conclusion (Theorem 2.3.6 in Ref. [11]):

Theorem 2

Consider a compact space X of a system state and transformation T in Eq. (5.16). Then we have:

 If R < hs(T), the system is not observable.

 If R > hs(T), the system is observable.

Here we provide an intuitive explanation of the encoding and decoding mechanism to achieve observability, as well as the reason for the unobservability when the transmission rate is too low. A rigorous proof can be found in chapter 2 of Ref. [11].

 R < hs(T): We assume that a coder-decoder pair can make the system observable. Then for n time slots and any ϵ > 0, we can find 2nR spanning points in time [0, n − 1] such that they form an (nϵ)-separated orbits, due to the definition of observability. Then the topological entropy of the system will be no larger than limn1nlog22nR<hs(T)si34_e, which contradicts the assumption that the topological entropy is hs(T).

 R > hs(T): Due to the definition of hs(T), for any n > 1 and ϵ > 0, we can always find an (nϵ)-spanning set with cardinality no larger than 2nR in the state space. Then we can simply transmit the index of the corresponding point in the spanning set (thus taking a transmission rate no larger than R) and achieve an error less than ϵ. This encoding procedure is illustrated in Fig. 5.5.

f05-05-9780128019504
Fig. 5.5 Illustration of the encoding process.

Communication requirements for linear system control

The communication requirements for control are more complicated. We focus on only the special case of linear systems without noise and with direct observation of the system state:

x(t+1)=Ax(t)+Bu(t),y(t)=x(t).

si35_e  (5.22)

The control action is for the purpose of system stabilization, which is defined as follows:

Definition 5

The system in Eq. (5.22) is stabilizable if there exist a communication period T ≥ 1 and a 3-tuple of encoder, decoder and controller such that the system is stable in the following sense: for any ϵ > 0 there is an integer t0 such that

x(t)<ϵ,tt0,

si36_e  (5.23)

for any traces of the dynamics and any initial condition x(0).

Intuitively, that the system is stabilizable means that we can arbitrarily control the system state given sufficient time. The following two assumptions are subsequently made:

 The initial state x(0) lies in a compact set X1si37_e which has the origin as an interior point; i.e., there exists a δ > 0 such that x<δsi38_e implies xX1si39_e.

 The system (AB) is stabilizable given a perfect system state feedback.

Then the following theorem provides conditions for the stabilizabilities of the linear system:

Theorem 3

Suppose that the above two assumptions hold. We denote by hs(A) the topological entropy of system x(t + 1) = Ax(t). Then the following two statements hold:

 When R < hs(A), the system is not stabilizable.

 When R > hs(A), the system is stabilizable.

The first conclusion (R < hs(A)) in the above theorem can be obtained from contradiction, as illustrated in Fig. 5.6. A detailed proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. We assume that the linear system is still stabilizable when R < hs(A). Since R < hs(A), we can find a constant ϵ > 0 such that

limsupk1klog2s(k,ϵ)>R,

si40_e  (5.24)

where s(kϵ) is the metric-based topological entropy. This implies that there is a (jTϵ)-separated set S of cardinality N such that

log2NjT>R,

si41_e  (5.25)

and for any two different points x1 and x2 in S satisfying

x1(t)x2(t)ϵ.

si42_e  (5.26)

On the other hand, we set ϵ0=12ϵsi43_e. Since the system is stabilizable, we can always find control actions for each element x in S such that x(t)ϵ0si44_e. There are no more than 2jRT possible control action sequences. Hence we can always find a sequence of controls, u*(0), …, u*(jT), in these sequences such that we can find x1 and x2 that are both well controlled by the sequence {u*}; i.e., both x1(t) =x1(t) +xu(t) and x2(t) =x2(t) +xu(t) satisfy Eq. (5.23) with respect to ϵ0. Then it is easy to verify

x1(t)x2(t)ϵ.

si45_e  (5.27)

This contradicts the assumption of the (jTϵ)-separated set S. Hence the cardinality of the message set cannot be less than 2hs(A)si46_e.

f05-06-9780128019504
Fig. 5.6 Illustration of the proof of Theorem 3.

The second conclusion (R > hs(A)) in the theorem can be derived from the conclusions in the subsequent treatment of the optimal control of linear systems.

Communication requirements for optimal control

In the previous discussion, the feedback is to stabilize the system, regardless of the corresponding cost (e.g., a very large control action u). In the optimal control of linear systems, the costs of both system state magnitude and control action power are considered; i.e., the system cost with initial state x(0) is given by

J(x(0))=t=1xT(t)Qx(t)+uT(t)Ru(t),

si47_e  (5.28)

where both Q and R are positive definite matrices.

When the feedback is perfect (i.e., there is no constraint on the communication rate), it is well known that the optimal feedback control is given in [72]

u(t)=Kx(t),

si48_e  (5.29)

where

K=(R+BTPB)1BTPA,

si49_e  (5.30)

and the matrix P satisfies the following equation:

A(PPB(R+BTPB)1BTP)A+QP=0.

si50_e  (5.31)

The optimal cost function is given by

Jopt(x(0))=xT(0)Px(0).

si51_e  (5.32)

Now we consider the constraint on the communication rate; i.e., R is bounded. We define the solvability of the optimal control as follows [11]:

Definition 6

The optimal control of the linear system in Eqs. (5.29)(5.32) is solvable with respect to the communication model introduced above, if for any ϵ > 0, there exist a T ≥ 1 and a 3-tuple of encoder, decoder, and controller such that

 The closed-loop controlled system is stable in the following sense: For any ϵ > 0, there exists a t0 ≥ 1 such that Eq. (5.23) holds.

 The cost satisfies

J(x(0))Jopt(x(0))+ϵ,

si52_e  (5.33)

where Jopt(x(0)) is given in Eq. (5.32).

The conclusion on the communication requirements for the solvable optimal control is given in the following theorem:

Theorem 4

Suppose that the assumptions in Theorem 3 hold and the pair (A,Q)si53_e has no unobservable nodes on the unit circle. Then the following statements are correct:

 If R < hs(A), then the optimal control is not solvable.

 If R > hs(A), then the optimal control is solvable.

The conclusion for R < hs(A) is a straightforward conclusion of Theorem 3, since the system is not even stabilizable when R < hs(A). The proof for the case of R > hs(A) is much more involved (note that the corresponding conclusion in Theorem 3 is merely a special case). A rigorous proof can be found in Section 2.7 of Ref. [11]. Here we provide a sketch of the proof. First we choose a reasonable period T. Then we consider the system:

xa(t+1)=Axa(t).

si54_e  (5.34)

We can find a set Ω of initial point {xn*(0)}si55_e with cardinality 2(T+1)α, where hs(A) < α < R, with the following property: For any initial state x(0), we can always find a point xn*(0)si56_e in Ω such that the trace in Eq. (5.34) triggered by xn*(0)si56_e is very close to that triggered by x(0) within the time duration [0, T]. For each point {xn*(0)}si55_e in Ω, we can compute the corresponding control actions within time period [0, T], namely u*(0), …, u*(T − 1).

Then we define the following encoder and controller pair for time interval [0, T − 1]:

 Encoder: Given the initial state x(0), we choose the corresponding point in Ω, namely {xn*(0)}si55_e, such that the traces are sufficiently close to each other in [0, T − 1]:

h(1)=Enc(x(0))=n.

si60_e  (5.35)

 Controller: In the time interval [0, T − 1], the control action is set as the optimal control action for {xn*(0)}si55_e, namely

(uT(0),,uT(T1))T=((un*)T(0),,(un*)T(T1))T.

si62_e  (5.36)

For the jth time period, namely [(j − 1)TjT − 1], we can use the same encoding and control mechanism by scaling the system state and control actions, namely

h(j)=Enc1bjx(0)

si63_e  (5.37)

and

(uT(jT),,uT((j+1)T1))T=bj((un*)T(0),,(un*)T(T1))T.

si64_e  (5.38)

The whole procedure is illustrated in Fig. 5.7.

f05-07-9780128019504
Fig. 5.7 Illustration of the encoding, decoding, and control action procedure.

5.2.3 Calculation of Topological Entropy

The above discussions show the importance of the topological entropy on the communication requirements for controlling the physical dynamics in a CPS. Hence it is very important to evaluate the topological entropy of a given dynamical system.

Linear systems

The topological entropy of linear systems is fully explained in the following theorem (the rigorous proof is given in Theorem 2.4.2 of Ref. [11]):

Theorem 5

The topological entropy of linear system x(t + 1) = Ax(t), where A is an N × N square matrix, is given by

hs(A)=n=1Nlog2(max{1,|λn|}),

si65_e  (5.39)

where {λn}n=1, …, N are the eigenvalues of A.

An immediate conclusion is that, when all the eigenvalues of A are within the unit circle, the corresponding topological entropy is zero. Hence the corresponding communication requirement is zero; i.e., no communication is needed for observing or stabilizing the physical systems. This is obvious since the system state will converge to zero spontaneously.

Generic systems

For the generic case of dynamical systems, there have been no systematic approaches to compute the topological entropy. The existing approaches can be categorized into the following two classes:

 Consider some special types of dynamics, such as one-dimensional piecewise monotone mappings [73].

 Discretize the mapping and then using the approaches of calculating the topological entropy of symbol dynamics.

In this book, we provide a brief introduction to the second approach by following the argument in Ref. [74]. We consider a compact topological space M and partition it into a set of subsets Q = {A1, …, Aq}. Consider a continuous mapping T on M and a discrete time period {0, …, t − 1}. We define

Wt(T,Q)={(a0,,at1)|xMand0it1,s.t.,TixAai},

si66_e  (5.40)

namely the set of all possible t-strings, each of which represents a series of regions in Q that a trajectory of T passes. An example is given in Fig. 5.8, where the trajectory generates a 6-string (1, 4, 2, 6, 7, 3).

f05-08-9780128019504
Fig. 5.8 Illustration of the partition and trajectory.

Then we define the following quantity:

h*(T,Q)=limt|logWt(T,Q)|t.

si67_e  (5.41)

We further define the concept of generating partition:

Definition 7

We say that a partition Q of the space M is generating if

i=0Ti=B,ori=0Ti=B,

si68_e  (5.42)

where Bsi69_e is the Borel σ-algebra of M.

Then we obtain the following conclusion that provides bounds for the topological entropy of T:

Theorem 6

Denote by h(T) the topological entropy of transform T. Then it satisfies

h(T)liminfdiam(Q)0h*(T,Q).

si70_e  (5.43)

Moreover, if Q is generating, we have

h(T)h*(T,Q).

si71_e  (5.44)

The proof of the theorem can be found in Ref. [74].

Based on this theorem, an algorithm for computing an upper bound of the topological entropy was proposed in Ref. [74], consisting of the following four steps:

 Step 1: Select a coarse partition of M, which is denoted by A.

 Step 2: Select a partition B = {B1, …, BK} that is much finer than A. Each element in A is the union of a set of elements in B. Then we define a topological Markov chain. The transition matrix is given by

Bij=1,ifBiT1Bjϕ,0,otherwise,

si72_e  (5.45)

which means that Bij equals 1 if Bi and Bj are two successive regions that a trajectory of T visits. A sequence (b0, …, bN−1) is called a B-word if Bbi,bi+1=1si73_e. Each B-word corresponds to an A-word (a0, …, aN−1), where BbiAaisi74_e. Then we define the set of A-words as

WN(B,A)={a)|B-wordbgeneratinga}.

si75_e  (5.46)

It is easy to verify

WN(T,A)WN(B,A),

si76_e  (5.47)

which implies

h(B,A)=limNlog|WN(B,A)|NlimNlog|WN(T,A)|N=h*(T,A).

si77_e  (5.48)

It has been shown that h(BA) converges to h(TA) if the diameter of B tends to zero.

 Step 3. The set of all B-words forms a sofic shift, whose detailed definition can be found in Ref. [74]. First a directed labeled graph can be constructed, in which each node corresponds to one element in B and an edge exists between nodes i and j if Bij = 1. Moreover, if BiαA, we label the edge ij (if it exists) as α. The following example from Ref. [74] illustrates the procedure:

Example 1

Consider a one-dimensional mapping T from [0, 1] to [0, 1], which is given by

Tx=2x,x<1/2,1.5(x0.5),x1/2.

si78_e  (5.49)

The partitions A and B are selected as

A={A1,A2}={[0,0.5),[0.5,1)},B={B1,B2,B3,B4}={[0,0.25),[0.25,0.5),[0.5,0.75),[0.75,1)}.

si79_e  (5.50)

It is easy to verify that the matrix B is given by

B=11000010111000110.

si80_e  (5.51)

The graph generated by the matrix B is given in Fig. 5.9. A path in the graph can represent a string.

f05-09-9780128019504
Fig. 5.9 The graph generated in the example.

 Step 4. We compute the entropy of the sofic shift, which is shown to be equal to the topological entropy of the corresponding Markov chain. To that end, we need to find a new graph G′(BA) satisfying

 WN(G′(BA)) = WN(G(AB));

 All outgoing edges of the same node have different labels.

The following algorithm from Ref. [74] has been proposed to find such a subgraph of G′, which is called a reduced right-resolving presentation of G:

1. Begin from an empty graph R. Add a single node in G(BA) to R.

2. If there is at least one hyper node without outgoing hyper edges, choose one of them and label it by H. Otherwise, go to Step 6.

3. Since the hyper node H consists of multiple nodes in G(BA), define H′ as the set of all nodes in G(BA) reached by the outgoing edges of the nodes in H that have the same label (say, Ai). Define a new hyper node H′ and a hyper edge pointing from H to H′. Label it as Ai.

4. Repeat Step 3 for all possible labels in A.

5. Return to Step 2.

6. If a hyper node has no incoming hyper edges, remove it.

7. If a hyper node is removed, return to Step 6. Otherwise, stop and output R.

Take the example in Fig. 5.9, for instance. If we begin from node B1, then we label B1 as H1. Since all the outgoing edges of B1 are labeled as A1, in Step 3 we form a hyper node H2 consisting of B1 and B2 (since the two outgoing edges reach B1 and B2). Since H2 does not have any outgoing edges, we form a hyper node H3 consisting of B1, B2, B3 and B4. Then we consider H3 whose outgoing edges have two labels A1 and A2. For A1, we form a hyper node H4 consisting of B1, B2, and B3; for A2, the outgoing edge returns to H4. This results in the graph shown in Fig. 5.10. Finally, we remove the hyper node H1.
From the graph R, we can also obtain its adjacency matrix. The example in Fig. 5.10 has the matrix R given by

B=001101011.

si81_e  (5.52)

f05-10-9780128019504
Fig. 5.10 The graph R generated in the example.

We can obtain the eigenvalues of R, which are related to the topological entropy that we want to compute, in the following theorem [74]:

Theorem 7

Denote by λmax(R)si82_e the maximum eigenvalue of R. If T is transitive, we have

h*(T,A)h(B,A)=logλmax(R).

si83_e  (5.53)

Note that although the above procedure provides a systematic approach for the concrete calculation of topological entropy, it does not completely solve the problem for the following reasons: (1) It is still not clear how to verify that a partition is generating. (2) It is still not clear how to choose the radius of the partitions. Hence we can only choose as small a radius as possible, within the capability of computing.

5.3 Stochastic systems: estimation

Many control systems use the structure of separated estimation and control, as illustrated in Fig. 5.11, in which an estimator provides an estimation x^si84_e of the system state, and then a controller considers the estimate x^si84_e as the true system state and computes the corresponding control action. However, such a separated structure may not be optimal, although it is more feasible in practice. On the other hand, source coding is needed for the communication of observations y in order to estimate the system state x, which is usually lossy when the system state is continuous. In traditional communication systems, the corresponding information-theoretic communication requirement for lossy source coding is based on infinite block length and thus infinite delay, which is infeasible in the context of real-time control. Hence in this section, we follow the research in Ref. [14] and study the following two fundamental questions:

f05-11-9780128019504
Fig. 5.11 Illustration of separated estimation and control.

 When the feedback is passed through a communication channel with limited capacity, is the structure of separated estimation and control still optimal?

 When the estimation is real-time with limited delay, what are the communication requirements for the system state estimation?

5.3.1 System Model

We first introduce the system model, which is adopted in Ref. [14]. The overall system has the same structure as that in Fig. 5.4.

We assume that the system state dynamics is described by the linear system dynamics in Eq. (5.4). The noise {w(t)}t is a sequence of i.i.d. Gaussian distributed random variables. The initial state x(0) is assumed to be Gaussian with zero expectation and covariance matrix Σx(0). For simplicity, we assume that the system state can be observed directly by the encoder at the sensor.

The communication channel is assumed to be stochastic, characterized by stochastic kernels P(dy|x), where y and x are the output and input of the channel. For simplicity we assume that the channel is time-invariant and memoryless. The following two special channels are discussed in Ref. [14]:

 Noisy digital channel with rate R: The input and the output of the channel are the same. There are a total of 2R messages.

 Memoryless vector Gaussian channel: Both the input and output alphabets are d-dimensional real spaces. The output of the channel is given by

y(t)=x(t)+v(t),

si86_e  (5.54)

where v(t) is a series of i.i.d. Gaussian distributed random vectors with zero expectation and covariance matrix Σv. The input is average power constrained; i.e., E[∥x2] ≤ P0.

At each time slot t, the sensor obtains an observation x(t), generates and transmits a symbol x(t) through the communication channel. Then the controller receives a symbol y(t), decodes it and obtains a system state estimation x^(t)si87_e. Based on x^(t)si87_e, the controller computes the control action u(t).

The following three possible information patterns are considered, where Enc denotes the function of encoding:

 The encoder output is a function of all previous knowledge, namely

x(t)=Enc(x(0:t),x(0:t1),y(0:t1),x^(0:t1),u(0:t1)).

si89_e  (5.55)

 The encoder does not know the channel and decoder outputs, namely

x(t)=Enc(x(0:t),x(0:t1),u(0:t1)).

si90_e  (5.56)

 The encoder uses only the current system state to encode the message:

x(t)=Enc(x(t)).

si91_e  (5.57)

The decoder uses a stochastic mechanism for decoding, namely generating the system state estimation according to the following conditional distribution:

P(dx^(t)|y(0:t),x^(0:t1),u(0:t1));

si92_e  (5.58)

namely, it is dependent on the previous received communication symbols, the previous system state estimations and control actions.

The controller is a stochastic one, which is generated according to the condition probability P(du(t)|x^(t))si93_e, which is dependent only on the current system state estimation.

5.3.2 Separation

Now we study whether the communication channel will affect the optimality of the separated structure of system state estimation and control. In traditional linear quadratic Gaussian (LQG) control with perfect feedback, the certainty equivalence guarantees the optimality of the following control law:

u(t)=Kx^(t)

si94_e  (5.59)

and

x^(t)=E[x(t)|y(0:t1),u(0:t1)],

si95_e  (5.60)

which represent separated control and system estimation. We define the estimation error as

δx(t)=x(t)E[x(t)|y(0:t1),u(0:t1)].

si96_e  (5.61)

We say that the control has no dual effect if (here y-(t)si97_e denotes the channel output when there is no feedback control)

E[δx(t)δxT(t)|y(0:t),u(0:t1)]=E[δx(t)δxT(t)|y-(0:t)],

si98_e  (5.62)

which means that the error covariance matrix is independent of the control action taken by the controller. An intuitive explanation of the no dual effect is given in Ref. [14]: the control action can be used to both control the system state evolution and prob the system in order to estimate the system state; the no dual effect means that the control action is fully used to control the system state (since the system state estimation error is irrelevant to the control action). The following theorem shows that the no dual effect is equivalent to the certainty equivalence [14]:

Theorem 8

The optimal control law for the linear system has the property of certainty equivalence if and only if it has no dual effect.

The following lemma provides a sufficient condition for the no dual effect, which can be used to judge whether a control action with certainty equivalence is optimal:

Lemma 1

Denote by x-(t)si99_e the uncontrolled system state. If σ(y-(0:t))σsi100_e(y(0 : t), u(0 : t − 1)) and E[x-|y-(0:t)]=E[x-|y(0:t),y-(0:t),u(0:t1)]si101_e, then the control has no dual effect.

Based on this lemma, Ref. [14] discussed the no dual effect in the two communication channels introduced above:

 Noiseless digital channel: Since the input and the output of the communication channel are the same (while the number of bits is limited), the sensor can know exactly the control action that the controller will take. It is shown in Ref. [14] that the optimal encoder has the following form:

x(t)=qt(x(t)E[x(t)|y(0:t1),u(0:t1)])=qt(x-(t)E[x-(t)|y(0:t1),u(0:t1)]),

si102_e  (5.63)

where qt is a quantizer. Note that the second equation is due to the sensor’s perfect knowledge of the control action. By applying induction, we can prove the following lemma:

Lemma 2

For a noiseless digital channel, y-(t)=y(t)si103_e and x-(t)y(0:t)u(0:t1)si104_e is a Markov chain.

A rigorous proof is given in Ref. [14]. Here we provide the proof for t = 1, as illustrated in Fig. 5.12. When t = 0, obviously we have y-(0)=y(0)si105_e. Since the control action is completely determined by the received message, x-(0),w(0)y(0)u(0)si106_e forms a Markov chain. Since x-(1)=Ax-(0)+w(0)si107_e, x-(1)y(0)u(0)si108_e also forms a Markov chain.

f05-12-9780128019504
Fig. 5.12 Illustration of Markov chains in the noiseless digital channel.

By applying Lemma 1, we draw the conclusion that the control through a noiseless digital channel has no dual effect.

 Memoryless vector Gaussian channel: The optimal encoder for the vector Gaussian channel is still unknown. Ref. [14] restricted the encoder to be linear and deterministic:

x(t)=D1tx(0:t)+D2tu(0:t1)+D3tx(0:t1)+D4ty(0:t1),

si109_e  (5.64)

where {Dnt}n=1, 2, 3, 4 are matrices of appropriate dimensions. Since the communication channel output is given by y(t) = x(t) + v(t), it can also be written as a linear combination of the initial state, dynamics noise, communication noise, and control actions:

y(t)=F1tx(0)+F2tw(0:t1)+F3tv(0:t)+F4tu(0:t1),

si110_e  (5.65)

where {Fnt}n=1, 2, 3, 4 are the corresponding matrices. The uncontrolled channel output can also be written as

y-(t)=F1tx(0)+F2tw(0:t1)+F3tv(0:t),

si111_e  (5.66)

since the controls are all zero. Then we have y-(0:t)=y(0:t)Gtu(0:t1)si112_e, where Gt is an appropriate matrix. When estimating the uncontrolled system state x-(t)si99_e, all the related information in y(0;t) and u(0 : t − 1) is summarized in y-(0;t)si114_e. Hence the condition in Lemma 1 is valid, and thus the control through the memoryless vector Gaussian channel also has no dual effect.

Based on the above arguments, we reach the conclusion that, for both the noiseless digital communication channels and memoryless vector Gaussian channels, the structure of separated system state estimation and control is still optimal in linear dynamics.

5.3.3 Sequential Estimation

When the structure of separated system state estimation and control is optimal, the controller can use the same design as that in the case of perfect feedback (e.g., the LQG control). However, the system state estimation needs substantial revisions due to the limited communication capacity. We discuss the communication requirements for system state estimation at the controller in this subsection.

For the linear dynamics in Eq. (5.4), system state estimation can be considered as a lossy source coding problem, since the sensor encodes the observations and the decoder output always has errors except in trivial cases. The lossy source coding is traditionally studied using rate-distortion theory [33], as a branch of information theory, which discusses the minimum communication requirement to satisfy the requirement on the distortion at the decoder output. However, the information-theoretic argument is based on the assumption of infinite codeword length and thus infinite delays, which is reasonable for traditional data communications (e.g., one packet encoded as a codeword may have thousands of bits). Obviously such arguments are not suitable for CPSs, since the encoding procedure must have a limited delay. Hence in contrast to the block structure of traditional data communications, the encoding procedure in a CPS should have a sequential structure, as illustrated in Fig. 5.13.

f05-13-9780128019504
Fig. 5.13 Comparison between block and sequential estimation.

Before we study sequential rate distortion, we provide a review of traditional rate-distortion theory. When the true information symbol is x while the recovered one is y, we denote the distortion by D(xy). Then the rate-distortion function is defined as follows:

Definition 8

For a certain information source generating symbols {X(t)}, the rate-distortion function is defined as

R(D)=infQY|X(y|x),DQDI(Y;X),

si115_e  (5.67)

where QY|X is the conditional probability of generating the recovered symbol Y given the true information symbol X, DQ is the distortion incurred by the conditional distribution QY|X, and I is mutual information.

Intuitively, the rate-distortion function means the minimum average number of bits needed to achieve a distortion less than or equal to D. Meanwhile, the rate-distortion function can also determine whether a communication channel with a certain capacity can support the transmission of an information source with a given distortion bound. Roughly speaking, when transmitting the information of a Gauss-Markov process through a communication channel having capacity C, distortion D can be achieved if R(D) < C, while it cannot be achieved if R(D) > C. This dichotomy is illustrated in Fig. 5.14.

f05-14-9780128019504
Fig. 5.14 Dichotomy of the rate-distortion function.

Similarly to the traditional definition of rate-distortion function with infinite delay, Ref. [14] defined the sequential rate-distortion function as follows:

Definition 9

For information source {X(t)} (the recovered symbols are {Y (t)}), the sequential rate-distortion function is defined as

RTs(D)=infPPT(D)1TI(X(0:T1);Y(0:T1)),

si116_e  (5.68)

where the family of conditional probability PT(D)si117_e is given by

PT(D)={{P(dY(t)|x(0:t1),y(0:t1))}t=0,,T1|E[d(X(t),Y(t))]D,t}.

si118_e  (5.69)

We notice that the sequential rate-distortion function has the following features:

 The necessary rate may be dependent on the time T.

 Only the causal information is considered in the mutual information, which is due to the requirement of sequential decoding.

 The expected distortion is no larger than D for each time slot, instead of being averaged over all time slots.

The following theorem shows the application of the sequential rate-distortion function in the lossy transmission of information sources [14]. Note that a sufficient condition is much more difficult to find.

Theorem 9

Consider a memoryless communication channel with channel capacity C. A necessary condition for achieving the distortion D causally is RTs(D)Csi119_e for all T.

Gauss-Markov source

Since the sequential rate-distortion function provides a lower bound for the channel capacity of the communication channel, it is important to calculate the sequential rate-distortion function for various information sources for the purpose of online system state estimation. We follow the argument in Ref. [14] to discuss the sequential rate-distortion function for a d-dimensional Gauss-Markov source where

x(t+1)=Ax(t)+w(t)

si120_e  (5.70)

and

d(x,y)=xy22.

si121_e  (5.71)

We assume a Gaussian communication channel with dimension d.

The following lemma shows the necessary property for the infimizing channel in Eq. (5.68) in the definition of a sequential rate-distortion function:

Lemma 3

The infimizing channel in the Gauss-Markov information source in Eq. (5.70), namely P(dY (t)|x(0 : t − 1), y(0 : t − 1)), is a Gaussian channel of the form P(dY (t)|x(t), y(0 : t − 1)).

Compared with the standard form P(dY (t)|x(0 : t − 1), y(0 : t − 1)), in the form P(dY (t)|x(t), y(0 : t − 1)) the distribution of Y is dependent only on the current input of the communication channel, instead of all the history. Hence the recovered symbols have the following form:

x^(t)=Fx(t)+Gy(0:t1)+z(t),

si122_e  (5.72)

where F is a d × d matrix, G is a d × (t − 1)d matrix, and {z(t)} is a series of independent Gaussian random vectors of dimension d.

Such a channel can be realized for a memoryless Gaussian channel with perfect feedback, which is illustrated in Fig. 5.15. The encoder obtains the innovation, which is given by

x(t)=Ft(x(t)E[x(t)|y(0:t1)]).

si123_e  (5.73)

Since the communication channel output y(0 : t − 1) can be sent to the encoder via the perfect feedback channel, the encoder can easily compute x(t) as above. The decoder receives y(t) = x(t) + z(t) and then obtains the system state estimation:

x^(t)=F(x(t)E[x(t)|y(0:t1)])+FtE[x(t)|y(0:t1)]+Gy(0:t1)+z(t).

si124_e  (5.74)

The channel input should satisfy the power constraint:

PE[x(t)xT(t)]=trace[FtTΣx(t)|y(0:t1)Ft],

si125_e  (5.75)

where Σx(t)|y(0:t−1) is the covariance matrix of the prediction error, namely

Σx(t)|y(0:t1)=Cov(x(t)E[x(t)|y(0:t1)]).

si126_e  (5.76)

f05-15-9780128019504
Fig. 5.15 Realization of channel P(dY (t)|x(t), y(0 : t − 1)) in a memoryless Gaussian channel with perfect feedback.

For the realized channel, Ref. [14] has proved that

I(x(0:T1);x^(0:T1))=t=0T1I(x(t);y(t)),

si127_e  (5.77)

because

I(x(0:T1);x^(0:T1))=t=0T1I(x(0:T1);x^(t)|x^(0:t1))=t=0T1I(x(t);x^(t)|x^(0:t1)),

si128_e  (5.78)

where the second equation is due to the form of the estimation in Eq. (5.72), and

I(x(t);x^(t)|x^(0:t1))=h(x^(t)|x^(0:t1))h(x^(t)|x(t),x^(0:t1))=h(y(t)|x^(0:t1))h(y(t)|x(t),x^(0:t1))=h(y(t))h(y(t)|x(t))=I(x(t);y(t)),

si129_e  (5.79)

where the second equation is because

x^(t)=y(t)+FtE[x(t)|x^(0:t1)]+Gtx^(0:t1).

si130_e  (5.80)

Before we proceed to compute the sequential rate-distortion function, we provide a review of the traditional rate-distortion function. Consider a d-dimensional Gaussian source with independent samples and d × d covariance matrix Σx. The distortion function is given by d(x,x^)=xx^2si131_e. Then we have [33]

R(D)=12n=1dlogλnδn,

si132_e  (5.81)

where {λn}n=1, …, d are the eigenvalues of Σx and

δn=c,ifcλn,λn,ifc>λn,

si133_e  (5.82)

where the constant c is chosen such that n=1dδn=Dsi134_e. In particular, when d = 1, we have

R(D)=12logλ1D.

si135_e  (5.83)

Hence δn can be considered as the portion of distortion assigned to the nth eigenvector of Σx; then the assignment of {δn} can be considered as water filling, as illustrated in Fig. 5.16.

f05-16-9780128019504
Fig. 5.16 Illustration of water filling in the rate-distortion function of a Gaussian information source.

The optimal channel to minimize the rate R given the distortion D is given as follows [75]. Two channels are defined:

 Backward channel: P(x|x^)si136_e. It is given by

x=x^+δ,

si137_e  (5.84)

where δ is the error, which is Gaussian distributed with zero mean and covariance matrix Σx|y. Hence the output x^si84_e is also Gaussian distributed with zero mean and expectation:

Σx^=ΣxΣx|y.

si139_e  (5.85)

 Forward channel: P(x^|x)si140_e. It is given by

x^=Hx+z,

si141_e  (5.86)

where H=E[x^xT]E[(xxT)1]si142_e, and z is Gaussian with zero mean and covariance matrix

Σz=E[x^x^T]E[x^xT]E[(xxT)1]E[xx^T].

si143_e  (5.87)

To realize the forward channel x^=Hx+zsi144_e, one can define an invertible matrix Γ such that ΓHΓ−1 = I. Define b=Γx^si145_e, a = Γx, and n = Γz. Then the channel b = a + n with power constraint E[aTa] ≤ trace(ΓΣxΓT) is matched to the information source.

We also need the concept of a matched channel. The stochastic kernel P(dx^(t)|x(0:t1),x^(t1))si146_e can be considered as a communication channel with input x(t) and output x^(t)si87_e [76]. If the real communication channel is equal to the channel that minimizes the rate given the distortion, then the communication channel is said to be matched to the information source. When the communication channel is matched to the source, there is no need for the structure of separated source encoder and channel encoder. The information can be sent in an uncoded manner without any loss in performance.

Then for the sequential rate-distortion function, we can convert it to the traditional rate-distortion expression. We begin with the scalar Gaussian source case. The distortion is given by σx222Rsi148_e. The encoder computes the innovation information:

x(t)E[x(t)|x^(0:t1)]=Ax(t1)+w(t)Ax^(t1)=A(x(t1)x^(t1))+w(t)=Aδ(t1)+w(t),

si149_e  (5.88)

where δ(t)=xx^(t)si150_e is the estimation error. Since the error variance is equal to the distortion, namely D(t) = E[δ2(t)], then the variance of the innovation information is given by

E[x(t)2]=A2D(t1)+σw2,

si151_e  (5.89)

where σw2si152_e is the scalar notation of Σw.

In the above argument for the traditional rate-distortion function, we have shown that the scalar Gaussian variable has variance σx2si153_e when a Gaussian channel having capacity R is matched to the source. Hence the distortions at different times can be obtained recursively as follows:

D(t)=(A2D(t1)+σw2)22R(t),t1,D(0)=σx0222R(0).

si154_e  (5.90)

For the sequential rate-distortion function, we have D(t) = D. Hence we have

R(t)=max{0,1/2log(A2+(σw2/D))},t1,R(0)=max{0,1/2log(σw2/D)}.

si155_e  (5.91)

When T is sufficiently large, the impact of R(0) will vanish. Hence we have

limTRTs(D)=max0,12logA2+σw2D.

si156_e  (5.92)

Now we handle the generic case of a d-dimensional Gauss-Markov source. The covariance matrix of the distortion error δ(t) is denoted by Σ(t) = E[δ(t)δT(t)]. The evolution of the covariance matrix can be easily shown to be

Σ(t)=AΣ(t1)AT+Σw.

si157_e  (5.93)

Consider the unitary matrix U(t) that diagonalizes the matrix AΣ(t − 1)AT + Σw:

U(AΣ(t1)AT+Σw)UT=diag[λ1(t),,λd(t)].

si158_e  (5.94)

From the previous water-filling argument, to achieve the distortion D, the coding rate needs to be

R(t)=n=1d12logλn(t)σn(t),

si159_e  (5.95)

where

σn(t)=c(t),ifc(t)λn(t),λn(t),ifc(t)>λn(t),

si160_e  (5.96)

where the constant c(t) is chosen such that n=1dσn(t)=Dsi161_e.

Then by using the rate-distortion function for the scalar Gaussian source, the error covariance matrix of the d-dimensional source at time slot t is given by

Σ(t)=UT(t)U(t)(AΣ(t1)AT+Σw)UT(t)×22R1(t)22Rd(t)U(t).

si162_e  (5.97)

The evolution of the error covariance matrix is explained in Fig. 5.17. The encoding and decoding procedures are given as follows:

f05-17-9780128019504
Fig. 5.17 The coding and decoding procedure for a d-dimensional Gaussian source.

1. The sensor receives x(t) and then computes the innovation e(t) = x(t) − E[x(t)|y(0 : t − 1)].

2. The sensor computes the decoupling unitary matrix U and obtains the decoupled components x(t) = Ue(t).

3. The sensor uses the encoding approach in the scalar Gaussian source to encode each component of e(t) and sends out the signal through the communication channel.

4. The estimator receives the signals of each component and reconstructs the transferred innovation. The estimation is denoted by e^(t)si163_e.

5. The estimator estimates the innovation by e^(t)=UTe^(t)si164_e and then obtains the estimation x^(t)=e^(t)+E[x(t)|y(0:t1)]si165_e.

When the distortion is sufficiently small, we have

UΣ(t)UT=diag(D/d,,D/d),

si166_e  (5.98)

and thus the error covariance matrix evolution is given by

Σ(t)=UT(t)U(t)DdAAT+ΣwUT(t)×22R1(t)22Rd(t)U(t)=DdI.

si167_e  (5.99)

Based on the above equality, we have

Rn(t)=12logλnDdAAT+ΣwDd=12logλnAAT+dDΣw.

si168_e  (5.100)

Hence the total rate is given by

n=1dRn(t)=n=1d12logλnAAT+dDΣw=12logAAT+dDΣw.

si169_e  (5.101)

Then when the distortion D is small and as T tends to infinity, we have

limTRTs(D)=12logAAT+dDΣw.

si170_e  (5.102)

Noiseless digital channel

When the communication channel is digital and noiseless, it is not matched to the information source. Since the communication channel is digital, it is necessary to quantize the information source and then transmit the discrete bits through the digital channel. For the sequential rate-distortion function, we need a sequential quantizer as follows [14]:

Definition 10

A sequential quantizer is a series of functions mapping from R(t+1)×d × Rt×d to Rd, and the corresponding range is at most countable.

For practical applications, we consider the following “operational sequential rate-distortion function,” which explicitly considers the quantizer structures:

Definition 11

The operational sequential rate-distortion function is defined as

RTs,o(D)=infq0,,qT1FTo1TH(x^(0:T1)),

si171_e  (5.103)

where FTosi172_e is the set of quantizers satisfying the distortion constraint, namely

FTo={(q0,,qT1)|E[d(x(t),x^(t))]D}.

si173_e  (5.104)

Intuitively, the operational sequential rate-distortion function is to minimize the uncertainty of reconstruction while keeping the expected distortion at each time below D. However, it is more difficult to compute the exact value of the operational sequential rate-distortion function since it involves the structures of the quantizers. We can use the sequential rate-distortion function as an approximation. It is easy to verify that the sequential rate-distortion function provides a lower bound for the operational sequential rate-distortion function since

H(x^(0:T1))I(x(0:T1),x^(0:T1)).

si174_e  (5.105)

Moreover, Ref. [14] has shown that, for very low distortion (thus very high coding rate), the operational sequential rate-distortion and sequential rate-distortion functions are infinitesimally close to each other:

Theorem 10

For each T, we have

limD0RTS,o(D)RTS(D)=1.

si175_e  (5.106)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset