2

Basics of Differential Games

The theory of games was developed around 1930 by Von Neumann and Morgenstern [136] and differential games approximately around 1950, the same time that optimal control theory was being developed. The theory immediately found application in warfare and economics; for instance, certain types of battles, airplane dog-fighting, a torpedo pursuing a ship, a missile intercepting an aircraft, a gunner guarding a target against an invader [73], are typical models of differential games. Similarly, financial planning between competing sectors of an economy, or competing products in a manufacturing system, meeting demand with adequate supply in a market system, also fall into the realm of differential games.

Game theory involves multi-person decision-making. It comprises of a task, an objective, and players or decision makers. The task may range from shooting down a plane, to steering of a ship, and to controlling the amount of money in an economy. While the objective which measures the performance of the players may vary from how fast? to how cheaply? or how closely? or a combination of these, the task is accomplished by the player(s).

When the game involves more than one player, the players may play cooperatively or noncooperatively. It is noncooperative if each player involved pursues his or her own interest, which are partly conflicting with that of others; and it is dynamic if the order in which the decisions are made is important. Whereas when the game involves only one player, the problem becomes that of optimal control which can be solved by the Pontryagin’s “maximum principle” or Bellman’s “dynamic programming principle” [73]. Indeed, the theories of differential games, optimal control, the calculus of variation and dynamic programming are all approaches to solving variational problems, and have now become merged into the theory of “modern control.”

There are two types of differential games that we shall be concerned with in this chapter; namely, nonzero-sum and zero-sum noncooperative dynamic games. We shall not discuss static games in this book. Furthermore, we shall only be concerned with games in which the players have perfect knowledge of the current and past states of the system, and their individual decisions are based on this. This is also known as closed-loop perfect information structure or pattern. On the other hand, when the decision is only based on the current value of the state of the system, we shall add the adjective “memoryless.”

We begin with the fundamental dynamic programming principle which forms the basis for the solution to almost all unconstrained differential game problems.

2.1    Dynamic Programming Principle

The dynamic programming principle was developed by Bellman [64] as the discrete-time equivalent of Hamilton-Jacobi theory. In fact, the term dynamic programming has become synonymous with Hamilton-Jacobi theory. It also provides a sufficient condition for optimality of an optimal decision process. To derive this condition, we consider a discrete-time dynamic system defined on a manifold XX ⊆ ℜn which is open and contains the origin x = 0,

with local coordinates (x1,…, xn):

xk+1=fk(xk,uk),xk0=x0,k{k0,}Z,xk+1=fk(xk,uk),xk0=x0,k{k0,}Z,

(2.1)

where ukUukU is the control input or decision variable which belongs to the set UU of all admissible controls, fk:X×UXfk:X×UX is a C0(X×U)C0(X×U) function of its arguments for each kZ. We shall also assume that fk(., .) satisfies the following global existence and uniqueness conditions for the solutions of (2.1)

Assumption 2.1.1 For the function fk(x, u) in (2.1), for each kZ, there exists a constant C1k (depending on k) such that for any fixed u,

fk(x1,u)fk(x2,u)c1kx1x2x1,x2X,uU,kZ.fk(x1,u)fk(x2,u)c1kx1x2x1,x2X,uU,kZ.

To formulate an optimal decision process (or an optimal control problem), we introduce the following cost functional:

J(x0,k0;u[k0,K])=Kk=K0Lk(xk+1,xk,uk)min.,J(x0,k0;u[k0,K])=k=K0KLk(xk+1,xk,uk)min.,

(2.2)

where Lk:X×X×U,k=k0,,K,Lk:X×X×UR,k=k0,,K, are real C0 functions of their arguments, which is to be minimized over the time span {k0,…, K} as a basis for making the decision. The minimal cost-to-go at any initial state xk and initial time k ∈ {k0,…, K} (also known as value-function) is then defined as

V(x,k)=infu[k,K]U{Kj=kLj(xj+1,xj,uj)},V(x,k)=infu[k,K]U{j=kKLj(xj+1,xj,uj)},

(2.3)

and satisfies the boundary condition V (x, K + 1) = 0, where x = xk and u[k,K] ≜ {uk,…, uK}.

Remark 2.1.1 For brevity, we shall use the notation [k1, k2] henceforth to mean the subset of the integers Z {k1,…, k2} and where there will be no confusion.

We then have the following theorem.

Theorem 2.1.1 Consider the nonlinear discrete-time system (2.1) and the optimal control problem of minimizing the cost functional (2.2) subject to the dynamics of the system. Then, there exists an optimal control, k ∈ ukuk [k0, K] to the problem if there exist C0(XX ×[k0, K+1]) functions V : XX × [k0, K] which corresponds to the value-function (2.3) for each k, and satisfying the following recursive (dynamic programming) equation subject to the boundary condition:

V(x,k)=infuk{Lk(fk(x,uk),x,uk)+V(fk(x,uk),k+1)};V(x,k)=infuk{Lk(fk(x,uk),x,uk)+V(fk(x,uk),k+1)};

(2.4)

V (x, K + 1) = 0.

Furthermore,

J(x0,k0;u[k0,K])=V(x0,k0)J(x0,k0;u[k0,K])=V(x0,k0)

is the optimal cost of the decision process.

Proof: The proof of the above theorem is based on the principle of optimality [164] and can be found in [64, 164]. □

Equation (2.4) is the discrete dynamic programming principle and governs the solution of most of the discrete-time problems that we shall be dealing with in this book.

Next, we discuss the continuous-time equivalent of the dynamic programming equation which is a first-order nonlinear partial-differential equation (PDE) known as the Hamilton-Jacobi-Bellman equation (HJBE). In this regard, consider the continuous-time nonlinear dynamic system defined on an open subset XX ⊆ ℜn containing the origin x = 0, with coordinates (x1,…, xn):

˙x(t)=f(x(t),u(t),t),x(t0)=x0,x˙(t)=f(x(t),u(t),t),x(t0)=x0,

(2.5)

where f:X×U×nf:X×U×RRn is a C1 measurable function, and UU ⊂ ℜm is the admissible control set which is also measurable, and x(t0) = x0 is the initial state which is assumed known. Similarly, for the sake of completeness, we have the following assumption for the global existence of solutions of (2.5) [234].

Assumption 2.1.2 The function f(., ., .) in (2.5) is piece-wise continuous with respect to t and for each t ∈ [0, ∞) there exist constants C1t, C2t such that

f(x1,u,t)f(x2,u,t)C1tx1x2x1,x2X,uU.f(x1,u,t)f(x2,u,t)C1tx1x2x1,x2X,uU.

For the purpose of optimally controlling the system, we associate to it the following cost functional:

J(x0,t0;u[t0,T])=Tt0L(x,u,t)dtmin.,J(x0,t0;u[t0,T])=t0TL(x,u,t)dtmin.,

(2.6)

for some real C0 function L : XX × UU × ℜ → ℜ, which is to be minimized over a time-horizon [t0, T ] ⊆ ℜ. Similarly, define the value-function (or minimum cost-to-go) from any initial state x and initial time t as

V(x,t)=infu[t,T]{TtL(x(s),u(s),s)ds}V(x,t)=infu[t,T]tTL(x(s),u(s),s)ds

(2.7)

and satisfying the boundary condition V (x, T ) = 0. Then we have the following theorem.

Theorem 2.1.2 Consider the nonlinear system (2.5) and the optimal control problem of minimizing the cost functional (2.6) subject to the dynamics of the system and initial condition. Suppose there exists a C1(XX × [t0, T ]) function V : XX × [t0, T ] → ℜ, which corresponds to the value-function (2.7), satisfying the Hamilton-Jacobi-Bellman equation (HJBE):

Vt(x,t)=minu[t0,T]{Vx(x,t)f(x,u,t)+L(x,u,t)},V(x,T)=0.Vt(x,t)=minu[t0,T]{Vx(x,t)f(x,u,t)+L(x,u,t)},V(x,T)=0.

(2.8)

Then, there exists an optimal solution u to the problem. Moreover, the optimal cost of the policy is given by

J(x0,t0;u[t0,T])=V(x0,t0).J(x0,t0;u[t0,T])=V(x0,t0).

Proof: The proof of the theorem can also be found in [47, 175, 164]. □

The above development has considered single player decision processes or optimal control problems. In the next section, we discuss dynamic games, which involve multiple players. Again, we shall begin with the discrete-time case.

2.2    Discrete-Time Nonzero-Sum Dynamic Games

Nonzero-sum dynamic games were first introduced by Isaacs [136, 137] around the years 1954-1956 within the frame-work of two-person zero-sum games. They were later popularized in the works of Starr and Ho [250, 251] and Friedman [107]. Stochastic games were then later developed [59].

We consider a discrete-time deterministic N-person dynamic game of duration K − k0 described by the state equation (which we referred to as the task):

xk+1=fk(xk,u1k,,uNk),x(k0)=x0,k{k0,,K},xk+1=fk(xk,u1k,,uNk),x(k0)=x0,k{k0,,K},

(2.9)

where xkXX ⊂ ℜ n is the state-vector of the system for k = k0,…, K which belong to the state-space XX with x0 the initial state which is known a priori; uik,i=1,,N,NNuik,i=1,,N,NN are the decision variables or control inputs of the players 1,…, N which belong to the measurable control sets UUi ⊂ ℜmi, mi N; and fk:X×U1×,UNfk:X×U1×,UNR are real C1 functions of their arguments. To the game is associated the N objectives or pay-offs or cost functionals Ji : Γ1 ×…× ΓN ℜ, where Γi, i = 1. .., N are the permissible strategy spaces of the players,

Ji(u1,,uN)=Kk=k0Lik(xk+1,xk,u1k,,uNk),i=1,,N,Ji(u1,,uN)=k=k0KLik(xk+1,xk,u1k,,uNk),i=1,,N,

(2.10)

and where the uikΓi,i=1,,NuikΓi,i=1,,N are functions of xk, and Lik:X×X×U1×,UN,i=1,,N,k=k0,,KLik:X×X×U1×,UNR,i=1,,N,k=k0,,K are real C1 functions of their arguments. The players play noncooperatively, and the objective is to minimize each of the above pay-off functions. Such a game is called a nonzero-sum game or Nash-game. The optimal decisions uik,,i=1,,N,uik,,i=1,,N, which minimize each pay-off function at each stage of the game, are called Nash-equilibrium solutions.

Definition 2.2.1 An N-tuple of strategies {ui*Γi,i=1,,N}{uiΓi,i=1,,N} where Γi, i = 1,…, N is the strategy space of each player, is said to constitute a noncooperative Nash-equilibrium solution for the above N-person game if

Ji:=Ji(u1,,uN)Ji(u1,,ui,ui+1,uN)i=1,,N,Ji:=Ji(u1,,uN)Ji(u1,,ui,ui+1,uN)i=1,,N,

(2.11)

whereui={uik,k=k0,,K}andui={uik,k=k0,,K}.whereui={uik,k=k0,,K}andui={uik,k=k0,,K}.

In this section, we discuss necessary and sufficient conditions for the existence of Nash-equilibrium solutions of such a game under closed-loop (or feedback) no-memory perfect-state-information structure, i.e., when the decision is based on the current state information which is pefectly known. The following theorem based on the dynamic programming principle gives necessary and sufficient conditions for a set of strategies to be a Nash-equilibrium solution [59].

Theorem 2.2.1 For the N-person discrete-time nonzero-sum game (2.9)-(2.11), an N-tuple of strategies {uiΓi,i=1,,N},{uiΓi,i=1,,N}, provides a feedback Nash-equilibrium solution, if and only if, there exist N × (K − k0) functions Vi : XX × Z → ℜ such that the following recursive equations are satisfied:

Vi(x,k)=minuikUi{Lik(xk+1,xk,u1k,,,uik,,uNk,),+Vi(xk+1,k+1)},Vi(x,K+1)=0,i=1,,N,k{k0,,K}=minuikUi{Lik(fk(x,u1k,,uik,,uNk,),xk,u1k,,,uik,,uNk,)+Vi(fk(x,u1k,,uik,,uNk,),k+1},Vi(x,K+1)=0,i=1,,N,k{k0,,K}.Vi(x,k)=minuikUi{Lik(xk+1,xk,u1k,,,uik,,uNk,),+Vi(xk+1,k+1)},Vi(x,K+1)=0,i=1,,N,k{k0,,K}=minuikUi{Lik(fk(x,u1k,,uik,,uNk,),xk,u1k,,,uik,,uNk,)+Vi(fk(x,u1k,,uik,,uNk,),k+1},Vi(x,K+1)=0,i=1,,N,k{k0,,K}.

(2.12)

Notice that, in the above equations, there are N × (K − k0) functions, since at each stage of the decision process, a different function is required for each of the players. However, it is often the case that N functions that satisfy the above recursive equations can be obtained with a single function for each player. This is particularly true in the linear case.

More specifically, let us consider the case of a two-player nonzero-sum game, of which type most of the problems in this book will be, described by the state equation:

xk+1=fx(xk,wk,uk),x(k0)=x0,k{k0,,K}xk+1=fx(xk,wk,uk),x(k0)=x0,k{k0,,K}

(2.13)

where wk WWmw and uk UUmu represent the decisions of the two players respectively. The pay-offs J1, J2 are given by

J1(w,u)=Kk=K0L1k(xk+1,xk.wk,uk),J1(w,u)=k=K0KL1k(xk+1,xk.wk,uk),

(2.14)

J2(w,u)=Kk=K0L2k(xk+1,xk.wk,uk),J2(w,u)=k=K0KL2k(xk+1,xk.wk,uk),

(2.15)

where w = w[k0,K], u = u[k0,K]. Then, a pair of strategies w := {wk,⋆, k = k0,…, K}, u := {uk,⋆, k = k0,…, K}, will constitute a Nash-equilibrium solution for the game if

J1(w,u)J1(w,u)uU,J1(w,u)J1(w,u)uU,

(2.16)

J2(w,u)J2(w,u)wW.J2(w,u)J2(w,u)wW.

(2.17)

Furthermore, the conditions of Theorem 2.2.1 reduce to:

V1(x,k)=minukU{L1k(fk(x,wk,,uk),x,wk,,uk)+V1(fk(x,wk,,uk),k+1};V1(x,K+1)=0,k=1,,KV1(x,k)=minukU{L1k(fk(x,wk,,uk),x,wk,,uk)+V1(fk(x,wk,,uk),k+1};V1(x,K+1)=0,k=1,,K

(2.18)

V2(x,k)=minwkW{L2k(fk(x,wk,uk,),x,wk,uk,)+V2(fk(x,wk,uk,),k+1};V2(x,K+1)=0,k=1,,K.V2(x,k)=minwkW{L2k(fk(x,wk,uk,),x,wk,uk,)+V2(fk(x,wk,uk,),k+1};V2(x,K+1)=0,k=1,,K.

(2.19)

Remark 2.2.1 Equations (2.18), (2.19) represent a pair of coupled HJ-equations. They are difficult to solve, and not much work has been done to study their behavior. However, these equations will play an important role in the derivation of the solution to the discrete-time mixed H2H2/HH control problem in later chapters of the book.

Another special class of the nonzero-sum games is the two-person zero-sum game, in which the objective functionals (2.14), (2.15) are such that

J1(w,u)=J2(w,u)=J(w,u):=Kk=k0(xk+1,xk.wk,uk).J1(w,u)=J2(w,u)=J(w,u):=k=k0K(xk+1,xk.wk,uk).

(2.20)

If this is the case, then J1(w, u) + J2(w, u) = 0 and while u is the minimizer, w is the maximizer. Thus, we might refer to u as the minimizing player and w as the maximizing player. Furthermore, the “Nash-equilibrium” conditions (2.16), (2.17) reduce to the saddle-point conditions:

J(w,u)J(w,u)J(w,u),wW,uU,J(w,u)J(w,u)J(w,u),wW,uU,

(2.21)

where the pair (w*, u*) is the optimal strategy and is called a “saddle-point.” Consequently, the set of necessary and sufficient conditions (2.18), (2.19) reduce to the following condition given in this theorem [57, 59].

Theorem 2.2.2 For the two-person discrete-time zero-sum game defined by (2.13), (2.20), a pair of strategies (w, u) provides a feedback saddle-point solution if, and only if, there exists a set of K − k0 functions V : XX × Z → ℜ, such that the following recursive equations are satisfied for each k ∈ [k0, K]:

V(x,k)=minukUmaxwkW{Lk(fk(x,wk,uk),x,wk,uk)+V(fk(x,wk,uk),k+1)},=maxwkWminukU{Lk(fk(x,wk,uk),x,wk,uk)+V(fk(x,wk,uk),k+1)},=Lk(fk(x,wk,,uk,),x,wk,,uk,)+V(fk(x,wk,,uk,),k+1),V(x,K+1)=0,V(x,k)=minukUmaxwkW{Lk(fk(x,wk,uk),x,wk,uk)+V(fk(x,wk,uk),k+1)},=maxwkWminukU{Lk(fk(x,wk,uk),x,wk,uk)+V(fk(x,wk,uk),k+1)},=Lk(fk(x,wk,,uk,),x,wk,,uk,)+V(fk(x,wk,,uk,),k+1),V(x,K+1)=0,

(2.22)

where x = xk. Equation (2.22) is known as Isaacs equation, being the discrete-time version of the one developed by Isaacs [136, 59]. Furthermore, the interchangeability of the “max” and “min” operations above is also known as “Isaacs condition.” The above equation will play a significant role in the derivation of the solution to the HH-control problem for discrete-time nonlinear systems in later chapters of the book.

2.2.1    Linear-Quadratic Discrete-Time Dynamic Games

We now specialize the above results of the N-person nonzero-sum dynamic games to the linear-quadratic (LQ) case in which the optimal decisions can be expressed explicitly. For this purpose, the dynamics of the system is described by a linear state equation:

xk+1=fk(xk,u1k,,uNk)=Akxk+Ni=1Bikuik,k=[k0,K]xk+1=fk(xk,u1k,,uNk)=Akxk+i=1NBikuik,k=[k0,K]

(2.23)

where Akn×n,Bikn×pi,AkRn×n,BikRn×pi, and all the variables have their previous meanings. The pay-offs Ji, i = 1,…, N are described by

Ji(u1[k0,K],,uN[k0,K])=Nk=1Lik(xk+1,u1k,,uNk)Ji(u1[k0,K],,uN[k0,K])=Nk=1Lik(xk+1,u1k,,uNk)

(2.24)

Lik(xk+1,u1k,,uNk)=12[xTk+1Qik+1xk+1+Nj=1(ujk)TRijkujk]Lik(xk+1,u1k,,uNk)=12[xTk+1Qik+1xk+1+j=1N(ujk)TRijkujk]

(2.25)

where Qik+1,RijkQik+1,Rijk are matrices of appropriate dimensions with Qik+10Qik+10 symmetric and Riik>0,Riik>0, for all k = k0,…, K, i, j = 1,…, N.

The following corollary gives necessary and sufficient conditions for the existence of Nash-equilibrium for the LQ-game [59].

Corollary 2.2.1 For the N-player LQ game described by the equations (2.23)-(2.25), suppose Qik+10Qik+10 i = 1,…, N and Riik>0,Riik>0, i, j = 1,…, N, ji, k = k0,…, K. Then, there exists a unique feedback Nash-equilibrium solution to the game if, and only if, there exist unique solutions Pik,i=1,,N,Pik,i=1,,N, k = k0,…, K to the recursive equations:

[Riik+(Bik)TZjk+1Bik]Pik+(Bik)TZik+1Nj=1,jiBjkPjk=(Bik)TZik+1Ak,i=1,,N,[Riik+(Bik)TZjk+1Bik]Pik+(Bik)TZik+1j=1,jiNBjkPjk=(Bik)TZik+1Ak,i=1,,N,

(2.26)

where

Zik=FTkZik+1Fk+Nj=1(pjk)TRijkPjk+Qik,ZiK+1=QiK+1,i=1,,NZik=FTkZik+1Fk+j=1N(pjk)TRijkPjk+Qik,ZiK+1=QiK+1,i=1,,N

(2.27)

Fk=AkNi=1BikPk,k=k0,,K.Fk=Aki=1NBikPk,k=k0,,K.

(2.28)

Furthermore, the equilibrium strategies are given by

uik=Pikxk,i=1,,N.uik=Pikxk,i=1,,N.

(2.29)

Remark 2.2.2 The positive-semidefinite conditions for Qik+1Qik+1, and RijkRijk, ji guarantee the convexity of the functionals Ji and therefore also guarantee the existence of a minimizing solution in (2.12) for all k ∈ [k0, K]. Furthermore, there has not been much work in the literature regarding the existence of solutions to the coupled discrete-Riccati equations (2.26)-(2.28). The only relevant work we could find is given in [84].

Again, the results of the above corollary can easily be specialized to the case of two players. But more importantly, we shall specialize the results of Theorem 2.2.2 to the case of the two-player LQ zero-sum discrete-time dynamic game described by the state equation:

xk+1=Akxk+B1kwk+B2kuk,x(k0)=x0,k=[k0,K]xk+1=Akxk+B1kwk+B2kuk,x(k0)=x0,k=[k0,K]

(2.30)

and the objective functionals

J1(u,w)=J2(u,w)=J(u,w)=Kk=k0Lk(xk+1,xk,u,w),J1(u,w)=J2(u,w)=J(u,w)=Kk=k0Lk(xk+1,xk,u,w),

(2.31)

Lk(xk+1,xk,u,w)=12[xTk+1Qk+1xk+1+uTkukwTkwk].Lk(xk+1,xk,u,w)=12[xTk+1Qk+1xk+1+uTkukwTkwk].

(2.32)

We then have the following corollary [57, 59].

Corollary 2.2.2 For the two-player LQ zero-sum dynamic game described by (2.30)-(2.31), suppose Qk+1 0, k = k0,…, K. Then there exists a unique feedback saddle-point solution if, and only if,

I+(B2k)TMk+1B2k>0,k=k0,,K,I+(B2k)TMk+1B2k>0,k=k0,,K,

(2.33)

I(B1k)TMk+1B1k>0,k=k0,,K,I(B1k)TMk+1B1k>0,k=k0,,K,

(2.34)

where

Sk=[I+(B1kB1TkB2kB2Tk)Mk+1],Sk=[I+(B1kB1TkB2kB2Tk)Mk+1],

(2.35)

Mk=Qk+ATkMk+1S1kAk,MK+1=QK+1.Mk=Qk+ATkMk+1S1kAk,MK+1=QK+1.

(2.36)

Moreover, the unique equilibrium strategies are given by

uk,=(B2k)TMk+1S1kAkxk,uk,=(B2k)TMk+1S1kAkxk,

(2.37)

wk,=(B1k)TMk+1S1kAkxk,wk,=(B1k)TMk+1S1kAkxk,

(2.38)

and the corresponding unique state trajectory is given by the recursive equation:

xk+1=S1kAk,xk.xk+1=S1kAk,xk.

(2.39)

In the next section we consider the continuous-time counterparts of the discrete-time dynamic games.

2.3    Continuous-Time Nonzero-Sum Dynamic Games

In this section, we discuss the continuous-time counterparts of the results that we presented in the previous section for discrete-time dynamic games. Indeed the name “differential game” stems from the continuous-time models of dynamic games.

We consider at the outset N-player nonzero-sum differential game defined by the state equation:

˙x(t)=f(x,u1,,uN,t),x(t0)=x0x˙(t)=f(x,u1,,uN,t),x(t0)=x0

(2.40)

where x(t) ∈ XX ⊆ ℜn is the state of the system for any t ∈ℜ which belong to the state-space manifold XX and x(t0) is the initial state which is assumed to be known a priori by all the players; uiUUi, i = 1,…, N, NN are the decision variables or control inputs of the players 1,…, N which belong to the control sets UUi ⊂ ℜmi, miN, which are also measurable; and fk:X×U1×,UN×fk:X×U1×,UN×RR is a real C1 function of its arguments.

To each player i = 1,…, N is associated an objective functional or pay-off Ji : Γ1 ×. .. × ΓN ℜ, where Γi, i = 1,…, N is the strategy space of the player, which he tries to minimize, and is defined by

Ji(u1,,uN)=ϕi(x(tf),tf)+tft0Li(x,u1,,uN,t)dt,i=1,,N,Ji(u1,,uN)=ϕi(x(tf),tf)+tft0Li(x,u1,,uN,t)dt,i=1,,N,

(2.41)

where Li:X×U1×,UN×+andϕi:X×+,i=1,,N,Li:X×U1×,UN×R+Randϕi:X×R+R,i=1,,N, are real C1 functions of their arguments respectively. The final or terminal time tf may be variable or fixed; here, we shall assume it is fixed for simplicity. The functions ϕi(., .), i = 1,…, N are also known as the terminal cost functions.

The players play noncooperatively, but each player knows the current value of the state vector as well as all system parameters and cost functions. However, he does not know the strategies of the rival players. Our aim is to derive sufficient conditions for the existence of Nash-equilibrium solutions to the above game under closed-loop memoryless perfect information structure. For this purpose, we redefine Nash-equilibrium for continuous-time dynamic games as follows.

Definition 2.3.1 An N-tuple of strategies {uiΓi,i=1,,N}{uiΓi,i=1,,N}, where Γi, i = 1,…, N are the set of strategies, constitute a Nash-equilibrium for the game (2.40)-(2.41) if

Ji:=Ji(u1,,uN)Ji(u1,,ui1,ui,ui+1,,uN)i=1,,N,Ji:=Ji(u1,,uN)Ji(u1,,ui1,ui,ui+1,,uN)i=1,,N,

(2.42)

where ui*ui = {ui*ui (t), t ∈ [t0, tf ]} and ui = {ui(t), t ∈ [t0, tf ]}.

To derive the optimality conditions, we consider an N-tuple of piecewise continuous

strategies u := {u1,…, uN } and the value-function for the i-th player

Vi(x,t)=infuU{ϕi(x(tf),tf)+tftLi(x,u,τ)dt}=ϕi(x(tf),tf)+tftLi(x,u,τ)dτ.Vi(x,t)=infuU{ϕi(x(tf),tf)+tftLi(x,u,τ)dt}=ϕi(x(tf),tf)+tftLi(x,u,τ)dτ.

(2.43)

Then, by applying Definition 2.3.1 and the dynamic-programming principle, it can be shown that the value-functions V i, i = 1,…, N are solutions of the following Hamilton-Jacobi PDEs:

Vit=infuiUiHi(x,t,u1,,ui1,ui,ui+1,,uN,Vix)Vit=infuiUiHi(x,t,u1,,ui1,ui,ui+1,,uN,Vix)

Vi(x(tf),tf)=ϕi(x(tf),tf),Vi(x(tf),tf)=ϕi(x(tf),tf),

(2.44)

Hi(x,t,u,λTi)=Li(x,u,t)+λTif(x,u,t),i=1,,N.Hi(x,t,u,λTi)=Li(x,u,t)+λTif(x,u,t),i=1,,N.

(2.45)

The strategies u=(u1,,uN)u=(u1,,uN) which minimize the right-hand-side of the above equations (2.44) are the equilibrium strategies. The equations (2.44) are integrated backwards in time from the terminal manifold x(tf ), and at each (x, t) one must solve a static game for the Hamiltonians Hi, i = 1,…, N to find the Nash-equilibrium. This is not always possible except for a class of differential games called normal [250]. For this, it is possible to find a unique Nash-equilibrium point u for the vector H = [H1,…, HN ] for all x, λ, t such that when the equations

Vit(x,t)=Hi(x,t,u(x,t,V1x,,VNx),Vix)Vit(x,t)=Hi(x,t,u(x,t,V1x,,VNx),Vix)

(2.46)

˙x=f(x,u1,,uN,t)x˙=f(x,u1,,uN,t)

(2.47)

are integrated backward from all the points on the terminal surface, feasible trajectories are obtained. Thus, for a normal game, the following theorem provides sufficient conditions for the existence of Nash-equilibrium solution for the N-player game under closed-loop memoryless perfect-state information pattern [57, 250].

Theorem 2.3.1 Consider the N-player nonzero-sum continuous-time dynamic game (2.40)-(2.41) of fixed duration [t0, tf ], and under closed-loop memoryless perfect state information pattern. An N-tuple u=(u1,,uN)u=(u1,,uN) of strategies provides a Nash-equilibrium solution, if there exist N C1-functions V i : XX × ℜ+ → ℜ, i = 1,…, N satisfying the HJEs (2.44)-(2.45).

Remark 2.3.1 The above result can easily be specialized to the two-person nonzero-sum continuous-time dynamic game. This case will play a significant role in the derivation of the solution to the mixed H2H2/HH control problem in a later chapter.

Next, we specialize the above result to the case of the two-person zero-sum continuous-time dynamic game. For this case, we have the dynamic equation and the objective functional described by

˙x(t)=f(x,u,w,t),x(t0)=x0,x˙(t)=f(x,u,w,t),x(t0)=x0,

(2.48)

J1(u,w)=J2(u,w):=J(u,w)=ϕ(x(tf),tf)+tft0L(x,u,w,t)dt,J1(u,w)=J2(u,w):=J(u,w)=ϕ(x(tf),tf)+tft0L(x,u,w,t)dt,

(2.49)

where uUU, wWW are the strategies of the two players, which belong to the measurable sets UU ⊂ ℜmu, WW ⊂ ℜmw respectively, x0 is the initial state which is known a priori to both players. Then we have the following theorem which is the continuous-time counterpart of Theorem 2.2.2.

Theorem 2.3.2 Consider the two-person zero-sum continuous-time differential game (2.48)-(2.49) of fixed duration [t0, tf], and under closed-loop memoryless information structure. A pair of strategies (u, w) provides a Nash-feedback saddle-point solution to the game, if there exists a C1-function V : X × ℜ → ℜ of both of its arguments, satisfying the PDE:

V(x,t)t=infuUsupwW{V(x,t)tf(x,u,w,t)+L(x,u,w,t)}V(x,t)t=infuUsupwW{V(x,t)tf(x,u,w,t)+L(x,u,w,t)}

=supwWinfuU{V(x,t)tf(x,u,w,t)+L(x,u,w,t)}=supwWinfuU{V(x,t)tf(x,u,w,t)+L(x,u,w,t)}

=V(x,t)tf(x,u,w,t)+L(x,u,w,t),V(x(tf),tf)=ϕ(x(tf),tf),=V(x,t)tf(x,u,w,t)+L(x,u,w,t),V(x(tf),tf)=ϕ(x(tf),tf),

(2.50)

known as Isaacs equation [136] or Hamilton-Jacobi-Isaacs equation (HJIE).

Remark 2.3.2 The HJIE (2.50) will play a significant role in the derivation of the solution to the HH-control problem for continuous-time nonlinear systems in later chapters.

Next, we similarly specialize the above results to linear systems with quadratic objective functions, also known as the “linear-quadratic (LQ)” continuous-time dynamic game.

2.3.1    Linear-Quadratic Continuous-Time Dynamic Games

In this section, we specialize the results of the previous section to the linear-quadratic (LQ) continuous-time dynamic games. For the linear case, the dynamic equations (2.40), (2.41) reduce to the following equations:

˙x(t)=f(x,u1,,uN,t)=A(t)x(t)+Ni=1Bi(t)ui(t),x(t0)=x0x˙(t)=f(x,u1,,uN,t)=A(t)x(t)+i=1NBi(t)ui(t),x(t0)=x0

(2.51)

Ji(u1,,uN)=12xT(tf)Qifx(tf)+12tft0[xT(tf)Qifx(tf)+Nj=1(uj)T(t)Rij(t)uj(t)]dt,Ji(u1,,uN)=12xT(tf)Qifx(tf)+12tft0[xT(tf)Qifx(tf)+Nj=1(uj)T(t)Rij(t)uj(t)]dt,

(2.52)

where A(t) ∈ ℜn×n, Bi(t) ∈n×pi and Qi, Qif, Rij(t) are matrices of appropriate dimensions for each t ∈ [t0, tf ], i, j = 1,…, N. Moreover, Qi(t), Qif are symmetric positivesemidefinite and Rii is positive-definite for i = 1,…, N. It is then possible to derive a closed-form expression for the optimal strategies as stated in the following corollary to Theorem 2.3.1.

Corollary 2.3.1 Consider the N-person LQ-continuous-time dynamic game (2.51), (2.52). Suppose Qi(t) ≥ 0, Qif ≥ 0 and symmetric, Rij(t) > 0 for all t ∈ [t0, tf ], i, j = 1,…, N, i = j. Then there exists a linear feedback Nash-equilibrium solution to the differential game under closed-loop memoryless perfect state information structure, if there exists a set of symmetric solutions P i(t) ≥ 0 to the N-coupled time-varying matrix Riccati ordinary differential-equations (ODEs):

˙Pi(t)+Pi(t)˜A(t)+˜AT(t)Pi(t)+Nj=1Pj(t)Bj(t)(Rjj(t))1Rij(t)(Rjj(t))1(Bj)TPj(t)+Qi(t)=0,Pi(tf)=Qif,i=1,,NP˙i(t)+Pi(t)A˜(t)+A˜T(t)Pi(t)+j=1NPj(t)Bj(t)(Rjj(t))1Rij(t)(Rjj(t))1(Bj)TPj(t)+Qi(t)=0,Pi(tf)=Qif,i=1,,N

(2.53)

˜A(t)=A(t)Ni=1Bi(t)(Rii(t))T(Bi)TPi(t).A˜(t)=A(t)i=1NBi(t)(Rii(t))T(Bi)TPi(t).

(2.54)

Furthermore, the optimal strategies and costs for the players are given by

ut(t)=(Rii(t))1(Bi(t))TPi(t)x(t),i=1,,N,ut(t)=(Rii(t))1(Bi(t))TPi(t)x(t),i=1,,N,

(2.55)

Ji=12xT0Pi(0)x0,i=1,,N,Ji=12xT0Pi(0)x0,i=1,,N,

(2.56)

The equations (2.53) represent a system of coupled ODEs which could be integrated backwards using numerical schemes such as the Runge-Kutta method. However, not much is known about the behavior of such a system, although some work has been done for the two-player case [2, 108, 221]. Since these results are not widely known, we shall review them briefly here.

Consider the system equation with two players given by

˙x(t)=A(t)x(t)+B1(t)w(t)+B2(t)u(t),x(t0)=x0,x˙(t)=A(t)x(t)+B1(t)w(t)+B2(t)u(t),x(t0)=x0,

(2.57)

and the cost functions:

J1(u,w)=x(tf)TQ1fx(tf)+tf0(xTQ1x+wTR11w+uTR12u)dt,J1(u,w)=x(tf)TQ1fx(tf)+tf0(xTQ1x+wTR11w+uTR12u)dt,

(2.58)

J2(u,w)=x(tf)TQ2fx(tf)+tf0(xTQ2x+uTR22u+wTR21w)dt,J2(u,w)=x(tf)TQ2fx(tf)+tf0(xTQ2x+uTR22u+wTR21w)dt,

(2.59)

where Qif ≥ 0, Qi ≥ 0, Rij ≥ 0, i = j, Rii > 0, i, j = 1, 2. Further, assume that the matrices A(t) = A, B1(t) = B1, and B2(t) = B2 are constant and of appropriate dimensions. Then, the system of coupled ODEs corresponding to the two-player nonzero-sum game is given by

˙P1=ATP1P1A+P1S11P1+P1S22P2+P2S22P1P2S12P2Q1,P1(tf)=Q1f,P˙1=ATP1P1A+P1S11P1+P1S22P2+P2S22P1P2S12P2Q1,P1(tf)=Q1f,

(2.60)

˙P2=ATP2P2A+P2S22P2+P2S11P1+P1S11P2P1S21P1Q2,P2(tf)=Q2f,P˙2=ATP2P2A+P2S22P2+P2S11P1+P1S11P2P1S21P1Q2,P2(tf)=Q2f,

(2.61)

where

Sij=Bj(Rjj)1Rij(Rjj)1(Bj)T,i,j=1,2.Sij=Bj(Rjj)1Rij(Rjj)1(Bj)T,i,j=1,2.

In [221] global existence results for the solution of the above system (2.60)-(2.61) is established for the special case B1 = B2 and R11 = R22 = R12 = R21 = In, the identity matrix. It is shown that P 1 − P 2 satisfies a linear Lyapunov-type ODE, while P 1 + P 2 satisfies a standard Riccati equation. Therefore, P 1 + P 2 and hence P 1, P 2 cannot blow up in any finite time, and as tf → ∞, P 1(0) + P 2(0) goes to a stabilizing, positive-semidefinite solution of the corresponding algebraic Riccati equation (ARE). In the following result, we derive upper and lower bounds for the solutions P 1 and P 2 of (2.60), (2.61).

Lemma 2.3.1 Suppose P 1, P 2 are solutions of (2.60)-(2.61) on the interval [t0, tf ]. Then, P1(t)0,P2(t)0forallt[t0,tf].P1(t)0,P2(t)0forallt[t0,tf].

Proof: Let x(t0) = x1 ≠ 0 and let x be a solution of the initial-value problem

˙x(t)=[AS11P1(t)S22P2(t)]x(t),x(t0)=x1.x˙(t)=[AS11P1(t)S22P2(t)]x(t),x(t0)=x1.

(2.62)

Then for i = 1, 2

ddt[xTPi(t)x]=˙xT˙Pi(t)x+xTPi(t)˙xddt[xTPi(t)x]=x˙TP˙i(t)x+xTPi(t)x˙

=xT{ATPi(t)P1(t)S11Pi(t)P2(t)S22Pi(t)+Pi(t)APi(t)S11P1(t)Pi(t)S22P2(t)ATPi(t)Pi(t)AQi+Pi(t)SiiPi(t)jiPj(t)SijPj(t)+ji(Pi(t)SjjPj(t)+Pj(t)SjjPi(t))}x=xT{ATPi(t)P1(t)S11Pi(t)P2(t)S22Pi(t)+Pi(t)APi(t)S11P1(t)Pi(t)S22P2(t)ATPi(t)Pi(t)AQi+Pi(t)SiiPi(t)jiPj(t)SijPj(t)+ji(Pi(t)SjjPj(t)+Pj(t)SjjPi(t))}x

=xT{Qi+P1(t)Si1P1(t)+P2(t)Si2P2(t)}x=xT{Qi+P1(t)Si1P1(t)+P2(t)Si2P2(t)}x

where the previous equation follows from (2.60), (2.61) and (2.62). Now integrating from t0 to tf yields

xT1Pi(t0)x1=tft0xT(τ)˜Qi(τ)x(τ)dτ+xT(tf)Qifx(tf)0xT1Pi(t0)x1=tft0xT(τ)Q˜i(τ)x(τ)dτ+xT(tf)Qifx(tf)0

(2.63)

where ˜QiQ˜i := Qi + P 1Si1P 1 + P 2Si2P 2. Since x1 is arbitrary and Qi, Si1, Si2, P i(tf ) ≥ 0, the result follows. □

The next theorem gives sufficient conditions for no finite-escape times for the solutions P 1, P 2.

Theorem 2.3.3 Let Q ∈ ℜn×n be symmetric and suppose for all ttf

F(P1(t),P2(t),Q):=Q+(P1(t)+P2(t))(S11+S22)+(P1(t)+P2(t))P1(t)S21P1(t)P2(t)S12P2(t)(P1(t)S22P1(t)+P2(t)S11P2(t))0,F(P1(t),P2(t),Q):=Q+(P1(t)+P2(t))(S11+S22)+(P1(t)+P2(t))P1(t)S21P1(t)P2(t)S12P2(t)(P1(t)S22P1(t)+P2(t)S11P2(t))0,

(2.64)

and when P 1(t), P 2(t) exist, then the solutions P 1(t), P 2(t) of (2.60), (2.61) exist for all t < tf with

0P1(t)+P2(t)RQ(t),0P1(t)+P2(t)RQ(t),

(2.65)

for some unique positive-semidefinite matrix function RQ(t) solution to the terminal value problem:

˙RQ(t)=RQ(t)AATRQ(t)(Q1+Q2+Q),RQ(tf)=Q1f+Q2fR˙Q(t)=RQ(t)AATRQ(t)(Q1+Q2+Q),RQ(tf)=Q1f+Q2f

(2.66)

which exists for all t < tf.

Proof: The first inequality in (2.65) follows from Lemma 2.3.1. Further

(˙P1+˙P2)=AT(P1+P2)(P1+P2A(Q1+Q2+Q)+F(P1(t),P2(t),Q)),(P˙1+P˙2)=AT(P1+P2)(P1+P2A(Q1+Q2+Q)+F(P1(t),P2(t),Q)),

and by the monotonicity of solutions to Riccati equations [292], it follows that the second inequality holds while F (P 1(t), P 2(t), Q) 0. □

Remark 2.3.3 For a more extensive discussion of the existence of the solutions P 1, P 2 to the coupled Riccati-ODEs, the reader is referred to reference [108]. However, we shall take up the subject for the infinite-horizon case where tf → ∞ in a later chapter.

We now give an example of a pursuit-evasion problem which can be solved as a two-player nonzero-sum game [251].

Example 2.3.1 We consider a pursuit-evasion problem with the following dynamics

˙r=v,˙v=apaer˙=v,v˙=apae

where r is the relative position vector of the two adversaries, ap, ae are the accelerations of the pursuer and evader respectively, which also serve as the controls to the players.

The cost functions are taken to be

Jp=12q2prTfrf+12T0(1ceaTpap+1cepaTeae)dtJp=12q2prTfrf+12T0(1ceaTpap+1cepaTeae)dt

Je=12q2erTfrf+12T0(1ceaTeae+1cepaTpap)dtJe=12q2erTfrf+12T0(1ceaTeae+1cepaTpap)dt

where rf = r(T ) and the final time is fixed, while cp, ce, cpe, cep are weighting constants. The Nash-equilibrium controls are obtained by applying the results of Corollary 2.3.1 as

ap=cp[0I]Pp[rv],ae=ce[0I]Pe[rv],ap=cp[0I]Pp[rv],ae=ce[0I]Pe[rv],

where Pp, Pe are solutions to the coupled Riccati-ODES:

˙Pp(t)=Pp[0I00][00I0]Pp+cpPp[000I]Pp+cePp[000I]Pe+cePe[000I]Ppc2ecpePe[000I]PeP˙p(t)=Pp[00I0][0I00]Pp+cpPp[000I]Pp+cePp[000I]Pe+cePe[000I]Ppc2ecpePe[000I]Pe

˙Pe(t)=Pe[0I00][00I0]Pe+cePe[000I]Pe+cpPe[000I]Pp+cpPp[000I]Pec2pcepPp[000I]PpP˙e(t)=Pe[00I0][0I00]Pe+cePe[000I]Pe+cpPe[000I]Pp+cpPp[000I]Pec2pcepPp[000I]Pp

Pp(T)=q2p[I000],Pe(T)=q2e[I000].Pp(T)=q2p[I000],Pe(T)=q2e[I000].

It can be easily verified that the solutions to the above ODEs are given by

Pp(t)=1cp˜p(t)[IτIτIτ2I],Pe(t)=1ce˜e(t)[IτIτIτ2I]Pp(t)=1cpp˜(t)[IτIτIτ2I],Pe(t)=1cee˜(t)[IτIτIτ2I]

where τ = Tt is the time-to-go and ˜p(t),˜e(t)p˜(t),e˜(t) are solutions to the following ODES:

d˜pdτ=τ2(˜p2+2˜p˜ea˜e2),˜p(0)=cpq2p,a=cpcpedp˜dτ=τ2(p˜2+2p˜e˜ae˜2),p˜(0)=cpq2p,a=cpcpe

d˜edτ=τ2(˜e2+2˜p˜eb˜p2),˜e(0)=ceq2e,a=cecep.de˜dτ=τ2(e˜2+2p˜e˜bp˜2),e˜(0)=ceq2e,a=cecep.

Again we can specialize the result of Theorem 2.3.2 to the LQ case. We consider for this case the system equation given by (2.57) and cost functionals represented as

J1(u,w)=J2(u,w):=J=12xT(tf)Qfx(tf)+12tft0[xT(t)Qx(t)+uT(t)Ruu(t)wT(t)Rww(t)]dt,J1(u,w)=J2(u,w):=J=12xT(tf)Qfx(tf)+12tft0[xT(t)Qx(t)+uT(t)Ruu(t)wT(t)Rww(t)]dt,

(2.67)

where u is the minimizing player and w is the maximizing player, and without any loss of generality, we can assume Rw = I. Then if we assume V (x(tf ), tf ) = ϕ(x(tf ), tf ) = 1212 xT Qfx is quadratic, and also

V(x,t)=12xT(t)P(t)x(t),V(x,t)=12xT(t)P(t)x(t),

(2.68)

where P (t) is symmetric and P (T ) = Qf, then substituting these in the HJIE (2.50), we get the following matrix Riccati ODE:

˙P(t)+P(t)˜A(t)+˜AT(t)P(t)+P(t)[B1(t)(B1(t))TB2(t)R1u(B2(t))T]P(t)+Q=0,P(tf)=Qf.P˙(t)+P(t)A˜(t)+A˜T(t)P(t)+P(t)[B1(t)(B1(t))TB2(t)R1u(B2(t))T]P(t)+Q=0,P(tf)=Qf.

(2.69)

Furthermore, we have the following corollary to Theorem 2.3.2.

Corollary 2.3.2 Consider the two-player zero-sum continuous-time LQ dynamic game described by equations (2.57) and (2.67). Suppose there exist bounded symmetric solutions to the matrix ODE (2.69) for all t ∈ [t0, tf ], then there exists a unique feedback saddle-point solution to the dynamic game under closed-loop memoryless information-structure. Further, the unique strategies for the players are given by

u(t)=R1u(B2(t))TP(t)x(t),u(t)=R1u(B2(t))TP(t)x(t),

(2.70)

w(t)=(B1(t))TP(t)x(t),w(t)=(B1(t))TP(t)x(t),

(2.71)

and the optimal value of the game is given by

J(u,w)=12xT0P(0)x0.J(u,w)=12xT0P(0)x0.

(2.72)

Remark 2.3.4 In Corollary 2.3.1, the requirement that Qf ≥ 0 and Qi ≥ 0, i = 1,…, N has been included to guarantee the existence of the Nash-equilibrium solution; whereas, in Corollary 2.3.2 the existence of bounded solutions to the matrix Riccati ODE (2.69) is used. It is however possible to remove this assumption if in addition to the requirement that Qf ≥ 0 and Q ≥ 0, we also have

[(B1(t))TB1(t)(B2(t))TR1uB2(t)]0t[t0,tf].

Then the matrix Riccati equation (2.69) admits a unique positive-semidefinite solution [59].

2.4    Notes and Bibliography

The first part of the chapter on dynamic programmimg principle is based on the books [47, 64, 164, 175], while the material on dynamic games is based in most part on the books by Basar and Bernhard [57] and by Basar and Olsder [59]. The remaining material is based on the papers [250, 251]. On the other hand, the discussion on the existence, definiteness and boundedness of the solutions to the coupled matrix Riccati-ODEs are based on the Reference [108]. Moreover, the results on the stochastic case can also be found in [2].

We have not discussed the Minimum principle of Pontryagin [47] because all the problems discussed here pertain to unconstrained problems. However, for such problems, the minimum principle provides a set of necessary conditions for optimality. Details of these necessary conditions can be found in [59]. Furthermore, a treatment of constrained problems using the minimum principle can also be found in [73].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset