The theory of games was developed around 1930 by Von Neumann and Morgenstern [136] and differential games approximately around 1950, the same time that optimal control theory was being developed. The theory immediately found application in warfare and economics; for instance, certain types of battles, airplane dog-fighting, a torpedo pursuing a ship, a missile intercepting an aircraft, a gunner guarding a target against an invader [73], are typical models of differential games. Similarly, financial planning between competing sectors of an economy, or competing products in a manufacturing system, meeting demand with adequate supply in a market system, also fall into the realm of differential games.
Game theory involves multi-person decision-making. It comprises of a task, an objective, and players or decision makers. The task may range from shooting down a plane, to steering of a ship, and to controlling the amount of money in an economy. While the objective which measures the performance of the players may vary from how fast? to how cheaply? or how closely? or a combination of these, the task is accomplished by the player(s).
When the game involves more than one player, the players may play cooperatively or noncooperatively. It is noncooperative if each player involved pursues his or her own interest, which are partly conflicting with that of others; and it is dynamic if the order in which the decisions are made is important. Whereas when the game involves only one player, the problem becomes that of optimal control which can be solved by the Pontryagin’s “maximum principle” or Bellman’s “dynamic programming principle” [73]. Indeed, the theories of differential games, optimal control, the calculus of variation and dynamic programming are all approaches to solving variational problems, and have now become merged into the theory of “modern control.”
There are two types of differential games that we shall be concerned with in this chapter; namely, nonzero-sum and zero-sum noncooperative dynamic games. We shall not discuss static games in this book. Furthermore, we shall only be concerned with games in which the players have perfect knowledge of the current and past states of the system, and their individual decisions are based on this. This is also known as closed-loop perfect information structure or pattern. On the other hand, when the decision is only based on the current value of the state of the system, we shall add the adjective “memoryless.”
We begin with the fundamental dynamic programming principle which forms the basis for the solution to almost all unconstrained differential game problems.
2.1 Dynamic Programming Principle
The dynamic programming principle was developed by Bellman [64] as the discrete-time equivalent of Hamilton-Jacobi theory. In fact, the term dynamic programming has become synonymous with Hamilton-Jacobi theory. It also provides a sufficient condition for optimality of an optimal decision process. To derive this condition, we consider a discrete-time dynamic system defined on a manifold X
with local coordinates (x1,…, xn):
xk+1=fk(xk,uk), xk0=x0, k∈ {k0,…} ⊂Z, |
(2.1) |
where uk∈U
Assumption 2.1.1 For the function fk(x, u) in (2.1), for each k ∈ Z, there exists a constant C1k (depending on k) such that for any fixed u,
‖fk(x1,u)−fk(x2,u)‖ ≤ c1k‖x1−x2‖∀x1,x2 ∈X,∀u∈U, k∈Z.
To formulate an optimal decision process (or an optimal control problem), we introduce the following cost functional:
J(x0,k0;u[k0,K]) = K∑k=K0Lk(xk+1,xk,uk)→min., |
(2.2) |
where Lk:X×X×U→ℜ, k=k0,…,K,
V(x,k) = infu[k,K]∈U{K∑j=kLj(xj+1,xj,uj)}, |
(2.3) |
and satisfies the boundary condition V (x, K + 1) = 0, where x = xk and u[k,K] ≜ {uk,…, uK}.
Remark 2.1.1 For brevity, we shall use the notation [k1, k2] henceforth to mean the subset of the integers Z ⊃ {k1,…, k2} and where there will be no confusion.
We then have the following theorem.
Theorem 2.1.1 Consider the nonlinear discrete-time system (2.1) and the optimal control problem of minimizing the cost functional (2.2) subject to the dynamics of the system. Then, there exists an optimal control, k ∈ u⋆k
V(x,k) = infuk{Lk(fk(x,uk),x,uk)+V(fk(x,uk),k+1)}; |
(2.4) |
V (x, K + 1) = 0.
Furthermore,
J⋆(x0,k0;u⋆[k0,K])=V(x0,k0)
is the optimal cost of the decision process.
Proof: The proof of the above theorem is based on the principle of optimality [164] and can be found in [64, 164]. □
Equation (2.4) is the discrete dynamic programming principle and governs the solution of most of the discrete-time problems that we shall be dealing with in this book.
Next, we discuss the continuous-time equivalent of the dynamic programming equation which is a first-order nonlinear partial-differential equation (PDE) known as the Hamilton-Jacobi-Bellman equation (HJBE). In this regard, consider the continuous-time nonlinear dynamic system defined on an open subset X
˙x(t)= f(x(t),u(t),t), x(t0)=x0, |
(2.5) |
where f:X×U×ℜ→ℜn
Assumption 2.1.2 The function f(., ., .) in (2.5) is piece-wise continuous with respect to t and for each t ∈ [0, ∞) there exist constants C1t, C2t such that
‖f(x1,u,t)−f(x2,u,t)‖ ≤ C1t‖x1−x2‖ ∀x1,x2∈X, ∀u∈U.
For the purpose of optimally controlling the system, we associate to it the following cost functional:
J(x0,t0;u[t0,T])=T∫t0L(x,u,t)dt→min., |
(2.6) |
for some real C0 function L : X
V(x,t)= infu[t,T]{T∫tL(x(s),u(s),s)ds} |
(2.7) |
and satisfying the boundary condition V (x, T ) = 0. Then we have the following theorem.
Theorem 2.1.2 Consider the nonlinear system (2.5) and the optimal control problem of minimizing the cost functional (2.6) subject to the dynamics of the system and initial condition. Suppose there exists a C1(X
−Vt(x,t)=minu[t0,T]{Vx(x,t)f(x,u,t)+L(x,u,t)}, V(x,T)=0. |
(2.8) |
Then, there exists an optimal solution u⋆ to the problem. Moreover, the optimal cost of the policy is given by
J⋆(x⋆0,t0;u⋆[t0,T])=V(x0,t0).
Proof: The proof of the theorem can also be found in [47, 175, 164]. □
The above development has considered single player decision processes or optimal control problems. In the next section, we discuss dynamic games, which involve multiple players. Again, we shall begin with the discrete-time case.
2.2 Discrete-Time Nonzero-Sum Dynamic Games
Nonzero-sum dynamic games were first introduced by Isaacs [136, 137] around the years 1954-1956 within the frame-work of two-person zero-sum games. They were later popularized in the works of Starr and Ho [250, 251] and Friedman [107]. Stochastic games were then later developed [59].
We consider a discrete-time deterministic N-person dynamic game of duration K − k0 described by the state equation (which we referred to as the task):
xk+1=fk(xk,u1k,…,uNk), x(k0)=x0, k∈{k0,…,K}, |
(2.9) |
where xk ∈ X
Ji(u1,…,uN)=K∑k=k0Lik(xk+1,xk,u1k,…,uNk), i=1,…,N, |
(2.10) |
and where the uik ∈ Γi, i=1,…,N
Definition 2.2.1 An N-tuple of strategies {ui*∈Γi, i=1,…,N}
Ji⋆:= Ji(u1⋆,…,uN⋆)≤Ji(u1⋆,…,ui,ui+1⋆,uN⋆) i=1,…,N, |
(2.11) |
where ui⋆ ={uik,⋆k=k0,…,K} and ui ={uik,⋆k=k0,…,K}.
In this section, we discuss necessary and sufficient conditions for the existence of Nash-equilibrium solutions of such a game under closed-loop (or feedback) no-memory perfect-state-information structure, i.e., when the decision is based on the current state information which is pefectly known. The following theorem based on the dynamic programming principle gives necessary and sufficient conditions for a set of strategies to be a Nash-equilibrium solution [59].
Theorem 2.2.1 For the N-person discrete-time nonzero-sum game (2.9)-(2.11), an N-tuple of strategies {ui⋆∈Γi,i=1,…,N},
Notice that, in the above equations, there are N × (K − k0) functions, since at each stage of the decision process, a different function is required for each of the players. However, it is often the case that N functions that satisfy the above recursive equations can be obtained with a single function for each player. This is particularly true in the linear case.
More specifically, let us consider the case of a two-player nonzero-sum game, of which type most of the problems in this book will be, described by the state equation:
xk+1=fx(xk,wk,uk), x(k0)=x0, k∈{k0,…,K} |
(2.13) |
where wk ∈ W
J1(w,u) = K∑k=K0L1k(xk+1,xk.wk,uk), |
(2.14) |
J2(w,u) = K∑k=K0L2k(xk+1,xk.wk,uk), |
(2.15) |
where w = w[k0,K], u = u[k0,K]. Then, a pair of strategies w⋆ := {wk,⋆, k = k0,…, K}, u⋆ := {uk,⋆, k = k0,…, K}, will constitute a Nash-equilibrium solution for the game if
(2.16) |
(2.17) |
Furthermore, the conditions of Theorem 2.2.1 reduce to:
V1(x,k) = minuk∈U {L1k(fk(x,wk,⋆,uk),x,wk,⋆,uk)+V1(fk(x,wk,⋆,uk),k+1};V1(x,K+1)=0, k=1,…,K |
(2.18) |
V2(x,k) = minwk∈W {L2k(fk(x,wk,uk,⋆),x,wk,uk,⋆)+V2(fk(x,wk,uk,⋆),k+1};V2(x,K+1)=0, k=1,…,K. |
(2.19) |
Remark 2.2.1 Equations (2.18), (2.19) represent a pair of coupled HJ-equations. They are difficult to solve, and not much work has been done to study their behavior. However, these equations will play an important role in the derivation of the solution to the discrete-time mixed H2
Another special class of the nonzero-sum games is the two-person zero-sum game, in which the objective functionals (2.14), (2.15) are such that
J1(w,u)=−J2(w,u)=J(w,u):=K∑k=k0(xk+1,xk.wk,uk). |
(2.20) |
If this is the case, then J1(w⋆, u⋆) + J2(w⋆, u⋆) = 0 and while u is the minimizer, w is the maximizer. Thus, we might refer to u as the minimizing player and w as the maximizing player. Furthermore, the “Nash-equilibrium” conditions (2.16), (2.17) reduce to the saddle-point conditions:
J(w,u⋆)≤J(w⋆,u⋆)≤J(w⋆,u), ∀w∈W, u∈U, |
(2.21) |
where the pair (w*, u*) is the optimal strategy and is called a “saddle-point.” Consequently, the set of necessary and sufficient conditions (2.18), (2.19) reduce to the following condition given in this theorem [57, 59].
Theorem 2.2.2 For the two-person discrete-time zero-sum game defined by (2.13), (2.20), a pair of strategies (w⋆, u⋆) provides a feedback saddle-point solution if, and only if, there exists a set of K − k0 functions V : X
where x = xk. Equation (2.22) is known as Isaacs equation, being the discrete-time version of the one developed by Isaacs [136, 59]. Furthermore, the interchangeability of the “max” and “min” operations above is also known as “Isaacs condition.” The above equation will play a significant role in the derivation of the solution to the H∞
2.2.1 Linear-Quadratic Discrete-Time Dynamic Games
We now specialize the above results of the N-person nonzero-sum dynamic games to the linear-quadratic (LQ) case in which the optimal decisions can be expressed explicitly. For this purpose, the dynamics of the system is described by a linear state equation:
xk+1 = fk(xk,u1k,…,uNk)=Akxk+N∑i=1Bikuik, k=[k0,K] |
(2.23) |
where Ak∈ℜn×n, Bik∈ℜn×pi,
Ji(u1[k0,K],…,uN[k0,K]) = N∑k=1Lik(xk+1,u1k,…,uNk) |
(2.24) |
Lik(xk+1,u1k,…,uNk)= 12[xTk+1Qik+1xk+1+N∑j=1(ujk)TRijkujk] |
(2.25) |
where Qik+1, Rijk
The following corollary gives necessary and sufficient conditions for the existence of Nash-equilibrium for the LQ-game [59].
Corollary 2.2.1 For the N-player LQ game described by the equations (2.23)-(2.25), suppose Qik+1≥0
[Riik+(Bik)TZjk+1Bik]Pik+(Bik)TZik+1N∑j=1,j≠iBjkPjk=(Bik)TZik+1Ak, i=1,…,N, |
(2.26) |
where
Zik = FTkZik+1Fk+N∑j=1(pjk)TRijkPjk+Qik, ZiK+1=QiK+1, i=1,…,N |
(2.27) |
(2.28) |
Furthermore, the equilibrium strategies are given by
(2.29) |
Remark 2.2.2 The positive-semidefinite conditions for Qik+1
Again, the results of the above corollary can easily be specialized to the case of two players. But more importantly, we shall specialize the results of Theorem 2.2.2 to the case of the two-player LQ zero-sum discrete-time dynamic game described by the state equation:
xk+1=Akxk+B1kwk+B2kuk, x(k0)=x0, k=[k0,K] |
(2.30) |
and the objective functionals
−J1(u,w) = J2(u,w)=J(u,w)=∑Kk=k0Lk(xk+1,xk,u,w), |
(2.31) |
Lk(xk+1,xk,u,w) = 12[xTk+1Qk+1xk+1+uTkuk−wTkwk]. |
(2.32) |
We then have the following corollary [57, 59].
Corollary 2.2.2 For the two-player LQ zero-sum dynamic game described by (2.30)-(2.31), suppose Qk+1 ≥ 0, k = k0,…, K. Then there exists a unique feedback saddle-point solution if, and only if,
I+(B2k)TMk+1B2k > 0, ∀k=k0,…,K, |
(2.33) |
I−(B1k)TMk+1B1k > 0, ∀k=k0,…,K, |
(2.34) |
where
Sk = [I+(B1kB1Tk−B2kB2Tk)Mk+1], |
(2.35) |
Mk = Qk+ATkMk+1S−1kAk, MK+1=QK+1. |
(2.36) |
Moreover, the unique equilibrium strategies are given by
(2.37) |
(2.38) |
and the corresponding unique state trajectory is given by the recursive equation:
(2.39) |
In the next section we consider the continuous-time counterparts of the discrete-time dynamic games.
2.3 Continuous-Time Nonzero-Sum Dynamic Games
In this section, we discuss the continuous-time counterparts of the results that we presented in the previous section for discrete-time dynamic games. Indeed the name “differential game” stems from the continuous-time models of dynamic games.
We consider at the outset N-player nonzero-sum differential game defined by the state equation:
˙x(t) = f(x,u1,…,uN,t), x(t0)=x0 |
(2.40) |
where x(t) ∈ X
To each player i = 1,…, N is associated an objective functional or pay-off Ji : Γ1 ×. .. × ΓN → ℜ, where Γi, i = 1,…, N is the strategy space of the player, which he tries to minimize, and is defined by
Ji(u1,…,uN) = ϕi(x(tf),tf)+∫tft0Li(x,u1,…,uN,t)dt, i=1,…,N, |
(2.41) |
where Li:X×U1×…,UN×ℜ+→ℜ and ϕi:X×ℜ+→ℜ, i=1,…,N,
The players play noncooperatively, but each player knows the current value of the state vector as well as all system parameters and cost functions. However, he does not know the strategies of the rival players. Our aim is to derive sufficient conditions for the existence of Nash-equilibrium solutions to the above game under closed-loop memoryless perfect information structure. For this purpose, we redefine Nash-equilibrium for continuous-time dynamic games as follows.
Definition 2.3.1 An N-tuple of strategies {ui⋆∈Γi,i=1,…,N}
Ji⋆ := Ji(u1⋆,…,uN⋆)≤Ji(u1⋆,…,ui−1⋆,ui,ui+1⋆,…,uN⋆) ∀i=1,…,N, |
(2.42) |
where ui*
To derive the optimality conditions, we consider an N-tuple of piecewise continuous
strategies u := {u1,…, uN } and the value-function for the i-th player
Vi(x,t) = infu∈U{ϕi(x(tf),tf)+∫tftLi(x,u,τ)dt}=ϕi(x(tf),tf)+∫tftLi(x,u⋆,τ)dτ. |
(2.43) |
Then, by applying Definition 2.3.1 and the dynamic-programming principle, it can be shown that the value-functions V i, i = 1,…, N are solutions of the following Hamilton-Jacobi PDEs:
∂Vi∂t = −infui∈Ui Hi(x,t,u1⋆,…,ui−1⋆,ui,ui+1⋆,…,uN⋆,∂Vi∂x)
(2.44) |
Hi(x,t,u,λTi) = Li(x,u,t)+λTif(x,u,t), i=1,…,N. |
(2.45) |
The strategies u⋆=(u1⋆,…,uN⋆)
∂Vi∂t (x,t) = −Hi(x,t,u⋆(x,t,∂V1∂x ,…,∂VN∂x ),∂Vi∂x ) |
(2.46) |
(2.47) |
are integrated backward from all the points on the terminal surface, feasible trajectories are obtained. Thus, for a normal game, the following theorem provides sufficient conditions for the existence of Nash-equilibrium solution for the N-player game under closed-loop memoryless perfect-state information pattern [57, 250].
Theorem 2.3.1 Consider the N-player nonzero-sum continuous-time dynamic game (2.40)-(2.41) of fixed duration [t0, tf ], and under closed-loop memoryless perfect state information pattern. An N-tuple u⋆=(u1⋆,…,uN⋆)
Remark 2.3.1 The above result can easily be specialized to the two-person nonzero-sum continuous-time dynamic game. This case will play a significant role in the derivation of the solution to the mixed H2
Next, we specialize the above result to the case of the two-person zero-sum continuous-time dynamic game. For this case, we have the dynamic equation and the objective functional described by
(2.48) |
J1(u,w)=−J2(u,w):=J(u,w)=ϕ(x(tf),tf)+∫tft0L(x,u,w,t)dt, |
(2.49) |
where u ∈ U
Theorem 2.3.2 Consider the two-person zero-sum continuous-time differential game (2.48)-(2.49) of fixed duration [t0, tf], and under closed-loop memoryless information structure. A pair of strategies (u⋆, w⋆) provides a Nash-feedback saddle-point solution to the game, if there exists a C1-function V : X × ℜ → ℜ of both of its arguments, satisfying the PDE:
−∂V(x,t)∂t = infu∈U supw∈W{∂V(x,t)∂t f(x,u,w,t)+L(x,u,w,t)}
=supw∈W infu∈U{∂V(x,t)∂t f(x,u,w,t)+L(x,u,w,t)}
=∂V(x,t)∂t f(x,u⋆,w⋆,t)+L(x,u⋆,w⋆,t),V(x(tf),tf)=ϕ(x(tf),tf), |
(2.50) |
known as Isaacs equation [136] or Hamilton-Jacobi-Isaacs equation (HJIE).
Remark 2.3.2 The HJIE (2.50) will play a significant role in the derivation of the solution to the H∞
Next, we similarly specialize the above results to linear systems with quadratic objective functions, also known as the “linear-quadratic (LQ)” continuous-time dynamic game.
2.3.1 Linear-Quadratic Continuous-Time Dynamic Games
In this section, we specialize the results of the previous section to the linear-quadratic (LQ) continuous-time dynamic games. For the linear case, the dynamic equations (2.40), (2.41) reduce to the following equations:
˙x(t)= f(x,u1,…,uN,t)=A(t)x(t)+N∑i=1Bi(t)ui(t), x(t0)=x0 |
(2.51) |
Ji(u1,…,uN)=12xT(tf)Qifx(tf)+12∫tft0[xT(tf)Qifx(tf)+∑Nj=1(uj)T(t)Rij(t)uj(t)]dt, |
(2.52) |
where A(t) ∈ ℜn×n, Bi(t) ∈n×pi and Qi, Qif, Rij(t) are matrices of appropriate dimensions for each t ∈ [t0, tf ], i, j = 1,…, N. Moreover, Qi(t), Qif are symmetric positivesemidefinite and Rii is positive-definite for i = 1,…, N. It is then possible to derive a closed-form expression for the optimal strategies as stated in the following corollary to Theorem 2.3.1.
Corollary 2.3.1 Consider the N-person LQ-continuous-time dynamic game (2.51), (2.52). Suppose Qi(t) ≥ 0, Qif ≥ 0 and symmetric, Rij(t) > 0 for all t ∈ [t0, tf ], i, j = 1,…, N, i = j. Then there exists a linear feedback Nash-equilibrium solution to the differential game under closed-loop memoryless perfect state information structure, if there exists a set of symmetric solutions P i(t) ≥ 0 to the N-coupled time-varying matrix Riccati ordinary differential-equations (ODEs):
˜A(t)=A(t)−N∑i=1Bi(t)(Rii(t))T(Bi)TPi(t). |
(2.54) |
Furthermore, the optimal strategies and costs for the players are given by
ut⋆(t) = −(Rii(t))−1(Bi(t))TPi(t)x(t), i=1,…,N, |
(2.55) |
(2.56) |
The equations (2.53) represent a system of coupled ODEs which could be integrated backwards using numerical schemes such as the Runge-Kutta method. However, not much is known about the behavior of such a system, although some work has been done for the two-player case [2, 108, 221]. Since these results are not widely known, we shall review them briefly here.
Consider the system equation with two players given by
˙x(t) = A(t)x(t)+B1(t)w(t)+B2(t)u(t), x(t0)=x0, |
(2.57) |
and the cost functions:
J1(u,w)=x(tf)TQ1fx(tf)+∫tf0(xTQ1x+wTR11w+uTR12u)dt, |
(2.58) |
J2(u,w)=x(tf)TQ2fx(tf)+∫tf0(xTQ2x+uTR22u+wTR21w)dt, |
(2.59) |
where Qif ≥ 0, Qi ≥ 0, Rij ≥ 0, i = j, Rii > 0, i, j = 1, 2. Further, assume that the matrices A(t) = A, B1(t) = B1, and B2(t) = B2 are constant and of appropriate dimensions. Then, the system of coupled ODEs corresponding to the two-player nonzero-sum game is given by
˙P1 = −ATP1−P1A+P1S11P1+P1S22P2+P2S22P1−P2S12P2−Q1, P1(tf)=Q1f, |
(2.60) |
˙P2 = −ATP2−P2A+P2S22P2+P2S11P1+P1S11P2−P1S21P1−Q2, P2(tf)=Q2f, |
(2.61) |
where
Sij = Bj(Rjj)−1Rij(Rjj)−1(Bj)T, i,j=1,2.
In [221] global existence results for the solution of the above system (2.60)-(2.61) is established for the special case B1 = B2 and R11 = R22 = −R12 = −R21 = In, the identity matrix. It is shown that P 1 − P 2 satisfies a linear Lyapunov-type ODE, while P 1 + P 2 satisfies a standard Riccati equation. Therefore, P 1 + P 2 and hence P 1, P 2 cannot blow up in any finite time, and as tf → ∞, P 1(0) + P 2(0) goes to a stabilizing, positive-semidefinite solution of the corresponding algebraic Riccati equation (ARE). In the following result, we derive upper and lower bounds for the solutions P 1 and P 2 of (2.60), (2.61).
Lemma 2.3.1 Suppose P 1, P 2 are solutions of (2.60)-(2.61) on the interval [t0, tf ]. Then, P1(t)≥0, P2(t)≥0 for all t ∈ [t0, tf].
Proof: Let x(t0) = x1 ≠ 0 and let x be a solution of the initial-value problem
˙x(t)=[A−S11P1(t)−S22P2(t)]x(t), x(t0)=x1. |
(2.62) |
Then for i = 1, 2
ddt[xTPi(t)x] = ˙xT˙Pi(t)x+xTPi(t)˙x
=xT{ATPi(t)−P1(t)S11Pi(t)−P2(t)S22Pi(t)+ Pi(t)A−Pi(t)S11P1(t)−Pi(t)S22P2(t)−ATPi(t)−Pi(t)A−Qi+Pi(t)SiiPi(t)−∑j≠iPj(t)SijPj(t)+∑j≠i(Pi(t)SjjPj(t)+Pj(t)SjjPi(t))}x
= −xT{Qi+P1(t)Si1P1(t)+P2(t)Si2P2(t)}x
where the previous equation follows from (2.60), (2.61) and (2.62). Now integrating from t0 to tf yields
xT1Pi(t0)x1 = ∫tft0xT(τ)˜Qi(τ)x(τ)dτ+xT(tf)Qifx(tf)≥0 |
(2.63) |
where ˜Qi
The next theorem gives sufficient conditions for no finite-escape times for the solutions P 1, P 2.
Theorem 2.3.3 Let Q ∈ ℜn×n be symmetric and suppose for all t ≤ tf
and when P 1(t), P 2(t) exist, then the solutions P 1(t), P 2(t) of (2.60), (2.61) exist for all t < tf with
(2.65) |
for some unique positive-semidefinite matrix function RQ(t) solution to the terminal value problem:
˙RQ(t) = −RQ(t)A−ATRQ(t)−(Q1+Q2+Q), RQ(tf) = Q1f+Q2f |
(2.66) |
which exists for all t < tf.
Proof: The first inequality in (2.65) follows from Lemma 2.3.1. Further
(˙P1+˙P2)=−AT(P1+P2)−(P1+P2A−(Q1+Q2+Q)+F(P1(t),P2(t),Q)),
and by the monotonicity of solutions to Riccati equations [292], it follows that the second inequality holds while F (P 1(t), P 2(t), Q) ≥ 0. □
Remark 2.3.3 For a more extensive discussion of the existence of the solutions P 1, P 2 to the coupled Riccati-ODEs, the reader is referred to reference [108]. However, we shall take up the subject for the infinite-horizon case where tf → ∞ in a later chapter.
We now give an example of a pursuit-evasion problem which can be solved as a two-player nonzero-sum game [251].
Example 2.3.1 We consider a pursuit-evasion problem with the following dynamics
˙r =v, ˙v = ap−ae
where r is the relative position vector of the two adversaries, ap, ae are the accelerations of the pursuer and evader respectively, which also serve as the controls to the players.
The cost functions are taken to be
Jp = 12q2prTfrf+12∫T0(1ceaTpap+1cepaTeae)dt
Je = −12q2erTfrf+12∫T0(1ceaTeae+1cepaTpap)dt
where rf = r(T ) and the final time is fixed, while cp, ce, cpe, cep are weighting constants. The Nash-equilibrium controls are obtained by applying the results of Corollary 2.3.1 as
ap = −cp[0I]Pp[rv], ae = ce[0I]Pe[rv],
where Pp, Pe are solutions to the coupled Riccati-ODES:
˙Pp(t) = −Pp[0I00]−[00I0]Pp+cpPp[000I]Pp+cePp[000I]Pe+cePe[000I]Pp−c2ecpePe[000I]Pe
˙Pe(t) = −Pe[0I00]−[00I0]Pe+cePe[000I]Pe+cpPe[000I]Pp+cpPp[000I]Pe−c2pcepPp[000I]Pp
Pp(T) = q2p[I000], Pe(T) = −q2e[I000].
It can be easily verified that the solutions to the above ODEs are given by
Pp(t)=1cp˜p(t)[IτIτIτ2I], Pe(t)=1ce˜e(t)[IτIτIτ2I]
where τ = T − t is the time-to-go and ˜p(t),˜e(t)
d˜pdτ = −τ2(˜p2+2˜p˜e−a˜e2), ˜p(0) = cpq2p, a=cpcpe
d˜edτ = −τ2(˜e2+2˜p˜e−b˜p2), ˜e(0) = ceq2e, a=cecep.
Again we can specialize the result of Theorem 2.3.2 to the LQ case. We consider for this case the system equation given by (2.57) and cost functionals represented as
−J−1(u,w) = J2(u,w) := J =12xT(tf)Qfx(tf)+12∫tft0[xT(t)Qx(t)+uT(t)Ruu(t)−wT(t)Rww(t)]dt, |
(2.67) |
where u is the minimizing player and w is the maximizing player, and without any loss of generality, we can assume Rw = I. Then if we assume V (x(tf ), tf ) = ϕ(x(tf ), tf ) = 12
(2.68) |
where P (t) is symmetric and P (T ) = Qf, then substituting these in the HJIE (2.50), we get the following matrix Riccati ODE:
˙P(t)+P(t)˜A(t)+˜AT(t)P(t)+P(t)[B1(t)(B1(t))T−B2(t)R−1u(B2(t))T]P(t)+Q=0, P(tf)=Qf. |
(2.69) |
Furthermore, we have the following corollary to Theorem 2.3.2.
Corollary 2.3.2 Consider the two-player zero-sum continuous-time LQ dynamic game described by equations (2.57) and (2.67). Suppose there exist bounded symmetric solutions to the matrix ODE (2.69) for all t ∈ [t0, tf ], then there exists a unique feedback saddle-point solution to the dynamic game under closed-loop memoryless information-structure. Further, the unique strategies for the players are given by
u⋆(t) = −R−1u(B2(t))TP(t)x(t), |
(2.70) |
(2.71) |
and the optimal value of the game is given by
(2.72) |
Remark 2.3.4 In Corollary 2.3.1, the requirement that Qf ≥ 0 and Qi ≥ 0, i = 1,…, N has been included to guarantee the existence of the Nash-equilibrium solution; whereas, in Corollary 2.3.2 the existence of bounded solutions to the matrix Riccati ODE (2.69) is used. It is however possible to remove this assumption if in addition to the requirement that Qf ≥ 0 and Q ≥ 0, we also have
[(B1(t))TB1(t)−(B2(t))TR−1uB2(t)]≥0 ∀t∈[t0,tf].
Then the matrix Riccati equation (2.69) admits a unique positive-semidefinite solution [59].
The first part of the chapter on dynamic programmimg principle is based on the books [47, 64, 164, 175], while the material on dynamic games is based in most part on the books by Basar and Bernhard [57] and by Basar and Olsder [59]. The remaining material is based on the papers [250, 251]. On the other hand, the discussion on the existence, definiteness and boundedness of the solutions to the coupled matrix Riccati-ODEs are based on the Reference [108]. Moreover, the results on the stochastic case can also be found in [2].
We have not discussed the Minimum principle of Pontryagin [47] because all the problems discussed here pertain to unconstrained problems. However, for such problems, the minimum principle provides a set of necessary conditions for optimality. Details of these necessary conditions can be found in [59]. Furthermore, a treatment of constrained problems using the minimum principle can also be found in [73].