In the previously presented approach (variational approach to optimal control problems), the problems usually emerge as nonlinear, two-point boundary-value problems, which also typically cannot be solved analytically to obtain an optimal control law, or an optimal open-loop control. In this case, iterative numerical techniques may be used to determine open-loop optimal controls. In this subsection, we briefly present some candidate numerical approaches that can be used in order to compute optimal control laws, or optimal open-loop controls whenever this is not analytically possible. We present the general concepts of these numerical approaches and point toward suitable references for detailed presentations for the more interested reader.
˙x∗(t)=∂H∂p=a(x∗(t),u∗(t),t),˙p∗(t)=−∂H∂x=−[∂a∂x(x∗(t),u∗(t),t)]Tp∗(t)−∂g∂x(x∗(t),u∗(t),t),0=∂H∂u=[∂a∂u(x∗(t),u∗(t),t)]Tp∗(t)+∂g∂u(x∗(t),u∗(t),t),x(t0)=x0,p∗(tf)=∂h∂x(x∗(tf)).
(C.47)
It considers unbounded states/controls with
tf,
x(tf) free. From the system of state/costate equations and boundary conditions, it is possible to obtain an explicit relationship between
x∗(t) and
u∗(t),
t∈[t0,tf] dependent on
x0. Assuming
u∗(t)=f(x∗(t),p∗(t),t) and substituted in the state/costate equations, a set of
2n first-order ODEs (actually reduced ODEs can be obtained and considered with the boundary conditions
x(t0)=x0 and
p∗(tf)=∂h∂x(x∗(tf)). If the boundary conditions are all known at
t0 or
tf, then the optimal control law
x∗(t),p∗(t),t∈[t0,tf] can be obtained
by numerical integration and the optimal control history can be found by substitution in the previous relation.
The basic strategy in all of following three numerical approaches considered (steepest descent, variation of extremals, and quasilinearization) is to initially guess a solution so that one of the three set of equations (state/costate/Hamiltonian) is not satisfied. Then a solution is used to adjust the initial guess, i.e. make the next solution closer to satisfying all necessary conditions. Finally, the steps are repeated until the iterative procedure converges.
When the boundary values are split (for nonlinear equations), the above cannot be applied and the gradient projection is a more suitable approach.
C.5.1. Steepest Descent
For a function of multiple variables to have a relative minimum at a point, it is necessary that the gradient of the function is zero at this point, which in general yields a system of nonlinear differential equations. Suppose
u(i)(t),t∈[t0,tf] is known and used to solve
˙x(i)(t)=a(x(i)(t),u(i)(t),t),x(i)(t0)=x0,
(C.48)
˙p(i)(t)=−∂H∂x(x(i)(t),u(i)(t),p(i)(t),t),p(i)(tf)=∂h∂x(x(i)(tf)).
(C.49)
If
u(i)(t) satisfies
∂h∂x(x(i)(tf))=0, for
t∈[t0,tf] as well, then
x(i)(t),u(i)(t),p(i)(t) are extremal.
If the above are not satisfied, then
δJa=[∂h∂x(x(i)(tf))−p(i)(tf)]Tδx(tf)+∫tft0{[˙p(i)(t)+∂H∂x(x(i)(t)),u(i)(t)p(i)(t),t)]Tδx(t)+[∂H∂u(x(i)(t)),u(i)(t)p(i)(t),t)]Tδu(t)+[a(x(i)(t),u(i)(t),t)−˙x(i)(t)]Tδp(t)}dt,
where
δx(t)=x(i+1)(t)−x(i)(t),
δu(t)=u(i+1)(t)−u(i)(t) and
δp(t)=p(i+1)(t)−p(i)(t).
If the necessary conditions are satisfied however, then
δJa=∫tft0[∂H∂u(x(i)(t)),u(i)(t)p(i)(t),t)]Tδu(t)dt.
If the norm of
δu,
‖u(i+1)−u(i)‖ is small, the sign of
ΔJa=Ja(u(i+1))−Ja(u(i)) will be determined by the sign of
δJa. Then, if
δu(t)=u(i+1)−u(i)=−τ∂H(i)∂u(t),t∈[t0,tf] with
τ>0,
δJa=−τ∫tft0[∂H(i)∂u(t)]T[∂H(i)∂u(t)]dt≤0,
and in this case the equality holds if and only if
∂H(i)∂u(t)=0, for all
t∈[t0,tf].
The value of
τ is usually selected
ad hoc, possibly as a value that effects a certain value of
ΔJa or using a single variable search. The steepest descent algorithm can be found in detail in
[137] and references therein.
C.5.2. Variation of Extremals
In the optimal control problem defined in the previous sections, the condition
∂H(x∗(t),u∗(t),p∗(t))∂u=0 was used to obtain a set of differential equations of the form
˙x(t)=a(x(t),p(t),t) and
˙p(t)=d(x(t),p(t),t) (for a first-order system), where the admissible state/control values were not bounded. In order to proceed with a numerical computation of the optimal trajectory, the value of the initial
p(0)(t0) needs to be guessed and used for integrating and further obtaining every succeeding trajectory. For this purpose, Newton’s method
[11] may be employed, which for the case of
2n differential equations would yield the matrix equation
p(i+1)(t0)=p(i)(t0)−[Pp(p(i)(t0),tf)]−1p(i)(tf),
(C.50)
where the costate influence function matrix
Pp is given by
Pp(p(i)(t0),t)=[∂p1(t)∂p1(t0)∂p1(t)∂p2(t0)...∂p1(t)∂pn(t0)............∂pn(t)∂p1(t0)∂pn(t)∂p2(t0)...∂pn(t)∂pn(t0)]p(i)(t0),
(C.51)
which is appropriate only if the desired value of the final costate is zero, which in addition occurs if the term
h(x(tf)) is missing from performance measure
(C.3). If however,
h(x(tf)) exists, then
p(i+1)(t0)=p(i)(t0)−{[[∂2h∂x2(x(tf))]Px(p(i)(t0),tf)−Pp(p(i)(t0),tf)]i}−1×[p(i)(tf)−∂h∂x(x(tf))]i,
(C.52)
where now the state influence function matrix is given by
Px(p(i)(t0),t)=[∂x1(t)∂p1(t0)∂x1(t)∂p2(t0)...∂x1(t)∂pn(t0)............∂xn(t)∂p1(t0)∂xn(t)∂p2(t0)...∂xn(t)∂pn(t0)]p(i)(t0),
(C.53)
and the influence function matrices can be determined as in
[137] (pp. 348–349). The appropriate initial conditions for influence matrices are
Px(p(i)(t0),t0)=∂x(t0)∂p(t0)∣p(i)(t0)=0,
Pp(p(i)(t0),t0)=∂P(t0)∂p(t0)∣p(i)(t0)=I.
A detailed description of the steps of the variation of extremals algorithm is provided also in
[137] and references therein.
C.5.3. Quasilinearization
Consider a linear two-point boundary-value problem to be solved,
˙x(t)=a11(t)x(t)+a12(t)p(t)+e1(t),x(t0)=x0,
(C.54)
˙p(t)=a21(t)x(t)+a22(t)p(t)+e2(t),p(tf)=pf,
(C.55)
where
a11,a12,a21,a22,e1,e2 are known functions and
t0,tf,x0,pf are known constants.
For this system, a solution can be obtained by numerical integration and initial conditions
x′′(t0)=x0,p′′(t0)=0, while a particular solution to the nonhomogeneous system can be again obtained with numerical integration, with initial conditions
xp(t0)=x0,pp(t0)=0. Combining these two solutions, a third one may be obtained for the linear two-point boundary-value problem, in terms of the solution of a linear algebraic equation.
The quasilinearization approach is based on the linearization of the reduced state-costate equations, which in turn is based on the Taylor series expansion about the initial state-costate variables. To demonstrate this process, we assume one state and one costate variable and solve
∂H∂u=0 for
u(t). Then we substitute in the state-costate equations to obtain
˙x(t)=a(x(t),p(t),t) and
˙p(t)=d(x(t),p(t),t), where
a,d are nonlinear functions, and
x(0)(t),p(0)(t),t∈[t0,tf] are known trajectories. The linearization using Taylor series expansion about
x(0)(t),p(0)(t) and some basic calculus yields
˙x(1)(t)=a11(t)x(t)(t)+a12(t)p(1)(t)+e1(t),
(C.56)
˙p(1)(t)=a21(t)x(t)(t)+a22(t)p(2)(t)+e2(t),
(C.57)
where
a11(t)=∂a∂x(x(0)(t),p(0)(t),t)a12(t)=∂a∂p(x(0)(t),p(0)(t),t),
a21(t)=∂d∂x(x(0)(t),p(0)(t),t)a22(t)=∂d∂p(x(0)(t),p(0)(t),t),
e1(t)=a(x(0)(t),p(0)(t),t)−[∂a∂x(x(0)(t),p(0)(t),t)]x(0)(t)−∂a∂p(x(0)(t),p(0)(t),t)p(0)(t),
e2(t)=d(x(0)(t),p(0)(t),t)−[∂d∂x(x(0)(t),p(0)(t),t)]x(0)(t)−∂d∂p(x(0)(t),p(0)(t),t)p(0)(t)
are all known functions.
Detailed implementation steps (in the form of pseudocode) of the quasilinearization approach can be found in
[137]. Additional properties and aspects of its operation and behavior can be also found in
[137] and references therein.
The three previous iterative methods regard in general unconstrained, nonlinear, two-point boundary-value problems. They solve problems (more accurately a sequence of partial problems that should converge to the analyzed problem), where one or more necessary conditions is/are initially violated, but eventually satisfied if the iterative process converges. Trying several different initial guesses, or if the iterative procedure converges to the same control and trajectory for a variety of initial guesses, there exists some assurance that a global minimum has been determined. A table summary of the features of these three approaches, such as the initial guess, storage requirements, required computations, is provided in
[137].
C.5.4. Gradient Projection
Gradient projection methods, on the other hand, are iterative numerical procedures for finding extrema of function of several variables, which are required to satisfy various constraining relations.
Assume a convex function
f(y) defined in the region
R, which means that it satisfies
(1−θ)f(y(0))+θf(y(1))≥f((1−θ)y(0)+θy(1)),
for all
0≤θ≤1,
y(0),y(1)∈R. Variables
y1,y2,...yk are constrained by
L linear inequalities of the form
∑Kj=1njiyj−vi≥0 (
ni=[n1in2i...nki]T). If
NL=[n1n2...nL] are orthonormal vectors, defined as
vL=[v1v2...vL] and
λ(y)=[λ1(y)λ2(y)...λL(y)]T, then the set of linear constraints can be expressed as
nTiy−vi≥λi(y)≥0, for
i=1,2,...,L (
NTLy−vL=λ(y)≥0) and points of the form
λi(y) will lie in a hyperplane
Hi in the
K-dimensional space.
Suppose
y is a point
NTqy−vq=0 (intersection of
q linearly independent hyperplanes) and points
w are such that
NTqw=0 (intersection of
q linearly independent hyperplanes, each of which contains the origin). Then
Q (denoting the intersection defined above by
NTqw=0, which is a
(K−q)-dimensional subspace of
EK,
EK being the
K-dimensional Euclidean space) and
Q′ (the intersection of hyperplanes
H1,H2,....,Hq) are orthogonal differing only by vector
vq. Then, the following two
K×K symmetric matrices can be defined as
Pq=I−Nq[NTqNq]−1]NTq=I−˜Pq,
where the first is the projection matrix that takes any vector in
EK into
˜Q and the second a projection matrix that takes any vector
EK into
Q. Thus,
˜Q is a
q-dimensional subspace of
EK spanned by
n1,...,nL constituting the space
Nq.
Assuming that
f is convex with continuous second partial derivatives in a closed and bounded convex region
R of
EK, and letting
y∗ be a boundary point of
R which lies on exactly
q,
1≤q≤K hyperplanes that are assumed to be linearly independent with
Q′ denoting the intersection of these hyperplanes, then point
y∗ is a constrained global minimum if and only if
Pq[−∂f∂y(y∗)]=0and[NTqNq]−1]NTq[−∂f∂y(y∗)]≤0.
(C.58)
If
y∗ is an interior point in
R, a necessary and sufficient condition is
∂f∂y(y∗)=0.
Exploiting the above, the optimal trajectories of a constrained optimal control problem can be determined using gradient projection. More specifically, it is desired to find an admissible control history
u∗ causing
˙x(t)=a(x(t),u(t)),
x(t0)=x0 to follow admissible
x∗ that minimizes the performance measure
J=h(x(tf))+∫tft0g(x(t),u(t))dt, with
t0=0,
tf specified, and constraints
Mi−≤ui(t)≤Mi+,t∈[0,tf],i=1,2,...,m,
(C.59)
Si−≤xi(t)≤Si+,t∈[0,tf],i=1,2,...,n,
(C.60)
xi(tj)=Tij,tjspecified,i=1,2,...,n.
(C.61)
Approximating the state differential equations by the difference equations (which can be nonlinear) we obtain
x(t+Δt)≈x(t)+a(x(t),u(t))×Δt,t=0,Δt,2Δt,...,NΔt.
The latter can be written together with the discrete performance metric in the simpler form
x(k+1)=x(k)+a(x(k),u(k))Δt=aD(x(k),u(k)),
(C.62)
JD=h(x(N))+Δt∑N−1k=0g(x(k),u(k)).
(C.63)
Thus, it is required to find
(Nn+Nm) variables that minimize
JD and satisfy the approximate state equations and constraints. If the state equations are nonlinear, then, they can be linearized about a nominal state-control history, followed by solving a sequence of linearized problems.
The state-control history is considered known,
(x(i)(0),x(i)(1),...x(i)(N);u(i)(0),u(i)(1),...,u(i)(N−1)),
and an initial state-control history can be guessed in the form
x(i+1)(k+1)=A(k)x(i+1)(k)+B(k)u(i+1)(k)+c(u),
(C.64)
where
A,B,c are known time-varying matrices. Since
x(0)=x0 is specified,
x(i)(0)=x0 for all
i, and thus in the general case
x(i+1)(k+1)=A(k)xH+c(k)+A(k)...A(1)B(0)u(i+1)(0)+....=xH(k+1)+Dk+10u(i+1)(0)+...+Dk+1ku(i+1)(k)=xH(k+1)+∑N−1ℓ=0[Dk+1ℓu(i+1)(ℓ)],
(C.65)
where
xk(k+1)=A(k)xH(k)+c(k),xH(0)=x0
and
Dℓk+1={A(k)A(k−1)...A(ℓ+1)B(ℓ),k>ℓB(ℓ),k=ℓ0,k<ℓ.
In this fashion, the entire discrete state history can be written in terms of the discrete control history, as
X(i+1)=D0U(i+1)+X0. If there are inequality constraints involving the states
[S1−S2−...Sn−]≤x(k)≤[S1+S2+...Sn+],k=0,1,...,N, these can be expressed as linear constraints involving only control values
[S1−S2−...Sn−]≤Dk0u(i+1)(0)+...+DkN−1u(i+1)(N−1)+xH(k)≤[S1+S2+...Sn+],k=0,1,...,N.
Consequently, the problem to be solved is to find the control values that satisfy the constraints
[M1−M2−...Mm−]≤u(i+1)(k)≤[M1+M2+...Mm+],k=0,1,...,N,
[S1−S2−...Sn−]≤Dk0u(i+1)(0)+...+DkN−1u(i+1)(N−1)+xH(k)≤[S1+S2+...Sn+],k=0,1,...,N,
[T1jT2j...Tnj]≤Dj0u(i+1)(0)+...+DjN−1u(i+1)(N−1)+xH(j),jspecified,
and minimize the function of
Nm variables given by
JD=h(U(i+1))+Δt∑N−1k=0g(U(i+1)).
A summary of the gradient projection algorithm for determining optimal trajectories can be found in
[137] and references therein.