Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences

Sie Long Kek; Kok Lay Teo; Mohd Ismail Abd Aziz

doi:10.5772/64564

Abstract

In this chapter, the performance of the integrated optimal control and parameter estimation (IOCPE) algorithm is improved using a modified fixed-interval smoothing scheme in order to solve the discrete-time nonlinear stochastic optimal control problem. In our approach, a linear model-based optimal control problem with adding the adjustable parameters into the model used is solved iteratively. The aim is to obtain the optimal solution of the original optimal control problem. In the presence of the random noise sequences in process plant and measurement channel, the state dynamics, which is estimated using Kalman filtering theory, is smoothed in a fixed interval. With such smoothed state estimate sequence that reduces the output residual, the feedback optimal control law is then designed. During the computation procedure, the optimal solution of the modified model-based optimal control problem can be updated at each iteration step. When convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. Moreover, the convergence of the resulting algorithm is also given. For illustration, optimal control of a continuous stirred-tank reactor problem is studied and the result obtained shows the efficiency of the approach proposed.

Keywords

fixed-interval smoothing
Kalman filtering theory
model-reality differences
adjustable parameters
iterative solution

Author Information

Show +

Sie Long Kek*
- Center for Research in Computational Mathematics, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Kok Lay Teo
- Department of Mathematics and Statistics, Curtin University of Technology, Perth, WA, Australia
Mohd Ismail Abd Aziz
- Department of Mathematical Sciences, Universiti Teknologi Malaysia, UTM, Skudai, Malaysia

*Address all correspondence to: slkek@uthm.edu.my

1. Introduction

Optimal control approach provides the solution in solving dynamic real-world practical problems. Particularly, the linear problems, which are disturbed by the random noise sequence, have been well-defined with application of the optimal state estimate in designing the optimal feedback control law. In such situation, the optimal state estimator and the optimal controller are designed separately to optimize and control the dynamical systems. This is called the separation principle [1–4]. By virtue of this principle, the research works on stochastic optimal control and applications are growing widely, see for examples, linear systems [5, 6], fleet composition problem [7], optimal parameter selection problems [8], Markov jump process [9], power management [10], multiagent systems [11], portfolio selection model [12], 2-DOF vehicle model [13], sensorimotor system [14], and advertising model [15].

In fact, the exact solution of stochastic optimal control problems is impossible to be obtained, especially for the problems involving nonlinear system dynamics. To obtain an optimal solution of the discrete-time nonlinear stochastic optimal control problem, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed to solve this kind of the problem iteratively [16–18]. In this algorithm, the linear quadratic Gaussian (LQG) model is applied to a model-based optimal control problem, where the state estimation procedure is done using the Kalman filtering theory. Based on this model, the adjusted parameters are added into the model so as system optimization and parameter estimation are integrated interactively. On this basis, the differences between the real plant and the model used are measured repeatedly in order to update the optimal solution of the model used. On the other hand, the output that is measured from the real plant is fed back into the model used for the state estimator design. When the convergence is achieved, the iterative solution approaches to the true optimal solution of the original optimal control problem despite model-reality differences. This optimal solution is the optimal filtering solution, which is obtained using the IOCPE algorithm. The efficiency of the IOCPE algorithm has been proven in Refs. [16–18].

However, the output trajectory of the model, which is obtained from the IOCPE algorithm, is less accurate in estimating the exact output measurement of the original optimal control problem. In this chapter, our aim is to improve the IOCPE algorithm using the fixed-interval smoothing approach, where the output residual shall be reduced within an appropriate tolerance to generate a better output trajectory. In our model, the state dynamics, which is disturbed by Gaussian noise sequences, is estimated by using the Kalman filtering theory, and then it is smoothed in a fixed-interval estimation. With such state estimation procedure, we modify the estimation procedure so that a smoothed state estimate is predicted backward in time and is used in designing the feedback optimal control law. It is noticed that the output residual of this smoothed state estimate is smaller than the output residual that is obtained by using the Kalman filtering theory, see [17]. The procedure of the solution method discussed in this chapter is almost the same as that was presented in the study of Kek et al. [17], but the accuracy of the optimal solution with the modified fixed-interval smoothing would be definitely increased.

The structure of the chapter is outlined as follows. In Section 2, the description of a general discrete-time nonlinear stochastic optimal control problem and its simplified model-based optimal control problem is made. In Section 3, an expanded optimal control model is introduced, where system optimization and parameter estimation are integrated mutually. The feedback control law, which is incorporated with the Kalman filtering theory and the fixed-interval smoothing, is designed. Then, the iterative algorithm based on principle of model-reality differences is derived so that discrete-time nonlinear stochastic optimal control problem could be solved. In Section 4, a convergence result for the algorithm proposed is provided. In Section 5, an example of optimal control of a continuous stirred-tank reactor problem is illustrated. Finally, some concluding remarks are made.

2. Problem description

Consider a general class of the dynamical system given below:

x(k+1)=f(x(k),u(k),k)+Gω(k)E1a

y(k)=h(x(k),k)+η(k)E1b

where u(k)∈ℜm,k=0,1,...,N−1,x(k)∈ℜn,k=0,1,...,N, and y(k)∈ℜp,k=0,1,...,N are the control sequence, the state sequence, and the output sequence, respectively. ω(k)∈ℜq,k=0,1,...,N−1, which is the process noise sequence, and η(k)∈ℜp,k=0,1,...,N, which is the measurement noise sequence, are stationary Gaussian white noise sequences with zero mean, and their covariance matrices are given by Qω∈ℜq×q and Rη∈ℜp×p, respectively. Here, both of these covariance matrices are positive definite matrices. In addition, f:ℜn×ℜm×ℜ→ℜn represents the real plant and h:ℜn×ℜ→ℜp is the real output measurement, which both are assumed to be continuously differentiable with respect to their respective arguments, whereas G∈ℜn×q is a process coefficient matrix.

The initial state is

x(0)=x0E3

where x0∈ℜn is a random vector with mean and covariance given, respectively, by

E[x(0)]=x¯0andE[(x0−x¯0)(x0−x¯0)T]=M0.E4

Here, M0∈ℜn×n is a positive definite matrix and E[⋅] is the expectation operator. It is assumed that initial state, process noise, and measurement noise are statistically independent.

Therefore, our aim is to find an admissible control sequence u(k)∈ℜm,k=0,1,...,N−1 subject to the dynamical system given in Eq. (1) such that the scalar cost function

J0(u)=E[φ(x(N),N)+∑k=0N−1L(x(k),u(k),k)]E2

is minimized, where φ:ℜn×ℜ→ℜ is the terminal cost and L:ℜn×ℜm×ℜ→ℜ is the cost under summation. It is assumed that these functions are continuously differentiable with respect to their respective arguments.

This problem is regarded as the discrete-time nonlinear stochastic optimal control problem and is referred to as Problem (P).

Notice that, in general, the exact solution of Problem (P) is unable to be obtained and estimating the state of the real plant by applying the nonlinear filtering theory is computationally demanding. Due to these reasons, a smoothing model-based optimal control problem, which is referred to as Problem (M), is proposed by

minu(k)Jm(u)=12x^s(N)TS(N)x^s(N)+γ(N)+∑k=0N−1(12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k))E3

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)−x¯(k+1))y^s(k)=Cx^s(k)E7

with the following state estimation procedure

x¯(k+1)=Ax^(k)+Bu(k)+α1(k)E4a

x^(k)=x¯(k)+Kf(k)(y(k)−y¯(k))E4b

y¯(k)=Cx¯(k)+α2(k)E4c

where x^s(k)∈ℜn,k=0,1,...,N and y^s(k)∈ℜp,k=0,1,...,N are, respectively, the smoothed state sequence and the smoothed output sequence. The matrices involved are given as follow: A is an n × n state transition matrix, B is an n × n control coefficient matrix,

is a p × n output coefficient matrix, S(N) and Q are n × n positive semidefinite matrices, and R is a m × m positive definite matrix. The extra parameters α1(k),k=0,1,...,N−1,α2(k),k=0,1,...,N, and γ(k),k=0,1,...,N are introduced as adjustable parameters.

The state estimation procedure, which is given in (4a), (4b), and (4c), is obviously from the Kalman filtering theory, where x^(k)∈ℜn,k=0,1,...,N−1 and x¯(k)∈ℜn,k=0,1,...,N are, respectively, the filtered state sequence and the predicted state sequence, whereas y¯(k)∈ℜp,k=0,1,...,N is the expected output sequence. The filter and smoother gains, which are Kf(k)∈ℜn×p and Ks(k)∈ℜn×n, are, respectively, given by

Kf(k)=Mx(k)CTMy(k)−1E5a

Ks(k)=P(k)ATMx(k+1)−1E5b

whereas the state error covariance matrices are

P(k)=Mx(k)−Mx(k)CTMy(k)−1CMx(k)E6a

Mx(k+1)=AP(k)AT+GQωGTE6b

Ps(k)=P(k)+Ks(k)(Ps(k+1)−Mx(k+1))Ks(k)TE6c

and the output error covariance matrix is

My(k)=CMx(k)CT+RηE6d

with the boundary conditions Mx(0)=M0 and Ps(N)=Mx(N) The filtered state error covariance P(k)∈ℜn×n, the predicted state error covariance Mx(k)∈ℜn×n, the smoothed state error covariance Ps(k)∈ℜn×n, and the output error covariance My(k)∈ℜp×p are positive definite matrices.

Here, the cost function given in Eq. (3) is evaluated from the expectation of the quadratic forms [2], both for random and deterministic terms with trace matrix tr(⋅), which is simplified by

E[x(N)TS(N)x(N)]=tr(S(N)Mx(N))+x¯(N)TS(N)x¯(N)
E[x(k)TQx(k)]=tr(QMx(k))+x¯(k)TQx¯(k)
E[u(k)TRu(k)]=u(k)TRu(k)
E[γ(k)]=γ(k), E[α1(k)]=α1(k), and E[α2(k)]=α2(k).

Follow from this simplification, the trace matrix terms that are depend on the state error covariance matrix are ignored in the model used since they are constant values. In such a way, the cost function of the linear model-based optimal control model could be evaluated.

Notice that the separation principle [1–4] is applied to solving Problem (M), where the optimal feedback control law and the optimal state estimate are designed separately as discussed in [16–18]. Further from this, the accuracy of the optimal state estimate is increased by smoothing the state estimate in the fixed interval [2, 4]. Then, based on this smoothed state estimate, the smoothing optimal control law is designed. On the other hand, the output measured from the real plant is fed back into the model used, in turn, to improve the state estimation procedure and to update the solution of the model used. Moreover, only solving Problem (M) without adding the adjusted parameters into the model used would not approximate to the optimal solution of Problem (P). Hence, by taking the adjusted parameters into the model used and solving Problem (M) iteratively, the correct optimal solution of the original optimal control problem could be obtained, in spite of model-reality differences.

3. Modified smoothing with model-reality differences

Now, let us introduce an expanded optimal control problem with smoothing state estimate, which is referred to as Problem (E), given below:

minu(k)Je(u)=12x^s(N)TS(N)x^s(N)+γ(N)+∑k=0N−1(12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k))+12r1||v(k)−u(k)||2+12r2||z(k)−x^s(k)||2E7

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)−x¯(k+1))E18

y^s(k)=Cx^s(k)E199

12z(N)TS(N)z(N)+γ(N)=φ(z(N),N)E209

12(z(k)TQz(k)+v(k)TRv(k))+γ(k)=L(z(k),v(k),k)E219

Az(k)+Bv(k)+α1(k)=f(z(k),v(k),k)E229

Cz(k)+α2(k)=h(z(k),k)E239

v(k)=u(k)E24

z(k)=x^s(k)E25

where v(k)∈ℜm,k=0,1,...,N−1 and z(k)∈ℜn,k=0,1,...,N are introduced to separate the control and the smoothed state from the respective signals in the parameter estimation problem and ∥⋅∥ denotes the usual Euclidean norm. The terms 12r1∥u(k)−v(k)∥2 and 12r2∥x^s(k)−z(k) ∥2 are introduced such that the convexity is improved and the convergence of the iterative algorithm is enhanced. The main purpose of designing the algorithm in this way is to ensure that satisfying of the constraints v(k)=u(k) and z(k)=x^s(k) is fulfilled at the end of the iterations. More specifically, applying the state estimate z(k) and the control v(k) for the computation in the parameter estimation and the matching schemes will increase the practical usage of the algorithm. Moreover, implementing the relevant smoothed state x^s(k) and control u(k) that will be reserved for optimizing the model-based optimal control problem leads the iterative solution toward to the true optimal solution of the original optimal control problem.

Figure 1 shows the block diagram of the approach proposed. The methodology of the approach proposed is further discussed in the following sections.

From the block diagram in Figure 1, the definition of the principle of model-reality differences could be given.

Definition 3.1: Principle of model-reality differences is a unified framework, which integrates system optimization and parameter estimation interactively to define an expanded optimal control problem, aims to give the correct optimal solution of the original optimal control problem by solving the model-based optimal control problem iteratively.

Figure 1.
Block diagram of the approach proposed.

3.1. Optimality conditions

Define the Hamiltonian function for Problem (E) as follows:

He(k)=12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k)+12r1||v(k)−u(k)||2+12r2||z(k)−x^s(k)||2−λ(k)Tu(k)−β(k)Tx^s(k)+q(k)T(Cx^s(k)−y^s(k))+p(k+1)T(x^s(k)−x^(k)−Ks(k)(x^s(k+1)−x¯(k+1)))E8

Then, the augmented cost function becomes

J′e(k)=12x^s(N)TS(N)x^s(N)+γ(N)+ΓT(x^s(N)−z(N))+ξ(N)(φ(z(N),N)−12z(N)TS(N)z(N)−γ(N))+∑k=0N−1He(k)+λ(k)Tv(k)+β(k)Tz(k)+ξ(k)(L(z(k),v(k),k)−12(z(k)TQz(k)+v(k)TRv(k))−γ(k))+μ(k)T(f(z(k),v(k),k)−Az(k)−Bv(k)−α1(k))+π(k)T(h(z(k),k)−Cz(k)−α2(k))E9

where p(k),q(k),μ(k),ξ(k),π(k),Γ,β(k), and λ(k) are the proper multipliers to be judged the value later.

The following necessary conditions for optimality are resulted when applying the calculus of variation [2, 4, 17] to the augmented cost function given in Eq. (9):

(a) Stationary condition:
Ru(k)+BTKs(k)p(k+1)−λ(k)−r1(v(k)−u(k))=0E10a
(b) Smoothed costate equation:
p(k)=Qx^s(k)+p(k+1)−β(k)−r2(z(k)−x^s(k))E10b
(c) Smoothed state equation:
x^s(k)=x^(k)+Ks(k)(x^s(k+1)−x¯(k+1))E10c
with the boundary conditions x^s(N)=x¯(N) and p(N)=Γ.
(d) Adjustable parameter equations:
φ(z(N),N)=12z(N)TS(N)z(N)+γ(N)E11a

L(z(k),v(k),k)=12(z(k)TQz(k)+v(k)TRv(k))+γ(k)E11b

f(z(k),v(k),k)=Az(k)+Bv(k)+α1(k)E11c

h(z(k),k)=Cz(k)+α2(k)E11d
(e) Multiplier equations:
Γ−∇z(k)φ+S(N)z(N)=0E12a

λ(k)+(∇v(k)L−Rv(k))+(∂f∂v(k)−B)Tp^(k+1)=0E12b

β(k)+(∇z(k)L−Qz(k))+(∂f∂z(k)−A)Tp^(k+1)=0E12c
with ξ(k)=1,μ(k)=p^(k+1) and π(k)=q(k)=0_.
(f) Separable variables:
v(k)=u(k), z(k)=x^s(k),p^(k)=p(k)E13

In view of these necessary optimality conditions, the conditions (10a), (10b), and (10c) define the modified model-based optimal control problem, the conditions (11a), (11b), (11c), and (11d) define the parameter estimation problem and the conditions (12a), (12b), and (12c) are used to compute the multipliers. They are further discussed as follows.

3.2. Modified model-based optimal control problem

The modified model-based optimal control problem, which is referred to as Problem (MM), is given below:

minu(k)Jmm(u)=12x^s(N)TS(N)x^s(N)+γ(N)+ΓTx^s(N)+∑k=0N−112(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k)+12r1||v(k)−u(k)||2+12r2||z(k)−x^s(k)||2−λ(k)Tu(k)−β(k)Tx^s(k)E14

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)−x¯(k+1))E100

y^s(k)=Cx^s(k)E101

From the outcome of Problem (E) and Problem (MM), the theorem of the smoothed optimal control law which is applied to solve Problem (MM) is described.

Theorem 3.1: Suppose the expanded optimal control law for Problem (E) exists. Then, this control law is the smoothed feedback control law for Problem (MM) given by

u(k)=−K(k)x^s(k)+uff(k)E15

where

uff(k)=−(Ra+BTKs(k)S(k+1)B)−1(BTKs(k)s(k+1)−λa(k)+BTKs(k)S(k+1)((A−Ks(k)−1)x^(k)+α1(k)))E16a

K(k)=(Ra+BTKs(k)S(k+1)B)−1BTKs(k)S(k+1)Ks(k)−1E16b

S(k)=Qa+S(k+1)(Ks(k)−1−BK(k))E16c

s(k)=S(k+1)((A−Ks(k)−1)x^(k)+Buff(k)+α1(k))+s(k+1)−βa(k)E16d

with the boundary conditions S(N) given and s(N)=0, and

Ra=R+r1Im; Qa=Q+r2In;I56

λa(k)=λ(k)+r1v(k); βa(k)=β(k)+r2z(k).I56

Proof: From the necessary optimality condition (10a), we have

Rau(k)=−BTKs(k)p(k+1)+λa(k)E17

Applying sweep method [2, 4],

p(k)=S(k)x^s(k)+s(k)E18

we substitute Eq. (18) for k=k+1 into Eq. (17), which yields

Rau(k)=−BTKs(k)S(k+1)xs(k+1)−BTKs(k)s(k+1)+λa(k).E19

Rewrite the smoothed state equation from Eq. (10c),

x^s(k+1)=x¯(k+1)+(Ks(k))−1(x^s(k)−x^(k)).E20

Then, substitute Eq. (20) into Eq. (19). After some algebraic manipulations, the smoothed control law (15) is obtained, where Eqs. (16a) and (16b) are satisfied.

From the smoothed costate equation (10b), we substitute Eq. (18) for k=k+1 to give

p(k)=Qax^s(k)+S(k+1)x^s(k+1)+s(k+1)−βa(k)E21

Consider Eq. (20) in Eq. (21), we obtain

p(k)=Qax^s(k)+S(k+1)(x¯(k+1)+(Ks(k))−1(x^s(k)−x^(k))+s(k+1)−βa(k)E22

By doing some algebraic manipulations, it is found that Eqs. (16c) and (16d) are satisfied after comparing to Eq. (18). This completes the proof.

From Eqs. (4a), (10c), and (15), the smoothed state equation becomes

x^s(k)=(In−Ks(k)BK(k))−1((In−Ks(k)A)x^(k)+Ks(k)(x^s(k+1)−Buff(k)−α1(k)))E23

and the smoothed output is measured from

y^s(k)=Cx^s(k)E24

with the boundary condition x^s(N)=x¯(k).

3.3. Parameter estimation

After solving Problem (MM), the defined separable variables given in Eq. (13) are used for the further computations. Particularly, in the parameter estimation problem, the differences between the real plant and the model used are taken into account in which the matching schemes are established. In view of this, the adjusted parameters, which are resulted from parameter estimation problem defined by Eq. (11), are calculated from

α1(k)=f(z(k),v(k),k)−Az(k)−Bv(k)E25a

α2(k)=h(z(k),k)−Cz(k)E25b

γ(N)=φ(z(N),N)−12z(N)TS(N)z(N)E25c

γ(k)=L(z(k),v(k),k)−12(z(k)TQz(k)+v(k)TRv(k))E25d

3.4. Computation of multipliers

The multipliers, which are related to the Jacobian matrix of the functions f and L with respect to v(k) and z(k), are computed from

Γ=∇z(k)φ−S(N)z(N)E26a

λ(k)=−(∇v(k)L−Rv(k))−(∂f∂v(k)−B)Tp^(k+1)E26b

β(k)=−(∇z(k)L−Qz(k))−(∂f∂z(k)−A)Tp^(k+1)E26c

3.5. Iterative algorithm

From the previous sections, the derivation of equations and the formulation of the resulting algorithm are clearly discussed. Following from these discussions, a summary on this iterative algorithm is delivered as follows:

Data Q,R,S(N),A,B,C,G,Qω,Rη,M0,x¯0,N,r1,r2,kv,kz,kp,f,L,h,φ. Note that A and B may be chosen through the linearization of f, and C is obtained from the linearization of h.
Step 0: Compute a nominal solution. Assume α1(k)=0,k=0,1,...,N−1,α2(k)=0, k=0,1,...,N, and r1=r2=0. Calculate Kf(k) and Ks(k) from Eqs. (5a) and (5b), P(k),Mx(k),Ps(k) and My(k) from Eqs. (6a), (6b), (6c), and (6d) for the state estimation, and solve Problem (M) defined by Eq. (3) to obtain u(k)0,k=0,1,...,N−1, and x^s(k)0,y^s(k)0,p(k)0,k=0,1,...,N. Then, with α1(k)=0,k=0,1,...,N−1,α2(k)=0, k=0,1,...,N, and r1,r2 from data, calculate K(k) and S(k), respectively, from Eqs. (16b) and (16c). Set i=0,z(k)0=x^s(k)0,v(k)0=u(k)0 and p^(k)0=p(k)0.
Step 1: Calculate the adjustable parameters α1(k)i, k=0 ,1,..., N−1,α2(k)i,k=0,1,..., N,γ(k)i,k=0,1,...,N, from Eq. (25). This is called the parameter estimation step.
Step 2: Compute the modifiers Γi,λ(k)i and β(k)i,k=0,1,...,N−1, from Eq. (26). This requires the partial derivatives of f,h and L with respect to v(k)i and z(k)i.
Step 3: With the determined α1(k)i,α2(k)i,γ(k)i,Γi,λ(k)i,β(k)i,v(k)i, and z(k)i, solve Problem (MM) defined by Eq. (14) using the result in Theorem 3.1. This is called the system optimization step.
1. Obtain s(k)i,k=0,1,...,N by solving Eq. (16d) backward, and obtain uff(k)i,k=0,1,...,N−1 by solving Eq. (16a), either backward or forward.
2. Calculate the new control u(k)i,k=0,1,...,N−1 using Eq. (15).
3. Calculate the new state x^s(k)i,k=0,1,...,N, using Eq. (23).
4. Calculate the new costate p(k)i,k=0,1,...,N, using Eq. (18).
5. Calculate the new output y^s(k)i,k=0,1,...,N, using Eq. (24).
Step 4: Update the optimal smoothing solution of Problem (P) and test the convergence of the algorithm. For regulating convergence, a mechanism, which is a simple relaxation method, shall be provided and given by:
z(k)i+1=z(k)i+kz(x^s(k)i−z(k)i)E27a

v(k)i+1=v(k)i+kv(u(k)i−v(k)i)E27b

p^(k)i+1=p^(k)i+kp(p(k)i−p^(k)i)E27c
where kv,kz,kp, range in the interval of (0,1], are scalar gains. If z(k)i+1=z(k)i,k=0,1,...,N, and v(k)i+1=v(k)i,k=0,1,...,N−1, within a given tolerance, stop; else repeat from Step 1 by setting i=i+1.

Remarks:

The off-line computation, which is mentioned in Step 0, is done for the state estimator design, where Kf(k),Ks(k),k=0,1,...,N−1,Mx(k),My(k),k=0,1,...,N,P(k), Ps(k), k=0,1,..., N−1 are computed, and for the control law design, where K(k),k=0,1,...,N−1,S(k),k=0,1,...,N are calculated. In fact, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (MM) in Step 3, respectively.
The variables γ(k)i,α1(k)i,α2(k)i,Γi,λ(k)i,β(k)i, and s(k)i are initially zero in Step 0. Their computed values, where γ(k)i,α1(k)i,α2(k)i in Step 1, Γi,λ(k)i,β(k)i in Step 2, and s(k)i in Step 3, would be changed from iteration to iteration.
The driving input uff(k) in Eq. (16a) corrects the differences between the real plant and the model used, and it also drives the controller given in Eq. (15).
The state estimation without the control is done forward using the Kalman filtering, and then it is followed by the fixed-interval smoothing backward in order to design the feedback control law.
Problem (P) is not necessary to have a cost function in quadratic criterion or to be a linear problem.
The equations z(k)i+1=z(k)i and v(k)i+1=v(k)i can be definitely required to satisfy for the converged state estimate sequence and the converged optimal control sequence. On this point of view, the following averaged 2-norms are computed and, then, they are compared with a given tolerance to verify the convergence of v(k) and z(k):
||vi+1−vi||2=(1N−1∑k=0N−1||v(k)i+1−v(k)i||)1/2E28a

||zi+1−zi||2=(1N∑k=0N||z(k)i+1−z(k)i||)1/2E28b
The relaxation scalars (k_v, k_z, k_p) are the step-sizes in regulating the convergence mechanism. These scalars could be normally chosen as a certain value in the range of (0, 1], but this choice may not provide the optimal number of iterations. Hence, it is important to note that the optimal choice of these scalars k_v, k_z, k_p ∈ (0, 1] would be problem dependent. As a rule of this case, the algorithm (from Step 1 to Step 4) is required to run few times. Initially, for first run of the algorithm (from Step 1 to Step 4), these scalars are set at k_v = k_z = k_p = 1, and then, with different values chosen from 0.1 to 0.9, the algorithm is run again. The value with the optimal number of iterations can be determined after that. Applying the parameters r₁ and r₂ is to enhance the convexity such that the convergence of the algorithm can be improved.

4. Convergence analysis

In this section, the convergence of the algorithm is discussed. The following assumptions are needed:

The derivatives of f,L and h exist.
The solution (u*,x*,y*) is the optimal solution to Problem (P). That is, the optimal smoothing solution.

The convergence result is presented in Theorem 4.1, while the accuracy of the smoothed state in term of state error covariance is proven in Corollary 4.1.

Theorem 4.1: The converged solution of Problem (M) is the correct optimal smoothing solution of Problem (P).

Proof: Consider the real plant and the output measurement of Problem (P) with the exact optimal smoothing solution (u*,x*,y*) as given below:

x*(k+1)=f(x*(k),u*(k),k)andy*(k)=h(x*(k),k)E29

In Problem (M), the model used consists of

x^c(k)=x¯c(k)+Kf(k)(y(k)−y¯c(k))E30a

x¯c(k+1)=Ax^c(k)+Buc(k)+α1(k)E30b

y¯c(k)=Cx¯c(k)+α2(k)E30c

x^sc(k)=x^c(k)+Ks(k)(x^sc(k+1)−x¯c(k+1))E30d

y^sc(k)=Cx^sc(k)E30e

where uc(k),x^sc(k),x^c(k),x¯c(k),y^sc(k), and y¯c(k) are, respectively, the converged sequences for control law, smoothed state estimate, filtered state estimate, expected state estimate, smoothed output, and expected output. Here, y(k) is the output measured from the real plant.

Applying the adjusted parameters α1(k) and α2(k), which are given by

α1(k)=f(z(k),v(k),k)−Az(k)−Bv(k) andI116

α2(k)=h(z(k),k)−Cz(k),I116

into the model used given by Eq. (30b) and (30c), the differences between the real plant and the model used can be measured at each iteration. Moreover, at the end of iteration, from Eqs. (29) and (30a) – (30e) yields

x^sc(k+1)=f(z(k),v(k),k)andy^sc(k)=h(z(k),k)E102

which v(k)=uc(k) and z(k)=x^sc(k)=x^c(k) are satisfied. Hence, this implies that

uc(k)=u*(k), x^sc(k)= x*(k), y^sc(k)= y*(k)E120

This completes the proof.

Corollary 4.1: The smoothed state error covariance is the smallest among the values of state error covariance.

Proof: From Eq. (6), it is clear that the filtered state error covariance P(k) is less than the predicted state error covariance Mx(k). That is, P(k)<Mx(k). Now, to prove Ps(k)<P(k),, we shall show that Ps(k+1)<Mx(k+1)_.Consider the boundary condition Ps(N)=Mx(N) and taking k=N−1, we have

Ps(N−1)=P(N−1)<Mx(N−1).E103

For k=N−2, it shows that

Ps(N−2)<P(N−2)<Mx(N−2).E104

This statement can be deduced that

Ps(k+1)−Mx(k+1)<0fork=k+1.E105

Thus, we conclude that

Ps(k)<P(k)<Mx(k),k=0,1,...,N−2,E106

which shows the accuracy of the smoothed state estimate. This completes the proof.

5. Illustrative example

Consider a continuous stirred-tank reactor problem [19], which consists of the state difference equations

x1(k+1)=x1(k)−0.02(x1(k)+0.25)+0.01(x2(k)+0.5)exp[25x1(k)x1(k)+2]−0.01(x1(k)+0.25)u(k)E107

x2(k+1)=0.99x2(k)−0.005−0.01(x2(k)+0.5)exp[25x1(k)x1(k)+2]+ω2(k)E108

for k=0,...,77, and the output measurement y(k)=x1(k)+η(k). The initial state x(0)=x0 is a random vector with mean and covariance given, respectively, by x¯1(0)=0.05, x¯2(0)=0, and M0=10−2I2.

Here, ω(k)=[ω1(k)ω2(k)]T and η(k) are Gaussian white noise sequences with their respective covariance given by Qω=10−3I2 and Rη=10−3. The expected cost function

J0(u)=0.5∑k=0N−1E[(x1(k))2+(x2(k))2+0.1(u(k))2]E109

is to be minimized over the state difference equations and the output measurement.

This problem is referred to as Problem (P).

To obtain the optimal smoothing solution of Problem (P), we simplify the plant dynamics of Problem (P) and refer it as Problem (M), given by

minu(k)Jm(u)=12∑k=0N−1[(x^s(k))2+0.1(u(k))2+2γ(k)]E110

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)−x¯(k+1))E111

y^s(k)=Cx^s(k)E112

with

x^(k)=x¯(k)+Kf(k)(y(k)−y¯(k))E113

[x¯1(k+1)x¯2(k+1)]=[1.08950.0184−0.10950.9716][x^1(k)x^2(k)]+[−0.0030.000]u(k)+[α11(k)α12(k)]E114

y¯(k)=x¯1(k)+α2(k)E115

with the initial condition x¯(0)=x¯0 and the boundary value x^s(N)=x¯(N). Here, γ(k),α2(k) and α1(k)=[α11(k)α12(k)]T are the adjusted parameters.

Model	Iteration number	Elapsed time	Initial cost	Final cost	Output residual
Filtering	6	0.782772	3.7910	0.021271	0.034731
Smoothing	8	1.026919	3.5095	0.000734	0.018294

Table 1.

Iteration result.

The iteration results, both for filtering and smoothing models, are shown in Table 1. The final cost of the smoothing model is the least compared to the final cost of the filtering model. When the trace matrix terms are considered in the cost function, the total final cost of the smoothing model is 0.019188 unit, while the total final cost of the filtering model is 0.039725 unit. The value of the trace matrix terms is 0.0185 unit. It is noticed that the output residual could be dropped to almost 52% from the filtering output residual by using the approach proposed in this chapter. This statement is valid since the output residual of smoothing model is least than the output residual of filtering model.

Figure 2.
Filtering trajectory for final control.

Figure 3.
Filtering trajectory for final state.

To identify the accuracy of the resulting algorithm, the norms of the differences between the real plant and the model used at the end of iteration, which are 0.0128 unit for filtering model and 0.0099 unit for smoothing model, are calculated. These values show that the smoothing model can approximate closely to the correct optimal solution of the original optimal control problem rather than the filtering model. Hence, the accuracy of the smoothing model is proven.

Figure 4.
Filtering trajectory for final output and real output.

Figure 5.
Smoothing trajectory for final control.

Figure 6.
Smoothing trajectory for final state.

Figure 7.
Smoothing trajectory for final output and real output.

The trajectories of final control, final state and final output for filtering, and smoothing models are shown in Figures 2–7. With the smallest output residual, the output, which is associated with the smoothed state estimate, is definitely applicable to measure the real output trajectory.

6. Concluding remarks

A fixed-interval smoothing scheme was modified in this chapter for solving the discrete-time nonlinear stochastic optimal control problem. The state estimation procedure, which is using the Kalman filtering theory and is followed by the fixed-interval smoothing, is applied to estimate the system dynamics. Then, the smoothed state estimate is used in designing the feedback optimal control law. By employing this smoothed state estimate, system optimization and parameter estimation are integrated. During the computation procedure, the differences between the real plant and the model used are calculated iteratively. On the other hand, the output measured from the real plant is fed back into the model used, in turn, updates the iterative solution. Once the convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. The illustrative example on the optimal control of the continuous stirred-tank reactor problem was studied. The results obtained demonstrated the applicable of the approach proposed, and the efficiency of the approach proposed is highly presented.

Acknowledgments

The authors like to thank the Universiti Tun Hussein Onn Malaysia (UTHM) for financial supporting to this study under Incentive Grant Scheme for Publication (IGSP) VOT. U417.

References

1. Kalman R. E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960; 82(1):35–45.
2. Bryson A. E. and Ho Y. C. Applied Optimal Control. Washington: Hemisphere; 1975.
3. Bertsekas D. P. Dynamic Programming and Optimal Control (Vol. 1, No. 2). Belmont: Athena Scientific; 1995.
4. Lewis F. L. and Syrmos V. L. Optimal Control. 2nd ed. USA: John Wiley & Sons; 1995.
5. Feng Z.G. and Teo K. L. Optimal feedback control for stochastic impulsive linear systems subject to Poisson processes. In: Optimization and Optimal Control. New York: Springer; 2010. p. 241–258.
6. Misiran M., Wu C., Lu Z. and Teo K.L. Optimal filtering of linear system driven by fractional Brownian motion. Dynamic Systems and Applications. 2010; 19(3):495–514.
7. Loxton R., Lin Q. and Teo K. L. A stochastic fleet composition problem. Computers & Operations Research. 2012; 39(12):3177–3183. DOI: 10.1016/j.cor.2012.04.004.
8. Liu C. M., Feng Z. G. and Teo K. L. On a class of stochastic impulsive optimal parameter selection problems. International Journal of Innovation, Computer and Information Control. 2009; 5:1043–1054.
9. Yin Y., Shi P., Liu F. and Teo K. L. Robust L₂ – L_∞ filtering for a class of dynamical systems with nonhomogeneous Markov jump process. International Journal of Systems Science. 2015; 46(4):599–608. DOI: 10.1080/00207721.2013.792976.
10. Moura S. J., Fathy H. K., Callaway D. S. and Stein J. L. A stochastic optimal control approach for power management in plug-in hybrid electric vehicles. IEEE Transactions on Control Systems Technology. 2011; 19(3):545–555. DOI: 10.1109/TCST.2010.2043736.
11. Wiegerinck W. Broek B. V. D. and Kappen H. Stochastic optimal control in continuous space-time multi-agent systems. Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06), Arlington, Virginia. 2006; 528–535.
12. Zhu Y. Uncertain optimal control with application to portfolio selection model. International Journal of Cybernetics and Systems. 2010; 41(7):535–547. DOI: 10.1080/01969722.2010.511552.
13. Hać A. Suspension optimization of a 2-DOF vehicle model using a stochastic optimal control technique. Journal of Sound and Vibration. 1985; 100(3):343–357. DOI: 10.1016/0022-460X(85)90291-3.
14. Todorov E. Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation. 2005; 17(5):1084–1108.
15. Sethi S. P. Deterministic and stochastic optimization of a dynamic advertising model. Optimal Control Applications and Methods. 1983; 4(2):179–184. DOI: 10.1002/oca.4660040207.
16. Kek S. L., Teo K. L. and Mohd Ismail A. A. An integrated optimal control algorithm for discrete-time nonlinear stochastic system. International Journal of Control. 2010; 83:2536–2545. DOI: 10.1080/00207179.2010.531766.
17. Kek S. L., Teo K. L. and Mohd Ismail A. A. Filtering solution of nonlinear stochastic optimal control problem in discrete-time with model-reality differences. Numerical Algebra, Control and Optimization. 2012; 2(1):207–222. DOI: 10.3934/naco.2012.2.207.
18. Kek S. L., Mohd Ismail A. A., Teo K. L. and Ahmad R. An iterative algorithm based on model-reality differences for discrete-time nonlinear stochastic optimal control problems. Numerical Algebra, Control and Optimization. 2013; 3(1):109–125. DOI: 10.3934/naco.2013.3.109.
19. Kirk D. E. Optimal Control Theory: An Introduction. Mineola, New York: Dover Publications; 2004.

[1] 1. Kalman R. E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960; 82(1):35–45.

[2] 2. Bryson A. E. and Ho Y. C. Applied Optimal Control. Washington: Hemisphere; 1975.

[3] 3. Bertsekas D. P. Dynamic Programming and Optimal Control (Vol. 1, No. 2). Belmont: Athena Scientific; 1995.

[4] 4. Lewis F. L. and Syrmos V. L. Optimal Control. 2nd ed. USA: John Wiley & Sons; 1995.

[5] 5. Feng Z.G. and Teo K. L. Optimal feedback control for stochastic impulsive linear systems subject to Poisson processes. In: Optimization and Optimal Control. New York: Springer; 2010. p. 241–258.

[6] 6. Misiran M., Wu C., Lu Z. and Teo K.L. Optimal filtering of linear system driven by fractional Brownian motion. Dynamic Systems and Applications. 2010; 19(3):495–514.

[7] 7. Loxton R., Lin Q. and Teo K. L. A stochastic fleet composition problem. Computers & Operations Research. 2012; 39(12):3177–3183. DOI: 10.1016/j.cor.2012.04.004.

[8] 8. Liu C. M., Feng Z. G. and Teo K. L. On a class of stochastic impulsive optimal parameter selection problems. International Journal of Innovation, Computer and Information Control. 2009; 5:1043–1054.

[9] 9. Yin Y., Shi P., Liu F. and Teo K. L. Robust L₂ – L_∞ filtering for a class of dynamical systems with nonhomogeneous Markov jump process. International Journal of Systems Science. 2015; 46(4):599–608. DOI: 10.1080/00207721.2013.792976.

[10] 10. Moura S. J., Fathy H. K., Callaway D. S. and Stein J. L. A stochastic optimal control approach for power management in plug-in hybrid electric vehicles. IEEE Transactions on Control Systems Technology. 2011; 19(3):545–555. DOI: 10.1109/TCST.2010.2043736.

[11] 11. Wiegerinck W. Broek B. V. D. and Kappen H. Stochastic optimal control in continuous space-time multi-agent systems. Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06), Arlington, Virginia. 2006; 528–535.

[12] 12. Zhu Y. Uncertain optimal control with application to portfolio selection model. International Journal of Cybernetics and Systems. 2010; 41(7):535–547. DOI: 10.1080/01969722.2010.511552.

[13] 13. Hać A. Suspension optimization of a 2-DOF vehicle model using a stochastic optimal control technique. Journal of Sound and Vibration. 1985; 100(3):343–357. DOI: 10.1016/0022-460X(85)90291-3.

[14] 14. Todorov E. Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation. 2005; 17(5):1084–1108.

[15] 15. Sethi S. P. Deterministic and stochastic optimization of a dynamic advertising model. Optimal Control Applications and Methods. 1983; 4(2):179–184. DOI: 10.1002/oca.4660040207.

[16] 16. Kek S. L., Teo K. L. and Mohd Ismail A. A. An integrated optimal control algorithm for discrete-time nonlinear stochastic system. International Journal of Control. 2010; 83:2536–2545. DOI: 10.1080/00207179.2010.531766.

[17] 17. Kek S. L., Teo K. L. and Mohd Ismail A. A. Filtering solution of nonlinear stochastic optimal control problem in discrete-time with model-reality differences. Numerical Algebra, Control and Optimization. 2012; 2(1):207–222. DOI: 10.3934/naco.2012.2.207.

[18] 18. Kek S. L., Mohd Ismail A. A., Teo K. L. and Ahmad R. An iterative algorithm based on model-reality differences for discrete-time nonlinear stochastic optimal control problems. Numerical Algebra, Control and Optimization. 2013; 3(1):109–125. DOI: 10.3934/naco.2013.3.109.

[19] 19. Kirk D. E. Optimal Control Theory: An Introduction. Mineola, New York: Dover Publications; 2004.

Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences

Nonlinear Systems - Design, Analysis, Estimation and Control

Abstract

Keywords

Author Information

Sie Long Kek*

Kok Lay Teo

Mohd Ismail Abd Aziz

1. Introduction

2. Problem description

3. Modified smoothing with model-reality differences

Figure 1.

3.1. Optimality conditions

3.2. Modified model-based optimal control problem

3.3. Parameter estimation

3.4. Computation of multipliers

3.5. Iterative algorithm

4. Convergence analysis

5. Illustrative example

Table 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

6. Concluding remarks

Acknowledgments

References

Design, Analysis, and Applications of Iterative Methods for Solving Nonlinear Systems

Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences

Nonlinear Systems - Design, Analysis, Estimation and Control

Abstract

Keywords

Author Information

Sie Long Kek*

Kok Lay Teo

Mohd Ismail Abd Aziz

1. Introduction

2. Problem description

3. Modified smoothing with model-reality differences

Figure 1.

3.1. Optimality conditions

3.2. Modified model-based optimal control problem

3.3. Parameter estimation

3.4. Computation of multipliers

3.5. Iterative algorithm

4. Convergence analysis

5. Illustrative example

Table 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

6. Concluding remarks

Acknowledgments

References

Continue reading from the same book

Nonlinear Systems