Open access peer-reviewed chapter

# Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences

By Sie Long Kek, Kok Lay Teo and Mohd Ismail Abd Aziz

Submitted: October 17th 2015Reviewed: June 9th 2016Published: October 19th 2016

DOI: 10.5772/64564

## Abstract

In this chapter, the performance of the integrated optimal control and parameter estimation (IOCPE) algorithm is improved using a modified fixed-interval smoothing scheme in order to solve the discrete-time nonlinear stochastic optimal control problem. In our approach, a linear model-based optimal control problem with adding the adjustable parameters into the model used is solved iteratively. The aim is to obtain the optimal solution of the original optimal control problem. In the presence of the random noise sequences in process plant and measurement channel, the state dynamics, which is estimated using Kalman filtering theory, is smoothed in a fixed interval. With such smoothed state estimate sequence that reduces the output residual, the feedback optimal control law is then designed. During the computation procedure, the optimal solution of the modified model-based optimal control problem can be updated at each iteration step. When convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. Moreover, the convergence of the resulting algorithm is also given. For illustration, optimal control of a continuous stirred-tank reactor problem is studied and the result obtained shows the efficiency of the approach proposed.

### Keywords

• fixed-interval smoothing
• Kalman filtering theory
• model-reality differences
• iterative solution

## 1. Introduction

Optimal control approach provides the solution in solving dynamic real-world practical problems. Particularly, the linear problems, which are disturbed by the random noise sequence, have been well-defined with application of the optimal state estimate in designing the optimal feedback control law. In such situation, the optimal state estimator and the optimal controller are designed separately to optimize and control the dynamical systems. This is called the separation principle [14]. By virtue of this principle, the research works on stochastic optimal control and applications are growing widely, see for examples, linear systems [5, 6], fleet composition problem [7], optimal parameter selection problems [8], Markov jump process [9], power management [10], multiagent systems [11], portfolio selection model [12], 2-DOF vehicle model [13], sensorimotor system [14], and advertising model [15].

In fact, the exact solution of stochastic optimal control problems is impossible to be obtained, especially for the problems involving nonlinear system dynamics. To obtain an optimal solution of the discrete-time nonlinear stochastic optimal control problem, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed to solve this kind of the problem iteratively [1618]. In this algorithm, the linear quadratic Gaussian (LQG) model is applied to a model-based optimal control problem, where the state estimation procedure is done using the Kalman filtering theory. Based on this model, the adjusted parameters are added into the model so as system optimization and parameter estimation are integrated interactively. On this basis, the differences between the real plant and the model used are measured repeatedly in order to update the optimal solution of the model used. On the other hand, the output that is measured from the real plant is fed back into the model used for the state estimator design. When the convergence is achieved, the iterative solution approaches to the true optimal solution of the original optimal control problem despite model-reality differences. This optimal solution is the optimal filtering solution, which is obtained using the IOCPE algorithm. The efficiency of the IOCPE algorithm has been proven in Refs. [1618].

However, the output trajectory of the model, which is obtained from the IOCPE algorithm, is less accurate in estimating the exact output measurement of the original optimal control problem. In this chapter, our aim is to improve the IOCPE algorithm using the fixed-interval smoothing approach, where the output residual shall be reduced within an appropriate tolerance to generate a better output trajectory. In our model, the state dynamics, which is disturbed by Gaussian noise sequences, is estimated by using the Kalman filtering theory, and then it is smoothed in a fixed-interval estimation. With such state estimation procedure, we modify the estimation procedure so that a smoothed state estimate is predicted backward in time and is used in designing the feedback optimal control law. It is noticed that the output residual of this smoothed state estimate is smaller than the output residual that is obtained by using the Kalman filtering theory, see [17]. The procedure of the solution method discussed in this chapter is almost the same as that was presented in the study of Kek et al. [17], but the accuracy of the optimal solution with the modified fixed-interval smoothing would be definitely increased.

The structure of the chapter is outlined as follows. In Section 2, the description of a general discrete-time nonlinear stochastic optimal control problem and its simplified model-based optimal control problem is made. In Section 3, an expanded optimal control model is introduced, where system optimization and parameter estimation are integrated mutually. The feedback control law, which is incorporated with the Kalman filtering theory and the fixed-interval smoothing, is designed. Then, the iterative algorithm based on principle of model-reality differences is derived so that discrete-time nonlinear stochastic optimal control problem could be solved. In Section 4, a convergence result for the algorithm proposed is provided. In Section 5, an example of optimal control of a continuous stirred-tank reactor problem is illustrated. Finally, some concluding remarks are made.

## 2. Problem description

Consider a general class of the dynamical system given below:

x(k+1)=f(x(k),u(k),k)+Gω(k)E1a
y(k)=h(x(k),k)+η(k)E1b

where u(k)m,k=0,1,...,N1,x(k)n,k=0,1,...,N,and y(k)p,k=0,1,...,Nare the control sequence, the state sequence, and the output sequence, respectively. ω(k)q,k=0,1,...,N1, which is the process noise sequence, and η(k)p,k=0,1,...,N,which is the measurement noise sequence, are stationary Gaussian white noise sequences with zero mean, and their covariance matrices are given by Qωq×qand Rηp×p, respectively. Here, both of these covariance matrices are positive definite matrices. In addition, f:n×m×nrepresents the real plant and h:n×pis the real output measurement, which both are assumed to be continuously differentiable with respect to their respective arguments, whereas Gn×qis a process coefficient matrix.

The initial state is

x(0)=x0E3

where x0nis a random vector with mean and covariance given, respectively, by

E[x(0)]=x¯0andE[(x0x¯0)(x0x¯0)T]=M0.E4

Here, M0n×nis a positive definite matrix and E[]is the expectation operator. It is assumed that initial state, process noise, and measurement noise are statistically independent.

Therefore, our aim is to find an admissible control sequence u(k)m,k=0,1,...,N1subject to the dynamical system given in Eq. (1) such that the scalar cost function

J0(u)=E[φ(x(N),N)+k=0N1L(x(k),u(k),k)]E2

is minimized, where φ:n×is the terminal cost and L:n×m×is the cost under summation. It is assumed that these functions are continuously differentiable with respect to their respective arguments.

This problem is regarded as the discrete-time nonlinear stochastic optimal control problem and is referred to as Problem (P).

Notice that, in general, the exact solution of Problem (P) is unable to be obtained and estimating the state of the real plant by applying the nonlinear filtering theory is computationally demanding. Due to these reasons, a smoothing model-based optimal control problem, which is referred to as Problem (M), is proposed by

minu(k)Jm(u)=12x^s(N)TS(N)x^s(N)+γ(N)+k=0N1(12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k))E3

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)x¯(k+1))y^s(k)=Cx^s(k)E7

with the following state estimation procedure

x¯(k+1)=Ax^(k)+Bu(k)+α1(k)E4a
x^(k)=x¯(k)+Kf(k)(y(k)y¯(k))E4b
y¯(k)=Cx¯(k)+α2(k)E4c

where x^s(k)n,k=0,1,...,Nand y^s(k)p,k=0,1,...,Nare, respectively, the smoothed state sequence and the smoothed output sequence. The matrices involved are given as follow: A is an n × n state transition matrix, B is an n × n control coefficient matrix, is a p × n output coefficient matrix, S(N) and Q are n × n positive semidefinite matrices, and R is a m × m positive definite matrix. The extra parameters α1(k),k=0,1,...,N1,α2(k),k=0,1,...,N,and γ(k),k=0,1,...,Nare introduced as adjustable parameters.

The state estimation procedure, which is given in (4a), (4b), and (4c), is obviously from the Kalman filtering theory, where x^(k)n,k=0,1,...,N1and x¯(k)n,k=0,1,...,Nare, respectively, the filtered state sequence and the predicted state sequence, whereas y¯(k)p,k=0,1,...,Nis the expected output sequence. The filter and smoother gains, which are Kf(k)n×pand Ks(k)n×n, are, respectively, given by

Kf(k)=Mx(k)CTMy(k)1E5a
Ks(k)=P(k)ATMx(k+1)1E5b

whereas the state error covariance matrices are

P(k)=Mx(k)Mx(k)CTMy(k)1CMx(k)E6a
Mx(k+1)=AP(k)AT+GQωGTE6b
Ps(k)=P(k)+Ks(k)(Ps(k+1)Mx(k+1))Ks(k)TE6c

and the output error covariance matrix is

My(k)=CMx(k)CT+RηE6d

with the boundary conditions Mx(0)=M0and Ps(N)=Mx(N)The filtered state error covariance P(k)n×n,the predicted state error covariance Mx(k)n×n,the smoothed state error covariance Ps(k)n×n, and the output error covariance My(k)p×pare positive definite matrices.

Here, the cost function given in Eq. (3) is evaluated from the expectation of the quadratic forms [2], both for random and deterministic terms with trace matrix tr(⋅), which is simplified by

1. E[x(N)TS(N)x(N)]=tr(S(N)Mx(N))+x¯(N)TS(N)x¯(N)

2. E[x(k)TQx(k)]=tr(QMx(k))+x¯(k)TQx¯(k)

3. E[u(k)TRu(k)]=u(k)TRu(k)

4. E[γ(k)]=γ(k), E[α1(k)]=α1(k), and E[α2(k)]=α2(k).

Follow from this simplification, the trace matrix terms that are depend on the state error covariance matrix are ignored in the model used since they are constant values. In such a way, the cost function of the linear model-based optimal control model could be evaluated.

Notice that the separation principle [14] is applied to solving Problem (M), where the optimal feedback control law and the optimal state estimate are designed separately as discussed in [1618]. Further from this, the accuracy of the optimal state estimate is increased by smoothing the state estimate in the fixed interval [2, 4]. Then, based on this smoothed state estimate, the smoothing optimal control law is designed. On the other hand, the output measured from the real plant is fed back into the model used, in turn, to improve the state estimation procedure and to update the solution of the model used. Moreover, only solving Problem (M) without adding the adjusted parameters into the model used would not approximate to the optimal solution of Problem (P). Hence, by taking the adjusted parameters into the model used and solving Problem (M) iteratively, the correct optimal solution of the original optimal control problem could be obtained, in spite of model-reality differences.

## 3. Modified smoothing with model-reality differences

Now, let us introduce an expanded optimal control problem with smoothing state estimate, which is referred to as Problem (E), given below:

minu(k)Je(u)=12x^s(N)TS(N)x^s(N)+γ(N)+k=0N1(12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k))+12r1||v(k)u(k)||2+12r2||z(k)x^s(k)||2E7

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)x¯(k+1))E18
y^s(k)=Cx^s(k)E199
12z(N)TS(N)z(N)+γ(N)=φ(z(N),N)E209
12(z(k)TQz(k)+v(k)TRv(k))+γ(k)=L(z(k),v(k),k)E219
Az(k)+Bv(k)+α1(k)=f(z(k),v(k),k)E229
Cz(k)+α2(k)=h(z(k),k)E239
v(k)=u(k)E24
z(k)=x^s(k)E25

where v(k)m,k=0,1,...,N1and z(k)n,k=0,1,...,Nare introduced to separate the control and the smoothed state from the respective signals in the parameter estimation problem and denotes the usual Euclidean norm. The terms 12r1u(k)v(k)2and 12r2x^s(k)z(k)2are introduced such that the convexity is improved and the convergence of the iterative algorithm is enhanced. The main purpose of designing the algorithm in this way is to ensure that satisfying of the constraints v(k)=u(k)and z(k)=x^s(k)is fulfilled at the end of the iterations. More specifically, applying the state estimate z(k)and the control v(k)for the computation in the parameter estimation and the matching schemes will increase the practical usage of the algorithm. Moreover, implementing the relevant smoothed state x^s(k)and control u(k)that will be reserved for optimizing the model-based optimal control problem leads the iterative solution toward to the true optimal solution of the original optimal control problem.

Figure 1 shows the block diagram of the approach proposed. The methodology of the approach proposed is further discussed in the following sections.

From the block diagram in Figure 1, the definition of the principle of model-reality differences could be given.

Definition 3.1: Principle of model-reality differences is a unified framework, which integrates system optimization and parameter estimation interactively to define an expanded optimal control problem, aims to give the correct optimal solution of the original optimal control problem by solving the model-based optimal control problem iteratively.

### 3.1. Optimality conditions

Define the Hamiltonian function for Problem (E) as follows:

He(k)=12(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k)+12r1||v(k)u(k)||2+12r2||z(k)x^s(k)||2λ(k)Tu(k)β(k)Tx^s(k)+q(k)T(Cx^s(k)y^s(k))+p(k+1)T(x^s(k)x^(k)Ks(k)(x^s(k+1)x¯(k+1)))E8

Then, the augmented cost function becomes

Je(k)=12x^s(N)TS(N)x^s(N)+γ(N)+ΓT(x^s(N)z(N))+ξ(N)(φ(z(N),N)12z(N)TS(N)z(N)γ(N))+k=0N1He(k)+λ(k)Tv(k)+β(k)Tz(k)+ξ(k)(L(z(k),v(k),k)12(z(k)TQz(k)+v(k)TRv(k))γ(k))+μ(k)T(f(z(k),v(k),k)Az(k)Bv(k)α1(k))+π(k)T(h(z(k),k)Cz(k)α2(k))E9

where p(k),q(k),μ(k),ξ(k),π(k),Γ,β(k), and λ(k)are the proper multipliers to be judged the value later.

The following necessary conditions for optimality are resulted when applying the calculus of variation [2, 4, 17] to the augmented cost function given in Eq. (9):

1. (a) Stationary condition:

Ru(k)+BTKs(k)p(k+1)λ(k)r1(v(k)u(k))=0E10a

2. (b) Smoothed costate equation:

p(k)=Qx^s(k)+p(k+1)β(k)r2(z(k)x^s(k))E10b

3. (c) Smoothed state equation:

x^s(k)=x^(k)+Ks(k)(x^s(k+1)x¯(k+1))E10c

with the boundary conditions x^s(N)=x¯(N)and p(N)=Γ.

φ(z(N),N)=12z(N)TS(N)z(N)+γ(N)E11a
L(z(k),v(k),k)=12(z(k)TQz(k)+v(k)TRv(k))+γ(k)E11b
f(z(k),v(k),k)=Az(k)+Bv(k)+α1(k)E11c
h(z(k),k)=Cz(k)+α2(k)E11d

5. (e) Multiplier equations:

Γz(k)φ+S(N)z(N)=0E12a
λ(k)+(v(k)LRv(k))+(fv(k)B)Tp^(k+1)=0E12b
β(k)+(z(k)LQz(k))+(fz(k)A)Tp^(k+1)=0E12c

with ξ(k)=1,μ(k)=p^(k+1)and π(k)=q(k)=0.

6. (f) Separable variables:

v(k)=u(k), z(k)=x^s(k),p^(k)=p(k)E13

In view of these necessary optimality conditions, the conditions (10a), (10b), and (10c) define the modified model-based optimal control problem, the conditions (11a), (11b), (11c), and (11d) define the parameter estimation problem and the conditions (12a), (12b), and (12c) are used to compute the multipliers. They are further discussed as follows.

### 3.2. Modified model-based optimal control problem

The modified model-based optimal control problem, which is referred to as Problem (MM), is given below:

minu(k)Jmm(u)=12x^s(N)TS(N)x^s(N)+γ(N)+ΓTx^s(N)+k=0N112(x^s(k)TQx^s(k)+u(k)TRu(k))+γ(k)+12r1||v(k)u(k)||2+12r2||z(k)x^s(k)||2λ(k)Tu(k)β(k)Tx^s(k)E14

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)x¯(k+1))E100
y^s(k)=Cx^s(k)E101

From the outcome of Problem (E) and Problem (MM), the theorem of the smoothed optimal control law which is applied to solve Problem (MM) is described.

Theorem 3.1: Suppose the expanded optimal control law for Problem (E) exists. Then, this control law is the smoothed feedback control law for Problem (MM) given by

u(k)=K(k)x^s(k)+uff(k)E15

where

uff(k)=(Ra+BTKs(k)S(k+1)B)1(BTKs(k)s(k+1)λa(k)+BTKs(k)S(k+1)((AKs(k)1)x^(k)+α1(k)))E16a
K(k)=(Ra+BTKs(k)S(k+1)B)1BTKs(k)S(k+1)Ks(k)1E16b
S(k)=Qa+S(k+1)(Ks(k)1BK(k))E16c
s(k)=S(k+1)((AKs(k)1)x^(k)+Buff(k)+α1(k))+s(k+1)βa(k)E16d

with the boundary conditions S(N)given and s(N)=0, and

Ra=R+r1Im;Qa=Q+r2In;I56
λa(k)=λ(k)+r1v(k);βa(k)=β(k)+r2z(k).I56

Proof: From the necessary optimality condition (10a), we have

Rau(k)=BTKs(k)p(k+1)+λa(k)E17

Applying sweep method [2, 4],

p(k)=S(k)x^s(k)+s(k)E18

we substitute Eq. (18) for k=k+1into Eq. (17), which yields

Rau(k)=BTKs(k)S(k+1)xs(k+1)BTKs(k)s(k+1)+λa(k).E19

Rewrite the smoothed state equation from Eq. (10c),

x^s(k+1)=x¯(k+1)+(Ks(k))1(x^s(k)x^(k)).E20

Then, substitute Eq. (20) into Eq. (19). After some algebraic manipulations, the smoothed control law (15) is obtained, where Eqs. (16a) and (16b) are satisfied.

From the smoothed costate equation (10b), we substitute Eq. (18) for k=k+1to give

p(k)=Qax^s(k)+S(k+1)x^s(k+1)+s(k+1)βa(k)E21

Consider Eq. (20) in Eq. (21), we obtain

p(k)=Qax^s(k)+S(k+1)(x¯(k+1)+(Ks(k))1(x^s(k)x^(k))+s(k+1)βa(k)E22

By doing some algebraic manipulations, it is found that Eqs. (16c) and (16d) are satisfied after comparing to Eq. (18). This completes the proof.

From Eqs. (4a), (10c), and (15), the smoothed state equation becomes

x^s(k)=(InKs(k)BK(k))1((InKs(k)A)x^(k)+Ks(k)(x^s(k+1)Buff(k)α1(k)))E23

and the smoothed output is measured from

y^s(k)=Cx^s(k)E24

with the boundary condition x^s(N)=x¯(k).

### 3.3. Parameter estimation

After solving Problem (MM), the defined separable variables given in Eq. (13) are used for the further computations. Particularly, in the parameter estimation problem, the differences between the real plant and the model used are taken into account in which the matching schemes are established. In view of this, the adjusted parameters, which are resulted from parameter estimation problem defined by Eq. (11), are calculated from

α1(k)=f(z(k),v(k),k)Az(k)Bv(k)E25a
α2(k)=h(z(k),k)Cz(k)E25b
γ(N)=φ(z(N),N)12z(N)TS(N)z(N)E25c
γ(k)=L(z(k),v(k),k)12(z(k)TQz(k)+v(k)TRv(k))E25d

### 3.4. Computation of multipliers

The multipliers, which are related to the Jacobian matrix of the functions f and L with respect to v(k)and z(k), are computed from

Γ=z(k)φS(N)z(N)E26a
λ(k)=(v(k)LRv(k))(fv(k)B)Tp^(k+1)E26b
β(k)=(z(k)LQz(k))(fz(k)A)Tp^(k+1)E26c

### 3.5. Iterative algorithm

From the previous sections, the derivation of equations and the formulation of the resulting algorithm are clearly discussed. Following from these discussions, a summary on this iterative algorithm is delivered as follows:

1. Data Q,R,S(N),A,B,C,G,Qω,Rη,M0,x¯0,N,r1,r2,kv,kz,kp,f,L,h,φ.Note that A and B may be chosen through the linearization of f, and C is obtained from the linearization of h.

2. Step 0: Compute a nominal solution. Assume α1(k)=0,k=0,1,...,N1,α2(k)=0,k=0,1,...,N,and r1=r2=0.Calculate Kf(k)and Ks(k)from Eqs. (5a) and (5b), P(k),Mx(k),Ps(k)and My(k)from Eqs. (6a), (6b), (6c), and (6d) for the state estimation, and solve Problem (M) defined by Eq. (3) to obtain u(k)0,k=0,1,...,N1,and x^s(k)0,y^s(k)0,p(k)0,k=0,1,...,N.Then, with α1(k)=0,k=0,1,...,N1,α2(k)=0,k=0,1,...,N,and r1,r2from data, calculate K(k)and S(k), respectively, from Eqs. (16b) and (16c). Set i=0,z(k)0=x^s(k)0,v(k)0=u(k)0and p^(k)0=p(k)0.

3. Step 1: Calculate the adjustable parameters α1(k)i,k=0,1,...,N1,α2(k)i,k=0,1,...,N,γ(k)i,k=0,1,...,N,from Eq. (25). This is called the parameter estimation step.

4. Step 2: Compute the modifiers Γi,λ(k)iand β(k)i,k=0,1,...,N1,from Eq. (26). This requires the partial derivatives of f,hand L with respect to v(k)iand z(k)i.

5. Step 3: With the determined α1(k)i,α2(k)i,γ(k)i,Γi,λ(k)i,β(k)i,v(k)i, and z(k)i,solve Problem (MM) defined by Eq. (14) using the result in Theorem 3.1. This is called the system optimization step.

1. Obtain s(k)i,k=0,1,...,Nby solving Eq. (16d) backward, and obtain uff(k)i,k=0,1,...,N1by solving Eq. (16a), either backward or forward.

2. Calculate the new control u(k)i,k=0,1,...,N1using Eq. (15).

3. Calculate the new state x^s(k)i,k=0,1,...,N,using Eq. (23).

4. Calculate the new costate p(k)i,k=0,1,...,N,using Eq. (18).

5. Calculate the new output y^s(k)i,k=0,1,...,N,using Eq. (24).

6. Step 4: Update the optimal smoothing solution of Problem (P) and test the convergence of the algorithm. For regulating convergence, a mechanism, which is a simple relaxation method, shall be provided and given by:

z(k)i+1=z(k)i+kz(x^s(k)iz(k)i)E27a
v(k)i+1=v(k)i+kv(u(k)iv(k)i)E27b
p^(k)i+1=p^(k)i+kp(p(k)ip^(k)i)E27c

where kv,kz,kp, range in the interval of (0,1], are scalar gains. If z(k)i+1=z(k)i,k=0,1,...,N,and v(k)i+1=v(k)i,k=0,1,...,N1,within a given tolerance, stop; else repeat from Step 1 by setting i=i+1.

Remarks:

1. The off-line computation, which is mentioned in Step 0, is done for the state estimator design, where Kf(k),Ks(k),k=0,1,...,N1,Mx(k),My(k),k=0,1,...,N,P(k),Ps(k),k=0,1,...,N1are computed, and for the control law design, where K(k),k=0,1,...,N1,S(k),k=0,1,...,Nare calculated. In fact, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (MM) in Step 3, respectively.

2. The variables γ(k)i,α1(k)i,α2(k)i,Γi,λ(k)i,β(k)i, and s(k)iare initially zero in Step 0. Their computed values, where γ(k)i,α1(k)i,α2(k)iin Step 1, Γi,λ(k)i,β(k)iin Step 2, and s(k)iin Step 3, would be changed from iteration to iteration.

3. The driving input uff(k)in Eq. (16a) corrects the differences between the real plant and the model used, and it also drives the controller given in Eq. (15).

4. The state estimation without the control is done forward using the Kalman filtering, and then it is followed by the fixed-interval smoothing backward in order to design the feedback control law.

5. Problem (P) is not necessary to have a cost function in quadratic criterion or to be a linear problem.

6. The equations z(k)i+1=z(k)iand v(k)i+1=v(k)ican be definitely required to satisfy for the converged state estimate sequence and the converged optimal control sequence. On this point of view, the following averaged 2-norms are computed and, then, they are compared with a given tolerance to verify the convergence of v(k)and z(k):

||vi+1vi||2=(1N1k=0N1||v(k)i+1v(k)i||)1/2E28a
||zi+1zi||2=(1Nk=0N||z(k)i+1z(k)i||)1/2E28b

7. The relaxation scalars (kv, kz, kp) are the step-sizes in regulating the convergence mechanism. These scalars could be normally chosen as a certain value in the range of (0, 1], but this choice may not provide the optimal number of iterations. Hence, it is important to note that the optimal choice of these scalars kv, kz, kp ∈ (0, 1] would be problem dependent. As a rule of this case, the algorithm (from Step 1 to Step 4) is required to run few times. Initially, for first run of the algorithm (from Step 1 to Step 4), these scalars are set at kv = kz = kp = 1, and then, with different values chosen from 0.1 to 0.9, the algorithm is run again. The value with the optimal number of iterations can be determined after that. Applying the parameters r1 and r2 is to enhance the convexity such that the convergence of the algorithm can be improved.

## 4. Convergence analysis

In this section, the convergence of the algorithm is discussed. The following assumptions are needed:

1. The derivatives of f,Land h exist.

2. The solution (u*,x*,y*)is the optimal solution to Problem (P). That is, the optimal smoothing solution.

The convergence result is presented in Theorem 4.1, while the accuracy of the smoothed state in term of state error covariance is proven in Corollary 4.1.

Theorem 4.1: The converged solution of Problem (M) is the correct optimal smoothing solution of Problem (P).

Proof: Consider the real plant and the output measurement of Problem (P) with the exact optimal smoothing solution (u*,x*,y*)as given below:

x*(k+1)=f(x*(k),u*(k),k)andy*(k)=h(x*(k),k)E29

In Problem (M), the model used consists of

x^c(k)=x¯c(k)+Kf(k)(y(k)y¯c(k))E30a
x¯c(k+1)=Ax^c(k)+Buc(k)+α1(k)E30b
y¯c(k)=Cx¯c(k)+α2(k)E30c
x^sc(k)=x^c(k)+Ks(k)(x^sc(k+1)x¯c(k+1))E30d
y^sc(k)=Cx^sc(k)E30e

where uc(k),x^sc(k),x^c(k),x¯c(k),y^sc(k), and y¯c(k)are, respectively, the converged sequences for control law, smoothed state estimate, filtered state estimate, expected state estimate, smoothed output, and expected output. Here, y(k)is the output measured from the real plant.

Applying the adjusted parameters α1(k)and α2(k), which are given by

α1(k)=f(z(k),v(k),k)Az(k)Bv(k)andI116
α2(k)=h(z(k),k)Cz(k),I116

into the model used given by Eq. (30b) and (30c), the differences between the real plant and the model used can be measured at each iteration. Moreover, at the end of iteration, from Eqs. (29) and (30a)(30e) yields

x^sc(k+1)=f(z(k),v(k),k)andy^sc(k)=h(z(k),k)E102

which v(k)=uc(k)and z(k)=x^sc(k)=x^c(k)are satisfied. Hence, this implies that

uc(k)=u*(k),x^sc(k)=x*(k),y^sc(k)=y*(k)E120

This completes the proof.

Corollary 4.1: The smoothed state error covariance is the smallest among the values of state error covariance.

Proof: From Eq. (6), it is clear that the filtered state error covariance P(k)is less than the predicted state error covariance Mx(k).That is, P(k)<Mx(k).Now, to prove Ps(k)<P(k),, we shall show that Ps(k+1)<Mx(k+1). Consider the boundary condition Ps(N)=Mx(N)and taking k=N1,we have

Ps(N1)=P(N1)<Mx(N1).E103

For k=N2, it shows that

Ps(N2)<P(N2)<Mx(N2).E104

This statement can be deduced that

Ps(k+1)Mx(k+1)<0fork=k+1.E105

Thus, we conclude that

Ps(k)<P(k)<Mx(k),k=0,1,...,N2,E106

which shows the accuracy of the smoothed state estimate. This completes the proof.

## 5. Illustrative example

Consider a continuous stirred-tank reactor problem [19], which consists of the state difference equations

x1(k+1)=x1(k)0.02(x1(k)+0.25)+0.01(x2(k)+0.5)exp[25x1(k)x1(k)+2]0.01(x1(k)+0.25)u(k)E107
x2(k+1)=0.99x2(k)0.0050.01(x2(k)+0.5)exp[25x1(k)x1(k)+2]+ω2(k)E108

for k=0,...,77,and the output measurement y(k)=x1(k)+η(k). The initial state x(0)=x0is a random vector with mean and covariance given, respectively, by x¯1(0)=0.05,x¯2(0)=0,and M0=102I2.

Here, ω(k)=[ω1(k)ω2(k)]Tand η(k)are Gaussian white noise sequences with their respective covariance given by Qω=103I2and Rη=103. The expected cost function

J0(u)=0.5k=0N1E[(x1(k))2+(x2(k))2+0.1(u(k))2]E109

is to be minimized over the state difference equations and the output measurement.

This problem is referred to as Problem (P).

To obtain the optimal smoothing solution of Problem (P), we simplify the plant dynamics of Problem (P) and refer it as Problem (M), given by

minu(k)Jm(u)=12k=0N1[(x^s(k))2+0.1(u(k))2+2γ(k)]E110

subject to

x^s(k)=x^(k)+Ks(k)(x^s(k+1)x¯(k+1))E111
y^s(k)=Cx^s(k)E112

with

x^(k)=x¯(k)+Kf(k)(y(k)y¯(k))E113
[x¯1(k+1)x¯2(k+1)]=[1.08950.01840.10950.9716][x^1(k)x^2(k)]+[0.0030.000]u(k)+[α11(k)α12(k)]E114
y¯(k)=x¯1(k)+α2(k)E115

with the initial condition x¯(0)=x¯0and the boundary value x^s(N)=x¯(N).Here, γ(k),α2(k)and α1(k)=[α11(k)α12(k)]Tare the adjusted parameters.

ModelIteration numberElapsed timeInitial costFinal costOutput residual
Filtering60.7827723.79100.0212710.034731
Smoothing81.0269193.50950.0007340.018294

### Table 1.

Iteration result.

The iteration results, both for filtering and smoothing models, are shown in Table 1. The final cost of the smoothing model is the least compared to the final cost of the filtering model. When the trace matrix terms are considered in the cost function, the total final cost of the smoothing model is 0.019188 unit, while the total final cost of the filtering model is 0.039725 unit. The value of the trace matrix terms is 0.0185 unit. It is noticed that the output residual could be dropped to almost 52% from the filtering output residual by using the approach proposed in this chapter. This statement is valid since the output residual of smoothing model is least than the output residual of filtering model.

To identify the accuracy of the resulting algorithm, the norms of the differences between the real plant and the model used at the end of iteration, which are 0.0128 unit for filtering model and 0.0099 unit for smoothing model, are calculated. These values show that the smoothing model can approximate closely to the correct optimal solution of the original optimal control problem rather than the filtering model. Hence, the accuracy of the smoothing model is proven.

The trajectories of final control, final state and final output for filtering, and smoothing models are shown in Figures 27. With the smallest output residual, the output, which is associated with the smoothed state estimate, is definitely applicable to measure the real output trajectory.

## 6. Concluding remarks

A fixed-interval smoothing scheme was modified in this chapter for solving the discrete-time nonlinear stochastic optimal control problem. The state estimation procedure, which is using the Kalman filtering theory and is followed by the fixed-interval smoothing, is applied to estimate the system dynamics. Then, the smoothed state estimate is used in designing the feedback optimal control law. By employing this smoothed state estimate, system optimization and parameter estimation are integrated. During the computation procedure, the differences between the real plant and the model used are calculated iteratively. On the other hand, the output measured from the real plant is fed back into the model used, in turn, updates the iterative solution. Once the convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. The illustrative example on the optimal control of the continuous stirred-tank reactor problem was studied. The results obtained demonstrated the applicable of the approach proposed, and the efficiency of the approach proposed is highly presented.

## Acknowledgments

The authors like to thank the Universiti Tun Hussein Onn Malaysia (UTHM) for financial supporting to this study under Incentive Grant Scheme for Publication (IGSP) VOT. U417.

## How to cite and reference

### Cite this chapter Copy to clipboard

Sie Long Kek, Kok Lay Teo and Mohd Ismail Abd Aziz (October 19th 2016). Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences, Nonlinear Systems - Design, Analysis, Estimation and Control, Dongbin Lee, Tim Burg and Christos Volos, IntechOpen, DOI: 10.5772/64564. Available from:

### Embed this chapter on your site Copy to clipboard

<iframe src="http://www.intechopen.com/embed/nonlinear-systems-design-analysis-estimation-and-control/smoothing-solution-for-discrete-time-nonlinear-stochastic-optimal-control-problem-with-model-reality" />

Embed this code snippet in the HTML of your website to show this chapter

### Related Content

Next chapter

#### Design, Analysis, and Applications of Iterative Methods for Solving Nonlinear Systems

By Alicia Cordero, Juan R. Torregrosa and Maria P. Vassileva

First chapter

#### Recent Advances in Fragment Molecular Orbital-Based Molecular Dynamics (FMO-MD) Simulations

By Yuto Komeiji, Yuji Mochizuki, Tatsuya Nakano and Hirotoshi Mori

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all books