In this chapter, the performance of the integrated optimal control and parameter estimation (IOCPE) algorithm is improved using a modified fixed-interval smoothing scheme in order to solve the discrete-time nonlinear stochastic optimal control problem. In our approach, a linear model-based optimal control problem with adding the adjustable parameters into the model used is solved iteratively. The aim is to obtain the optimal solution of the original optimal control problem. In the presence of the random noise sequences in process plant and measurement channel, the state dynamics, which is estimated using Kalman filtering theory, is smoothed in a fixed interval. With such smoothed state estimate sequence that reduces the output residual, the feedback optimal control law is then designed. During the computation procedure, the optimal solution of the modified model-based optimal control problem can be updated at each iteration step. When convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. Moreover, the convergence of the resulting algorithm is also given. For illustration, optimal control of a continuous stirred-tank reactor problem is studied and the result obtained shows the efficiency of the approach proposed.
- fixed-interval smoothing
- Kalman filtering theory
- model-reality differences
- adjustable parameters
- iterative solution
Optimal control approach provides the solution in solving dynamic real-world practical problems. Particularly, the linear problems, which are disturbed by the random noise sequence, have been well-defined with application of the optimal state estimate in designing the optimal feedback control law. In such situation, the optimal state estimator and the optimal controller are designed separately to optimize and control the dynamical systems. This is called the separation principle [1–4]. By virtue of this principle, the research works on stochastic optimal control and applications are growing widely, see for examples, linear systems [5, 6], fleet composition problem , optimal parameter selection problems , Markov jump process , power management , multiagent systems , portfolio selection model , 2-DOF vehicle model , sensorimotor system , and advertising model .
In fact, the exact solution of stochastic optimal control problems is impossible to be obtained, especially for the problems involving nonlinear system dynamics. To obtain an optimal solution of the discrete-time nonlinear stochastic optimal control problem, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed to solve this kind of the problem iteratively [16–18]. In this algorithm, the linear quadratic Gaussian (LQG) model is applied to a model-based optimal control problem, where the state estimation procedure is done using the Kalman filtering theory. Based on this model, the adjusted parameters are added into the model so as system optimization and parameter estimation are integrated interactively. On this basis, the differences between the real plant and the model used are measured repeatedly in order to update the optimal solution of the model used. On the other hand, the output that is measured from the real plant is fed back into the model used for the state estimator design. When the convergence is achieved, the iterative solution approaches to the true optimal solution of the original optimal control problem despite model-reality differences. This optimal solution is the optimal filtering solution, which is obtained using the IOCPE algorithm. The efficiency of the IOCPE algorithm has been proven in Refs. [16–18].
However, the output trajectory of the model, which is obtained from the IOCPE algorithm, is less accurate in estimating the exact output measurement of the original optimal control problem. In this chapter, our aim is to improve the IOCPE algorithm using the fixed-interval smoothing approach, where the output residual shall be reduced within an appropriate tolerance to generate a better output trajectory. In our model, the state dynamics, which is disturbed by Gaussian noise sequences, is estimated by using the Kalman filtering theory, and then it is smoothed in a fixed-interval estimation. With such state estimation procedure, we modify the estimation procedure so that a smoothed state estimate is predicted backward in time and is used in designing the feedback optimal control law. It is noticed that the output residual of this smoothed state estimate is smaller than the output residual that is obtained by using the Kalman filtering theory, see . The procedure of the solution method discussed in this chapter is almost the same as that was presented in the study of Kek et al. , but the accuracy of the optimal solution with the modified fixed-interval smoothing would be definitely increased.
The structure of the chapter is outlined as follows. In Section 2, the description of a general discrete-time nonlinear stochastic optimal control problem and its simplified model-based optimal control problem is made. In Section 3, an expanded optimal control model is introduced, where system optimization and parameter estimation are integrated mutually. The feedback control law, which is incorporated with the Kalman filtering theory and the fixed-interval smoothing, is designed. Then, the iterative algorithm based on principle of model-reality differences is derived so that discrete-time nonlinear stochastic optimal control problem could be solved. In Section 4, a convergence result for the algorithm proposed is provided. In Section 5, an example of optimal control of a continuous stirred-tank reactor problem is illustrated. Finally, some concluding remarks are made.
2. Problem description
Consider a general class of the dynamical system given below:
where and are the control sequence, the state sequence, and the output sequence, respectively. , which is the process noise sequence, and which is the measurement noise sequence, are stationary Gaussian white noise sequences with zero mean, and their covariance matrices are given by and , respectively. Here, both of these covariance matrices are positive definite matrices. In addition, represents the real plant and is the real output measurement, which both are assumed to be continuously differentiable with respect to their respective arguments, whereas is a process coefficient matrix.
The initial state is
where is a random vector with mean and covariance given, respectively, by
Here, is a positive definite matrix and is the expectation operator. It is assumed that initial state, process noise, and measurement noise are statistically independent.
Therefore, our aim is to find an admissible control sequence subject to the dynamical system given in Eq. (1) such that the scalar cost function
is minimized, where is the terminal cost and is the cost under summation. It is assumed that these functions are continuously differentiable with respect to their respective arguments.
This problem is regarded as the discrete-time nonlinear stochastic optimal control problem and is referred to as Problem (P).
Notice that, in general, the exact solution of Problem (P) is unable to be obtained and estimating the state of the real plant by applying the nonlinear filtering theory is computationally demanding. Due to these reasons, a smoothing model-based optimal control problem, which is referred to as Problem (M), is proposed by
with the following state estimation procedure
where and are, respectively, the smoothed state sequence and the smoothed output sequence. The matrices involved are given as follow: A is an n × n state transition matrix, B is an n × n control coefficient matrix,
The state estimation procedure, which is given in (4a), (4b), and (4c), is obviously from the Kalman filtering theory, where and are, respectively, the filtered state sequence and the predicted state sequence, whereas is the expected output sequence. The filter and smoother gains, which are and , are, respectively, given by
whereas the state error covariance matrices are
and the output error covariance matrix is
with the boundary conditions and The filtered state error covariance the predicted state error covariance the smoothed state error covariance , and the output error covariance are positive definite matrices.
, , and .
Follow from this simplification, the trace matrix terms that are depend on the state error covariance matrix are ignored in the model used since they are constant values. In such a way, the cost function of the linear model-based optimal control model could be evaluated.
Notice that the separation principle [1–4] is applied to solving Problem (M), where the optimal feedback control law and the optimal state estimate are designed separately as discussed in [16–18]. Further from this, the accuracy of the optimal state estimate is increased by smoothing the state estimate in the fixed interval [2, 4]. Then, based on this smoothed state estimate, the smoothing optimal control law is designed. On the other hand, the output measured from the real plant is fed back into the model used, in turn, to improve the state estimation procedure and to update the solution of the model used. Moreover, only solving Problem (M) without adding the adjusted parameters into the model used would not approximate to the optimal solution of Problem (P). Hence, by taking the adjusted parameters into the model used and solving Problem (M) iteratively, the correct optimal solution of the original optimal control problem could be obtained, in spite of model-reality differences.
3. Modified smoothing with model-reality differences
Now, let us introduce an expanded optimal control problem with smoothing state estimate, which is referred to as Problem (E), given below:
where and are introduced to separate the control and the smoothed state from the respective signals in the parameter estimation problem and denotes the usual Euclidean norm. The terms and are introduced such that the convexity is improved and the convergence of the iterative algorithm is enhanced. The main purpose of designing the algorithm in this way is to ensure that satisfying of the constraints and is fulfilled at the end of the iterations. More specifically, applying the state estimate and the control for the computation in the parameter estimation and the matching schemes will increase the practical usage of the algorithm. Moreover, implementing the relevant smoothed state and control that will be reserved for optimizing the model-based optimal control problem leads the iterative solution toward to the true optimal solution of the original optimal control problem.
Figure 1 shows the block diagram of the approach proposed. The methodology of the approach proposed is further discussed in the following sections.
From the block diagram in Figure 1, the definition of the principle of model-reality differences could be given.
Definition 3.1: Principle of model-reality differences is a unified framework, which integrates system optimization and parameter estimation interactively to define an expanded optimal control problem, aims to give the correct optimal solution of the original optimal control problem by solving the model-based optimal control problem iteratively.
3.1. Optimality conditions
Define the Hamiltonian function for Problem (E) as follows:
Then, the augmented cost function becomes
where , and are the proper multipliers to be judged the value later.
(a) Stationary condition:
(b) Smoothed costate equation:
(c) Smoothed state equation:
with the boundary conditions and
(d) Adjustable parameter equations:
E11a E11b E11c E11d
(e) Multiplier equations:
E12a E12b E12c
with and .
(f) Separable variables:
In view of these necessary optimality conditions, the conditions (10a), (10b), and (10c) define the modified model-based optimal control problem, the conditions (11a), (11b), (11c), and (11d) define the parameter estimation problem and the conditions (12a), (12b), and (12c) are used to compute the multipliers. They are further discussed as follows.
3.2. Modified model-based optimal control problem
The modified model-based optimal control problem, which is referred to as Problem (MM), is given below:
From the outcome of Problem (E) and Problem (MM), the theorem of the smoothed optimal control law which is applied to solve Problem (MM) is described.
Theorem 3.1: Suppose the expanded optimal control law for Problem (E) exists. Then, this control law is the smoothed feedback control law for Problem (MM) given by
with the boundary conditions given and , and
Proof: From the necessary optimality condition (10a), we have
Rewrite the smoothed state equation from Eq. (10c),
and the smoothed output is measured from
with the boundary condition .
3.3. Parameter estimation
After solving Problem (MM), the defined separable variables given in Eq. (13) are used for the further computations. Particularly, in the parameter estimation problem, the differences between the real plant and the model used are taken into account in which the matching schemes are established. In view of this, the adjusted parameters, which are resulted from parameter estimation problem defined by Eq. (11), are calculated from
3.4. Computation of multipliers
The multipliers, which are related to the Jacobian matrix of the functions f and L with respect to and , are computed from
3.5. Iterative algorithm
From the previous sections, the derivation of equations and the formulation of the resulting algorithm are clearly discussed. Following from these discussions, a summary on this iterative algorithm is delivered as follows:
Data Note that A and B may be chosen through the linearization of f, and C is obtained from the linearization of h.
Step 0: Compute a nominal solution. Assume and Calculate and from Eqs. (5a) and (5b), and from Eqs. (6a), (6b), (6c), and (6d) for the state estimation, and solve Problem (M) defined by Eq. (3) to obtain and Then, with and from data, calculate and , respectively, from Eqs. (16b) and (16c). Set and
Step 1: Calculate the adjustable parameters from Eq. (25). This is called the parameter estimation step.
Step 2: Compute the modifiers and from Eq. (26). This requires the partial derivatives of and L with respect to and .
Step 3: With the determined , and solve Problem (MM) defined by Eq. (14) using the result in Theorem 3.1. This is called the system optimization step.
Step 4: Update the optimal smoothing solution of Problem (P) and test the convergence of the algorithm. For regulating convergence, a mechanism, which is a simple relaxation method, shall be provided and given by:
E27a E27b E27c
where , range in the interval of , are scalar gains. If and within a given tolerance, stop; else repeat from Step 1 by setting .
The off-line computation, which is mentioned in Step 0, is done for the state estimator design, where are computed, and for the control law design, where are calculated. In fact, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (MM) in Step 3, respectively.
The variables , and are initially zero in Step 0. Their computed values, where in Step 1, in Step 2, and in Step 3, would be changed from iteration to iteration.
The state estimation without the control is done forward using the Kalman filtering, and then it is followed by the fixed-interval smoothing backward in order to design the feedback control law.
Problem (P) is not necessary to have a cost function in quadratic criterion or to be a linear problem.
The equations and can be definitely required to satisfy for the converged state estimate sequence and the converged optimal control sequence. On this point of view, the following averaged 2-norms are computed and, then, they are compared with a given tolerance to verify the convergence of and :
The relaxation scalars (kv, kz, kp) are the step-sizes in regulating the convergence mechanism. These scalars could be normally chosen as a certain value in the range of (0, 1], but this choice may not provide the optimal number of iterations. Hence, it is important to note that the optimal choice of these scalars kv, kz, kp ∈ (0, 1] would be problem dependent. As a rule of this case, the algorithm (from Step 1 to Step 4) is required to run few times. Initially, for first run of the algorithm (from Step 1 to Step 4), these scalars are set at kv = kz = kp = 1, and then, with different values chosen from 0.1 to 0.9, the algorithm is run again. The value with the optimal number of iterations can be determined after that. Applying the parameters r1 and r2 is to enhance the convexity such that the convergence of the algorithm can be improved.
4. Convergence analysis
In this section, the convergence of the algorithm is discussed. The following assumptions are needed:
The derivatives of and h exist.
The solution is the optimal solution to Problem (P). That is, the optimal smoothing solution.
The convergence result is presented in Theorem 4.1, while the accuracy of the smoothed state in term of state error covariance is proven in Corollary 4.1.
Theorem 4.1: The converged solution of Problem (M) is the correct optimal smoothing solution of Problem (P).
Proof: Consider the real plant and the output measurement of Problem (P) with the exact optimal smoothing solution as given below:
In Problem (M), the model used consists of
where , and are, respectively, the converged sequences for control law, smoothed state estimate, filtered state estimate, expected state estimate, smoothed output, and expected output. Here, is the output measured from the real plant.
Applying the adjusted parameters and , which are given by
into the model used given by Eq. (30b) and (30c), the differences between the real plant and the model used can be measured at each iteration. Moreover, at the end of iteration, from Eqs. (29) and (30a) – (30e) yields
which and are satisfied. Hence, this implies that
This completes the proof.
Corollary 4.1: The smoothed state error covariance is the smallest among the values of state error covariance.
Proof: From Eq. (6), it is clear that the filtered state error covariance is less than the predicted state error covariance That is, Now, to prove , we shall show that . Consider the boundary condition and taking we have
For , it shows that
This statement can be deduced that
Thus, we conclude that
which shows the accuracy of the smoothed state estimate. This completes the proof.
5. Illustrative example
Consider a continuous stirred-tank reactor problem , which consists of the state difference equations
for and the output measurement . The initial state is a random vector with mean and covariance given, respectively, by and
Here, and are Gaussian white noise sequences with their respective covariance given by and . The expected cost function
is to be minimized over the state difference equations and the output measurement.
This problem is referred to as Problem (P).
To obtain the optimal smoothing solution of Problem (P), we simplify the plant dynamics of Problem (P) and refer it as Problem (M), given by
with the initial condition and the boundary value Here, and are the adjusted parameters.
|Model||Iteration number||Elapsed time||Initial cost||Final cost||Output residual|
The iteration results, both for filtering and smoothing models, are shown in Table 1. The final cost of the smoothing model is the least compared to the final cost of the filtering model. When the trace matrix terms are considered in the cost function, the total final cost of the smoothing model is 0.019188 unit, while the total final cost of the filtering model is 0.039725 unit. The value of the trace matrix terms is 0.0185 unit. It is noticed that the output residual could be dropped to almost 52% from the filtering output residual by using the approach proposed in this chapter. This statement is valid since the output residual of smoothing model is least than the output residual of filtering model.
To identify the accuracy of the resulting algorithm, the norms of the differences between the real plant and the model used at the end of iteration, which are 0.0128 unit for filtering model and 0.0099 unit for smoothing model, are calculated. These values show that the smoothing model can approximate closely to the correct optimal solution of the original optimal control problem rather than the filtering model. Hence, the accuracy of the smoothing model is proven.
The trajectories of final control, final state and final output for filtering, and smoothing models are shown in Figures 2–7. With the smallest output residual, the output, which is associated with the smoothed state estimate, is definitely applicable to measure the real output trajectory.
6. Concluding remarks
A fixed-interval smoothing scheme was modified in this chapter for solving the discrete-time nonlinear stochastic optimal control problem. The state estimation procedure, which is using the Kalman filtering theory and is followed by the fixed-interval smoothing, is applied to estimate the system dynamics. Then, the smoothed state estimate is used in designing the feedback optimal control law. By employing this smoothed state estimate, system optimization and parameter estimation are integrated. During the computation procedure, the differences between the real plant and the model used are calculated iteratively. On the other hand, the output measured from the real plant is fed back into the model used, in turn, updates the iterative solution. Once the convergence is achieved, the iterative solution approaches to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. The illustrative example on the optimal control of the continuous stirred-tank reactor problem was studied. The results obtained demonstrated the applicable of the approach proposed, and the efficiency of the approach proposed is highly presented.
The authors like to thank the Universiti Tun Hussein Onn Malaysia (UTHM) for financial supporting to this study under Incentive Grant Scheme for Publication (IGSP) VOT. U417.