Dynamic Process Model Parameter Estimation by Global System Analysis Dynamic Process Model Parameter Estimation by Global System Analysis

Global system analysis (GSA) was applied to parameter estimation of dynamic process models. First, the posterior distribution of the model parameters was estimated by quasi-Monte Carlo (QMC) simulations or uncertainty analysis. The expected variance of the estimated parameters by GSA was in general smaller than those were obtained by local search for the maximum likelihood. Second, sensitivity analysis was performed as an alternative application of GSA for the same mathematical models and testing data. The total effect index should serve as a quantitative measure of the robustness of each esti- mated parameter. Two process models were studied to demonstrate effectiveness of the proposed methodology based on GSA: a bio-reactor and a catalytic reactor. Parallelised computation allowed for sampling as many as 500,000 combinations of the model parameters in reasonable amount of time.


Introduction
We try to interpret, understand, and extract knowledge by analysing the data obtained from measurements, experiments and recording the behaviour of various processes that we encounter in the industries and real life.
Parameter estimation of dynamic process models is investigated in this study. To validate a mathematical process model, we need to match the model prediction with the measurement or the experiment data.
We assume that we can construct a certain mathematical or a first principle model to represent a chemical or biochemical process by a set of algebraic (either linear or nonlinear) and differential (either ordinary or partial) equations, that is, differential algebraic equations (DAEs).
The Bayesian statistics provides an ideal framework for the estimation of the mathematical parameters of predictive models, which is inherently stochastic in that all models make imperfect predictions because of the process variance, observation errors, model selection uncertainties, and so on.
The uncertainty of the estimated parameters must be evaluated in terms of accuracy or posterior distribution in the Bayesian statistics context.
Although maximum likelihood estimation (MLE) has been the standard tool for solving the parameter estimation problems, some shortcomings have been recognised in terms of its properties and the conventional solution algorithms for the MLE (see [1]): 1. Gradient-based methods (GBM) may miss the global optimum or true MLE because of the nature of such solvers that are based on local search mechanisms.
2. The confidence region for the parameters can be very large when the curvature of the likelihood function at an optimal point is extrapolated without knowing the global distribution of the likelihood function around the point estimate.
3. If the model equations contain singularities or structural redundancy, correlations among the estimated parameter may be observed.
Global system analysis (GSA) is designed to provide advanced features compared with the conventional analytical methods such as GBM and MLE. GSA is not precisely the same theoretical framework as Bayesian method; however, we can apply GSA as a version of computational Bayesian methodology to estimate model parameters.
First, GSA is a global search procedure as opposed to GBM such as Newton's method where we must start the search with some initial guess so that the solution can be trapped by a local optimal point that is not necessarily the global optimum.
Second, the Bayesian Cramer-Rao (BCR) bounds offer tighter lower bounds than the conventional Cramer-Rao lower bound that is based on Fisher information. GSA can calculate BCR with assumption of a prior that can be uniform (non-informative) or any probability distribution such as Gaussian.
Third, the posterior distribution of the estimated model parameters obtained by GSA correctly reflects the non-linearity of the process. Some process kinetics equations such as Arrhenius equation present structural correlation, for example, between pre-exponent factors and activation energies. Even in such a case, GSA can correctly capture the interdependency among the parameters so that the posterior distribution of the parameters accuracy is evaluated properly.
Since GSA algorithm is deterministic, one can decide the required number of samples prior to numerical simulation depending on the required accuracy or convergence. The computation can also be parallelised so that the actual CPU time can be significantly accelerated. This is a clear advantage over Markov Chain Monte Carlo (MCMC) sampling where we need to monitor the convergence to make sure the sample distribution approached the intended convergence criterion.
GSA is applied to parameter estimation taking the likelihood function as the response to the factors, which are in fact model parameters that are considered stochastic variables in the Bayesian statistics framework. The parameter search space may be either non-informative or informative in GSA: low discrepancy sequence (LDS) such as Sobol sequence is considered non-informative; or normal (Gaussian) distribution may also be assumed to perform sampling in the multidimensional parameter space (more distribution functions such as gamma or inverse-gamma will be available in the GSA tool that was used for this study).
Posterior of the model parameters can be calculated simply by integrating the sampled likelihood (response) over the entire parameter space (factors). We can then calculate posterior mean and variance as well as maximum a posterior (MAP) estimate.
Sensitivity analysis can also be performed by sampling in the hyper-dimensional space just like in the case of uncertainty analysis. The factor sensitivity table gives information about the influence of each parameter on the response, which is the likelihood function of the model postulated for the investigation.
If the process model contains some singularity or correlations among the parameters, the parameter estimation work gets very difficult, if not theoretically impossible. Although a number of different procedures have been proposed for re-parameterisation of the process models, since such manipulations are not essential in the physical significance of the processes of interest, the estimated parameters that are reformulated would not be very useful to understand the fundamental principles of the actual processes.
Since GSA is computational resource intensive, parallelisation should be considered to make the analysis tractable. Recent advance in the software technology made it possible to run relatively large GSA with as many as 500,000 samples executed even on a common laptop PC with a multi-core CPU 2. Parameter estimation by global system analysis GSA helps to investigate the global behaviour of any system models. In contrast to simulation, in which specific values are assigned to system inputs and the values of key outputs are reported, in GSA, it is possible to specify a range of input values of interest and therefore obtain a range of outputs.
GSA can perform three types of analysis: • Scenario or parametric • Uncertainty • Sensitivity The purpose of each type of analysis differs widely as follows: • Scenario or parametric sensitivity analysis is a series of simulations where the values of one or more factors are varied over a grid to investigate the resulting changes in one or more responses. It allows an assessment of the impact of different factors on each response.
• Uncertainty analysis propagates the uncertainty in the factors to the responses. The values returned represent the actual uncertainty associated with the responses.
• Sensitivity analysis provides metrics which indicate how factors and their uncertainty influence responses. However, the values obtained do not represent actual sensitivities.
When the likelihood function is taken as the objective or the only output from the system whereas the model parameters are given as inputs, GSA can be utilised as a global search and uncertainty evaluation for the model parameter estimation. The uncertainty analysis was used to identify the parameter point estimates and confidence intervals. In addition, the sensitivity analysis was performed to quantify the influence of each parameter on the output or prediction of the process models that are investigated.

Parameter estimation
Bayesian approach to the model parameter estimation can be presented as Eq. 1.
where p θ ð Þ designates a prior distribution of the model parameters, L yjθ ð Þ is the likelihood with which one observes response y b y ijk , and π θjy ð Þ is the posterior.
When certain a priori knowledge is available about the process, it can be incorporated in the Bayesian framework as a prior. In this study, however, we assumed a flat or homogeneous prior over the parameter search space.
The MAP estimate is defined by the parameter that maximises the posterior probability given as Eq. 2.
The system responses are parameterised by a set of parameters in the following equation, where θ * denotes true parameters that characterise the system. We assume additive errors that are i.i.d, that is, independent and identically distributed following a normal distribution.
The probability f with which one observes measurement b y ijk is formulated as Eq. 4. ijk , variance of the k-th measurement of variable j in experiment i; θ ¼ θ θ 1 ; ⋯; θ q À Á , model parameters to be estimated; and σ 2 ijk , variance of the k-th measurement of variable j in experiment i.
The MLE is defined by the parameter that minimises the log-likelihood function:

Global system analysis
GSA is based on drawing samples from the system under study; that is, taking different values for each factor and calculating the corresponding response values. One of the features and basic principles underlying GSA is that extracting random samples from a system permits the estimation of its mean value Considering the atypical surface shown in Figure 1 below on the left, Figure 2 on the right indicates what its average value will be for a number of random sample points. The dispersion of the values is due to the fact that this process is repeated 100 times for each number of samples, each time generating a different random set of sample points. This is indicative of how accurate the results will be for different numbers of samples It is clear that by increasing the number of points, values gradually become closer to the true mean value of the surface. Increasing the number of sample points will generally lead to more accurate results.

Uncertainty analysis
An uncertainty analysis can be used to determine what effect the uncertainty in the factors has on the uncertainty in the responses. This type of analysis can be accomplished using the Monte  Pseudo-random sampling generates a more realistic random sample, that is, resembling a sample drawn from a distribution. This method should be used to represent realistic sampling, for example, a clinical trial in which 200 people are sampled from the general population.
Quasi-random (Sobol) sampling ( Figure 3) allows for better coverage of space compared to pseudo-random sampling ( Figure 4). Sobol sampling is recommended unless a more realistic random behaviour is required.

Sensitivity analysis
Sensitivity analysis can be used to determine which factor contributes the most to the uncertainty in a response. In GSA, the calculated sensitivity measures are sensitivity indices rather than mathematical sensitivities (derivatives). For example, consider the following equation where all variables are algebraic and both x 1 and x 2 are assumed to have values between 0 and 10. Their mathematical sensitivities are given by: where y, x i , algebraic variables; and ∂y ∂xi , partial derivative of y with respect to x i .
These mathematical sensitivities are informative at the point where they are computed, but are limited when the factors are uncertain, and the model is of unknown linearity, as they do not provide for an exploration of the rest of the input factor space. In contrast, GSA calculates sensitivity measures that are averages over the entire space of interest. Two different indices are available: elementary effects and variance-based indices, which provide relative metrics.
In both cases, indices for the first order and total effect are calculated. According to Sobol [2,3], any function expressed either as follows: where f , function; n f , number of factors; u i , factor i; and y, response.
or as a combination of different functions: where f i , first-order function; f ij , second-order function; and f 1, 2, …, n f , n f th order function.
The first-order effect of factor u i is f i . In contrast, the total effect is the sum of all functions of all orders where u i appears as an argument Elementary effects are an efficient method for larger, more complex models with a large number of factors. This method requires a limited number of samples. However, it is not as accurate as the variance-based method and should only be used to identify the most important factors.
For elementary effects, the sensitivity indices are calculated based on the following approximation.
where EE i, j , elementary effect for factor i for sample j; u i , factor i; y j , response of sample j; and Δ, difference.
The mean values are calculated as follows: where n, number of samples; and μ i , mean elementary effect for factor i.
Although this appears to be an approximation of the mean partial derivative, it is typically not an accurate estimate of the actual mean sensitivity. Among other differences, the elementary effect also considers the variance in each factor. For example, considering the following equation y = x 1 + x 2 , the value of the mathematical sensitivity for both x 1 and x 2 is one, but if the variance in x 1 is larger than x 2 , then μ 1 > μ 2 .
The elementary effects method also calculates the standard deviation as follows.
where σ i is the standard deviation of elementary effect for factor i.
The advantage of elementary effects analysis is that it is a numerically efficient way to screen many factors at the expense of accuracy. It has been shown in practice that the relative value of the sensitivity is fairly accurate, so values sorted from high to low do typically follow the order of the most influential to the least influential factor.
Variance-based methods are more accurate but more computationally demanding and require (many) more samples than the elementary effects method. The variance-based method was applied to the following numerical examples.
The metrics used are based on variances and are indicative of what part of the variance, in the response, is attributed to the variance in each factor. Note that the relative contribution of the factors to the variance in the response can be sorted differently for each response.
This method distinguishes between first order and total effect indices. Generally, one can split any function.
Variance-based sensitivity indices measure the influence of individual factors on responses.
The variance-based sensitivity analysis method is based on the Saltelli's method [4], and the formulae used to estimate the sensitivity indices are those proposed in [5]. There are two types of variance-based sensitivity indices (see [2,3]) • The first-order effect, that is, the direct effect of factor i on response y: where S i , first-order effect index for factor i; σ 2 i, y , variance attributed to factor i on response y; and σ 2 y , variance observed for response y.
• The total effect, that is, the total effect of factor i on response, y: where S i, T , total effect index for factor i; and σ 2 i, T, y , variance attributed to factor i on response y through all factors.
In this case, the influence of factor i through other variables is taken into consideration.
In variance-based sensitivity analysis, the sensitivity indices are based on variances. Note that the relationships below can be inferred from the definitions of the factors: where S j, i , first-order effect index for factor i in response j; and S j, i, T , total effect index for factor i in response j.

It is clear that mathematical sensitivities and sensitivity indices have very different meanings.
For simplicity, consider the sensitivity index as a normalised metric which indicates what part of the variance in the response can be attributed to which factor.

Numerical case studies
gPROMS ® , a general-purpose DAE solver for process modelling, was used to compute the likelihood function values based on the measurement data with some artificial errors for the demonstration purposes. It was also used to compute the MLE (by GBM) to set a reference for comparison with the results by the proposed method, GSA-based inference.

Case study 'A': bioreactor
A benchmark problem in genome research in [6] is reproduced. They applied bootstrapping to minimise the expected variance of the estimated parameters.

Problem statement
In silico experimental data for an organism in a chemostat as shown in Figure 5 is presented. A mathematical model is set up based on a fictive network structure.
After reaching a steady state, the flow rates q in , q out as well as the concentration of the substrate in the feed c in are changed. Measurements are available for three metabolites, M 1 , M 2 , and M 3 that function as an enzyme E, representing a small biochemical network of the organism, and for biomass B and substrate S ( Table 1).
Those parameters as listed in Table 2 are assumed known prior to the experiment.
Since number of such experiments that can be performed is rather limited in reality, GSA or a Bayesian approach is anticipated to perform better than MLE or minimum mean square error (MMSE) methods in that the latter tends to provide overfitting of the parameters in terms of  the observation or measured data, whereas the former should calculate more robust estimate of parameters that are integrated over the posterior distribution.
The feed flow rate and the substrate concentration in feed are changed as shown in Table 3 and Figure 6.

Numerical study
GSA-based parameter estimation was attempted as well as the gradient-based method or local search as the benchmark. In Figures 7-11, each dot in blue shows an individual measurement       in dynamic experiments. The error bar on each dot signifies standard deviation of the measurement data (linear variance model). The solid lines in red are MLE estimation by the GBM solver (gPROMS ® ).
A constant linear variance model is assumed for each point of the measurement data where ω = 0.02.
GSA sampling was performed by Sobol sequence involving four parameters Y X/S , k 2 , k synmax , K IB . A total of 500,000 sample points were generated in terms of combinations of these four parameters. Marginal posterior for each parameter is plotted in Figures 12-15. Both Y X/S and k 2 are seen to follow rather smooth distributions as shown in Figures 12 and 13 respectively. k synmax also seems to follow a nice and smooth distribution, but with a few outliers scattering here and there as shown in Figure 14.
Note that those variance or confidence intervals for Y X/S , k 2 , k synmax by GSA are smaller than what was obtained by local search or the conventional GBM.
K IB on the other hand presents totally different picture (Figure 15), that is, the model is almost insensitive to the variation of this parameter. We can actually confirm this by running sensitivity analysis separately as summarised in Table 4.
An important finding here is that the local search was trapped by the local minimum, but it is not necessarily a robust estimate because of the spread as shown in Figure 15.

Case study 'B': catalytic synthesis of methanol from CO and H 2
A laboratory integral reactor with fixed bed catalyst was studied. Model parameter estimation was attempted by the conventional maximum likelihood approach as well as the newly proposed GSA methodology. Pseudo experiment data were prepared by introducing artificial measurement errors.

Problem statement
The following main reaction can be accomplished by several side and consecutive reactions (see [7]). A large number of kinetic expressions have been proposed in the literature [8]. We assumed the rate determining step (29) as the dominant reaction mechanism.
where * denotes component as adsorbed state; S is an adsorbed site.
A Langmuir-Hinshelwood type kinetics as described by Eq. 13 was assumed for the study.    Dynamic Process Model Parameter Estimation by Global System Analysis http://dx.doi.org/10.5772/intechopen.74635 where C i , molar concentration of component i, (mol/m 3 ); E j , activation energy of jth reaction, (J/mol); k j , reaction pre-exponent factor of jth reaction, (1/sec); r j , reaction rate of jth reaction, where, z is the measured data and ω (=0.02) is the standard deviation of the linear variance model.
The reactor with a fixed catalyst bed was operated under three different temperatures (475, 500, 525, 550 K). Two different initial conditions were assumed, either being filled with the feed  gas compositions or an empty/inert state. There were eight sets of experiment data for the study. Feed rates were controlled as shown in Figure 22.
We can observe that the GSA or Bayesian approach correctly captures the process kinetics as described by Arrhenius equations that contains some interdependency among the activation energies and the pre-exponent factors. Figures 23 and 24 show measurement data in a dynamic experiment (designated by the blue dots in the plots). Notice that the vertical bar attached to each dot denotes the measurement error that we assumed. The red dots are the simulation results of the final iteration for the MLE estimation by the GBM.  The model parameters were estimated firstly by gPROMS ® , a conventional solution engine or GBM to maximise the likelihood function (minimising the log-likelihood). Confidence regions or intervals for the estimated parameters are evaluated based on Cramer-Rao lower bounds.

Numerical study
Second, posterior likelihood of the model parameters is computed by GSA. Figures 25-30 are 3D plots of likelihood for a few pairs of parameters to be identified. The projection of those 3D points to 2D planes in the former plots are 2D marginal distributions as shown in Figures 31-36, respectively.
The parameters are correlated in one way or another. Nonetheless, the pair of parameters such as E CO and E r are still identifiable based on the posterior distribution of the likelihood function  ( Figure 31). The pair of E H2 and E r presents a little different profile, but they can also be estimated by taking MAP or averaging over the entire sampling space (Figure 32).
One noticeable finding was that the posterior variances of the GSA in this particular case were much smaller than those were estimated by local search or GBM ( Table 5). Since the preexponent factors and the activation energies in the Arrhenius-type kinetics are highly correlated as can be seen in Figures 34-36, parameter estimation is difficult in its nature of the mathematical formulation.

Conclusions
We applied GSA to parameter estimation for dynamic process models. Numerical case studies were performed for a couple of different processes with a set of simulated or artificial data. We demonstrated the advantage of the proposed methodology that is based on global search, over the conventional ones that are based on local search.
GSA calculates posterior distribution of the model parameters based on any measurement data available for the study. The estimated confidence region or intervals are in general narrower than those were calculated by the local search.
In case where some parameter is not affecting the system response significantly, GSA would still correctly evaluate the distribution or the probable spread of the model parameters. On the contrary, local search or GBM might find an optimal point for the same problem that can however be elusive in that the estimated variance around the solution is erroneously small due to lack of global information for the point estimate.
When a set of parameters are structurally correlated as in the case for the activation energy and pre-exponential coefficient, confidence region or confidence interval by the conventional local search methods can be very large thus smearing out the point estimate. Even under such circumstances, GSA can provide legitimate information about the accuracy of the estimated parameters. The posterior variance by GSA can be significantly smaller than Cramer-Rao lower bounds.
Therefore, we can conclude that GSA can provide more robust location parameters compared to the local search methods. But to be fair, because GBM can still find the optimal solution more precisely, it is advised to use both the methods complementarily: local search or GBM can initially be used to search for MLE. GSA can then provide a global view of the posterior probability of the point estimation to make sure the solution is stable and robust. GSA can also μ σ μ σ 0 < x i < 10 0 < x i < 1 be used for sensitivity study to make sure the estimated parameters or the variances of those are reasonable.
Since GSA is computation-intensive, parallel computation must be considered to practice such a method for a study of real-life problems. We made use of parallel computation on a single PC with multi-cores (employed four workers). GSA with 500,000 points sampling took CPU time of 9530 s (2.6 h) and 128,752 s (35.8 h) for the abovementioned example cases 'A' and 'B' respectively (performed on PC with Intel i7-5600 U @2.6GHz).

A.1.3. Shortcomings of MCMC
Although MCMC has become a standard tool for Bayesian analysis, several problems have also become obvious particularly in the numerical implementation of the method [9]. While MCMC provides an almost automatic way of sampling the posterior distribution, it often converges too slowly or get stuck within one mode of a multi-modal parametric space. Moreover, it is oftentimes difficult to determine if a chain has reached stationarity or still hovering in a certain region of the parameter space.
The computation of the likelihood function can be complicated as in the physical process systems that are described by a set of mathematical equations, which can also be dynamic or time-variant rather than stationary or time-invariant. Therefore, the sampling must be efficient for the numerical implementation of MCMC to be applied to analysis of chemical, biological or physical systems that we encounter in real life.

A.2. Global sensitivity analysis: example calculations
A couple of example calculations of GSA sensitivity analysis are shown below (refer to Section 2.4).

A.2.1. Elementary effects sensitivity analysis
Applying elementary effects sensitivity analysis on equation y = x 1 2 + 3x 2 returns the results in Table 5.

A.2.2. Variance-based sensitivity analysis
Applying variance-based sensitivity analysis on equation y = x 1 2 + 3x 2 returns the results in Table 6. While variance-based values do not have an exact meaning, the most influential factors are correctly identified.