Surrogate Model Applied for Analysis of Uncertain Parameters in Turbulent Mixing Flows Surrogate Model Applied for Analysis of Uncertain Parameters in Turbulent Mixing Flows

The water mixing experiment in the Generic Mixing Experiment (GEMIX) facility performed at the Paul Scherrer Institute is used as a benchmark case to investigate the influence of the main uncertain parameters on the turbulent mixing under isokinetic flow conditions. The benchmark experiment features two horizontal water streams with the same inlet velocity that merge together to form a mixing flow inside the larger horizontal square channel. The turbulence intensity and the velocity profile at the inlet were used as the main uncertain input parameters. The selected set of computational fluid dynamics (CFD) simulations based on different combinations of values for uncertain parameters has been performed with the code NEPTUNE_CFD that solves the Reynolds Averaged Navier Stokes (RANS) equations with the k- ε turbulence model. To investigate the influence of the uncertain parameters over a wide range of values, the surrogate model called optimal statistical estimator (OSE) was used to generate the response surface of the results. It has been demonstrated that the OSE method can be successfully applied to build the response surface from a limited set of simulation points. For the two-parameter problem of the current study, only a few CFD simulation points are found sufficient to construct the quality response surface.


Introduction
Computational fluid dynamics (CFD) simulations are affected by uncertainties, due in large part to the increasing complexity of physical models and the inherent introduction of random model data. The identification of the main uncertain parameters and the evaluation of their influence on the simulation results are crucial steps in estimating the simulation accuracy. For turbulent flows, not only that the uncertainty in the choice of the turbulent model parameters has to be considered, but also the physical properties of the fluid and the experimentally provided boundary conditions can be largely affected by uncertainties. The choice of single values for these "input" parameters often leads to considerable differences between the simulation and experimental results. The sensitivity of the simulation results to the range of different combinations of input parameters can be evaluated by hundreds of CFD simulations. However, even a simple steadystate simulation of a turbulent flow can take several hours to complete. One way to alleviate this burden is the construction of a surrogate model that mimics the behaviour of the CFD simulation as closely as possible while being computationally much cheaper. Such models are based on a complex mathematical response function that is constructed on a limited number of intelligently selected data (CFD) points. Several methods that deal with the sensitivity or uncertainty of CFD computations can be found in the literature [1][2][3][4].
The uncertain parameters affecting the turbulent flow presented in this study are investigated on a typical CFD case. The benchmark experiment featuring the mixing of two turbulent flows performed at the Paul Scherrer Institute (PSI) [4] has been used in the analysis. To analyse the effect of the sensitivity parameters, a series of CFD simulations is performed with the code NEPTUNE_CFD that solves the Reynolds Averaged Navier Stokes (RANS) equations with the k-eps turbulence model [5]. The optimal statistical estimator (OSE) method [6] is used for the response surface generation.

Turbulent mixing benchmark
To investigate the main turbulent mixing mechanisms, the experimental data of the water mixing experiment Generic Mixing Experiment (GEMIX) were used [4]. The GEMIX facility, located at the Paul Scherrer Institute (PSI), features the horizontal test section where the two horizontal water streams with different temperatures or densities can be mixed.
The selected benchmark case considers the mixing of two isothermal flows with the same density and mass flow rate. The schematic representation of the test section is shown in Figure 1. The inflow section consists of two slightly inclined rectangular channels separated by the splitter plate that merge into the larger square channel with the cross section of 50 mm by 50 mm. The water with a mass flow rate of 1 kg/s flows through the upper and lower channel into the mixing section, where the two water streams mix together to form a turbulent mixing layer. The profiles of velocity and turbulent quantities were measured at two axial locations, 70 mm and 450 mm behind the splitter plate, as shown in Figure 1. A detailed description of the experiment and the measurement techniques are given in [7].

Uncertain parameters at the inlet
The complete information about the flow conditions in the experiment requires the measurement of velocity and turbulent quantities at several locations along the flow. In the considered experiment, these data are provided in the mixing section, but the velocity and turbulence profiles have not been measured in the region of the two inflow channels. Trying to impose a uniform flow with the low level of turbulence, a flow straightener followed by several grids of different densities was placed inside each of the inflow channels [4]. Nevertheless, at the known mass flow rate, the velocity profile and turbulence at the inlet can be treated as uncertain parameters with an estimated range of values.
One may assume that the profile inlet velocity can take the values between the uniform and the fully developed turbulent profile shape. This can be written in a parametrised form using the parameter α as where U d (y,z) stands for the fully developed velocity profile and U u for the uniform velocity. The fully developed velocity profile was obtained from a separate simulation in a single long rectangular channel of the same cross section and was imposed for both inlet flows. For the purpose of the implementation into the CFD code, the velocity profiles between the uniform and fully developed profile shape were approximated by trigonometric mathematical functions that are exactly zero at the wall. The product of two series of cosine functions (Fourier series) is used: where a n and b n are coefficients obtained with the least-squares method and L y and L z represent the dimensions of the channel. The parameter α in Eq. (1) is varied between 0 and 1 (0 represents uniform and 1 fully developed axial velocity profile) in steps of 0.2. The second uncertain parameter is the turbulence intensity β at the inlet, which describes the ratio between the turbulent velocity fluctuations and the mean velocity. In the present work, it is assumed that the turbulence intensity β may vary between 0% and the high turbulence level of 25%.

CFD simulations
The simulations were carried out with the NEPTUNE_CFD 2.0.1 code [8] developed by Electricite de France (EDF) and Commissariat a l'Energie Atomique (CEA, France). NEPTUNE_CFD 2.0.1 is a dedicated code for modelling of transient flows in nuclear reactor systems. Phenomena that involve mixing of turbulent flows are of high interest in nuclear industry, especially from the perspective of hypothetical accident conditions [5].
The momentum and turbulence transport equations are solved using the Reynolds Averaged Navier Stokes (RANS) approach. The standard k-ε model with the logarithmic wall function near the walls was used for the turbulence modelling. The calculations were performed on a computational mesh with 475,000 hexahedral elements with mesh refinement near the channel walls and around the splitter plate. The simulation and mesh details are provided in [9].
The mass flow rate is the same at both inlets (1 kg/s), which corresponds to the average liquid velocity of 0.8 m/s. The shape of the inlet velocity profile is set by Eq. (1) and depends on the parameter α. The density of the water is 998 kg/m 3 at the temperature of 23 C. Besides the velocity, also the turbulence intensity β has to be prescribed at the inlet. No-slip boundary conditions are used on the wall, and a constant pressure is set at the outlet of the domain. All simulations were performed at constant turbulent Schmidt number Sc = 0.7. The constant time step of 0.02 s is used for the calculations. About 100 time steps are sufficient to reach the converged solution.

Matrix of CFD simulation points
The relevant output results are the profiles of the turbulence kinetic energy k and the downstream velocity component U (in x direction, see Figure 1) at two locations: 0.07 and 0.45 m downstream from the edge of the splitter plates. To evaluate the effect of random values of uncertain parameters on the results, a series of simulations was performed, independently combining different values of α and β. The matrix of CFD simulation points is presented in Figure 2. It can be seen that the calculation points are distributed much densely and in equidistant steps of 1% at lower values of β (between 0 and 4%), whereas their distribution is uneven and rather scarce at higher values. Overall 40 simulations have been carried out. But this is not nearly enough to cover the complete space of results for the range of α (0-1) and β (0-25%) values. In between, the surrogate (or response surface) model can be used.

Optimal statistical estimator (OSE) method
Hundreds of time-consuming CFD simulations would be needed to cover the entire space of results for different combinations and values of input parameters. Instead, the surrogate model can be used to replace the numerous code simulations. Surrogate modelling is based on the construction of the mathematical response function from the limited number of selected data points aiming to estimate the system response or, in other words, to define the relationship between specific system inputs and outputs. In this way, the constructed response surface can be used for the sensitivity analysis of the impact of uncertain parameters.
In the work of [6], the response surface was generated by the optimal statistical estimator (OSE) method, which can be easily applied for multidimensional space. Basic equations for the response surface generation with OSE are provided in this section; details are described in [6]. The OSE method uses a linear combination of code-calculated output values (or their corrected values as explained later) and coefficients C n , which represent the measure of similarity between a given vector of input data G and the vector G n for the nth calculation. The optimal statistical estimator b H o G ð Þ and the coefficients C n are defined as where G = (x 1 , x 2 ,…, x M ) is a given input data vector, G n = (x n1 , x n2 , …, x nM ) and H n = (x n(M+1) , x n (M+2) ,…, x nI ) are the input and output data vectors for the nth calculation, respectively; M is the number of input parameters; I the number of input and output parameters; and N is the number of calculated (or measured) values. Since the Dirac delta function δ appears in the denominator of Eq. (4) in the original derivation for the one-dimensional case, an approximation for δ was used to make the formula applicable [10]. Among various possible approximations, the Gaussian function was the most appropriate, and for the M-dimensional case, it was adapted to where σ i is the width of the Gaussian curve determined by the user. The selection of the adequate width σ i needs special attention. Widening of the Gaussian curve by selecting a larger width σ i extends the influence area of the particular input data point to its surroundings, i.e. also to the neighbouring data points. To uniformly cover the volume with samples, the width σ i for dimension i is defined as follows: where S i is the distance between the minimum and maximum value of the data points x ni (n = 1, 2,…, N) and N i is the number of intervals between the data points.
The corrective factor f c should be selected by the user based on the desired and previously tested performance of OSE. The contribution of each data point to the estimation of the final result can then be adjusted by f c . If f c is equal to 1.0, the width σ i is equal to the distance between two adjacent points, and the influence is thus extended approximately to its neighbour points. On the other hand, if f c is close to 0, in the one-dimensional case, this would result in a stepwise function. With increasing f c the OSE function becomes smoother between the data points but at the cost of less accurate predictions at the data points, which can be eliminated by appropriately shifting the data point values as performed in the present analysis. OSE is applied correctly if the OSE function between the calculated points is smooth and the calculated points are correctly predicted. More details about how the contribution of each calculated data point to the final output parameter estimation can be adjusted by the corrective factor f c can be found in [6].
Using the derived optimal statistical estimator b H o , the complete estimated vector can be defined as a composition The function Y(X) is modelled by OSE. The vector Y is influenced by the input parameters, which are directly transferred to the output, while the optimal statistical estimator determines the complementary components b H o G ð Þ of the above composition representing the output values. One of the most important characteristics of this estimator is that it involves the highly nonlinear coefficients C n ; therefore the response surface for highly nonlinear functions can be efficiently generated.
For assessing the adequacy and predictive capability of OSE, the root mean square error and coefficient of determination for the mth input parameter are used: for m = (M + 1), (M + 2),…, I, where M is the number of input uncertain parameters and I is the number of the input and output parameters. Here x nm is the nth code-calculated value of the mth output parameter, x est,m is nth estimated value with the optimal statistical estimator (see Eq. (3)) and x avg,m is the mean of the N code-calculated values of the mth output parameter. The assessment of the predictive capability of the optimal statistical estimator with the two proposed statistics is perfect when RMS m = 0 and R 2 m ¼ 1.
To produce the output results, the values of the input parameters (x 1 , x 2 ,…, x M ) are randomly sampled (e.g. by Monte Carlo method) each time, and then the corresponding unknown output values are estimated by the optimal statistical estimator using Eqs. (3)- (9). Each time the new coefficients C n are calculated, while the values of H n are the data points, determined in the phase of the response surface generation by OSE.
The values are based on computer code calculation values to achieve the desired predictive capability of OSE for the code-calculated values. If the statistics RMS m and R 2 m do not show the desired predictive capability of OSE, this means that the influence of the neighbour computer code calculation values is too large. This influence may be eliminated by appropriately shifting the computer code calculation values. Their shifting is performed in an iterative way till the results predicted by OSE do not agree with the originally non-shifted calculation values. Using this approach, a perfect agreement of the OSE results with the calculated data points can be obtained. If the estimated value x est,m initially exceeds the code-calculated value x nm , the new value is decreased for the difference between the estimated and code-calculated value and vice versa. This is done in iterative steps until the desired accuracy of the estimated value is obtained in the points (i.e. values of input parameters) used in the N code calculations. Due to the highly non-linear functions, only a few iterations are normally needed to achieve a reasonable agreement. In this way the code-calculated values can always be matched.

CFD simulation results
The results of the NEPTUNE_CFD sensitivity calculations are presented in Figures 3-6. In Figure 3 the axial velocity profiles at two locations 0.07 and 0.45 m downstream the splitter plate are shown. The sensitivity to the inlet turbulence intensity β is presented for three different values of inlet velocity profiles (α = 0, 0.4 and 1). As shown, high values of β tend to smooth the velocity gradients. The smoothing effect at higher β values is more pronounced for the profiles with α = 0.4 and 1. It can be seen that the highest turbulent intensities β = 16 and 25  The influence of β on the turbulent kinetic energy k is additionally discussed in Figures 5 and 6, where the dependence on α is shown for several different β values. Figure 5 presents the results just behind the splitter plate (at 0.07 m) and Figure 6 the results farther in the mixing region (at 0.45 m). At 0.07 m, the best agreement between the simulation and experiment is obtained for β = 8% ( Figure 5), whereas further downstream, at 0.45 m (Figure 6), the simulations with β = 8% underpredict the measured data. At this axial location, the profiles with β values of 16 and 25% match the experimental data. This result indicates that the dissipation of turbulence kinetic energy k in the existing k-ε turbulence model is higher than in the experiment.

Response surface generated by OSE method
For the reference OSE calculation, all 40 simulation points (see Figure 2) are used to generate the response surface (Case C0). It should be noted that for creating the 3D figures of the NEPTUNE_CFD-calculated data, the missing data in the calculational matrix at given α were obtained by interpolation in α direction as missing data would be otherwise set to 0 on the graph. Also, for the 3D graph to be proportional in β direction, the NEPTUNE_CFD-calculated data were linearly interpolated so that the distance between two points was set equal to 1% in β direction for comparison purposes with the response surface constructed by OSE.  and β), which has to be generated by OSE. Considering two output parameters at two axial locations, this means that four times 99 response surfaces were automatically generated by OSE for each case and only some are shown. Figures 7 and 8 show the comparison between the 3D plots of the NEPTUNE_CFD simulation points and the response surfaces generated by the OSE reference case C0. The dependency of the downstream velocity U and turbulence kinetic energy k on the parameters α and β, at different locations in the mixing region, is presented.
Although the construction points in the calculation matrix are scarcely distributed (see Figure 2) at higher β values, a good agreement between the NEPTUNE_CFD calculation and the OSE response surface can be observed. At the middle of the channel (y = 0 m), the velocity U increases with increasing β, while the dependence on α is smaller (see Figure 7(a1) and (b1)). The response surface is smooth and in good agreement with the surface plotted from the calculated data points as the relations are rather linear. On the other hand, at location y = 0.08 m, the velocity U increases with α and decreases with β as shown in Figure 7(a2).
More data in β direction would further improve the agreement for this more nonlinear relation. Similar is true for the velocity U at y = 0.16 m. Finally, from Figure 7(b4) it can be seen that at α = 0 the code-calculated value has not been matched for the highly nonlinear surface. Figure 8 showing the turbulence kinetic energy shows that the highest values are obtained at the maximum α and β value. The code-calculated values were matched by OSE (R 2 always above 0.99). It should be noted that the graph plotted from the calculated data has many interpolated data (in one direction only); therefore visually the agreement does not seem to be so perfect.
As already mentioned, the selected number of points (i.e. 40 points) is relatively scarcely distributed across the calculation matrix. Nevertheless, the main strength of the OSE method is its capability to efficiently generate the response surface even with much fewer number of construction points [11], in particular for the two-parameter problems. To demonstrate this, the following cases with reduced number of calculation points are considered for the response surface generation: •  10 show the influence of the number of calculated points on the response surface generation for the downstream velocity U and the turbulence kinetic energy k, respectively. In the case with 40 calculational points (C0 case) for the response surface generation by OSE, the value of the corrective factor f c is set to f c = 1.5 and the number of iterations to 90. The distances S1 and S2 are 1 and 25, respectively, and the number of the intervals N1 and N2 is 5. Finally, in the cases with less than 40 calculation points, f c is changed to f c = 2.0 to widen the influence of the neighbouring points. Generally, R 2 was above 0.97. The only exceptions are the points describing U at y values near the walls (less than À0.21 m and higher than 0.21 m), where R 2 was around 0.82 (see Figure 7(b4)). Closely observing Figure 3, the velocity U near the walls drops significantly. Figure 9 clearly shows that the agreement with the calculated 3D graph decreases with lowering the number of calculated points used for the response surface generation. Nevertheless, the applied simulation points are perfectly predicted in all cases (R 2 = 1 even for case C4 with the lowest number of points). However, if the number of used points is low, the agreement in the region between these points is not perfect.
The comparison of k values in Figure 10 shows that the response surfaces for the cases C0 and C1 are in good agreement with the calculated 3D graph, while some deviation can be observed at the surfaces with a lower number of points (cases C2, C3 and C4).
In addition to the qualitative comparison of the surfaces in Figure 9, in Table 1 a quantitative comparison is shown between the CFD code simulations and the four OSE predictions generated either from all 40 code calculation values (C0 case; see also Figure 9(b)), from 19 code simulation values (C1 case; see also Figure 9(c)), from 11 code simulation values (C2 case; see also Figure 9(d)) and from 5 code simulation values (C4 case; see also Figure 9(f)). It can be seen that in the base case, the difference in the downstream velocity is less than 0.4% (0.19% in average). For the C1, C2 and C4 cases, the maximum difference in the downstream velocity is 1.20% (0.23% in average), 1.44% (0.45% in average) and 2.73% (0.76% in average), respectively. Case C3 is not shown in Table 1, but the maximum difference is 1.94% (0.61% in average).
The obtained results suggest that in the case when thousands of complex computer code runs are needed for the Monte Carlo statistical analysis, the surrogate model can be used to replace the numerous CFD code simulations. In the case of a lower accuracy of response surface at a certain discrete value, a few additional calculations can be added in the vicinity of this point to better describe the space until a satisfactory agreement is obtained.

Conclusions
The mixing of two horizontal turbulent flows was investigated on the GEMIX experimental case. Two sensitivity parameters were considered as uncertainty variables: inlet velocity profile and inlet turbulence intensity. Both variables were treated as the main uncertain parameters with a certain range of randomly distributed values. The NEPTUNE_CFD code was used to perform a series of parametric CFD calculations, combining different values of turbulence intensity and velocity profile at the inlet. The sensitivity analysis of the NEPTUNE_CFD results shows that the implemented turbulence model predicts a higher dissipation of turbulence kinetic energy than observed in the experiment.
A surrogate method called optimal statistical estimator (OSE) was used to generate the full response surface of the output results from the limited number of input CFD predictions. The output results featured downstream velocity and turbulence kinetic energy profiles at few locations in the mixing region. The study demonstrates the efficiency of the response surface generation by the OSE method. The method is especially convenient when few parameters are varied. In the case of a two-dimensional problem, OSE clearly shows its ability to generate a quality response surface even if only a few calculation points are available. This suggests that in the case when thousands of complex computer code runs are needed for the Monte Carlo statistical analysis, the surrogate model can be used to replace the numerous CFD code simulations.