## 1. Introduction

Amongst the countless research domains and the fields of research revolutionized by the merits of computer simulation, the field of engineering is a prime example as it continues to benefit greatly from the introduction of computer simulations. During product design, engineers encounter several complex input/output systems, which need to be designed and optimized. Traditionally, several prototypes were required to assure quality criteria that were met or to obtain optimal solutions for design choices and to evaluate the behaviour of products and components under varying conditions. Typically, a single prototype is not sufficient, and lessons learnt are used to improve the design, back at the drawing table. Therefore, the development process used involves building several prototypes in order to gain more confidence in the solutions. A direct consequence of this approach is that the development process is both slow and not cost effective.

The introduction of computer simulations caused revolution: by bundling implementations of material, mechanical and physical properties into a software package, simulating the desired aspects of a system and performing the tests and experiments virtually; the number of required prototypes can be drastically reduced to only a few at the very end of the design process. These prototypes can be regarded purely as a validation of the simulations. The simulation itself can be interpreted as a *model* and serves as an abstract layer between the engineer and the real world. Performing a virtual experiment is faster and is very inexpensive. A direct consequence was an acceleration of development and system design, contributing to a shorter time‐to‐market and a more effective process in general. In addition, it was also possible to perform more virtual experiments, providing a way to achieve better products and design optimality.

However, as simulation software became more precise and gained accuracy over the years, its computational cost grew tremendously. In fact, the growth of computational cost was so fast it has beaten the growth in computational power resulting in very lengthy simulations on state‐of‐the‐art machines and high performance computing environments, mainly due to the never ending drive for finer time scales, more detail and general algorithmic complexity. For instance, a computational fluid dynamics (CFD) simulation of a cooling system can take several days to complete [1], or a simulation of a single crash test was reported to take up to 36 hours to complete [2]. This introduces a new problem: large‐scale parameter sweeping and direct use of this type of *computationally expensive* simulations for evaluation intensive tasks such as optimization and sensitivity analysis are impractical and should be avoided.

To counter this enormous growth in computational cost however, an additional layer of abstraction between the complex system in the real world and the engineer was proposed, more specifically between the simulation and the engineer. Rather than interacting directly with the simulator, a cheaper approximation is constructed. Roughly three approaches to obtain this approximation exist:

**Model driven**(known as model order reduction) takes a top‐down approach by applying mathematic techniques to derive approximations directly from the original simulator. This, however, exploits information about the application domain and is therefore problem specific.**Data driven**assumes absolutely nothing is known about the inner workings of the simulator. It is assumed to be a*black‐box*and information about the response is collected from evaluations. From these data, an approximation is derived. Because this approach is very general, it is not bound to a specific domain.**Hybrid**is the overlap zone between both model driven and data driven. Attempts to incorporate domain‐specific knowledge into a data‐driven process to obtain better traceability and reach accuracy with less evaluations.

In this chapter, the focus is on a data‐driven approach: the response of a simulator as a function of its inputs is mimicked by *surrogate models* (metamodels, emulators or response surface models are often encountered as synonyms). These cheap‐to‐evaluate mathematical expressions can be evaluated efficiently and can replace the simulator. A more extensive overview of usage scenarios and a formal description of surrogate modelling are given in Section 2. Before the surrogate model can be used, it must first be constructed and trained during the surrogate modelling process on a number of well‐chosen evaluations (*samples*) to be evaluated by the simulator in order to satisfy the requirements specified upfront. The problem of selecting an appropriate set of samples is further explored in Section 3. Section 4 briefly introduces an integrated platform for surrogate modelling with sequential design. Finally, these techniques are demonstrated on three use‐cases in Section 5.

## 2. Surrogate modelling

The introduction of this chapter highlighted the global idea of approximation and the benefits of introducing an additional layer of abstraction when simulations are expensive. Now, the data‐driven approach (surrogate modelling) is discussed more in depth, both from a usability point of view and a more formal description of the technique.

### 2.1. Goals and usage scenarios

The most direct implementation of surrogate modelling is training a globally accurate model over the entire design space. This approximation can then replace the expensive simulation evaluations for a variety of engineering tasks such as design space exploration, parameterization of simulations or visualization.

A different use‐case of surrogate models is sensitivity analysis of the complex system. Especially when many input parameters are present, it is very difficult to achieve global accuracy due to the exponential growth of the input space (known as the *curse of dimensionality*). Fortunately, not all input parameters contribute equally to the output variability, in fact some might not have any impact at all [3]. The surrogate models can be used directly for evaluation‐based sensitivity analysis methods such as Sobol indices [4], Interaction indices [5] or gradient‐based methods. For some kernel‐based modelling methods, analytical computation of sensitivity measures is possibly resulting in faster and more reliable estimation schemes, even before global accuracy is achieved [6].

Another task at which surrogates excel is the optimization of expensive objective functions. This discipline is often referred to as surrogate‐based optimization (SBO). A globally accurate surrogate model can be built and optimized using traditional optimization methods such as gradient descent, or metaheuristics such as particle swarm optimization [7]. Although this approach is correct and works faster than simulating each call of the objective function, it is not necessarily the most efficient methodology: when seeking a minimum, less samples can be devoted to the regions that are clearly shown to be the opposite. This results in specific methodologies, which explore the search space for optima and exploit the available knowledge to refine optima. This is a difficult trade‐off and will be discussed in detail in Section 3.2.

All tasks described so far were forward tasks, mapping samples from a design space to the output or objective space. It is also possible to do the opposite: this is referred to as inverse surrogate modelling. This can be interpreted as identifying the areas of the design space corresponding to a certain desired or feasible output range. Typical approaches involve training a (forward) surrogate model first, then optimizing the model using an error function between the output and the desired output as objective function. This optimization is often preferred to be a robust optimization to account for the error of the forward surrogate model [8]. Specific sampling schemes to identify these regions directly were also proposed [9]. Finally, it is also possible to translate the inverse problem into a forward problem involving discretizing the output (feasible/infeasible point) and learning the class boundaries.

Because the concept of surrogate modelling is both flexible as well as generic, allowing several modifications tailored for the task at hand, it has been applied in wide range of fields including metallurgy, economics, operations research, robotics, electronics, physics, automotive, biology, geology, etc.

### 2.2. Formalism

Formally, the surrogate modelling process can be described as follows. Given an expensive function

It is clear that the selection of the approximation function is a complex interaction of several aspects, more specifically:

The choice of

For instance, a straightforward approach is minimizing the error between the surrogate model response and the true responses for the samples used to train the surrogate. This is often referred to as *training error* or *sample error* and pushes the hyperparameter optimization to favour complex models interpolating the data points perfectly. Although this solution might be considered satisfactory at first sight, in reality this rarely provides a good model as the optimization problems do not consider model quality at other, unobserved samples in the design space. This approach inevitably leads to very unreliable responses when these unobserved samples are to be predicted, hence the model is said to be *overfitting* or to have poor *generalization performance*. Popular quality estimators accounting for generalization performance include crossvalidation and validation sets.

## 3. Experimental design

A typical requirement for the surrogate model is to be *accurate*, hence the objective of the modelling process is the ability to obtain accuracy with only a small number of (expensive) simulator evaluations. A different kind of requirement is optimizing the response of the simulator. This requires fast discovery of promising regions and fast exploitation thereof to identify the (global) optimum. Hence, a different strategy for selecting the samples is required as the accuracy of the surrogate in non‐optimal is of lesser importance. A refined set of model requirements and the goal of the process are required, as they greatly affect the choice of samples to be evaluated. The choice of samples is referred to as the experimental design.

### 3.1. One‐shot design

The traditional approaches to generate an experimental design are the *one‐shot* designs. Prior to any evaluation, all samples are selected in a space‐filling manner: at this point, no further information is available due to the black‐box assumption on the simulator itself (as part of the data‐driven approach). Therefore, the information density should be approximately equal over the entire design space and the samples are to be distributed uniformly. To this end, several approaches related to *Design of Experiments* (DoEs) have been developed. However, only the space‐filling aspect has an impact within in the context of computer experiments, as other criteria such as blocking and replication lose their relevance [11]. This led to the transition and extension of these existing statistical methods to computer experiments [11, 12]. Widely applied are the factorial designs (grid‐based) [13] and optimal (maximin) Latin Hypercube designs (LHDs) [14]. Both are illustrated in **Figures 1** and **2**, respectively, for a two‐dimensional input space and 16 samples. An LHD avoids collapsing points should the input space be projected into a lower dimensional space. Other approaches include maximin and minimax designs, Box and Behken [15], central composite designs [16] and (quasi‐)Monte Carlo methods [17–19].

Despite their widespread usage, these standard approaches to generate experimental designs come with a number of disadvantages. First and foremost: the most qualitative designs (with the best space‐filling properties) can be extremely complex to generate (especially for problems with a high‐dimensional input space) due to their geometric properties. For instance, generating an LHD with optimal maximin distance is very time‐consuming process. In fact, the generation of an optimal LHD is almost a field of its own, with several different methods for faster and reliable generation [14, 20]. Fortunately, once a design is generated, it can be reused. For some other design methodologies, it is not possible to generate them for an arbitrary size. Given the expensive nature of each evaluation, this can result in an unacceptable growth of required simulation time. Factorial designs for instance always have size

Another disadvantage of one‐shot methodologies is the arbitrary choice of size of the design. The choice should depend entirely on the nature of the problem (i.e. larger design spaces with more complex behaviour require more evaluations). However, this information is unavailable at the time the design is generated. Hence, one‐shot approaches risk selecting too few data points resulting in an underfitted model, or selecting too much data points causing loss of time and computational resources.

### 3.2. Sequential experimental design

As a solution, sequential design was adopted [21]. This methodology starts from a very small one‐shot design to initiate the process. After evaluation of these samples a model is built, and a loop is initiated which is only exited when either one of the specified stopping criteria is met. Within the loop, an *adaptive sampling* algorithm is run to select additional data points for evaluation which are used to update the model. **Figure 3** displays this process graphically.

This approach has a number of advantages. First of all, constraints on the surrogate modelling process can be explicitly imposed through the stopping criteria. Typical criteria include how well the model satisfied the model requirements, a maximum number of allowed evaluations, or a maximum runtime. Second, the *adaptive sampling* method can be designed to select new data points specifically in terms of the requirements. Sampling to obtain a globally accurate model will differ from sampling to discover class boundaries or sampling to obtain optima. These choices can also be guided by all information available about the input‐output behaviour: when

Roughly, all methods for adaptive sampling are based on any of the following criteria (discussed more in detail below):

Distance to neighbouring points (space‐filling designs)

Identification of optima

Model uncertainty

Non‐linearity of the response

Feasibility of the candidate point w.r.t. constraints

Depending on the goal and model requirements, a strategy can be designed involving a complex combination of these criteria. Fundamentally, each approach will involve two competing objectives:

**Exploration**: sampling regions of the design space where proportionally only little information has been acquired.**Exploitation**: sampling promising (w.r.t to the goal) regions of the design space.

It is clear, however, that a good strategy strikes a balance between these goals as both are required to obtain satisfactory results.

Exploration‐based algorithms are typically less involved with the goal of the process. They are crucial to assure that no relevant parts of the response surface are completely missed. Roughly the space‐filling and model uncertainty criteria focus mostly on exploration. Space‐filling sequential experimental design usually involves distance to neighbouring points, e.g. the maximin/minimax criteria, potentially complemented with projective properties [22]. Model uncertainty is either explicitly available, or must be derived somehow. Bayesian model types represent the former type of models, e.g. the prediction variance of Kriging and Gaussian Process models, which can be applied directly for maximum variance sampling [23, 24] or maximum entropy designs [25]. For these kinds of models, a better way is expressing the uncertainty on the model hyperparameters resulting in approaches to reduce this uncertainty and hence, enhancing the overall model confidence [26]. Model uncertainty can also be derived by training several models and comparing their responses. Areas with most disagreements are then marked for additional samples. This can be a very effective approach in combination with ensemble modelling.

On the other hand, exploitation methods clearly pursue the goal of the process. In case a global accurate model is required, a very effective approach is raising the information density in regions with non‐linear response behaviour (e.g. LOLA‐Voronoi [27], FLOLA‐Voronoi [28]). The latter approach does not even require intermediate models as it operates on local linear interpolations. For optimization purposes, specific adaptive sampling methodologies can be applied, depending on the specific tasks. For single‐objective optimization examples of such sampling methods include CORS [29] and Bayesian optimization acquisition functions such as Expected Improvement [30] (combined with Kriging models this corresponds to the well‐known Efficient Global Optimization approach [31]), the Knowledge Gradient [32] and Predictive Entropy Search [33]. Many of these methods for optimization can also be used in combination with a method, which learns about the feasibility of input regions of the design space: during the iterative process an additional model learns the feasibility from the samples (as reflected by the simulation thereof). This information is then used during the selection of new samples with specific criteria such as the Probability of Feasibility (PoF) [34].

Surrogate‐based optimization has also been extended to problems with two or more (potentially conflicting) objectives. The goal of this type of multi‐objective optimization is the identification of a Pareto front of solutions, which presents the trade‐off between these objectives. Existing approaches include hypervolume‐based methods such as the Hypervolume Probability of Improvement (HvPoI) [35, 36], the Hypervolume Expected Improvement [37] or multi‐objective Predictive Entropy Search [38].

## 4. SUMO Toolbox

Designed as a research platform for sequential sampling and adaptive modelling using MATLAB, the SUMO Toolbox [39, 40] has grown into a mature design tool for surrogate modelling, offering a large variety of algorithms for approximation of simulators with continuous and discrete output. The software design is fully object oriented allowing high‐extensibility of its capabilities. By default, the platform follows the integrated modelling flow with sequential design but can also be configured to approximate data sets, use a one‐shot design, etc.

The design goals of the SUMO Toolbox support approximation of expensive computer simulations of complex black‐box systems with several design parameters by cheap‐to‐evaluate models, both in a regression and a classification context. To obtain these goals, the SUMO Toolbox offers sequential sampling and adaptive modelling in a highly configurable environment, which is easy to extend due to the microkernel design. Distributed computing support for evaluations of data points is also available, as well as multi‐threading to support the usage of multi‐core architectures for regression modelling and classification. Many different plugins are available for each of the different sub‐problems, including many of the algorithms and methods mentioned in this chapter.

The behaviour of each software component is configurable through a central XML file, and components can easily be added, removed or replaced by custom implementations. The SUMO Toolbox is free for academic use and is available for download at http://sumo.intec.ugent.be. It can be installed on any platform supported by MATLAB. In addition, a link can be found to the available documentation and tutorials to install and configure the toolbox including some of its more advanced features.

## 5. Illustrations

To illustrate the flexibility of the surrogate modelling framework with sequential design, some example cases are considered. The SUMO Toolbox was used for each case.

### 5.1. Low‐noise amplifier

This test case consists of a real world problem from electronics. A low‐noise amplifier (LNA), a simple radio frequency circuit, is the typical first stage of a receiver, providing the gain to suppress noise of subsequent stages. The performance of an LNA can be determined by means of computer simulations where the underlying physical behaviour is taken into account. For this experiment, we chose to model the input noise‐current, in function of two (normalized) parameters: the inductance and the MOSFET width. The response to the inputs for this test case is smooth with a steep ridge in the middle. This type of strong non‐linear behaviour is difficult to approximate.

The model type for this problem is an ANN, trained with Levenberg‐Marquard backpropagation with Bayesian regularization (300 epochs). The network topology and initial weights are optimized by a genetic algorithm, with a maximum of two layers. The process initiates by combining a 10‐point LHD with a two‐level factorial design (the corner points). As adaptive sampling methodology, the FLOLA‐Voronoi algorithm was chosen to select a single‐point iteration. Once the steep ridge has been discovered, the information density in this area will be increased. As model quality estimator, crossvalidation was used, with the root relative square error (RRSE) function:

The stopping criterion was set to an RRSE score below 0.05. This was achieved after evaluating a total of 51 samples. A plot of the model is shown in **Figure 4**. Additionally, the distribution of the samples in the two‐dimensional input space is shown in **Figure 5**. The focus on the ridge can be clearly observed. In comparison, repeating the experiment in a one‐shot setting with an LHD of 51 points and keeping all other settings results in an RRSE score of only 0.11. This is mostly caused by an inadequate detection of the non‐linearity (only a few samples are near the non‐linearity), whereas a lot of samples are on the smoothly varying surfaces.

### 5.2. Optimization: gas cyclone

The next illustration is more involved and is a joint modelling process aiming both at design optimality as well as feasibility. The goal is to optimize the seven‐dimensional geometry of a gas cyclone. These components are widely used in air pollution control, gas‐solid separation for aerosol sampling and industrial applications aiming to catch large particles such as vacuum cleaners. An illustration is given in **Figure 6**. In cyclone separators, a strongly swirling turbulent flow is used to separate phases with different densities. A tangential inlet generates a complex swirling motion of the gas stream, which forces particles toward the outer wall where they spiral in the downward direction. Eventually, the particles are collected in the dustbin (or flow out through a dipleg) located at the bottom of the conical section of the cyclone body. The cleaned gas leaves through the exit pipe at the top. The cyclone geometry [41] is described by seven geometrical parameters: the inlet height

Design optimality for the gas cyclone, however, is not represented by a unique and optimal number. In fact, it is represented by two different aspects: the pressure loss (expressed by the Euler number) and the cut‐off diameter, which is expressed by the Stokes number. Both aspects represent a trade‐off, and the proper scaling to sum both into a single objective is unknown. Hence, the correct way to proceed is to identify a set of Pareto optimal solutions representing the trade‐off inherent to this problem, rather than a single solution. Presented with this trade‐off, the designer has to make the final decision on what the optimal design should be. The shape of the Pareto front is informative and of great value for the designer (w.r.t. robustness of the solution for example). For the optimization, the two outputs of a simulation corresponding to these objectives are approximated with the built‐in Kriging models [22].

In addition, geometry optimization usually involves constraints as some configurations are not feasible, or result in gas cyclones, which do not work according to specifications. In addition to the Euler and Stokes objectives, the simulation of a sample also emits four binary values indicating if their corresponding constraint was satisfied or not (denoted as *c*_{1}, *c*_{2}, *c*_{3} and *c*_{4}). As each evaluation is computationally demanding, this additional knowledge should be included in order to maximize the probability of selecting feasible solutions. Each of the constraint outputs will therefore be approximated by a probabilistic SVM. For selecting new samples, both the Pareto optimality and the feasibility need to be considered. To this end, the HvPoI and PoF criteria are used, respectively. The PoF score is not computed explicitly (as it would be for a Gaussian Process) but interpreted as the SVM probability for the class representing feasible samples. This results in the following joint criterion:

For each iteration of the sequential design, this criterion is optimized resulting in a new sample maximizing the probability of a more feasible and more optimal solution. To start, an LHD is constructed in seven dimensions. The kernel bandwidth and regularization constant hyperparameters for the SVMs are optimized with the DIRECT optimization algorithm [42], using crossvalidation with the popular

**Figure 7** shows the scores for all evaluated samples on both objectives. The bullets and squares represent the samples forming the Pareto front. The black‐box constraints were learned as the optimization was proceeding, hence many evaluated samples do not satisfy the constraints (as this was unknown at that time): 8% of the evaluated samples however satisfy the constraints. Fortunately, four of them are Pareto optimal and represent valid optimal configurations. The exact optimal Pareto front was unknown upfront: in order to provide a comparison and verify the integrity of the identified solutions the traditional NSGA‐II [43] multi‐objective optimization algorithm was applied directly on the CFD simulations for a total of 10,000 evaluations. Clearly, the Pareto optimal solutions found by the surrogate‐based approach form a similar front. Our approach was able to identify these solutions with significantly fewer evaluations and hence significantly faster. Therefore, the identified Pareto front is a very good approximation given the budget constraint of 120 evaluations.

### 5.3. Satellite braking system

Finally, we demonstrate the use of surrogate models for performing analysis into the relevance of input parameters. A simulation of a braking system of the Aalto‐1 student satellite [44, 45] was modelled with sequential design. The brake consists of a small mass **Figure 8** illustrates the setup graphically.

Although the displacement angle effectively causes the braking effect, it must remain within an acceptable range to prevent a range of undesired effects and issues. To this end, a simulation for

To approach this problem, the process starts from a LHD of 20 points. Iteratively, a space‐filling sequential design selects 10 additional samples for evaluation on the simulator. The space‐filling approach incorporates both the maximin distance and projective properties as described in Ref. [46]. The model type selected is a GP with the Matérn **Figure 9**.

From the results, it can be clearly observed that the mass of the tip has no impact at all on the displacement angle. It can therefore be disregarded from any further design decisions. All other parameters do have some impact, as expected. The deployment speed clearly has the highest impact. This is intuitive, as faster deployment results in more significant differences between the angular velocities of the satellite and the tip.

## 6. Conclusion

The benefits of surrogate modelling techniques have proven to be successful to work with expensive simulations and expensive objectives in general. Within this flexible methodology, and complemented with intelligent sequential sampling (sequential design) several tasks ranging from design space exploration, sensitivity analysis to (multi‐objective) optimization with constraints can be accomplished efficiently with only a small number of evaluations. This greatly enhances the capabilities to virtually design complex systems, reducing the time and costs of product development cycles resulting in a shorter time‐to‐market. The strengths and possibilities were demonstrated on a few real world examples from different domains.