A Simulated Annealing Based Optimization Algorithm A Simulated Annealing Based Optimization Algorithm

In modern engineering finding an optimal design is formulated as an optimization problem which involves evaluating a computationally expensive black-box function. To alleviate these difficulties, such problems are often solved by using a metamodel, which approximates the computer simulation and provides predicted values at a much lower computational cost. While metamodels can significantly improve the efficiency of the design process, they also introduce several challenges, such as a high evaluation cost, the need to effectively search the metamodel landscape and to locate good solutions, and the selection of which metamodel is most suitable to the problem being solved. To address these challenges, this chapter proposes an algorithm that uses a hybrid simulated annealing and SQP search to effectively search the metamodel. It also uses ensembles that combine prediction of several metamodels to improve the overall prediction accuracy. To further improve the ensemble accuracy, it adapts the ensemble topology during the search. Finally, to ensure convergence to a valid optimum in the presence of metamodel inaccuracies, the proposed algorithm operates within a trust-region framework. An extensive performance analysis based on both mathematical test functions and an engineering application shows the effectiveness of the proposed algorithm.


Introduction
With the advent of high performance computing, intricate computer simulations can now model real-world physics with high accuracy. This, in turn, transformed the engineering design process into a simulation-driven process in which candidate designs are evaluated by a computer simulation instead of a laboratory experiment. In this set-up, the design process is formulated as an optimization problem with several unique features [1]: • The computer simulation acts as the objective function, since it assigns objective values to candidate designs. However, the simulation is often a legacy code or commercial software whose inner workings are inaccessible to the user, and so there is no analytic expression that defines how candidate designs are mapped to objective values. Such a black-box function precludes the use of optimization algorithms that require an analytic function.
• Each simulation run is computationally expensive, that is, it requires a lengthy run time, and this severely restricts the number of candidate designs that can be evaluated.
• Both the real-world physics being modelled and the numerical simulation process may result in an objective function, which has a complicated landscape, and this further complicates the optimization process.
An optimization strategy that has proven effective in such problems is that of metamodelassisted optimization, namely, where a metamodel approximates the true expensive function and provides predicted objective values at a lower computational cost [1]. However, the integration of metamodels and ensembles into the optimization search introduces several challenges: • Locating a good solution requires effectively searching the metamodel, which can have a complicated landscape with multiple local solutions, and hence can be a difficult task.
• Since function evaluations are expensive, only a small number of evaluated vectors will be available and hence the metamodel will be inaccurate. In severe cases, the optimization search can converge to a false optimum, namely, which was artificially created by the metamodel's inaccurate approximation of the true expensive function.
• A variety of metamodels have been proposed, for example, artificial neural networks (ANNs), Kriging and radial basis functions (RBFs) [2,3], but the optimal variant is problem-dependant, and is typically not known a priori. In an attempt to address this issue, ensembles employ several metamodels concurrently and aggregate their individual predictions into a single one [4,5]. However, the effectiveness of ensembles depends on their topology, namely, which metamodels they incorporate, but the optimal topology is again typically unknown a priori, and may be impractical to identify by numerical experiments due to the high cost of each simulation run.
To address this issue, this chapter proposes an optimization algorithm that uses a hybridsimulated annealing (SA) search followed by a local refinement of solutions based on an SQP search. In this manner, this set-up achieves both an effective global and local search, which assists in locating good solutions. To address the issue of inaccurate metamodel predictions, the proposed algorithm operates within a trust region (TR) approach that manages the metamodel and ensures convergence to a valid optimum. Finally, to further improve the prediction accuracy the proposed algorithm uses ensembles and selects the most suitable topology during the search.
Accordingly, this chapter presents several contributions: (a) a hybrid SA-SQP metamodelassisted search, (b) integration within a TR framework, and (c) continuous selection of the ensemble topology during the search. An extensive performance analysis based on both mathematical test functions and an engineering problem of airfoil shape optimization shows the effectiveness of the proposed algorithm.
The remainder of this chapter is organized as follows: Section 2 provides the background information, Section 3 describes the proposed algorithm, then Section 4 provides an extensive performance evaluation and discussion, and finally Section 5 concludes this chapter.

Optimization techniques
As mentioned in Section 1, simulation-driven problems often include a challenging objective function, such as having multiple local optima or lacking an analytic expression. In such settings, classical gradient-based optimization algorithms can perform poorly, and therefore researchers have explored various alternative approaches [1]. One class of such algorithms is the nature-inspired metaheuristics, which include evolutionary algorithms (EAs), particle swarm optimization (PSO), and SA. The latter was inspired by the physics of the annealing process in metals: initially a metal has a high temperature and so particles have a high probability of moving to a higher energy state. As the metal cools in the annealing process, particles are more likely to move to a lower energy level than to a higher one. The process is completed once the system has reached the lowest possible energy level, typically its temperature of equilibrium with the environment. In the realm of global optimization, these mechanics have been translated into a heuristic search, which starts with an initial vector, namely, a candidate solution. During the search, the current vector is perturbed so that new vectors in its vicinity are obtained. These vectors are evaluated and replace the original vector if: (a) they are better, or (b) they are worse and with probability p, which is analogous to the energy state changes. As p decreases, the search is transformed from being explorative to being more localized. Two main parameters of the SA algorithm are the annealing schedule, namely, the duration of the search process, which is determined by the manner that the temperature is decreased, and the selection probability function, which defines the dynamic threshold for accepting a worse solution. Algorithm 1 gives a pseudocode of a baseline SA algorithm, while Section 3 gives the specific parameter settings of the SA implementation used in this study.
The underlying mechanism of the SA algorithm was originally proposed by Metropolis et al. [6], while the more common annealing-inspired formulation was later proposed by Černy [7] and Kirkpatrick et al. [8]. Since then the algorithm has been widely used in the literature, and some recent examples include [9] in finance, [10] in machine learning, [11] in chemical engineering and [12] in production line machine scheduling.

Expensive black-box problems
Computationally expensive optimization problems are common across engineering and science. Typically in such problems, candidate designs are parameterized as vectors of design variables and sent to the simulation for evaluation, as shown in Figure 1.
As mentioned in Section 1, metamodels are often used in such settings to alleviate the high computational cost of the simulation runs [2,3]. However, the integration of metamodels into the search introduces two main challenges: • Prediction uncertainty: Due to the high cost of the simulation runs only a small number of designs can be evaluated, which in turn degrades the prediction accuracy of the metamodel and leads to optimization with uncertainty regarding the validity of the predicted objective values [13]. To address this, the metamodel needs to be updated during the search to ensure that its accuracy is sufficient to drive the search to a correct final result. To accomplish this, the proposed algorithm is structured based on the TR approach [14]. In this way, the algorithm performs a sequence of trial steps that are constrained to the TR, namely, the region where the metamodel is assumed to be accurate. Based on the success of the trial step, namely, if a new optimum has been found, the TR and the set of vectors are updated. Section 3 presents a detailed description of the TR approach implemented in this study.
• Metamodel suitability: Various metamodel variants have been proposed, but the optimal variant is problem dependant and is typically unknown a priori [15]. To address this, ensembles employ multiple metamodels concurrently and combine their predictions into a single one to obtain a more accurate prediction [4,5,16]. Figure 2 shows an example based on the Rosenbrock function.
The ensemble topology, namely, which metamodel variants are incorporated, is typically determined a priori and is unchanged during the search. However, the topology directly affects the prediction accuracy, and hence an unsuitable topology can degrade the search effectiveness. As an example, RBFs, RBFN and Kriging metamodels (described in Appendix 1) were used to generate several ensemble topologies. The same testing and training samples were used with all the topologies (sized 30 and 20 vectors, respectively), such that each ensemble was trained with training sample and its prediction accuracy was tested on the testing sample. The prediction accuracy was measured both with the root mean squared error (RMSE), and the R 2 indicator,  where f ðxÞ, andf ðxÞ are the true and the ensemble predicted values, respectively, and x i , i ¼ 1…n are the testing vectors. Table 1 presents the test results, from which it follows that the prediction accuracy varied with the topology and that no single topology was the overall best.

R+RN R+K RN+K
Addressing this issue, the algorithm proposed in this study selects the most suitable ensemble topology during the search, as explained in the following section.

Proposed algorithm
This section describes the algorithm proposed in this study, which is designed to address the issues described in Sections 1 and 2. The proposed algorithm operates in five main steps, as follows: Step 1. Initialization: The algorithm begins by generating an initial sample of vectors based on the optimal Latin hypercube design (OLHD) method [17]. The method ensures that the resultant sample is space-filling, namely, adequately covers the search space, which in turn improves the prediction accuracy of the metamodels. The OLHD method exploits patterns of point locations for optimal Latin hypercube designs based on a variation of the maximum distance criterion to produce near-optimal designs efficiently. After generating the sample, the vectors are evaluated with the true expensive function.
Step 2. Ensemble selection: Step 2.1. Following the cross-validation (CV) procedure [18], the vectors that have been evaluated so far are split into a training set and a testing set. In turn, each candidate metamodel variant is trained with the training set and is tested on the testing set. The prediction accuracy is measured with the root mean squared error (RMSE), which is calculated as where x i , i ¼ 1…l, are the vectors in the testing set, f ðx i Þ is the exact and already known function value at x i , and m j ðxÞ is the prediction of the jth metamodel variant that has been trained with the training set.
Step 2.2. The evaluated vectors are again split into a training and a testing set. For each candidate ensemble topology, the following steps are performed: • Each metamodel variant that is active in the ensemble is assigned an ensemble weight, which is calculated as and the overall ensemble prediction is then given by where n m is the number of participating metamodels in the ensemble and m i ðxÞ is a metamodel that has been trained with the new training sample.
• The prediction accuracy of the ensemble is estimated based on its RMSE on the testing set where x i , i ¼ 1…k, are the vectors in the testing set.
Step 2.3. After repeating the process with all the candidate topologies, the one having the lowest RMSE, as defined in Eq. (6), is selected for the current optimization iteration. A new ensemble is then trained based on the selected topology and on all the evaluated vectors. This ensemble will be used in the following step.
It is emphasized that during the above process, the true expensive function is not used, and hence the process requires negligible computational resources.
For the specific implementation in this chapter, three well-established metamodels were considered for the ensembles: RBF, RBFN and Kriging that are all described in Appendix 1, while Table 2 presents the candidate ensemble topologies. Also, the split ratio between the training and testing vectors was calibrated by numerical experiments, as described in Appendix 2.
Step 3. Optimization trial step: The proposed algorithm now seeks an optimum based on the ensemble prediction in the bounded TR around the current best solution (x b ), namely where Δ is the TR radius. The search is performed by using a hybrid approach [19], which is composed of an initial search performed by a SA algorithm, and followed by a localized refinement of the solution with a sequential quadratic programming (SQP) algorithm. The main settings of the SA algorithm were based on existing studies [20][21][22], as follows: • Initial temperature: This was done to increase the annealing schedule for higher dimensional problems, which require a more extensive search.
• Temperature decrease function: A 5% decrease, namely where t is the current time-step counter. This way the temperature initially decreases at a slow rate, which assists the exploration search, but the reduction is much faster in later stages when the search is localized.
• Acceptance probability function: A decaying exponent, namely where x n is the new vector being examined and x c is the current vector.
During the entire process, objective values are obtained only from the ensemble at a negligible computational cost, and hence the SA and SQP were able to evaluate a large number of candidate vectors.
Step 4. TR updates: The best vector found in the previous step (x ⋆ ) is evaluated with the true expensive function, and the following updates are performed [14]: The trial step was successful since the vector found was indeed better than the current best one. This indicates that the ensemble prediction is accurate, and accordingly the TR radius is doubled.
• If f ðx ⋆ Þ ≥ f ðx b Þ and there are sufficient vectors in the TR: The search was unsuccessful since the solution found is not better than the current best one. However, since there are enough vectors in the TR to train the metamodels the failure is attributed to the TR being too large, and accordingly the TR radius is halved.
• If f ðx * Þ ≥ f ðx c Þ and there are insufficient vectors in the TR: As above but now the failure is attributed to having too few vectors in the TR to train the metamodels with. Therefore, a new vector is sampled in the TR, as explained below.
The implementation described above differs from the classical TR framework in two main aspects: • The TR is contracted only if the number of vectors in the TR is above a threshold n, which is done to avoid premature convergence. The threshold parameter was calibrated through numerical experiments, as described in Appendix 2.

Metamodels participating in the ensemble
• In Step 4, a new vector may be sampled in the TR to improve the prediction accuracy. This vector should be located in a region that is sparse with vectors, namely away from existing TR vectors. To accomplish this efficiently, the proposed algorithm generates a Latin hypercube design (LHD) sample of vectors in the TR and selects the one having the largest minimal distance (max-min criterion) from the existing TR vectors.
To conclude the description, Algorithm 2 gives the pseudocode of the proposed algorithm.

Benchmark tests based on mathematical test functions
To evaluate its effectiveness, the proposed algorithm was applied to the established set of mathematical functions [23]: Ackley, Griewank, Rastrigin, Rosenbrock, Schwefel 2.13 and Weierstrass in dimensions 5-40, as listed in Table 3. These test functions represent challenging features such as high multimodality, deceptive landscapes and noise, and are therefore adequate for the testing purpose.
For a comprehensive evaluation, the proposed algorithm was benchmarked against the following four reference algorithms: • V1: A variant of the proposed algorithm, which is identical to it in operation, except that it used a single metamodel (RBF), but no ensembles.
• V2: A variant of the proposed algorithm, which is identical to it in operation, except that it used a fixed ensemble, which consisted of the RBF, RBFN and Kriging metamodels, but without any topology selection.  • Evolutionary algorithm with periodic sampling (EA-PS): The algorithm leverages on the concepts in references [24,25]. It uses a Kriging metamodel and a real-coded evolutionary algorithm (EA). The accuracy of the metamodel is maintained by periodically evaluating a small subset of the EA population with the true objective function, and using these sampled vectors to update the metamodel.

Function Definition, f ðxÞ¼ Domain
• Expected improvement with covariance matrix adaptation evolutionary strategy (CMA-ES) (EI-CMA-ES) [26]: The algorithm combines a covariance matrix adaptation evolutionary strategy (CMA-ES) algorithm with a Kriging metamodel, and uses the expected improvement (EI) framework to select new vectors for evaluation based both on the response and uncertainty in the metamodel prediction. This way the algorithm balances between a local search around the current best solution, and an explorative search in less-visited regions of the search space.
This testing set-up was used for several reasons: (i) any gains brought by using ensembles are highlighted through the comparisons to the V1 and V2 algorithms and (ii) the performance of the proposed algorithm is benchmarked against representative metamodel-assisted algorithms from the literature, namely, the EA-PS and EI-CMA-ES algorithms. Each algorithm-objective function combination was tested over 30 runs to support a valid statistical analysis, and the number of evaluations of the true objective function was limited to 200, to represent the tight optimization budget in expensive problems.

Tables 4 and 5
give the resultant test statistics of mean, standard deviation (SD), median, minimum (best) and maximum (worst) objective value for each algorithm-objective function combination. It also gives the statistic α that indicates the significance level at which the results of the proposed algorithm were better than those of the competing algorithms, measured at either the 0.01 or 0.05 levels, while an empty entry indicates no such statistically significant advantage. The α statistic was obtained by using the Mann-Whitney non-parametric test [27].
Test results show that the proposed algorithm performed well, as it obtained the best mean and median statistics in 8 out of 12 cases. Its results also had a statistically significant advantage over the other algorithms in 30 out of 48 cases. The performance advantage of the proposed algorithm was particularly pronounced in the high-dimensional cases, where it obtained the best mean and median in five out of six test functions. In terms of repeatability of performance, its SD was often slightly higher but was competitive with the best SD in each test case.
Overall, the proposed algorithm consistently outperformed the V1 and V2 variants, which shows that selecting the ensemble topology during the search improved the search effectiveness, both when compared to using a fixed metamodel or a fixed ensemble topology. Also, the proposed algorithm consistently outperformed the two reference algorithms from the literature, which shows that it compares well with existing approaches.
The analysis of the experiments also examined how the ensemble topology was updated, to examine if either a single or multiple topologies were predominantly selected. Accordingly, Figure 3 shows representative plots of the ensemble topologies selected during a run with the Ackley-10D function and another with the Rosenbrock-20D function. It follows that in both Computational Optimization in Engineering -Paradigms and Applications cases, different topologies were selected, which indicate that no single topology was the overall optimal, and further justifies the proposed approach.

Engineering test problem
The proposed algorithm was also applied to a representative simulation-driven engineering problem, where the goal is to find an airfoil shape, which maximizes the lift L and minimizes the drag (aerodynamic friction) D at some prescribed flight conditions. In practise, the design requirements for airfoils are specified in terms of the non-dimensional lift and drag coefficients, c l and c d , respectively, defined as where L and D are the lift and drag forces, respectively, ρ is the air density, V is the aircraft speed, and S is the reference area, such as the wing area. The relevant flight conditions are the aircraft altitude, speed and angle of attack (AOA) that is the angle between the aircraft velocity and the airfoil chord line. Figure 4 gives a schematic layout of the airfoil problem.
Candidate airfoils were represented with the Hicks-Henne method [28], such that an airfoil profile was defined as where y b is a baseline profile taken here to be the NACA0012 symmetric profile, and b i are the shape basis functions [29].
where α i ∈½−0:01; 0:01 are the variables, as shown in Figure 4. Ten basis functions were used for the upper and lower airfoil profiles, respectively, which resulted in a total of 20 variables per airfoil. Also, for structural integrity the thickness of an airfoil between 0.2 and 0.8 of its chord line was required to be greater than a critical thickness t ⋆ ¼ 0:1. The lift and drag coefficients of candidate airfoils were obtained by using XFoil, an aerodynamics simulation code for subsonic isolated airfoils [30]. Each airfoil evaluation required up to 30 s on a desktop computer, so evaluations were not prohibitively expensive and the tests could be completed within a reasonable time.
Based on the above discussion, the objective function used was where p is the penalty for violation of the thickness constraint. The flight conditions were an altitude of 30,000 ft, a speed of Mach 0.7, namely 70% of the speed of sound, and an AOA of 2°.
Tests were performed along the set-up of Section 4.1, and Table 6 gives the resultant test statistics. The trends are consistent with those of the test functions, and the proposed algorithm
outperformed the other candidate algorithms also here. It obtained the best mean and median statistics, and had a competitively low SD.

Conclusion
While computer simulations can improve the efficiency of the engineering design process, they also introduce new optimization challenges. Metamodels aim to alleviate the challenge of a high evaluation cost by providing computationally cheaper approximations of the true expensive function.
While metamodels can significantly improve the search effectiveness, they also introduce various challenges, such as identifying an optimal combination of metamodel variants, and effectively searching the metamodel landscape. To address these issues, this chapter has proposed a hybrid algorithm that uses SA to perform a global search, and it then refines the solutions with an SQP local search. To further enhance its effectiveness, the proposed algorithm uses ensembles of metamodels and selects the most suitable ensemble topology during the search. Lastly, to ensure convergence to an optimum of the true expensive function in the light of the inherent metamodel inaccuracies, the proposed algorithm operates within a TR approach such that the optimization is performed through a series of trial steps.
In an extensive performance analysis, the proposed algorithm was benchmarked against two implementations without selection of the ensemble topology, and two reference algorithms from the literature, which also do not use topology adaption. Analysis of the results shows that the proposed algorithm consistently outperformed the other algorithms: it achieved better results in 30 out of 48 cases with mathematical test functions, and also performed well with a simulation-driven problem. Its performance advantage was evident from the superior mean and median statistics that was obtained across the tests and was particularly pronounced in the high-dimensional problems.
The analysis also showed that during the optimization search the optimal topology continuously varied during the search, and that no single topology was the overall optimal. This observation further supports the approach proposed of selecting the ensemble topology during the search.
Overall, the solid performance of the proposed algorithm shows the merit of the hybrid SA +SQP algorithm proposed. It also suggests that the proposed algorithm could be applied to problems from a variety of academic domains, such as scheduling, systems engineering and model calibration, to name a few.
observed responses precisely (Lagrangian interpolation). For a constant drift function, the metamodel becomes where κðxÞ is the local correction. The latter is defined by a stationary Gaussian process with mean zero and covariance Cov½κðxÞκðyÞ ¼ σ 2 cðθ;x;yÞ, where cðθ;x;yÞ is a user-prescribed correlation function. With the Gaussian correlation function [3] cðθ;x;yÞ ¼ ∏ the above metamodel becomes mðxÞ ¼β þ rðxÞ T R −1 ðf −1βÞ (17) whereβ is the estimated drift coefficient, R is the symmetric matrix of correlations between all interpolation vectors, f is the vector of objective values and 1 is a vector with all elements equal to 1. r T is the correlation vector between a new vector x and the sample vectors, namely, r T ¼ ½cðθ; x;x 1 Þ, …;cðθ;x;x n Þ The estimated drift coefficientβ and varianceσ 2 are calculated aŝ For an isotropic (single correlation parameter) Kriging metamodel the optimal value of the parameter is obtained by maximizing the metamodel likelihood [31] θ ⋆ : maxfjRj 1=nσ2 g (20) • Radial basis functions (RBF): The metamodel approximates the true objective function with a set of basis functions, namely where φ i ðxÞ are basis functions of the form where x i is a sampled vector and c is a constant. The coefficients α i and c are determined from the interpolation conditions In this study, the widely used Gaussian basis function [32] was used: where τ controls the width of the Gaussians, and is determined by cross-validation [33,34].
• Radial basis function network (RBFN): A variant of the RBF approach but in which the number of basis functions is smaller than the sample size, which can improve the prediction accuracy in certain scenarios. The metamodel is given by where the coefficients α i are determined from the least-squares interpolation conditions and x i , i ¼ 1…n, are the sample vectors, f is the vector of corresponding objective function values, and φ j ðxÞ, j ¼ 1…n are the basis functions, which in this study were taken as the Gaussian functions described above. The basic function centres x j are obtained by clustering the sampled vectors and using the resultant cluster centres.
• n: Performance was similar across the different settings and accordingly the intermediate setting of n ¼ 0:5d was selected.
• s: The best performing split ratio was 40-60 between the training and testing subsets.
The above settings were then used during the numerical experiments described in Section 4.

Yoel Tenne
Address all correspondence to: y.tenne@ariel.ac.il Department of Mechanical and Mechatronic Engineering, Ariel University, Ariel, Israel