Pareto-Based Multiobjective Particle Swarm Optimization: Examples in Geophysical Modeling

It has been recently revealed that particle swarm optimization (PSO) is a modern global optimization method and it has been used in many real world engineering problems to estimate model parameters. PSO has also led as tremendous alternative method to conventional geophysical modeling techniques which suffer from dependence to initial model, linearization problems and being trapped at a local minimum. An area neglected in using PSO is joint modeling of geophysical data sets having different sensivities, whereas this kind of modeling with multiobjective optimization techniques has become an important issue to increase the uniqueness of the model parameters. However, using of subjective and unpredictable weighting to objective functions may cause a misleading solution in multiobjective optimization. Multiobjective PSO (MOPSO) with Pareto approach allows obtaining set of solutions including a joint optimal solution without weighting requirements. This chapter begins with an overview of PSO and Pareto-based MOPSO presented their mathematical formulation, algorithms and alternate approaches used in these methods. The chapter goes on to present a series synthetic modeled of seismological data that is one kind of geophysical data by using of Pareto-based multiobjective PSO. According to results matched perfectly, we believe that multiobjective PSO is an innovative approach to joint modeling of such data.


Introduction
As a conventional approach, least squares and linear programming optimization methods have been used modeling of geophysical data with a general form without requiring any special case. However, due to some disadvantages of these methods such as computational time and linearization problems, it has become inevitable to tend to new approaches for the researchers. Unlike traditional optimization methods, optimization of nonlinear models has been improved in two ways which are derivative based and non-derivative search methods. Unfortunately, one of the major disadvantages of derivative based methods is that solutions potentially trap at a local minimum because of depending on initial model. On the other hand, non-derivative search methods partly provide a global solution, however, significantly increase computing time.
In recent years, two approaches that are artificial intelligence and meta-heuristic optimization algorithms have been effectively put forward in geophysical modeling studies having complex and nonlinear models. Meta-heuristic optimization algorithms called as modern global optimization methods are based on systematic characteristics of biological, molecular, neurobiological and animal swarms. [1] PSO as one of the modern global optimization algorithms, has increased its popularity with rapid convergence compared to various optimization algorithms [2,3], especially when real model parameters are used. [4] PSO for multiobjective optimization has also been used in many studies in order to solve real world engineering problems having conflicted solutions between objective functions. [5] Despite this interest, very few researchers have studied MOPSO for joint modeling of geophysical data such as electromagnetic and gravity. [6,7] In fact, simultaneous optimization of multiobjective functions is also favored to increase uniqueness of model parameters in joint modeling of geophysical data that are generally sensitive to different physical phenomena. Multiobjective functions can be transferred into single-objective by combination of objective functions by using weighted-sum approach. However, it is very difficult to find reasonable and optimum weigting coefficients. [5] In engineering problems, subjective and unpredictable weightings used to objective functions are the primary cause of a misleading solution, because different sensitivities and unpredictable noise of different data sets lead to uncertainty in weighting. Pareto optimality approach is a good way to obtain set of possible solutions including an optimum solution in objective function space overcoming the use of weighting and combining.
The purpose of this chapter is to review the literature on Pareto-based MOPSO. This chapter first gives a brief overview of the methods and approaches used in PSO and Pareto-based MOPSO and to look at how mathematical formulations and general algorithms of these optimization techniques work. In order to show the superiority of Pareto-based MOPSO over weighting-sum approaches, the chapter proceeds as joint modeled of two synthetic seismological data using Pareto-based MOPSO and analyses the obtained results. The results demonstrate that Paretobased MOPSO is a useful approach to joint modeling of seismological data as explain in detail in our previous paper [8], of which TÜBİTAK is the publisher. Findings validate the usefulness of MOPSO as a technique to optimize objective functions simultaneously without weighting requirements. Finally, conclusion section gives a brief summary of MOPSO and critique of findings in modeling.

Particle swarm optimization
PSO method, inspired by social behavior of the bird or fish flock to reach to target in a shortest route was originally introduced by. [9] It was noticed that members of flocks suddenly change their movements as scattering and regrouping, when trajectory of swarms was observed. A striking feature of that was an effort of members to reduce their distance from both flock and surrounding members. It was found that knowledge within a flock was continuously shared by all members. PSO method was developed by defining each member in a flock as a particle. According to PSO, particles bearing an information of decision variable or model parameters take a position in an objective function space. Each particle is in communication and learning with other particles as schematically illustrated in Figure 1a. If a minimization problem is considered, each particle changes its position with a velocity vector and converges a global minimum as shown in Figure 1b.
The basic algorithm of PSO is outlined in Figure 2. According to this scheme, particles which velocities are initially assigned as zero are initiated by randomly selection between the minimum and maximum value of decision variables. After  each particle is evaluated with an objective function, particle with best fitness value is assigned as a global best solution. Particles move to the next position with a velocity vector where, subscript i is the number of particles and k is the number of iterations.
Position and velocity vector of a particle i at iteration k are shown as x ! k i and V ! k i , respectively. ω is the inertia weight term forced on velocity vector. c 1 and c 2 are local and global learning constants, γ 1 and γ 2 are uniformly random numbers in the range [0,1]. x ! pbest,i is the best position of particle i in the past. x ! gbest called as leader of a swarm is the position of the best particle indicating the best fitness value. If one particle position updated by a new velocity vector has a fitness value better than its previous best, new position is assigned as x ! pbest . If one of the particles has a best fitness value than the others, it is assigned as x ! gbest . These processes are repeated in a balance between exploration and exploitation until maximum iterations or minimum error criteria is not satisfied.

Selection of the PSO parameters
Velocity vector in Eq. (1) is controlled in the following factors: velocity limitation, learning coefficients and inertia weight. These factors are significantly contribute to prevent explosion in a swarm and ensure convergence.
Velocity limitation: In the PSO method, each particle changes its velocity vector by stochastic process. However, this leads a tendency exploding of velocity vector and exceeding limit of one particle constrained in a search space, especially if the particle is far from the personal and global best position. Therefore velocity limit approach, introduced by [10], has been exploited to avoid such problems. By this limitation of movement, it is ensured that particle does not exceed a search space. Fan and Shi [11] suggested that maximum and minimum velocity limits as follows: x max and x min are the upper and lower limit of a particle. N is an interval number defined by user considering the x max and x min .
Learning coefficients: Local and global learning coefficients illustrated in Eq. (1) control acceleration of a particle. While local learning coefficient (c 1 Þ enables particle to approach its individual best position, global learning coefficient c 2 ð Þ enables to be pulled towards the global best solution. These parameters sometimes called as acceleration coefficients are important to control the convergence. Higher learning coefficients can provide a rapid convergence, but also probably prevents exploration in an objective function space. In [12], author suggested that dynamic learning coefficients for optimization with a large number of unknown parameters. In addition, Ozcan and Mohan [13] concluded that if c 1 þ c 2 > 4, trajectory of a swarm goes to infinity. Inertia weight or constriction factor: Inertia weight term (ωÞ introduced by Shi and Eberhart [14] maintains a balance between exploitation and exploration by acting upon the current velocity vector. They proposed that linearly decreasing inertia weight from 0.9 to 0.4 in each iteration to increase performance of the algorithm. If ω > 1 is selected, particles tend to move away through boundary of a search space as velocity vectors increase continuously. [5] In [15], the author observed that particles indicate a stable and reasonable searching in case of c 1 þ c 2 ≤ 4. Nevertheless, it has even suggested a velocity limit by [16] to prevent drastically growing of velocity vector. In a major advance, Clerc [17] proposed a constriction factor (χ) to the PSO velocity vector equation. PSO algorithm with a constriction factor allowed us to rapid and reliable convergence without velocity limit requirements in case that ϕ ¼ ϕ 1 þ ϕ 2 > 4, where ϕ 1 and ϕ 2 are the learning coefficients substituted with c 1 and c 2 . The velocity vector is changed in the following way: Eq.
(3) reveals that χ influences not only current velocity, but also new velocity vector. Theoretical studies in [18] show that if ϕ 1 and ϕ 2 used as 2.05, therefore ϕ = 4.1, and χ = 0.7298, rapidly convergence is observed without using velocity limits. On the other hand, in [2] the authors analyzed that if ϕ 1 and ϕ 2 are used as randomly, swarm trajectory behaves as a combination of divergence and convergence.

Swarm topologies
Network of particles in a swarm is provided by kind of neighborhood topologies that regulate sharing of information between particles. Small-scale topologies are selected to use in solving complex problems, whereas large-scale topologies are selected for simpler problem. [19] Empty, local best, fully connected, star network and tree network are the list of the neighborhood network topologies which are generally used in PSO [20]. Empty graph: Each particle is connected to itself and compared with personal best position. In this topology, it is considered that c 2 as equal to zero.

Local best:
Local best x ! lbest indicates that the best particle position between one particle and its nearest neighbors. In this topology, particles are affected by both their personal best and local best which is also defined as equivalent to a leader. Each particle is connected to k nearest neighbors and if k = 2, network structure of particles is called ring topology as shown in Figure 3a. Fully connected graph: All particles in a swarm are connected to one another as illustrated in Figure 3b. In fully connected topology also called as star topology [21], each particle is affected by both its personal best and a global best defined as equivalent to leader. Kennedy [22] analyzed that this structure tends to ensure that a rapidly, but also a premature convergence. Star network: In this topology, one particle called as focal particle [23] is in connection with others as shown in Figure 3c. Focal particle defined as equivalent to a leader adjusts trajectory of a swarm by comparing positions of others.
Tree network: In this network structure, all particles are taken shape of a tree and, each of nodes of tree consists of exactly one particle as illustrated in Figure 3d. [24] One particle at the node is connected to both child nodes bottom in a tree and a parent node directly above in a tree. If one of the child node particle has a solution better than its parent solution, position of both particles are replaced. In this topology, parent particle is defined as equivalent to a leader.

Pareto multiobjective optimization
In multiobjective optimization problems (MOOP) indicated as an optimization of more than one objective function, "trade-off" solutions that are conflicted to each other are obtained rather than single solution. MOOP is generally defined to obtain decision variables in p dimensional of model space S, while simultaneously optimizing of a vector including all objective functions. [25] In MOOP, all objective functions in f x ð Þ vector may not be optimized simultaneously. Therefore Pareto optimality sometimes called as Pareto dominance approach is used to find set of possible solutions. According to Pareto approach, if x is better solution than y in minimization prob- If there is no other x ∈ S satisfies a condition such that f x ð Þ < f x ð Þ without deteriorating any other objective function, non-dominated solutions called Pareto optimal set P * ∈ S or Pareto front PF * ¼ f x ð Þ ∈  f jx ∈ P * g exists in an objective function space as schematically illustrated in Figure 4.
All particles of Pareto front generally spreads in two ways which are concave and L-shaped curves. Concave shaped spreading indicates that one objective function does not be ameliorated without deteriorating any other objective function(s). On the other hand, L-shaped spreading indicates that complete optimal solution which means that all objective functions can be optimized simultaneously. [26] Other remarkable distribution of Pareto front is its deviation from symmetry referenced by utopia point [0,0]. Deviation is an indication that one objective function has many local minimums relatively the others (minimum one or more) and/or modeling has not properly accomplished with defined parameter search space and used methods [27].

Pareto-based multiobjective particle swarm optimization
Several considerations that are increasing the diversity, maintaining the nondominated particles and selecting a leader should be taken into account when using MOPSO. [20] A general MOPSO algorithm modified from PSO algorithm by these considerations outlined below is shown in Figure 5.
Increasing of diversity: In MOPSO, it is expected to increase of diversity of all particles, while dominated particles are required to converge towards to Pareto-front. Mutation, niching and clustering techniques are generally used to increase diversity. Mutation operator is originally a stage of the genetic algorithm and it prevents to trap of particles a local minima by altering decision variables within the bounds of possibility. [28,29] suggested that polynomial and dynamic mutation rate which should be changed from higher to lower degree to keep increasing diversity in an entire search space. This feature allows particles to be explored, even in far field on this space. [30] Niching method, which was also developed for evolutionary algorithms [31], prevents a premature convergence that is a leading cause of a movement of particles towards a solution. [32] In NichePSO presented by [33], alteration of each particle is kept track after each iteration. If some particles do not change after several iterations, niche formed as a sub-swarm including both these particles and their surroundings is generated with a radius called σ share as illustrated in Figure 6. By comparing of two particles in niches, a control mechanism is provided to be adding particles to a subswarm or to be merging of sub-swarms. [34] In clustering technique introduced by [35], main swarm is divided into k sub-swarms and determined leader in each subswarm is moved to centroid of its sub-swarm. Diversity is provided by a migration of leaders to different sub-swarms. [36] Maintaining non-dominated particles: Maintaining non-dominated particles is key component of MOPSO. External archive technique allowing a storage of nondominated particles is generally used to spread and maintain Pareto-front. Global best, personal best or local best can be stored to an archive when extending PSO to MOPSO. However, as dominated particles tending to converge and join to Paretofront, a rapid increase number of particles in an archive is a disadvantage. [20] A restriction is applied to prevent rapidly growing number of non-dominated particles. In [37], they proposed ϵ-dominance value defined as set of boxes in objective function space as shown in Figure 7 controls adding of dominated particles to Pareto-front by sizing of an archive. In [38], their attempts to update of external archive is composed niching method used to add or remove particles from archive when it is full. In [30], geographically-based global repository approach is used to update of an archive based on removing of non-dominated solutions from cells which have fewer particles. Li [39] proposed non-dominated sorting method to determine Pareto-front in each iteration. In his study, rather than comparing of a new position with a personal best, temporary swarms have been generated by combining both of them. In fact, the fundamental condition maintaining nondominated particles is that a new archive should be dominant over previous archives. ϵ-dominance approach is one of the most feasible method to provide this condition. [20] Leader selection: Unlike one leader that guides a swarm in the PSO, nondominated particles of Pareto front set indicate presence of multiple leaders in MOPSO. Neighbor density and kernel density estimator methods based on density measurement techniques are suggested to determine particles that are most likely to be selected as a leader. In [40], they studied and further supported the concept of crowding factor according to nearest neighbors of a given particle. The value of  ϵ-dominance concept for minimizing two objective functions. Particle A, B and C are incommensurable particles, however, particle a is considered to be dominant to B and C due to more closer the lower left hand corner. estimators are determined by perimeter of a rectangular formed between one particle and neighbors as illustrated in Figure 8. Particles which have larger value of estimator are preferred to select as a leader.
The kernel density estimator is based on the niching method. A niche which has radius as σ share is generated for each of non-dominated particles. It is preferable selected a leader from the least crowded niche. [41] As for leader selection from preferred area, different ways are used. For instance, neighbor density estimator has been used by Raquel and Naval [42] to select a leader randomly from top a list ranking in descending order according to estimator values. If an external archive is full, particle has been removed by randomly selecting from bottom of the list. To select a leader, Coello Coello et al. [30] suggested to fitness values based on the number of particles in each hypercube formed by division of an objective function space. A leader is randomly selected from a hypercube determined by roulette-wheel selection scheme which its lengths of divided segments proportional to fitness values.

Selecting the methods and parameters for Pareto-based multiobjective particle swarm optimization
We joint modeled two synthetic seismological data obtained by response of fivelayered models by using MOPSO. In one synthetic model, shear wave velocities increase smoothly with depth (SM-1), while the other has a noticeable velocity contrast at third layer interface (SM-2). In the modeling stage, we simultaneously optimized two objective functions related with Rayleigh wave dispersion (RWD) and horizontal to vertical spectral ratio (HVSR) methods, which have different sensivities to physical phenomena. The estimated parameters were shear-wave velocities and depth obtained by cumulative sum of layer thicknesses in each layer. Parameter search space and more technical details were be given in [8].
Minimization was carried out between observed data obtained by response of synthetic models (y obs i Þ and calculated data (y cal i Þ obtained by response of models which their parameters changing with particle movements. Fitness value α ð Þ minimized at each of the objective function was calculated using, where n is the number of observations. Number of iterations and particles was selected as 200 and 100, respectively. Number of particles was selected as 100 referenced by 10-fold the number of model parameters (thicknesses and velocities for 5 layers). We used PSO parameters defined in [43] with velocity equation in Eq. (3) which provides an optimization without a velocity limitation. A mutation operator was used to increase diversity and mutation rate was selected as two percent. ϵ-dominance approach was selected to maintain of non-dominated particles and the value of ϵ was used as 0.01 for each of the objective functions. Number of hypercubes was selected as 30 for 100 particles recommended by [30]. Hypercubes having more particles were preferred for leader selection. After a fitness hypercube was determined by using Roulette-wheel approach, a leader was selected randomly from this hypercube.

Results and discussions
The results of synthetic models (SM-1 and SM-2) are shown in Figure 9 and Figure 10, respectively. As can be seen in Figure 9a and Figure 10a, MOPSO was successful in proving to obtain the real models with defined methods and parameters. These tests showing matched perfectly between synthetic data and model responses as seen in Figure 9b, Figure 9c, Figure 10b and Figure 10c, further strengthened our confidence in MOPSO modeling such seismological data. The most conspicuous observation to emerge from the result of both models was the distribution of dominated particles and Pareto front. Dominated particles in Figure 9d show a clear and balanced distribution for SM-1, however, as in Figure 10d dominated particles tend to towards the α RWD axis for SM-2. This results showed that if only HVSR optimization was done, many local minima and nonunique solutions could be obtained in modeling of data obtained by response of such as SM-2 model. In contrast to earlier findings in [27], deviation from symmetry and divergence from an objective function axis were not only related to a improperly accomplished modeling, but also to the characteristic of HVSR optimization in case of model such as SM-2 that its optimization indicated non-uniqueness solutions. In addition, as shown in Figure 9b, spreading of Pareto front in SM-1 modeling showed a concave shaped curve that means an objective function could not further minimized without maximized to other objective function. However, for SM-2 model, Pareto front showed a similar distribution to L-shape, indicating that both objective functions were minimized independently and simultaneously.

Conclusion
This chapter considered is an overview of methods and parameters generally used in PSO and Pareto-based MOPSO, in the first step. The parameters and methods used in the literature are reliable but do not have an obvious superiority to each other. In spite of that MOPSO has been widely applied in many real world engineering problems, a few attempts have been made in order to modeling geophysical data. Until our previous study, this methodology have not been applied modeling seismological data. A set of solutions demonstrated in this chapter support the idea that MOPSO provides a powerful methodology for joint modeling of data having different sensivity. The present findings have important implications in order to solve weighting problem encountered in joint modeling approach. A clear benefit of MOPSO in the prevention of weighting-sum approaches could be clearly identified in this analysis. A further important implication is that divergence of particles from an objective function axis is not only related to properly defined parametrization and accomplished modeling, but also to non-uniqueness solutions. Our investigations into this point are still ongoing and seem likely to confirm our hypothesis. The evidence from this study implies that MOPSO is considered to be very attractive for joint modeling geophysical data in the future. However, further work needs to be performed to confirm whether MOPSO is beneficial to joint modeling of different types of geophysical data.