Study for Application of Artificial Neural Networks in Geotechnical Problems

The geotechnical engineering properties of soil exhibit varied and uncertain behaviour due to the complex and imprecise physical processes associated with the formation of these materials (Jaksa, 1995). This is in contrast to most other civil engineering materials, such as steel, concrete and timber, which exhibit far greater homogeneity and isotropy. In order to cope with the complexity of geotechnical behaviour, and the spatial variability of these materials, traditional forms of engineering design models are justifiably simplified. Moreover, geotechnical engineers face a great amount of uncertainties. Some sources of uncertainty are inherent soil variability, loading effects, time effects, construction effects, human error, and errors in soil boring, sampling, in-situ and laboratory testing, and characterization of the shear strength and stiffness of soils. Although developing an analytical or empirical model is feasible in some simplified situations, most manufacturing processes are complex, and therefore, models that are less general, more practical, and less expensive than the analytical models are of interest. An important advantage of using Artificial Neural Network (ANN) over regression in process modeling is its capacity in dealing with multiple outputs or responses while each regression model is able to deal with only one response. Another major advantage for developing NN process models is that they do not depend on simplified assumptions such as linear behavior or production heuristics. Neural networks possess a number of attractive properties for modeling a complex mechanical behavior or a system: universal function approximation capability, resistance to noisy or missing data, accommodation of multiple nonlinear variables for unknown interactions, and good generalization capability. Since the early 1990s, ANN has been increasingly employed as an effective tool in geotechnical engineering, including: constitutive modelling (Agrawal et al., 1994; Gribb & Gribb, 1994; Penumadu et al., 1994; Ellis et al., 1995; Millar & Calderbank, 1995; Ghaboussi & Sidarta 1998; Zhu et al., 1998; Sidarta & Ghaboussi, 1998; Najjar & Ali, 1999; Penumadu & Zhao, 1999); geo-material properities (Goh, 1995; Ellis et al., 1995; Najjar et al., 1996; Najjar and Basheer, 1996; Romero & Pamukcu, 1996; Ozer et al., 2008; Park et al., 2009; Park & Kim, 2010; Park & Lee, 2010; Bearing capacity of pile (Chan et al., 1995; Goh, 1996; Bea et al., 1999; Goh et al., 2005; Teh et al., 1997; Lee & Lee, 1996; Abu-Kiefa, 1998; Nawari et al., 1999; Das & Basudhar, 2006, Park & Cho, 2010); slope stability (Ni et al., 1995; Neaupane and Achet, 2004; Ferentinou & Sakellariou, 2007; Zhao, 2007; Cho, 2009); liquefaction (Agrawal


Mathematical modeling of artificial neuron
A neuron is an information-processing unit that is fundamental to the peration of a neural network. As shown in Fig. 2, we may identify three basic elements of the neuron model. A set of synapses, each of which is characterized by a weight or strength of its own. Specifically, a signal x j at the input of synapse j connected to neuron k is multiplied by the synaptic weight w kj . It is important to make a note of the manner in which the subscripts of the synaptic weight w kj are written. The first subscript refers to the neuron in question and the second subscript refers to the input end of the synapse to which the weight refers. The weight w kj is positive if the associated synapse is excitatory; it is negative if the synapse is inhibitory. An adder for summing the input signals, weighted by the respective synapses of the neuron. An activation function for limiting the amplitude of the output of a neuron. The activation function is also referred to in the literature as a squashing function in that it squashes (limits) the permissible amplitude range of the output signal to some finite value. Typically, the normalized amplitude range of the output of a neuron is written as the closed unit interval [0, 1] or alternatively [-1, 1]. The model of a neuron also includes an externally applied bias (threshold) w k0 = b k that has the effect of lowering or increasing the net input of the activation function. In matrix form, we may describe a neuron k by writing the following matrix.

Activation function
In this section, three of the most common activation functions are presented. An activation function performs a mathematical operation on the output. More sophisticated activation functions can also be utilized depending upon the type of problem to be solved by the network. As is known, a linear function satisfies the superposition concept. The function is shown in Fig. 3(a). The mathematical equation for the above linear function can be written as where is the slope of the linear function. If the slope is 1, then the linear activation function is called the identity function. The output (y) of identity function is equal to input function (u). Although this function might appear to be a trivial case, nevertheless it is very useful in some cases such as the last stage of a multilayer neural network.
As shown Fig. 3(b), sigmoidal(S shape) function is the most common nonlinear type of the activation used to construct the neural networks. It is mathematically well behaved, differentiable and strictly increasing function. A sigmoidal transfer function can be written in the following form: where is the shape parameter of the sigmoid function. By varying this parameter, different shapes of the function can be obtained as illustrated in Fig. 3(b). This function is continuous and differentiable. Tangent sigmoidal function is described by the following mathematical form:

Multilayered Neural Network
The source nodes in the input layer of the network supply respective elements of the activation pattern (input vector), which constitute the input signals applied to the neurons (computation nodes) in the second layer (i.e. the first hidden layer). The output signals of the second layer are used as inputs to the third layer, and so on for the rest of the network. Typically, the neurons in each layer of the network have as their inputs the output signals of the preceding layer only. The set of output signals of the neurons in the output layer of the network constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input layer. The commonest type of artificial neural network consists of three groups, or layers, of units: a layer of "input" units is connected to a layer of "hidden" units, which is connected to a layer of "output" units (see Fig. 4). The activity of the input units represents the raw information that is fed into the network. The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. The behaviour of the output units depends on the activity of the hidden units and the weights between the hidden and output units.

Back-propagation
Backpropagation algorithm (BP) is the most widely used search technique for training neural networks. Information in an ANN is stored in the connection weights which can be thought of as the memory of the system. The purpose of BP training is to change iteratively the weights between the neurons in a direction that minimizes the error E, defined as the squared difference between the desired and the actual outcomes of the output nodes, summed over training patterns (training dataset) and the output neurons. The algorithm uses a sample-by-sample updating rule for adjusting connection weights in the network. In one algorithm iteration, a training sample is presented to the network. The signal is then fed in a forward manner through the network until the network output is obtained. The error between the actual and desired network outputs is calculated and used to adjust the connection weights. Basically, the adjustment procedure, derived from a gradient descent method, is used to reduce the error magnitude. The procedure is firstly applied to the connection weights in the output layer, followed by the connection weights in the hidden layer next to output layer. This adjustment is continued backward through to network until connection weights in the first hidden layer are reached. The iteration is completed after all connection weights in the network have been adjusted. Rumelhart, Hinton, and Williams (1986) popularized the use of BP for learning internal representation in neural networks. Despite their popularity, BP has the drawback of converging to an optimal solution slowly when the gradient search technique is applied. That is, a BP using the gradient search technique has two serious disadvantages: the gradient search technique converges to an optimal solution with inconsistent and unpredictable performance for some applications and when trapped into some local areas, the gradient search technique performs poorly in getting a globally optimal solution. The most major problem during the training process of the neural network is the possible overfitting of training data. That is, during a certain training period, the network no longer improves its ability to solve the problem. In this case, the training stopped in a local minimum, leading to ineffective results and indicating a poor fit of the model. In order to attempt to prevent these disadvantages, researchers have modified the basic algorithm to try to escape local optima and find the global solution.
Numerous modifications have been implemented in order to overcome this problem. Over-fitting problem or poor generalization capability happens when a neural network over learns during the training period. As a result, such a too well-trained model may not perform well on unseen data set due to its lack of generalization capability. Several approaches have been suggested in literature to overcome this problem. The first method is an early learning stopping mechanism in which the training process is concluded as soon as the overtraining signal appears. The signal can be observed when the prediction accuracy of the trained network applied to a test set, at that stage of training period, gets worsened. The second approach is the Bayesian Regularization. This approach minimizes the over-fitting problem by taking into account the goodness-of-fit as well as the network architecture. Early stopping approach requires the data set to be divided into three subsets: training, test, and verification sets. The training and the verification sets are the norm in all model training processes. The test set is used to test the trend of the prediction accuracy of the model trained at some stages of the training process. At much later stages of training process, the prediction accuracy of the model may start worsening for the test set. This is the stage when the model should cease to be trained to overcome the over-fitting problem. The Bayesian Regularization approach involves modifying the usually used objective function, such as the mean sum of squared network errors (MSE) The modification aims to improve the model's generalization capability. The objective function in Eq. (5) is expanded with the addition of a term, w E which is the sum of squares of the network weights: where the and are parameters which are to be optimized in Bayesian framework of MacKay (1992a;1992b). It is assumed that the weights and biases of the network are random variables following Gaussian distributions and the parameters are related to the unknown variances associated with these distributions.

Designing the structure of Artificial Neural Network
Structural design of NN involves the determination of layers and neurons in each layer and selection of training algorithm. The selection of only effective input parameters to the NN is one of the most difficult processes since: (1) there may be interdependencies and redundancies between parameters, (2) sometimes it is better to omit some parameters to reduce the total number of input parameters, and therefore computational complexity of the problem and topology of the network, and (3) NN is usually applied to problems where there is no strong knowledge about the relations between input and output, and therefore it is not clear which of the input parameters are most useful. Moreover, other design parameters of NN architecture, such as the number of neurons in input layer, number of hidden layers, number of neurons in hidden layers and number of neurons in output layer, are found using several repeated runs of the system based on trial and error method. There is no clear framework to select the optimum NN architecture and its parameters (Chung and Kusiak, 1994;Kusiak and Lee, 1996). Nevertheless, some research work has contributed to determine the number of hidden layers, the number of neurons in each layer, selecting the learning rate parameter, and others.

Determining the number of hidden layers
Determining the number of hidden layers and the number of neurons in each hidden layer is a considerable task. The number of hidden layers is usually determined first and is a critical step. The number of hidden layers required depends on the complexity of the relationship between the input parameters and the output value. Most problems only require one hidden layer, and if relationship between the inputs and output is linear the network does not need a additional hidden layer at all. It is unlikely that any practical problem will require more than two hidden layers(THL). Cybenko (1989) and Bounds et al. (1988) suggested that one hidden layer (OHL) is enough to classify input patterns into different group. Chester (1990) argued that a THL should perform better than an OHL network. More than one hidden layer can be useful in certain architectures, such as cascade correlation (Fahlman & Lebiere, 1990) and others. A simple explanation for why larger networks can sometimes provide improved training and lower generalization error is that the extra degrees of freedom can aid convergence; that is, the addition of extra parameters can decrease the chance of becoming stuck in local minima or on "plateaus". The most commonly used training methods for back-propagation networks are based on gradient descent; that is, error is reduced until a minimum is reached, whether it be a global or local minimum. However, there isn't clear theory to tell how many hidden units are needed to approximate any given function. If only one input availavle, one sees no advantages in using more than one hidden layer. But things get much more complicated when two or more inputs are given. The rule of thumb in deciding the number of hidden layers is normally to start with OHL (Lawrence, 1994). If OHL does not train well, then try to increase the number of neurons. Adding more hidden layers should be the last option.

Determining the number of hidden neurons
The choice of hidden neuron size is problem-dependent. For example, any network that requires data compression must have a hidden layer smaller than the input layer (Swingler, 1996). A conservative approach is to select a number between the number of input neurons and the number of output neurons. It can be seen that the general wisdom concerning selection of initial number of hidden neurons is somewhat conflicting. A good rule Formula Comments 21 hi =+ Hecht-Nelson (1987) used Kolmogorov's theorem which any function of I variavles may be represented by the superposition of set of 2i+1 univariate functions-to derive the upper bound for the required number of hidden neurons. Lawrence and Fredrickson (1988) suggested that a best estimation for the number of hidden neurons is to half the sum of inputs and outputs. Moreover, they proposed the range of number of hidden neurons.

311
of thumb is to start with the number of hidden neurons equal to half of the number of input neurons and then either add neurons if the training error remains above the training error tolerance, or reduce neurons if the training error quickly drops to the training error tolerance.

Determining the number of training data
In order to train the neural network well, the number of data set must be carefully decided. An over fitted model could approximate the training data well but generalize poorly to the validation data set. On the other hand, an underfitted model would generalize to the validation data set well but approximate the training data poorly. To avoid over fitting and underfitting is to determine the best number of training observations. No general guidelines are available to achieve this. However, Lawrence and Fredrickson (1988) suggested the following rule of thumb.

ANN applications in geotechnical engineering 4.1 Constitutive Modelling of geo-materials
During the past decades, increasing interest has been shown in the development of a satisfactory formulation for the stress-strain relationships of geo-materials that incorporates a concise statement of nonlinearity, inelasticity and stress dependency based on a set of assumptions and proposed failure criteria. In spite of the considerable complexities of these constitutive models, and due to an inadequate understanding of the mechanisms and all factors involved, it is not possible to capture the complete material response along all complex stress paths and densities. Furthermore, the degree of complexity of these constitutive models (in many cases) inhibits their incorporation into general purpose numerical codes, thus restricting their usefulness in engineering practice (Shin and Pande, 2000). On the other hands, for the convenience of practical in engineering, the model seems to be established simple enough. In the process of establishing the model, the conventional method oversimplifies the soil mechanic behavior. When simplifying the model, parameters have been artificially lessened and only a few of them could be applied in setting up the soil constitutive model while the remaining large number of test data is neglected. Eventually, the model will be poor. Unlike conventional constitutive models, it needs no prior knowledge, or any constants and/or assumptions about the deformation characteristics of the geo-materials. Other powerful attributes of ANN models are their flexibility and adaptivity, which play an important role in material modeling . When a new set of experimental results cannot be reproduced by conventional models, a new constitutive model or a set of new constitutive equations, needs to be developed. However, trained ANN models can be further trained with the new data set to gain the required additional information needed to reproduce the new experimental results. These features ascertain the ANN model to be an objective model that can truly represent natural neural connections among variables, rather than a subjective model, which assumes variables obeying a set of predefined relations (Zhu et al., 1998). So far, ANNs have been applied to the constitutive modeling of rocks, clays, sands, gravels and other geo-materials (Zhu et al., 1998;Millar & Calderbank, 1995;Penumadu et al., 1994;Ellis et al., 1995;Penumadu & Zhao, 1999;Najjar & Ali, 1999) Ghaboussi and co-workers originally proposed an NN-based framework for constitutive modeling in geomechanics . They introduced a concept of nested adaptive NNs, which considers the nested structure of the material test data, e.g. dimensionality, stress path dependency or drainage conditions. By means of the finite element (FE) method and the autoprogressive training algorithm proposed in , they trained NNs with experimental nonuniform triaxial test data, in order to capture and reproduce the non-linear response of the soil without conventional concepts of the theory of plasticity. In addition, further research proved that the NN-constitutive models can be successfully embedded within the FE codes to compute the consistent tangent stiffness matrix (Shin and Pande, 2000;Hashash et al., 2004). Hashash et al. (2004) demonstrated that a tangent stiffness matrix can be derived from the NN-based material models, using the explicit formulation represented by network parameters. However, the main drawback of the NN-constitutive models is that it is valid only for a specific material for which a new NN has to be adopted each time. Moreover, a material model loses its 'flexibility', which is inherent in the case of conventional models and which is controlled by parameters explicitly describing concepts of plasticity, such as yield surface, flow rule and hardening law.

Properties of geo-materials
In geotechnical engineering, empirical relationships are often used to estimate certain engineering properties of soils. Using data from extensive laboratory or field testing, these correlations are usually derived with the aid of statistical methods. The relationships between soil parameters are clearly complex, but the degree of interaction enables a degree of statistical correlation to be established, suggesting the promise of a potential for estimation. Developing engineering correlations between various soil parameters is an issue discussed by Goh (1995). Goh used neural networks to model the correlation between the relative density and the cone resistance from cone penetration test (CPT), for both normally consolidated and over-consolidated sands. Laboratory data, based on calibration chamber tests, were used to successfully train and test the neural network model. The neural network model used soil parameters as inputs and the compression index as a single output (Ozer et al., 2008;Park & Lee, 2010). The ANN models was found to give higher coefficients of correlation than empirical equations for the training and testing data, respectively, which indicated that the neural network was successful in modelling the complex relationship between the compression index and the other soil parameters. Many other studies have successfully used ANNs for modelling soil properties. Ellis et al. (1995) developed an ANN model for sands based on grain size distribution and stress history.  showed that neural network-based models can be used to accurately assess soil swelling, and that neural network models can provide significant improvements in prediction accuracy over statistical models. Romero and Pamukcu (1996) showed that neural networks are able to effectively characterise and estimate the shear modulus of granular materials. Agrawal et al. (1994); Gribb and Gribb (1994) and  all used neural network approaches for estimating the permeability of clay liners. Park et al. (2010) used ANN models to develop an empirical model for the resilient modulus of subgrade soils and subbase materials from basic material properties and in-situ conditions related to stresses. Park and Kim (2010a) proposed an ANN model to predict the unconfined compressive strength of reinforced lightweight soil (RLS). RLS consisting of dredged soil, cement, airfoam, and waste fishing net is considered to be an eco-friendly backfilling material in construction because it provides a means to recycle both dredged soil and waste fishing net.
Several series of laboratory tests were performed to investigate the unconfined compressive strength of RLS in various mixing ratios. It may be difficult to find an optimum mixing ratio of RLS considering the design criteria and the construction's situation using the limited test results because the unconfined compressive strength is complicatedly influenced by various mixing ratios of admixtures. As a result, in order to expedite the field application of reinforced lightweight soil, an appropriate prediction method is needed. However, since the strength of RLS is strongly influenced by the mixing ratio of each admixture (i.e., cement, water, air foam, and waste fishing net), it is difficult to empirically formulate a mathematical relationship between the strength and the admixture content of the composite materials. An ANN model that predict the strength of RLS at a given mixing ratio was developed using experimental test results performed on various mixing admixture contents.

Air-foam
Dredged soil Cement Waste fishing net  6. Architecture for the developed artificial neural network (Park & Kim, 2010) www.intechopen.com Artificial Neural Networks -Application 314 training and testing data. As shown in Fig. 7, the developed ANN model is able to obtain the complex behaviors between the compressive strength of RLS and the mixing ratios of admixitures. It has been proven that NN is well suited to modeling the complex behavior of most geo-materials which, by their very nature, exhibit extreme variability.
The unconfined compressive strength with variation of input parameters (Park & Kim, 2010)

Pile capacity
Design of axial loaded pile can be done be solving equations of static equilibrium whereas design of lateral loaded piles requires solution of nonlinear differential equations (Poulos & Davis, 1980). Other semi-empirical methods used for lateral load capacity of piles are due to Hansen (1961), Broms (1964) and Meyerhof (1976). Although numerous investigations have been performed over the years to predict the behavior and capacity of piles, the mechanisms are not yet entirely understood. Predicting pile capacity is a difficult task because there are a large number of parameters affecting the capacity which have complex relationships with each other. It is extremely difficult to develop appropriate relationships between various essential parameters, including the soil condition, pile type, driving condition, time effect, and others. Baik (2002) illustrated that these factors include the soil condition (type of soil, density, shear strength, etc.), information related to the piles' shape (diameter, penetration depth, whether the tip of pile is open-ended or closed-ended, etc.), and other information (driving method, driving energy, set-up effect, etc.). Although many methods predicting pile resistance have been presented, they did not appropriately consider the various parameters that affect pile resistance. The main criticism of these methods is that they oversimplify the complicated mechanism of pile resistance, and the soil characteristics, type of pile, and information on driving conditions are not properly taken into account. Hence, ANN models could be an alternate approach for the above case. Goh (1995) used back propagation neural network (BPNN) to predict the skin friction of pile in clay. Goh (1995; observed that ultimate load capacity of driven timber, pre-cast concrete and steel piles in cohesionless soils using ANN was found to outperform the methods like Engineering News formula, the Hiley formula and the Janbu formula. Chan et al. (1995) and Teh et al. (1997) found that the static pile capacity predicted by using neural network have excellent agreement with the same obtained by using the commercially available computer code CAPWAP (GRL, 1972). Lee and Lee (1996) used neural networks to predict the ultimate bearing capacity of piles based on model and in situ pile load test results. Abu-Kiefa (1998) used a generalized regression neural network (GRNN), which is a type of probabilistic neural network to predict the pile load capacity considering separately the tip, the shaft and total load capacity of piles driven in cohesionless soils. Nawari et al. (1999) have used neural networks for prediction of axial load capacity of steel H-piles, steel piles and pre-stressed and reinforced concrete piles using both BPNN and GRNN. They also predicted the top settlement of drill shaft due to lateral load based on in situ testing. Park and Cho (2010) applied an artificial neural network (ANN) to predict the resistance of driven piles in dynamic load tests. They collected 165 data sets for driven piles at various construction sites in Korea. Predictions on the tip, shaft, and total pile resistance were made for piles with available corresponding measurements of such values. The results indicate that the ANN model serves as a reliable and simple predictive tool to appropriately consider various essential parameters for predicting the resistance of driven piles. The proposed neural network model has seven nodes in the input layer, eight nodes in the hidden layer, and three nodes in the output layer (Fig. 8). In order to find an appropriate combination of transfer functions providing good correlation in training and testing stage, various combinations using log-sigmoid, tan-sigmoid and linear was applied to hidden layer and output layer.

Slope stability
Slope stability is important because slope failures or landslides can lead to the loss of life and property. Slope failures are complex natural phenomena that constitute a serious natural hazard in many countries. Limited data and unclearly defined problems often complicate the study of landslides (Nieuwenhuis 1991). To prevent or mitigate the landslide damage, slope-stability analyses and stabilization require an understanding and evaluation of the processes that govern the behavior of the slopes. The factor of safety based on an appropriate geotechnical model as an index of stability, is required in order to evaluate slope stability. Black-box models, based on the Artificial Neural Networks (ANNs), currently attract many researchers studying slope instability, owing to their successful performance in modeling non-linear multivariate problems (Ni et al., 1995;Neaupane & Achet, 2004;Sakellariou & Ferentinou, 2005;Cho, 2009;Wang et al., 2005). Many variables are involved in slope stability evaluation and the calculation of the factor of safety requires geometrical data, physical data on the geologic materials and their shear-strength parameters (cohesion and angle of internal friction), information on pore-water pressures, etc. To evaluate slope instability, the complexity of the slope system requires employment of new methods that are efficient in predicting this nonlinear characteristic of natural landslides.

Mathematical formulation
Training a neural network is conducted by presenting a series of example patterns for associated input and output values. Initially, when a network is created, the connection weights and biases are set to random values. The performance of an ANN model is measured in terms of an error criterion between the target output and the calculated output.
The output calculated at the end of each feed-forward computation is compared with the target output to estimate the mean-squared error, as shown in Eq.
where, Num = number of target data, T i = i th target output, t i = i th calculated output, respectively. An algorithm called back-propagation is then used to adjust the weights and biases until the mean-squared error is minimized. The network is trained by repeating this process several times. Once the ANN is trained, the prediction mode simply consists of propagating the data through the network, giving immediate results. In this study, the training data sets (inputs and target outputs) were normalized according to Eq. (8). Processing of the training data was performed so that the processed data were in the range of -1 to +1. The output of the network was trained to produce outputs in the range of -1 to +1, and we converted these outputs back into the same units used for the original targets. pn = 2 ( p -min p ) / ( max p -min p ) -1 , tn = 2 ( t -min t ) / ( max t -min t ) -1 (8) where p = a matrix of input vectors; t = a matrix of target output vectors; pn = a matrix of normalized input vectors; tn = a matrix of normalized target output vectors; max p = a vector containing the maximum values of the original input; min p = a vector containing the minimum value of the original input; max t = a vector containing the maximum value of the target output; and min t = a vector containing the minimum value of the target output. The normalized data were then used to train the neural network to obtain the final connection weights. The data from the output neuron have to be post-processed to convert it back into non-normalized units as shown in Eq. (9). t = 0.5⋅(tn + 1)⋅(max t -min t) + min t The normalized output is then obtained by propagating the normalized input vector through the network as follows: where W1 = a weight matrix representing connection weights between the input layer neurons and the hidden layer; B1 = a weight matrix representing connection weights between the hidden layer neurons and the output neuron; W2 = a bias vector for the hidden layer neurons; and B2 = a bias for the output neuron. The log-sigmoid function log sig is defined in Eq. (3). The output t is then obtained using Eq. (9) and (10): where the transfer function in the hidden layer is the log-sigmoid activation function a=1/(1 -e -n ), and the transfer function in the output layer is the linear function a=n.

Example calculating pile resistance using ANN model(Park and Cho, 2010)
The proposed neural network model has seven nodes in the input layer, eight nodes in the hidden layer, and three nodes in the output layer (Fig. 8). In this study, the soil types near the tip and shaft of pile were classified as shown in Table 2. Weight matrix and bias vector used in the ANN model are summarized in Table 3.
The normalized input vector pn could be calculated using eq. (8) and min p and max p vectors are given in Table 4.    , also called artificial neural system, is an information processing technique which is developed to simulate the functions of a human brain. Although ANN is an effective algorithm for solving complex engineering problems, only few approaches are available to design the network and most of them rely on iterative procedures. The design of network architecture mainly consists of the network layers, number of neurons of each layer, the transfer functions between layers, and the appropriate selections of a training algorithm. Especially, there are some kinds of input variables and values in which some of them may not carry important information to define the relationship between the input and output. These values can be ignored for the sake of solution convergence and efficiency, even sometimes at the cost of losing some input information. This provides smaller network models, which may be more desirable because of computational resource requirements and generalization capability. Therefore, the present study applies GA to select only effective inputs of network to decrease the time required to design smaller network and to reduce the computational complexity of problems. GA is used to find the best combination of only effective input parameters to provide a solution with less computational process.
To make an ANN more efficient, the computational complexity of ANN should be reduced. The computational complexity of network are generally affected by the number of neurons in each layer. And the network performs poorly as the model become larger and more complex. Although the design methodology of structure of ANN was described in the chapter three, the structure of ANN have to be designed by the trial and error approach, which runs repeatedly to find the network architecture. There is no general framework for the selection of the optimum ANN architecture and its parameters. Genetic Algorithm (GA) is a very effective approach in solving problems from a wide range of applications, which is difficult to solve with traditional techniques. GA works by repeatedly modifying a population of artificial structures through the application of genetic operators (Goldberg, 1989). There have been a large number of applications of the GA for the NN especially for the evaluation of the weights and the architecture as a search engine to improve the convergence speed of network. Yu and Liang (2001) presented a hybrid approach involving ANN and GA to solve job-shop scheduling problem. The computational ability of the hybrid approach, ANN's computability and GA's searching efficiency, is strong enough to deal with complex scheduling problems. Park & Kim (2011) proposed the hybrid design method based on ANN and GA. In their approach, a trained NN was employed to model the complex relationships among the parameters related to the geotechnical problems, whereas GA was applied to determine a set of optimal architecture of NN including input parameters, number of hidden layer and each layer's neuron, combination of transfer function between layers. The hybrid approach involving ANN and GA was developed and implemented. It consists of two unit: an NN prediction unit and a GA optimization unit. As shown in Fig. 10, their procedure can be summarized as follows: 1. First, an initial population, which contains a number of sets including information about the structure of ANN, is randomly generated. Then the individuals stored in it are fed into a NN-based prediction unit. 2. The predicted quality measures, which related to objective function, are used to indicate the fitness of the individuals. Evaluate the fitness of each individual according to the rank-based fitness. 3. Based on the fitness, select individuals and place them in the mating pool according to the rank-based fitness assignment and stochastic universal sampling. 4. Do crossover and mutation to the current population to create new individuals. 5. Insert a number of new random individuals replacing old individuals in the current population randomly. Make sure that the inserted individuals did not replace the best individual in the population. 6. Evaluate the fitness of each individual. 7. Steps 3-6 are called a generation, and they are repeated until a certain stop criterion is met. Typical stop criteria in a genetic algorithm run include a predefined maximum number of generations or an error smaller than a predefined value. In our genetic algorithm, maximum number of generations is used.

Creation of initial population
The hybrid ANN-GA approach starts with the generation of an initial population, which contains a predefined number of chromosomes (strings). Each chromosome is composed of binary strings that include the design information of ANN's structure. For example, in case of design condition given in Table 5, a chromosome created is presented in Fig. 11. parameters values Total number of input variables, N ini 7 Maximum number of hidden layer, N HL 2 Maximum node number in hidden layer, N HN = 15 15 Transfer functions which can be used between layers linear function, sigmoid function, tangent-sigmoid function Table 5. An Example of design information to determine the structure of ANN 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1

Input layer Hidden layer Transfer function
Node number of input layer, N in = 6 Number of hidden layer, N hl = 1(in case of 0, N hl = 1 and in case of 1, N hl = 2) Number of Node of hidden layer, N hn = 2 3 ×0+2 2 ×1+2 1 ×0+2 0 ×1=5 Information of transfer function : Determination of the combination of transfer functions using five binary strings Fig. 11. Design information about the structure of ANN included in chromosome (Park & Kim, 2011) This chromosome is composed of the eighteen binary strings. First seven binary strings in the chromosome include the information about the selection of input parameters. Six binary strings deal with the input variables used for the network architecture, with the 0 code indicating that a variable that cannot be used and with the 1 code indicating that a variable can be used. There are seven input variables, in this chromosome; seven binary strings present that the first six inputs should be kept, and the last two inputs removed. One Hidden layer was selected and five node was applied to the hidden layer. The information about transfer function is included in the other five binary strings. For example, a population of q individuals can be created as follows:

Genetic operation
GA is an optimization procedure that operates on sets of design variables. Each set is called a string and it defines a potential. Each string consists of a series of characters representing the values of the discrete design variables for a particular solution. The fitness of each string is the measurement of the performance of the design variables as defined by the objective function.
In its simplest form, a genetic algorithm consists of three operations: (1) reproduction, (2) crossover, and (3) mutation (Goldberg, 1989). Each of these operations is described below. The reproduction operation is the basic engine of Darwinian natural selection by the survival of the fittest. The reproduction process promotes the information stored in strings with good fitness values to survive into the next generation. The next generation of offspring strings is developed from the selected pairs of parent strings exposed to the application of explorative operators such as crossover and mutation. Crossover is a procedure in which a selected parent string is broken into segments, some of which are exchanged with corresponding segments of another parent string. In this manner, the crossover operation creates variations in the solutions population by producing new solution strings that consist of parts taken from a selected parent string. The mutation operation is introduced as an insurance policy to enforce diversity in a population. It introduces random changes in the solution population by exploring the possibility of creating and passing features that are nonexistent in both parent strings to the offsprings. Without an operator of this type, some possibly important regions of the search space may never be explored.

Definition of objective function
The objective function for each individual is computed by Eq. 12. The objective function of the i th individual, ObjV(i) is composed of the error function, E i , calculated as the difference between measured values and predicted values, and the penalty function, P i , calculated on the basis of the complexity of structure of ANN. The complex structure of an ANN model increases the probability that the value of the error function will decrease, but generality is more likely to decrease due to overfitting. Therefore, the penalty function, P i , is included in the objective function to control the decrease of generality.
where α = 0.01;N mea = the total number of measured data; T max = the maximum value among measured values; T k = k th measured value; and t k = kth predicted value; N i n = total number of nodes used in the ith chromosome; N max = the maximum number of nodes that can be applied to the structure of ANN in this study; CW i = total number of connections used in the ith chromosome; and CW max = the maximum number of connections that can be applied to the structure of ANN in this study.

Example analysis
The developed methodology was estimated through it's application to the geotechnical problem which ANN was used. The optimal ANN model obtained through opmization process based the developed GA-NN method was compared with the ANN model obtained in basis of researcher's experiance. Rahman et al. (2001) develoved an ANN model to predict the uplift capacity of suction caissons which are frequently used for the anchorage of large compliant offshore structures. The uplift capacity of the suction caissons is a critical issue in these applications. the developed neural network model has five nodes in the input layer, ten nodes in the hidden layer, and one nodes in the output layer. The five input parameters to the neural network model are the aspect ration of caisson (L/d), the undrained shear strength of the caly soil in which the caisson is installed (s u ), the relative depth of the lug to which the caisson forces is applied (D/L), the angle that the chain force makes with the horizontal (θ), and the loading rate defined with respect ot the soil permeability (T k ). the transfer functions applied to the hidden layer and output layer neurons are tan-sigmoid and log-sigmoid functions, respectively. Design information for the application of GA-NN method is given in Table 6. Through the optimization process using the developed method, the optimal structure of ANN model is obtained in Table 7. Three input variables, D/L, T k , and θ was removed through the optimization based GA-NN method. The optimized number of hidden node was decreased compared with Rahman et al. (2001)'s model. the transfer functions of the hidden layer and output layer were obtained as tan-sigmoid and linear functions, respectively.

Parameters Values
Number of initial population, N ind 400 Number of maximum generation, MAXGEN 40 Number of seleced individuals for genetic process, N sel 400×0.9 = 360 GA paraemters Probability of mutation, P mut 0.005 Maximum number of input node, IL max 11 Maximum number of hidden layer, HLmax 2 NN parameters Maximum node number in each hiddlayer, NH max 16 Table 6. Design condition for application of the developed GA-NN method *. I-H means transfer function connecting input layer to hidden layer, H-O means transfer function connecting hidden layer to output layer. Tansig and logsig means tangent-sigmoid and log-sigmoid function, respectively.  Fig. 14, the predictied uplift capacity of ANN model obtained by GA-NN method was compared with those of Rahman et al. (2001)'s ANN model. Even though three input variables were ommited in the prediction and also number of hidden node was decreased, it gave almost same correlation in traing and testing stage. the same the ANN model. It means that three input variable ommitted in input layer couldn't affect to output value, uplift capacity in the data sets given by Rahman et al. (2001).  In Fig. 15, the values of correlation coefficient, R 2 were obtained with variations of number of hidden node and transfer functions in the ANN model obtained by GA-NN method. The R 2 increased with the number of hidden nodes and then converged to a value after exceeding about seven node. In Eq. 11, Even though the value of error function doesn't decrease any more, the value of complexity fuction should be continually increased with increasing hidden node after seven node. It implies that if seven hidden node gives the minimum value of objective function in comparison of other hidden nodes. Park & Kim (2011) suggested a hybrid NN/GA approach which is able to design optimal structure of ANN. The proposed approach combines the characteristics of GA and NN to overcome the shortcomings of NN structure design. The results of the proposed approach show that GA may enable the researchers to use NN more effectively and as an efficient tool for the solution of complex problems and reduces the risk of over designing the network architecture. The results of example showed that the performance of NN can be easily guaranteed with GA by selecting the optimal combination of input variables, number of hidden layer, node number of each hidden layer, and transfer functions between layers. GA reduces the complexity and over design of the network structure, as it helps to design smaller network architecture. Processing time of hybrid NN/GA for grouping parts can be decreased nearly to half of the preliminary NN-based approach. In summary, it is seen that GA enables to consider NN as an effective and efficient technique for the computationally complex type problems since it simultaneously reduces the computational complexity and enhances the prediction performance.  Fig. 15. The values of correlation coefficient with varing the design parameters of ANN model obtained by GA-NN method (Park & Kim, 2011) 6.2 Generalization of Neural Network using committee methodology 6.

Generaliability of Neural Network
Over-training is the most serious problem in neural network training. The drawback is that such a network is quickly over-trained which means that the network error is driven to a small value for the training samples but will become large when new input is presented. This indicates that the network has memorized the training samples but is not able to generalize to give reasonable answers on unseen input parameter combinations. As a result, such a too well-trained model may not perform well on unseen data set due to its lack of generalization capability. In this section, we focus on one particular problem with learning which is typical for neural networks: their generalization capabilities. Generalization is the ability to train with one data set and then successfully classify independent test sets.
Although continued training will increase the training set accuracy, the danger exists that test set accuracy decreases after a certain point. Approaches considered overcoming the over-fitting problems are early stopping, Bayesian Regularization approach, and others (Hirschen & Schäfer, 2006). One approach is to use early stopping, where the algorithm which minimizes the error function prevent it from doing so by stopping the algorithm at some point. In early stopping the available data is divided into a training, a validation and a test subset. The training set is used for training the network and updating the network weights. The validation subset is not used for training, yet the performance function indicates how the trained network responds to these samples. The validation error will normally decrease during the initial phase of training, as does the training set error. When the network begins to overfit the data, the error on the validation set will typically begin to increase. The test set is not used during the training, but utilized to compare different networks. If the response on the test set is too weak one may decide to restart the network training with a different division of data sets. The second approach is the Bayesian Regularization (MacKay, 1992a). This approach minimizes the over-fitting problem by taking into account the goodness-of-fit as well as the network architecture. The following is the short description about the Bayesian regularization. Typically, training aims to reduce the sum of squared errors F = E D . However, regularization adds an additional term; i.e. the objective function becomes F=α⋅E D +β⋅E W , where E W is the sum of squares of the network weights, and α and β are objective function parameters. The relative size of the objective function parameters dictates the emphasis for training. If α << β, then the training algorithm will drive the errors smaller. If α << β training will emphasize weight size reduction at the expense of network errors, thus producing a smoother network response (Foresee & Hagan, 1997). Single multilayer perceptrons (MLPs), consisting of an input layer, a hidden layer and an output layer, trained by a back-propagation algorithm (e.g. Levengerg-Marquardt, see Hagan, Demuth & Beale 1996, pp. 12-19), have been the conventional method of choice for most practical applications over the last decade. However, single MLP, when repeatedly trained on the same patterns, tends to reach different minima of the objective function each time and hence give a different set of neuron weights, because the solution is not unique for noisy data, as in most geotechnical problems. Therefore, a common approach is to train many nets, and then select the one that yields the best generalization performance. Nevertheless, selecting the single best neural network is likely to result in loss of information. While one network reproduces the main patterns, the others may provide the details lost by the first. The aim should be to exploit, rather than lose, the information contained in a set of imperfect generalizers. This is the motivation for the committee neural network approach, where a number of individually trained networks are combined to improve accuracy and increase robustness. Reddy & Buch (2003), Das et al. (2001), Gopinath & Reddy (2000), and Reddy et al. (1995) developed the concept of committee neural networks in which a large number of networks are trained. Based on initial testing with data obtained from subjects not used in training, a few networks are recruited into a committee. A final evaluation of the committee is conducted with data obtained from subjects not used in training or in initial testing.

Overviews of Committee Neural Network (CNN)
The committee technique for neural networks has been used for engineering problems (Reddy & Buch, 2003;Das et al., 2001;Gopinath & Reddy, 2000;Reddy et al., 1995). It was observed that the committee provided good estimates by means of averaging the results of individual networks in the committee, when the individual errors are uncorrelated. In the committee technique, several multiple neural networks (Fig. 16) are constructed and each individual neural network is trained independently with different initial synaptic weights using the training patterns as where TP i is a training patterns for the i th networks, and x i and t i are an input vector and target vector for the i th networks, respectively. Fig. 16. Illustration of committee of networks (Kim & Park, 2011) In Fig. 16, y i is an output vector calculated from the i th networks. A mapping function f i (x i ) is determined from the i th networks based on the training patterns TP i , and the error of this function can be calculated as where d i (x i ) is a desired function for the i th networks and is represented as d i (x i ) =E[t i |x i ] The desired function for the committee of networks is determined as where X={(x 1 , x 2 , … , x N )} and T={(t 1 , t 2 , … , t N )}. The committee mapping function can be represented as where, α i is a weighting factor for the i th networks, and Σα i =1. Therefore, the committee output can be calculated as Eq. (17) where C ij is a correlation matrix as C ij =E[e i e j ]. The local minima in determining the synaptic weights of a single MLP and the nonuniqueness of the solution due to the noise and a limited number of measurements may be resolved by employing the committee technique, which is a statistical approach averaging the outputs in the functional space. Kim and Park (2010) examined the feasibility of committee neural network theory for the improvement of accuracy and consistency of the neural network model on the estimation of preconsolidation pressure from the field piezocone measurements. The validity of the committee technique was also examined through the comparison with a single NN model, an empirical and a theoretical model. The case records from Chen (1994) are evaluated using neural network. A total of 119 case records are used for the training phase and 28 (randomly selected) for the testing phase. The proposed neural network model has four nodes in the input layer, seven nodes in the hidden layer, and one node in the output layer. In input layer, the total and effective overburden pressures σ vo , σ' vo , the cone tip resistance q T , and pore pressure measurement behind the cone tip u 2 were selected as input variables.

Case study for CNN
In their study, twenty single neural networks were trained from the different initial weights and biases but with the same training patterns. Fig. 17(a) and (b) show the coefficients of determination between measured and predicted preconsolidation pressure using the piezocone test result from each of the 20 single NNs for the training data and testing data, respectively. As shown in Fig. 17(a), coefficients of determination for training data from each NN model show very similar accuracy i.e., coefficients of determination R2 are almost around 0.93. However, the prediction results for testing data from each NN model aren't as accurate as those of the training data. They significantly fluctuates i.e., they range from 0.84 to 0.94, even though they have the same structural characteristics. Therefore, if a single NN is to be used, the best model must be selected which gives the relatively highest coefficient of determination among various models, e.g., second NN among 20 neural networks, which gives the coefficients of determination of 0.93 and 0.94 in the training and testing phase, respectively. However, in reality, it is quite difficult to choose the best model among a number of candidate NNs. Several committees of 20 NNs were constructed by changing the accumulated number n of NN in the committee to the equal weighting factor (α i =1/n). Prediction results of each committee are plotted in Fig. 18(a) and 18 (b) with respect to the increase of the accumulated number of NN for training data and testing data, respectively. As can be seen in Fig. 18 (a), the coefficients of determination of the committee neural network still increase with an increase of the number of accumulated NN in the committee for training data. Furthermore, (a) training stage (b) testing stage Fig. 17. Prediction performance of 20 MLPs which are optimized with different initial weights and biases by trial-and-error method (Kim & Park, 2010) as shown in Fig. 18 (b) for testing data, even though the R 2 value of each single NN model shows severe variation, the R 2 values of CNNs don't show such a dramatic variation after accumulating two NN models in the committee. From these figures, it can be concluded that any single NN model still cannot avoid the variation on the prediction due to initial dependency of weight and bias. However, such variation can be eliminated by connecting those NNs with an appropriate weighting factor α i as a committee neural network. Besides, by introducing Committee methodology, the conventional trial-and-error method for the optimization of the structure of a neural network can be used without any consideration of initial weight dependency and structural optimization. The authors observed that a committee neural network system is able to provide improved performance compared with a single optimal neural network. The committee technique has been found to be a very effective technique to improve the accuracy of the estimation of the preconsolidation pressure σ' p . The performance of NN has suffered because of its variation on the prediction of target value due to the localization of weight and bias during the optimization process on the structure. To overcome such problems of the single NN, in this study, structural optimization was carefully carried out by the trial-and-error method. Nevertheless, a single MLP, although it has successfully optimized structures, still cannot avoid the large variation on the prediction of preconsolidation pressure due to its initial weight dependency. Therefore, CNN is introduced to overcome the initial weight dependency of the single neural network model. Various committees of the single MLP were tested. It was found that if 8 single NNs, which have the same structure but have been trained with a different initial weight and bias, are accumulated in the committee with the same weighting factor i α , any variation on the prediction of the preconsolidation pressure from the piezocone test result can be simply and successfully eliminated. A comparison of the prediction results of CNN with the theoretical and empirical method shows that CNN is significantly more precise and consistent than conventional statistical and theoretical methods.
(a) training stage (b) testing stage Fig. 18. Improvement of estimation accuracy by accumulating the optimized single NNs in the committee (Kim & Park, 2010)

Conclusions
Artificial neural networks (ANNs) have been applied to various problem in geotechnical engineering. This include dams, earth retaining structures, environmental geotechnics, ground anchors, liquefaction, pile foundations, shallow foundations, slope stability, soil properties and behavior, site characterization, tunnels, underground openings, and other areas. In mathematical modeling to solve problem of above the geotechnical engineering area, the lack of understanding for complicated physical behavior is easily supplemented by either over-simplifying the problem or incorporating several assumptions into the model. Consequently, many mathematical models are apt to fail to simulate the complex behavior of geotechnical problems. In contrast, ANN methodology is based on the data alone in which the model can be trained on data sets to find the relationship between inputs and out values. There is no need to simplify the problem nor incorporate an any assumption. As geotechnical engineering exhibits extreme variability, ANNs are particularly amenable to modelling the complex behaviour of these materials and have generally demonstrated superior predictive performance when compared with traditional methods. In science and engineering problems, there is still no clear procedure to design NN architecture. Therefore, this often causes over design or inefficient network structures especially in the case of complex problems. Although considerable research has been accounted in NN and GA applications, their use in optimal NN design is quite recent. Nevertheless, it is seen that GA enables to consider NN as an effective and efficient technique for the computationally complex type problems since it reduces the computational complexity and enhances the search performance.
In training of ANN model, over-fitting problem or poor generalization capability happens frequently when a neural network over learns during the training period. As a result, such a too well-trained model may not perform well on unseen data set due to its lack of generalization capability. Several approaches have been suggested in literature to overcome this problem. The author introduced the feasibility of committee neural network theory for the improvement of accuracy and consistency of the neural network model on the geotechnical probleme.