Inter-Comparison of an Evolutionary Programming Model of Suspended Sediment Time-Series with Other Local Models

interaction, and a time series of a single variable can carry the information about the dynamics of the entire multi-variable system. The trajectories of the phase-space diagram describe the evolution of the system from some initial state, and hence represent the history of the system. This paper applies chaos theory to analyse the suspended sediment load of the Mississippi River data in a similar fashion to the other modelling strategies described above. It uses the local prediction method for training and testing, as outlined below, but it is a traditional practice to apply several methods to build evidence for the existence of chaotic signals in a particular data. These techniques employ the delay-embedding parameters of τ and m , which are unknown a-priori. The following methods are used in this chapter:


Introduction
The experience of applying evolutionary computing to time series describing local physical problems has benefited the modelling culture by showing that many different mathematical formulae can be produced to describe the same problem. This experience brings into the focus the roles of pluralism in the modelling culture as opposed to searching for the best model, where physical problems provide relevance and context to the choice of modelling techniques. Both of these roles are often overlooked and do not directly influence research agenda. Although the focus of this paper is on evolutionary computing, it also promotes a pluralistic modelling culture by studying other modelling techniques, as well as by keeping the role of physical problems in the foreground.
Estimating suspended sediment loads is a problem of practical importance and includes such problems as changing courses in rivers, loss of fertile soil, filling reservoirs and impacts on water quality. The study of these problems in the short-run are referred to as sediment transport and erosion for those in the long-run. Past empirical capabilities remain invaluable but are not sufficient on their own as management and engineering solutions often require an insight into the problem. Empirical knowledge has been incorporated into the body of distributed modelling techniques giving rise to sophisticated modelling software tools but their applications require a great deal of resources. There remains a category of problems, often referred to as time series analysis, which uses the sequences of time variations and predicts the future values. This category of models provides useful information to management of local problems. For instance, such models may be used to schedule dredging requirements or other maintenance activities. Time series analysis is developing into local management tools and it is a focus of this chapter.
The aim of this chapter is to predict suspended sediment load of a river into the future. Besides the traditional empirical Sediment Rating Curve (SRC), there are several strategies for analysing such time series and evolutionary computing is one of Artificial Intelligence (AI) approaches, which broadly include capabilities for searching and recognising patterns among others. This chapter also employs Artificial Neural Network (ANN), which is another AI approach. Yet another strategy is to regard time series as outcomes of many random drivers and this assumption is supported by a whole body of probabilistic approaches, where this chapter uses Multi-Linear Regression (MLR) analysis to model the same data. Over the past few decades, research has increasingly focused on the application of deterministic chaos (or chaos theory or dynamic systems) showing that many of apparently randomly varying system behaviours can be explained by deterministic chaos. The concept behind this modelling strategy is that the particular data can largely be explained by deterministic behaviour, where in time the system evolves asymptotically towards an attractor. Its random-looking variations are assumed to be an internal feature of the system and depending on its initial conditions, its state under a certain range may become highly erratic but with a predictable behaviour. Evidently, none of these strategies are identical and different models rarely produce identical results. This chapter therefore compares the performance of these modelling strategies for solving an engineering problem.
The study employs 26 years of the Mississippi River data recorded at Tarbert + RR Landings and involve both flows and suspended sediment load. The river discharges about 200 million metric tons of suspended sediment per year to the Gulf of Mexico, where it ranks about sixth in the world today.

Literature review
Sediment Rating Curve (SRC) is an empirical approach used by practitioners in the engineering studies of sediment and erosion problems. The log linear rating curve method has been used widely and Sivakumar and Wallender (2005) outline the many flaws associated with this technique, including the lack of fit due to missing variables (e.g. Miller, 1951), retransformation bias (e.g. Ferguson, 1986), and non-normality of the error distribution (e.g. Thomas, 1988). According to Sivakumar and Wallender (2005), the technique has been modified including, among others, use of separate curves for different seasons (Miller, 1951), stratifying the data according to the magnitude of flow and applying a separate curve for each stratum (Glysson, 1987), and use of a single multivariate model instead of multiple rating curves (Cohn et al., 1992). Sivakumar and Wallender (2005) argue that there is not a simple (and universal) 'water discharge-suspended sediment concentration-suspended sediment load' relationship. A brief overview of past studies is as follows. Kisi, et al (2008) review the application of ANN and neuro-fuzzy techniques to time series analysis of sediment loads at various timescales, uncertainty in the data. Variations of these techniques have also been reported by Jain (2001), Tayfur (2002), Cigizoglu (2004), Kisi (2004), Raghuwanshi et al. (2006), Cigizoglu & Kisi (2006). Other studies on the application of ANN to suspended sediment include that by Wang et al (2008), who applied ANN to derive the coefficients of regression analysis for their SRC model. Aytek and Kishi (2008) used the GP approach to model suspended sediment for two stations on the Tongue River in Montana, USA, and indicate that the GP formulation performs quite well compared to sediment rating curves and multi linear regression models.
Chaotic signals have also been identified in time series of suspended sediment loads by Jayawardena (2002, 2003), Farmer and Sidorowich, 1987). The outcomes revealed the usefulness of these methods towards an effective prediction capability.
Overall, a general understanding of the analysis of suspended sediment load is yet to emerge and one way to gain an insight into the problem is to carry out inter-comparison studies of the performance of a host of models applied to diversity of rivers of different shapes and sizes.

Understanding the problem
Sediment transport is concerned with entrained soil materials carried in water by erosion on the catchment and within channels. Sediment particles are categorised as follows (i) the saltation load (not discussed here); (ii) bedload (not discussed here) and (iii) suspended load including clay (< 62μm in particle diameter), silt and sand. Suspended load (both as "finegrained sediment" and "wash load") is directly a result of the turbulence in water and forms a large proportion of the transported load, where the turbulence is a measure of the energy in the water to carry the load.
Sediment discharge is a measure of the mass rate of sediment transport at any point in space and time and determines whether the load is being transported or deposited. The whole process comprises soil erosion, sediment transport and sediment yield, where the deposited load delivered to a point in the catchment is referred to as sediment yield and is expressed as tons per unit area of the basin per year, measured at a point. Estimation of sediment yield (and soil erosion) is essential for management but these and mathematical models are used to gain an insight into the underlying processes. Sediment yield is estimated by (i) direct measurement, (ii) using local time series models to predict future states; (iii) using mathematical models to study jointly both erosion and sediment processes.
Suspended sediment forms most of the transported load and can be affected by many parameters including rainfall, land use pattern, slope, soil characteristics, e.g. soil moisture content but their considerations lead to distributed models, which are complex. Recorded suspended sediment derives distributed models by serving them as boundary conditions or input sources but their inherent information is not tapped on. There is a case for local models to study the information contained in recorded sediment loads alone in terms of flow and sediment hydrographs. This chapter is concerned with the study of the suspended load of a river, as discussed below.

Study area
The flow-sediment time series data of a Mississippi river Station is used in the study, the location of which is shown in Figure 1. The gauge is situated at Tarbert + RR Landings, LA (USGS Station no. 07373291, latitude 30°57′40″, longitude 91°39′52″) and it is operated by the US Geological Survey (USGS) -the location map is shown in Figure 1. The Mississippi River discharges an average of about 200 million metric tons of suspended sediment per year to the Gulf of Mexico and to the ocean.

Review of data records
Daily suspended sediment measurements for the above station have been made available by the USGS from April 1949. The data used herein span over a period of about 26 years (amounting to 9496 datapoints) starting on October 1, 1949. Figures 2 show the variation of the daily suspended sediment and stream flow series observed at the above station.
Of the 26 water-years of the data sample of daily records of flow and suspended sediment (9496 datapoints), the first 25 water years of data (9131 datapoints) were used to train the models and the remaining 365 datapoints of daily records were used for testing. The statistical parameters of stream flow and sediment concentration data are shown in Table 1. These results show that the overall contribution of the datapoints in the test period is average; its individual characteristics in terms of kurtosis show the annual hydrographs to be less peaked and more flat but at the same time, the suspended sediment load during the year is significantly high. Thus, the minimum values during this year were significantly above the average but persistent and though less dynamic.

Overview of the models
The sediment rating curve method is the traditional method for converting measured flows to predict suspended sediment load and this paper aims to test the performance of evolutionary computing models but uses a host of other techniques for the inter-comparison purpose. These models are outlined in this section but evolutionary computing is explained in more detail. Their underlying notion is that past values contain a sufficient amount of information to predict the future values and a systematic way of representing this notion is purported in Table 2 in terms of a selection of models. These models, in essence, are reminiscent of regression analysis but GEP, ANN and MLR models approach the problem in their own individual ways to unearth the structure of the information inherent in time series. Notably, the SRC model is expressed by Model 1 and the deterministic chaos model is expressed by Model 0. These models will all be evaluated by using coefficient of Correlation (CC), Relative Absolute Errors (RAE) and Root Mean Square Errors (RMSE).

Input variables Output variables The Structure
Model

Sediment rating curve
Sediment rating-curve is a flux-averaged relationship between suspended sediment, S, and water discharge, Q, expressed as a power law in the form of: = b S aQ , where a and b are coefficients. Values of a and b for a particular stream are determined from data via a linear regression between (log S) and (log Q). The SRC model is represented in terms of Model 1 in Table 2. For more critical views on this model, references may be made Kisi (2005) and Walling (1977), among others.

Evolutionary computing
Evolutionary computing techniques apply optimisation algorithms as a tool to facilitate the mimicking of natural selection. A building block approach to generalised evolution driven by natural selection is yet to be presented, although Khatibi (2011) has outlined a rationale for it. Traditional understanding of natural selection for biological species is well developed, according to which the process takes place at the gene level of all individuals of all species carrying hereditary material for reproduction by inheriting from their parents and by passing on a range of their characteristics to their offspring. The process of reproduction is never a perfect copying process, as mutation may occur from time to time in biological reproductions involving the random process of reshuffling the genes during sexual reproduction. The paper assumes preliminary knowledge on genes, chromosomes, gene pool, DNA and RNA, where the environment also has a role to play. The environment for the production of proteins and sexual reproduction is different than the outer environment for the performance of the individual entities supported by the proteins or produced by sexual reproduction. The outer environment is characterised by (i) being limited in resources, (ii) having no foresight, (iii) organisms tend to produce more offspring than can be supported, a process that is driven by positive feedback loops, and (iv) there is a process of competition and selection. Some of these details are normally overlooked or simplified in evolutionary computing and therefore the paper stresses the point that natural selection takes place at the gene level and this is not directly applicable to that at the social level.
Facts on natural selection are overwhelming but there are myths as well, e.g. the myth of "the survival of the fittest" and this term is widely used in evolutionary computing. Although the fittest has a selective advantage to survive, this is not a guarantee for the survival in the natural world. An overview of the dynamics of natural selection in an environment is that (i) the environment can only support a maximum population of certain size, but there is also a lower size at the critical mass below which a population is at risk of losing its viability; (ii) there is a process of reproduction, during which natural selection operates at the gene level, although there are further processes operating at the individual levels beyond the direct reach of natural selection (e.g. interactions among the individuals catered for by other mechanisms or each individual is under selection pressure by the environment); (iii) the process of reproduction is associated with mutation, which gives rise to the production of gene pools.
A great deal of the above overview has been adopted in evolutionary computing, the history of which goes back to the 1960s when Rechenberg (1973) introduced evolution strategies. The variants of this approach include genetic algorithm (Holland, 1975), evolutionary programming (Fogel et al, 1966), genetic programming (Koza, 1992) and Gene Expression Programming (GEP), Ferreira (2001a). This paper uses the latter approach, which in a simple term is a variation of GP but each of these techniques have differences with one another. These techniques have the capability for deriving a set of mathematical expressions to describe the relationship between the independent and dependent variables using such functions as mutation, recombination (or crossover) and evolution.
This chapter is concerned with GEP and one of the important preliminary decisions in its implementations is to establish the models represented in Table 2 (Models 2-6). There is no prior knowledge of the appropriateness of any of these models and therefore this is normally fixed in a preliminary modelling task through a trial-and-error procedure. Whichever the model choice (Model 2 -Model 6 or similar other ones), each implementation of GEP builds up the model in terms of the values of the coefficients (referred to as terminals) and the operations (functions) through the procedure broadly outlined in Figure 3. The working of a gene expression program depicted in Figure 3 is outlined as follows. A chromosome in GEP is composed of genes and each gene is composed of (i) terminals and (ii) functions. The gene structures and chromosomes in GEP are illustrated for the solution that is obtained for the dataset used in this study (see Section 4.2). The terminals as their names suggest are composed of constants and variables and the functions comprise mathematical operations, as shown by (4.a)-(4.f). As the term terminal suggests, it comprises a set of values at the tail-ends of the genes of the chromosomes and these are made meaningful by the functions making up the other component of the genes of the chromosomes. In GEP, these are represented by a bilingual notation called Karva language of (i) genetic codes, which are not deemed necessary for a description here and (ii) expression trees (or parse trees), as illustrated by Figure 4.a and the recommended solution is shown in Figure 4.b, which is transcribed by Equation (4) in Section 4.2. The initial chromosomes of the initial population are no different than the solution shown in Figure 4.b but their difference is that the composition of each of the initial chromosomes is selected often in random and then GEP is expected to improve them through evolution by the strategy of selections, replication and mutation but there are other facilities that not mentioned facilitating a more robust solution and these include inversion, transposition and recombination. The improvements are carried out through selection from one generation to another and this is why this modelling strategy is called evolutionary computation. The main strength of this approach is that it does not set up any system of equation to predict the future but it evaluates the fitness of each chromosome and selects from those a new population with better performance traits.
The GEP employed in this study is based on evolving computer programs of different sizes and shapes encoded in linear chromosomes of fixed lengths, Ferreira, 2001a;Ferreira, (2001b). The chromosomes are composed of multiple genes, each gene encoding a smaller subprogram. Furthermore, the structural and functional organisation of the linear chromosomes allows the unconstrained operation of important genetic functions, such as mutation, transposition and recombination. It has been reported that GEP is 100-10,000 times more efficient than GP systems (Ferreira, 2001a;Ferreira, 2001b) for a number of reasons, including: (i) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc); (ii) the parse trees or expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.

Artificial Neural Networks (ANNs)
Whilst evolutionary programming emulates the working of Nature, ANNs emulate the workings of neurons in the brain. Both the brain and ANNs are parallel information processing systems consisting of a set of neurons or nodes arranged in layers but this is where the parallel ends. The actual process of information processing in the brain is a topical research issue but the drivers of ANNs are polynomial algebra and there is no evidence that the brains of humans, monkeys or any other animals employ algebraic computations such as optimisation methods. Although there is a great incentive to understand the working of the brain, it is not imperative to be constrained by it and the use of algebra in ANNs is not criticised here but awareness is raised as these two processes are not identical.
The ANN theory has been described in many books, including the text by Rumelhart et al. (1986). The application of ANNs has been the subject of a large number of papers that have appeared in the recent literature. There are various implementations of ANNs but the type used in this study is a Multi-Layer feedforward Perceptron (MLP) trained with the use of back propagation learning algorithm with the following functions: (i) the input layer accepts the data, (ii) intermediate layer processes them, and (iii) the output layer displays the resultant outputs. The number of hidden layers is decided in a preliminary modelling process by finding the most efficient number of layers through a trial-and-error procedure. Each neuron in a layer is connected to all the neurons of the next layer, and the neurons in one layer are not connected among themselves. All the nodes within a layer act synchronously.
This study implements the ANN models in terms of Models 1-6 of Table 2 and Figure 5 shows one of the implementation selected. For each of these models, the data passing through the connections from one neuron to another are multiplied by weights that control the strength of a passing signal. When these weights are modified, the data transferred through the network changes; consequently, the network output also changes. The signal emanating from the output node(s) is the network's solution to the input problem.
In the back-propagation algorithm, a set of inputs and outputs is selected from the training set and the network calculates the output based on the inputs. This output is subtracted from the actual output to find the output-layer error. The error is back propagated through the network, and the weights are suitably adjusted. This process continues for the number of prescribed sweeps or until a prescribed error tolerance is reached. The mean square error over the training samples is the typical objective function to be minimized. After training is complete, the ANN performance is validated. Depending on the outcome, either the ANN has to be retrained or it can be implemented for its intended use.

Multi Linear Regression (MLR)
An overview of the data presented in Figure 2 invokes the thought that other than the annual trend within the data, the underlying process is probably random and a more rational way of explaining the data would be through probabilistic approaches. One such method applied to the selected data is the Multi-Linear Regression (MLR) model. It fits a linear combination of the components of a multiple signals x (e.g. recorded flows and suspended sediment timeseries as defined by the Models 2-6 in Table 2) to a single output signal y, as defined by (1.a) (e.g. predicted suspended sediment load) and returns the residual, r, i.e. the difference signal, as defined by (1.b): Where i x is defined in Table 2 in terms of various models and i a values are called regression coefficients, which are estimated by using the least square or any other similar method. In this study, the coefficients of the regressions were determined using the least square method.

Chaos theory
A cursory view of the suspended sediment record of the Mississippi River in Figure 2 provides no clue to a strategy for its underlying patterns, if any, although annual trend superimposed on random variation may be an immediate reaction of a hydrologist. Another strategy to explore such possible patterns is through the application of chaos theory, more specifically through the "phase-space diagram" as shown in Figure 6 for this river data. A point in the phase-space represents the state of the system at a given time. The narrow dark band in the figure signifies strong determinism but its scattered band signifies the presence of noise and therefore there is a possibility to explain this set of data by chaos theory. The dark band signifies convergence of the trajectories of the phase-space with a fractal dimension towards the attractor of the data, where the dynamics of the system can be reduced to a set of deterministic laws to enable the prediction of its future states. Chaos theory is a method of nonlinear time series analysis and involves a host of methods, essentially based on the phase-space reconstruction of a process, from scalar or multivariate measurements of physical observables. This method is implemented in terms of Model 0 of Table 2. It is largely based on the representation of the underlying dynamics through reconstruction of phase-space, originally given by Takens, 1981. It is implemented in terms of two parameters of delay time and embedding dimension, according to which given a set of physical variables and an analytical model describing their interactions, the dynamics of the system can be represented geometrically by a single point moving on a trajectory, where each of its points corresponds to a state of the system. The phase-space diagram is essentially a co-ordinate system, whose coordinates represent the variables necessary to completely describe the state of the system at any moment.
One difficulty in its construction is that in most practical situations, information on every variable influencing the system may not be available. However, a time series of a single variable may be available, which may allow the construction of a (pseudo) phase-space. The idea behind such a reconstruction is that a non-linear system is characterized by self-interaction, and a time series of a single variable can carry the information about the dynamics of the entire multi-variable system. The trajectories of the phase-space diagram describe the evolution of the system from some initial state, and hence represent the history of the system. This paper applies chaos theory to analyse the suspended sediment load of the Mississippi River data in a similar fashion to the other modelling strategies described above. It uses the local prediction method for training and testing, as outlined below, but it is a traditional practice to apply several methods to build evidence for the existence of chaotic signals in a particular data. These techniques employ the delay-embedding parameters of τ and m, which are unknown a-priori. The following methods are used in this chapter: 1. Average Mutual Information (AMI) is used to estimateτ ; and the minimization of the False Nearest Neighbours to do that of the optimal values for the embedding dimension, m. AMI (Fraser and Swinney, 1986) defines how the measurements ( ) X t at time t are related, from an information theoretic point of view, to measurements τ . The average mutual information is defined as: where i is total number of samples. ( ( )) P X i and τ + ( ( )) P X i are marginal probabilities for measurements ( ) X i and is their joint probability. The optimal delay time τ minimises the function τ adds the maximum information on ( ) X i . The False Nearest Neighbours procedure (Kennel et al., 1992) is a method to obtain the optimum embedding dimension for phase-space reconstruction. By checking the neighbourhood of points embedded in projection manifolds of increasing dimension, the algorithm eliminates 'false Neighbours': This means that points apparently lying close together due to projection are separated in higher embedding dimensions. when the ratio between the number of false neighbours at the dimension m +1 and m is below a given threshold, generally smaller than 5%, each > + ' 1 m m is an optimal embedding. A poor reconstruction of few embedding states with several components is obtained if ' m is too large and the next analyses should not be performed.

Correlation Dimension (CD) method: is a nonlinear measure of the correlation
between pairs lying on the attractor. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas for stochastic systems it is infinite. For an m -dimensional phase-space, the correlation function ( ) m C r is defined as the fraction of states closer than r (Grassberger and Procaccia, 1983;Theiler, 1986): where H is the Heaviside step function,  i Y is the th i state vector, and N is the number of points on the reconstructed attractor. The number w is called Theiler window and it is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones. For stochastic time series ∝ ( ) m m C r r holds, whereas for chaotic time series the correlation function scales with r as: where D2, correlation exponent, quantifies the degrees of freedom of the process, and defined by: and can be reliably estimated as the slope in the ln ( ) m C r vs. ln( ) r plot.

Local Prediction Model:
The author's implementation of the local prediction method for deterministic chaos is details in Khatibi et al (2011) but the overview is that a correct phase-space reconstruction in a dimension m facilitates an interpretation of the underlying dynamics in the form of an m-dimensional map, T f ,according to where j Y and + j T Y are vectors of dimension m, describing the state of the system at times j (i.e. the current state) and + j T (i.e. the future state), respectively. The problem then is to find an appropriate expression for T f (i.e. T F ). Local approximation entails the subdivision of the T f domain into many subsets (neighbourhoods), each of which identifies some approximations T F ,valid only in that same subset. In other words, the dynamics of the system is described step-by-step locally in the phase-space. In this mdimensional space, prediction is performed by estimating the change of i X with time, which are observed values of discrete scalar timeseries, with delay coordinates in the mdimensional phase space. The relation between the points t X and (the behaviour at a future time p on the attractor) is approximated by function F as: In this prediction method, the change of t X with time on the attractor is assumed the same as those of nearby points, is determined by the d th order polynomial ( ) t F X as follows (Itoh, 1995): and A is the × + ( )! ! ! n m d m d Jacobian matrix which in its explicit form is: In order to obtain a stable solution, the number of rows in the Jacobian matrix A must satisfy: As stated by Porporato and Ridolfi (1997), even though F-values are first degree polynomials, the prediction is nonlinear, because during the prediction procedure every point ( ) x t belongs to a different neighbourhood and is therefore defined by different expressions for f (Koçak,1997).

Performance of sediment rating curve
The SRC model was implemented by using a simple least squares method leading to S =13.2Q 1.14 (3) The performance of this model is summarised in Table 3 and shown in Figure 7. Evidently, its performance is poor and the concern raised in the literature on this model is confirmed. This is a sufficient justification to search for reliable models.

Implementation of GEP
The preliminary investigation for the construction of a relationship between flows and suspended sediment in GEP requires: (i) the setting of the functions, as discussed below; (ii) the fitness function; and (iii) a range of other parameters, but the default values, given in Table 11, were sufficient in this study. The following functions were investigated: The performance of each function was investigated in terms of CC, MAE, and RMSE and the results are shown in Table 4.a for the training periods. The results show that (i) the model performances are more sensitive to the choice of independent variables than the function choices; (ii) the models not including suspended sediment time series perform poorly; and (iii) the model performance is not overly sensitive to the choice of the function. Appendix I, The performance of the GEP model is presented in Table 4.b, according to which there is not much to differences between performances of a number of the alternative models but (4.e) is selected in this study for the prediction purposes (its expression tree is given in Figure 4) and given below.

Implementation of ANN
ANN implements another AI approach to the data represented in Figure 2 by another strategy, as described in Section 3.4.3. A preliminary investigation was carried out to make decisions on the choice of the models given in Table 2 (Models 1-6) and the ANN structure in terms of the neuron structure of the various layers. Table 5 presents model structures investigated. The preliminary modelling task also included a normalisation function for the data. In this study, MATLAB was employed to develop the ANN model and its mapstd function was selected for the normalisation (further defaults values are given in Table 11). The investigated ANN model structures are defined in Table 5 and their results for both the training and testing periods are presented in Table 6.

Model6
Qt,Qt-1,Qt-2,St-1 5-12-1 5-12-1 The performances of Models 1-6 are shown in Table 6 in terms of the values of three statistical indices of CC, MAE and RMSE. The performance of different models in terms of CC is remarkably high but Model 4 (Qt , Qt-1 , St-1) produce less deviations, which is selected for the final run.

Implementation of the MLR model
The MLR modelling strategy was implemented using Mathematica to derive regression coefficients for both periods of model fitting (training in the AI terminology) and testing using different statistical indices (CC, MAE and RMSE) given in Table 7, which shows that Model 2 (Qt , St-1) performs relatively better than the others. The regression equation suggested by this technique is given by:

Implementation of the deterministic chaos model
A visual assessment for the existence of chaotic behaviour in the suspended sediment time series was presented in Figure 9, although it was not conclusive evidence but just invoked the possibility of the existence of a low-dimensional chaos. Traditionally, several techniques are employed to show the existence of low-dimensional chaos and below the results of the determination of the dimensions of the phase-state diagram are given: 1. Using the AMI method, the delay time, is estimated for the data as the intercept with the x-axis of the curves by plotting the values of the AMI evaluated by the TISEAN package (Hegger et al., 1999) against delay times progressively increased from 1 to 100. The value of delay time is calculated as the first (local) minimum in the variation of AMI against varying delay time from 1 to 100 day. The results are shown in Figure (9.a), signifying a well-defined first minimum at delay time of 94 day. The delay time is then used in the determination of the sufficient embedding dimension using the percentage of false nearest neighbours for the time series. Figure (9.b) shows the results of the false nearest neighbours method for embedding dimension m, by allowing it to vary from 1 to 40 and hence its value is 28. 2. The presence of chaotic signals in the data is further confirmed by the correlation dimension method. Figure (10.a) shows the relationship between correlation function C(r) and radius r (i.e. lnC(r) versus ln(r)) for increasing m, whereas Figure ( 3. Local prediction algorithm is used to predict suspended sediment time series. The procedure involves varying the value of the embedding dimension in a range, say 3-8, and estimating the CC and RMSE. The embedding function with the highest coefficient of correlation is selected as the solution. These are given in Table 8 for Mississippi River basin for the dataset with daily time interval, as well as a selection of other time steps. It shows that the best predictions are achieved when the embedding dimension is m=3 produce the best results.    Table 9 summarises the performance and main features of each and all of the modelling strategies. The results presented so far confirms the experience that the traditional SRC model performs poorly and may only be used for rough-and-ready assessments. However, the results by the GEP model show that considerable improvements are likely by using it. This section also analyses the relative performance of the various modelling strategies. An overall visual comparison of all the results is presented in Figure 11, according to which GEP, ANN, MLR and local prediction models perform remarkably well and similar to one another.    The relative performances of GEP, ANN, MLR and local prediction models are not still visible from Figure 12 and therefore attention is focused on the differences between the GEP and ANN models with respect to their corresponding observed values. Figures 13 shows the respective results for both the GEP and ANN models and that of ANN is remarkable, as the differences are nearly zero. It may be reported that those of local prediction model and MLR are very close to that of GEP.  The chapter presents the performance of the GEP model, as a variation of evolutionary programming, to forecast suspended sediment load of the Mississippi River, the USA. GEP is just a modelling strategy, where any other relevant strategy is just as valid if its performance is satisfactory. The overall results show that the information contained in the observed data can be treated by the following modelling strategies:

Model Performance Model Structure Outcome Comments
1. Evolutionary computing: this produced a formula to forecast the future values in terms of recorded values of flows and suspended sediment. The results show that the strategy can be successful in identifying a number of different formulae. 2. Emulation of the working of the brain: this successfully fitted an inbuilt polynomial to the data. It performs better than the other tested models but is not readily transferrable as it resides in particular software applications. 3. Regression analysis: this produced a regression equation, according to which the future values would regress towards average recorded values, in spite of the presence of noise. 4. Deterministic chaos: this produced future values of suspended sediment load by identifying an attractor towards which the system performance would converge even when the internal system behaves erratically.
The only common feature in the above modelling strategies is their use of optimisation techniques. Otherwise, they are greatly different from one another but remarkably, they produce models fit for purpose and can explain the data. Undoubtedly, the data can be explained by many more sets of equations or by other possible strategies. This emphasises that models are just tools and the modelling task is to test the performance of the various models to add confidence to the results. Yet the poor performance of the traditional SRC underlines the fact that a good performance cannot be taken for granted.
A review of the data (in Section 3) shows that the overall contribution of the datapoints in the test period is average; its individual characteristics in terms of kurtosis shows that the annual hydrographs are less peaked and more flat but at the same time the suspended sediment load during the year was significantly high. Thus, the minimum values during this year were significantly above the average but persistent and though less dynamic. However, all the four modelling strategies coped well with these data peculiarities. If the data during the test period have a more pronounced feature not very common during the training period, the various local modelling strategies are likely to perform poorly in their own unique way and one of the greatest tasks of research in modelling should be investigations to understand these unique features and not to sweep them under the carpet.
A general view projected by the investigation in this chapter is that the performance of modelling techniques must not be the only basis of practical applications. Equal attention must also be paid to the quality of the data used. If the data suffers from inherent uncertainties, no good model will compensate for the inherent shortfalls.

Conclusion
This chapter presents an investigation of the performance of the Gene Expression Programming (GEP) models of suspended sediment load of the Mississippi River, the USA. The study employs the Mississippi River data spanning 26 years involving both flows and suspended sediment load, of which the first 25 years of the data is used for training and the remaining for the prediction of one year into the future. This investigation concurs with the past findings that the performance of sediment rating curve, an empirical technique used widely in practice, can lead to gross errors. This alone underlines the value of other modelling techniques capable of producing reliable results with less than 1% of errors.
The chapter promotes a pluralist culture of modelling and although presents the GEP model as the focus, it also presents the application of other techniques to model the same data. The other models comprise: artificial neural networks, multi-linear regression analysis and deterministic chaos. The chapter outlines the modelling strategy underlying each of these techniques and the results show in spite of their differences they produce similar results inflicting less 1% of errors. The lowest errors are associated with the artificial neural networks for this set of data but each of these techniques should be considered as reliable. The volume of sediment load is an important management parameter and the error associated with each model was estimated for each model. The results show that the traditional SRC model suffers from gross errors by as much as 50% but the other tested models perform well, among which the error in the performance of ANN is the lowest. ANN is noted for its good performance but with some drawback that these models are not transferrable, like the GEP models. It is noted that the implementation of ANN requires an ANN-platform for further modelling and deterministic chaos models require considerable expertise.