Open access peer-reviewed chapter

# Morphological-Rank-Linear Models for Financial Time Series Forecasting

By Ricardo de A. Araújo, Gláucio G. de M. Melo, Adriano L. I. de Oliveira and Sergio C. B. Soares

Published: February 1st 2010

DOI: 10.5772/8048

## 1. Introduction

The financial time series forecasting is considered a rather difficult problem, due to the many complex features frequently present in such time series (irregularities, volatility, trends and noise). Several approaches have been studied for the development of predictive models able to predict time series, based on its past and present data.

In the attempt to solve the time series prediction problem, a wide number of linear statistical models were proposed. Among them, the popular linear statistical approach based on Auto Regressive Integrated Moving Average (ARIMA) models [1] is one of the most common choices. However, since the ARIMA models are linear and most real world applications involve nonlinear problems, this can introduce an accuracy limitation of the generated forecasting models.

In the attempt to overcome linear statistical models limitations, other nonlinear statistical approaches have been developed, such as the bilinear models [2], the threshold autoregressive models [3], the exponential autoregressive models [4], the general state dependent models [5], amongst others. The drawbacks of those nonlinear statistical models are the high mathematical complexity associated with them (resulting in many situations in similar performances to the linear models) and the need, most of the time, of a problem dependent specialist to validate the predictions generated by the model, limiting the development of an automatic forecast system [6].

Alternately, Artificial Neural Networks (ANNs) based approaches have been applied for nonlinear modeling of time series in the last two decades [7-14]. However, in order to define a solution to a given problem, ANNs require the setting up of a series of system parameters, some of them are not always easy to determine. The ANN topology, the number of processing units, the algorithm for ANN training (and its corresponding variables) are just some of the parameters that require definition. In addition to those, in the particular case of time series forecasting, another crucial element necessary to determine is the relevant time lags to represent the series [15]. In this context, evolutionary approaches for the definition of neural network parameters have produced interesting results [16-20]. Some of these works have focused on the evolution of the network weights whereas others aimed at evolving the network architecture.

In this context, a relevant work was presented by Ferreira [15], consisting of the Time-delay Added Evolutionary Forecasting (TAEF) method definition, which performs a search for the minimum number of necessary dimensions (the past values of the series) to determine the characteristic phase space of the time series. The TAEF method [15] finds the most fitted predictor model for representing a time series, and then performs a behavioral statistical test in order to adjust time phase distortions that may appear in the representation of some series.

Nonlinear filters, on the other hand, have been widely applied to signal processing. An important class of nonlinear systems is based on the framework of Mathematical Morphology (MM) [21, 22]. Many works have focused on the design of morphological systems [21, 23-28]. An interesting work was presented by Salembier [29, 30], which designed Morphological/Rank (MR) filters via gradient-based adaptive optimization. Also, Pessoa and Maragos [31] proposed a new hybrid filter, referred to as Morphological/Rank/Linear (MRL) filter, which consists of a linear combination of an MR filter [29, 30] and a linear Finite Impulse Response (FIR) filter [31]. In the branch of the filters and Artificial Intelligence integration, Pessoa and Maragos [32] also proposed a neural network architecture involving MRL operators at every processing node.

In the morphological systems context, another work was presented by Araújo et al. [33, 34]. It consists of an evolutionary morphological approach for time series prediction, which provides a mechanism to design a predictive model based on increasing and non-increasing translation invariant morphological operators and according to Matheron decomposition [35] and Banon and Barrera decomposition [36].

This work proposes the Morphological-Rank-Linear Time-lag Added Evolutionary Forecasting (MRLTAEF) method in order to overcome the random walk dilemma for financial time series prediction, which performs an evolutionary search for the minimum dimension to determining the characteristic phase space that generates the financial time series phenomenon. The proposed MRLTAEF method is inspired on Takens Theorem [37] and consists of an intelligent hybrid model composed of an MRL filter [31] combined with a Modified Genetic Algorithm (MGA) [16], which searches for the particular time lags capable of a fine tuned characterization of the time series and estimates the initial (sub-optimal) parameters of the MRL filter. Each individual of the MGA population is trained by the averaged Least Mean Squares (LMS) algorithm to further improve the MRL filter parameters supplied by the MGA. After training the model, the MRLTAEF method chooses the most tuned predictive model for the time series representation, and performs a behavioral statistical test [15] and a phase fix procedure [15] to adjust time phase distortions observed in financial time series.

Furthermore, an experimental analysis is conducted with the proposed MRLTAEF method using six real world financial time series. Five well-known performance metrics are used to assess the performance of the proposed method and the obtained results are compared with the previously presented methods in literature.

This work is organized as follows. In Section 2, the fundamentals and theoretical concepts necessary for the comprehension of the proposed method are presented, such as the time series prediction problem, the random walk dilemma for financial time series prediction, linear and nonlinear statistical models, neural network models, genetic algorithms (standard and modified), intelligent hybrid models (in particular the TAEF method). Section 3 shows the fundamentals and theoretical concepts of mathematical morphology and the MRL filter definition and its training algorithm. Section 4 describes the proposed MRLTAEF method. Section 5 presents the performance metrics which are used to assess the performance of the proposed method. Section 6 shows the simulations and the experimental results attained by the MRLTAEF method using six real world financial time series, as well as a comparison between the results achieved here and those given by standard MLP networks, MRL filters and the TAEF method [15]. Section 7 presents, to conclude, the final remarks of this work.

## 2. Fundamentals

In this section, the fundamentals and theoretical concepts necessary for the comprehension of the proposed method will be presented.

### 2.1. Time series forecasting problem

A time series is a sequence of observations about a given phenomenon, where it is observed in discrete or continuous space. In this work all time series will be considered time discrete and equidistant.

Usually, a time series can be defined by

Xt={xt|t=1,2,...,N}E1

where t is the temporal index and N is the number of observations. The term X t will be seen as a set of temporal observations of a given phenomenon, orderly sequenced and equally spaced.

The aim of prediction techniques applied to a given time series (X t ) are to provide a mechanism that allows, with certain accuracy, the prediction of the future values of X t , given by X t+k , k = 1, 2, …, where k represents the prediction horizon. These prediction techniques will try to identify certain regular patterns present in the data set, creating a model capable of generating the next temporal patterns, where, in this context, a most relevant factor for an accurate prediction performance is the correct choice of the past window, or the time lags, considered for the representation of a given time series.

Box & Jenkins [1] shown that when there is a clear linear relationship among the historical data of a given time series, the functions of auto-correlation and partial auto-correlation are capable of identifying the relevant time lags to represent a time series, and such procedure is usually applied in linear models. However, when it uses a real world time series, or more specifically, a complex time series with all their dependencies on exogenous and uncontrollable variables, the relationship that involves the time series historical data is generally nonlinear, which makes the Box & Jenkins' analysis procedure of the time lags only a crude estimate.

In a mathematical sense, such a relationship involving time series historical data defines a d-dimensional phase space, where d is the minimum dimension capable of representing such relationship. Therefore, a d- dimensional phase space can be built so that it is possible to unfold its corresponding time series. Takens [37] proved that if d is sufficiently large, such phase space is homeomorphic to the phase space that generated the time series. Takens' Theorem [37] is the theoretical justification that it is possible to build a state space using the correct time lags, and if this space is correctly rebuilt, Takens' Theorem [37] also guarantees that the dynamics of this space is topologically identical to the dynamics of the real system state space.

The main problem in reconstructing the original state space is naturally the correct choice of the variable d, or more specifically, the correct choice of the important time lags necessary for the characterization of the system dynamics. Many proposed methods can be found in the literature for the definition of the lags [38-40]. Such methods are based on measures of conditional probabilities, which consider,

Xt=f(xt1,xt2,...,xtd)+rtE2

where f(xt -1, xt -2,…, xt - d) is a possible mapping of the past values to the facts of the future (where xt -1 is the lag 1, xt -2 is the lag 2,…, xt - d is the lag d) and rt is a noise term.

However, in general, these tests found in the literature are based on the primary dependence among the variables and do not consider any possible induced dependencies. For example, if

f(xt1)=f(f(xt2))E3

it is said that xt -1 is the primary dependence, and the dependence induced on xt -2 is not considered (any variable that is not a primary dependence is denoted as irrelevant).

The method proposed in this work, conversely, does not make any prior assumption about the dependencies between the variables. In other words, it does not discard any possible correlation that can exist among the time series parameters, even higher order correlations, since it carries out an iterative automatic search for solving the problem of finding the relevant time lags.

### 2.2. The random walk dilemma

A naive prediction strategy is to define the last observation of a time series as the best prediction of its next future value (Xt +1 = Xt). This kind of model is known as the Random Walk (RW) model [41], which is defined by

Xt=Xt1+rtE4
or
ΔXt=XtXt1=rtE5

where Xt is the current observation, Xt -1 is the immediate observation before Xt, and rt is a noise term with a gaussian distribution of zero mean and standard deviation (rt ≈ N(0, )). In other words, the rate of time series change (ΔXt) is a white noise.

The model above clearly implies that, as the information set consists of past time series data, the future data is unpredictable. On average, the value Xt is indeed the best prediction of value Xt-1. This behavior is common in the finance market and in the economic theory and its so-called random walk dilemma or random walk hypothesis [41].

The computational cost for time series forecasting using the random walk dilemma is extremely low. Therefore, any other prediction method more costly than a random walk model should have a very superior performance than a random walk model, otherwise its use is not interesting in the practice.

However, if the time series phenomenon is driven by law with strong similarity to a random walk model, any model applied to this time series phenomenon will tend to have the same performance as a random walk model.

Assuming that an accurate prediction model is used to build an estimated value of Xt, denoted byXt, the expected value (E[·]) of the difference between Xtand Xt must tend to zero,

E[XtXt]0E6

If the time series generator phenomenon is supposed to have a strong random walk linear component and a very weak nonlinear component (denoted by g(t)), and assuming that E[rt] = 0 and E[rtrk] = 0 ( k t), the expected value of the difference between Xtand Xt will be

E[Xt(Xt1+g(t)+rt)]0E[Xt]E[Xt1]E[g(t)]E[rt]0E[Xt]E[Xt1]E[g(t)]0E[Xt]E[Xt1]+E[g(t)]E7
ButE[Xt1]E[g(t)], then E[Xt1]+E[g(t)]E[Xt1]and
E[Xt]E[Xt1]E8

Therefore, in these conditions, to escape the random walk dilemma is a hard task. Indications of this behavior (strong linear random walk component and a weak nonlinear component) can be observed from time series lagplot graphics. For example, lagplot graphics where strong linear structures are dominant with respect to nonlinear structures [42], generally observed in the financial and economical time series.

### 2.3. Linear statistical models

The time series prediction process consists of representing the time series features through a model able to extend such features to the future. According to Box & Jenkins [15], classical statistical models were developed to represent the following kind of information patterns: constants, trends and sazonalities. However, there are some variations that occur in such patterns as irregularities, volatility, noise, amongst other.

In this way, it is possible to verify that the statistical models are based on transcendental or algebraic time functions, which can be represented by

Xt=b1f1(t)+b2f2(t)+...+bkfk(t)+rtE9

where bi and fi(t) (i = 1, 2,…, k) denotes, respectively, the constant parameters and mathematical functions of t. Term rt represents a random component or noise.

However, there are other ways for time series modeling, where Xt will be modeled as a temporally ordered random component function, from the present to the past (rt, rt-1, rt-2,…). This kind of representation is kwon as “linear filter models”, which is widely applied when the time series observations are highly correlated [1]. In this way, Xt can be defined by

Zt=m+y0rt+y1rt1+y2rt2+...+ykrtkE10

where m and yi (i = 1, 2,…, k) are constants.

Therefore, the time series prediction process consists of an accurate parameters estimation of the prediction models to build the future behavior of a given phenomenon.

Box & Jenkins Models In the literature, several models were proposed to solve the time series prediction problem. Among these models, it is verified that a wide number of them are linear: Simple Moving Averages (SMA) [43, 44], Simple Exponential Smoothing (SES) [43, 44], Brow's Linear Exponential Smoothing (BLES) [43, 44], Holt's Bi-parametric Exponential Smoothing (HBES) [43, 44], Adaptive Filtering (AF) [43, 44], are some examples of that. However, among these linear models, the Box & Jenkins [1] models receive a special mention, given that, in practice, are the most popular and commonly used to solve the time series prediction problem.

Box and Jenkins [1] a set of algebraic models, referred to as Auto-Regressive Integrated Moving Average (ARIMA) models, where it builds an accurate prediction for a given time series. The ARIMA model is based on two main models:

Auto-Regressive (AR), which is defined by

Z˜t=ϕ1Z˜t1+ϕ2Z˜t2+...+ϕpZ˜tp+rtE11

whereZ˜k=Zkμ, being μ defined as the mean of the time series. Terms i (i = 1, 2,…, p) denotes the auto-regressive coefficients.

Moving Average (MA), which is defined by

Zt=μ+rtθ1rt1θ2rt2...θqrtqE12

Assuming thatZ˜k=Zkμ, it has

Z˜=(1θ1Bθ2B2...θqBq)rt=θ(B)atE13

where θ(B)=1θ1Bθ2B2...θqBqrepresent the moving average operator.

The union of both AR and MA models build a model known as Auto-regressive Moving Average (ARMA) of order (p,q), which was proposed in the attempt to build the most parsimonious model as possible, given that, with the inclusion of auto-regressive and moving average terms in a unique model can be seen as a possible solution to obtain a small number of model parameters. In this way, the ARMA model is defined by

Z˜t=ϕZ˜t1+...+ϕZ˜tp+rtθ1rt1θqrtqE14

The ARIMA model basically consists of the application of data to the high-pass filter, which is sensible only to high frequencies of the function, which is applied to the ARMA model. Such a filter is represented by letter “I” (integrated) in the ARIMA notation and this is the main difference among the separated data by a constant distance d. This procedure, known as data differences, is performed to remove the data trends, building the time series as a stationary process, that is, an ARIMA(p,d,q) model is an algebraic study that show as a time series variable (Xt) is related with its past values (Xt-1,Xt-2,…,Xt-p) and the past noisy term values (rt-1, rt-2,…, rt-p), differentiated d times [15].

In this way, Box & Jenkins [1] proposed a procedure able to find an adequate prediction model to solve the time series prediction problem, as be seen in Figure 1.

According to Figure 1, it verifies that the first step (identification) uses two graphical devices to measure the correlation among the observations of the data set of the time series. Such devices are the Auto-Correlation Function (ACF) [1] and the Partial Auto-Correlation Function (PACF) [1]. In the second step (estimation), the model coefficients are estimated, and finally, in the third step (diagnosis), Box & Jenkins [1] proposed some checking procedures to determine the statistical suitability of the chosen model in previous steps. Then, the model that fails in these procedures will be rejected.

### 2.4. Nonlinear statistical models

As mentioned in the previous section, the ARIMA model [1] is one of the most common choices for the time series prediction. However, since the ARIMA models are linear and most real world applications involve nonlinear problems, this can introduce a limitation in the accuracy of the predictions generated, that is, this model assumes that the time series are stationary or generated by a linear process. However, it is not correct to generalize the linearity assumption of the time series due to the nonlinear nature of a given real-world phenomena, where nonlinear structures are found in historical time series data.

n this way, a time series can be modeled by

Xt=h(Xt1,...,Xtp,rt1,...,rtq)+rtE15

where rt represents a random component, or noisy term. Terms p and q represent integer indexes that define the time windows of past terms of the time series and noise, respectively. The term h(·) denotes a nonlinear transference function, which build the mapping among the past and future values of the time series.

Therefore, to overcome the linear statistical models limitations, several nonlinear statistical models have been proposed in the literature, such as the bilinear models [2], the exponential auto-regressive models [4], the threshold autoregressive models [3], and the general state dependent models [5], amongst other.

The general state dependent models (GSD) [5] of (p, q) order is defined as an expansion and a local linearization of Equation 15 in Taylor series around a fixed time point, which can be defined by

Xt+i=1pϕi(yt1)Xt1=μ(yt1)+rt+j=1qθi(yt1)rt1E16

where yt = (rt-q+1,…, rt, Xt-q+1,.., Xt)’ is defined as a state vector, and the symbol “"”denotes a transposition operator.

A special class of such models, known as bilinear models [2], may be seen as a natural nonlinear extension of the ARMA model, making μ(x) and i(x) constants and θj(yt1)=θj+v=1QcjvXtv(j=1,2,...andcjv)the coefficients to be adjusted). The general form of bilinear models of (p, q, P, Q) order is defined by

Xt+i=1pϕ1Xt1=μ+rt+j=1qθ1rtj+u=1Pv=1QcuvXtvrtuE17

According to Ferreira [15], it is verified that Equation 17 is linear in terms Xt and rt and nonlinear regarding the cross term of Z and r. Thus, a bilinear model of first order can be built from the Equation 16 is given by

Xt=αXt1+βrt+γXt1rt1E18

where , and are the constant parameters to be determined.

Another particular class of such models, known as Exponential Auto-Regressive (EAR) models [4], of p order, is given by using a constant term

μ(x),θ1(x)=0(x)andϕ(yt1)=ϕi+πiexp(γXt12)E19

in Equation 16, and is formally defined by

Xt+i1p{ϕi+πiexp(γXt12)}Xt1=μ+rtE20

where denotes the time series scale factor.

Another class of nonlinear models which are used in time series predictions are the Threshold Auto-Regressive (TAR) models [3], where its parameters depend only on past values of its own process, and represent a finite set of possible AR models that a particular process could obey at any time point [15]. However, if the switch on of such models is determined by the data values location regarding thresholds, in this way the TAR model is known as Self-Excited Threshold Auto-regressive (SETAR) model [45].

A first-order TAR model is defined by

Xt={α1Xt1+rt,seXt1dα2Xt1+rt,seXt1d}E21

where the constant d is defined as the threshold.

The SETAR model can be defined μ(x) = 0(j), j(x) = 0(x) and i(yt-1) = Ái(j) if Xt-d R(j) (i = 1, 2,…, p; j = 1, 2,…, l), being d a positive integer and R(j) is a subset of real numbers (the threshold). Thus, Equation 15 can be rewritten in these terms, defining a SETAR model by

Xt+ϕ0(j)+i=1pϕ0(j)Xti=rt(j),seXtdR(j),j=1,2,...,lE22

where such equation represents a SETAR model of kind (l, p,…, p). Term rt(j)denotes a white noise, being rt(j)independent ofrt(j), with j j ’.

There are several other nonlinear models for time series prediction in literature, such as auto-regressive smooth models [46], auto-regressive models with time-dependent coefficients [46], auto-regressive conditional heteroscedastic models (ARCH) [47], amongst other. However, even with a wide number of nonlinear models proposed in the literature, De Gooijer and Kumar [46] do not find clear evidences, in terms of prediction performance, of nonlinear models when compared with the linear models. Clements et al. [45, 48] also argues that the prediction performance of such nonlinear models is more inferior than expected, and this problem still remains open.

According to Ferreira [15], a general accepted concept is that the environment of a given temporal phenomenon is nonlinear, and the fact that the nonlinear models do not achieve the expected results is due to inability of such models to describe the time series phenomenon more accurately than simple linear approximations. In particular, it is verified that the models applied in real world stock market and finance are highly nonlinear [48]. However, the problem of financial time series prediction is still considered a very di±cult problem due to several complex characteristics that often are present in these time series (irregularities, volatility, trend and noise).

Due to the complexity of the structures of relationships among time series data, there are several limitations of the nonlinear models when applied in real situations. One of these limitations is a high mathematical complexity, a factor that limits the nonlinear models to a performance similar to linear models, as well as the need, in most cases, of a problem specialist to validate the predictions generated by the model [6]. These factors suggest that new approaches must be developed in order to improve the prediction performance. Consequently, it is not surprising the great interest on the development of nonlinear models for time series prediction using new approaches and paradigms applied to the problem previously exposed.

### 2.5. Neural network models

The Artificial Neural Networks (ANN) are models that simulate biological neural systems behavior, particularly the human brain. The ANNs represent a parallel and distributed system composed of simple processing units, such as neurons, which calculate non linear mathematical functions.

The neurons are contained in a spatial arrangement generally composed of one or more layers interconnected by a wide number of connections. Generally, in most models, such connections are associated with weights, which are responsible for the storage of knowledge represented in the model, used as weights for the signals to be processed by neurons in the network.

Each ANN unit is conditioned to receive a signal, weighted by their respective input unit processing connections (ANN weights), which is processed by a mathematical function, known as activation function or transfer function, and producing a new output signal which is propagated over the network.

In this way, making an analogy to the human brain, an ANN has the ability to learn through examples, as well as perform interpolations and extrapolations of the learned information. In the ANN learning process the main task is to determine the intensity of connections among neurons, which are adjusted and adapted by learning algorithms, which aims to make a fine tuned adjustment of connection weights, in order to better generalize the information contained in the pattern examples.

Therefore, a wide number of ANNs have been proposed in literature, which is worth mentioning,

1. MultiLayers Perceptron (MLP) Neural Networks [49];

2. Recursive Networks [49];

3. Kohonen Networks [50, 51];

4. Hopfield Networks [52];

Among the several kinds of ANNs, the MLPs are undoubtedly the most popular due to convenience, flexibility and efficiency, and can be applied to a wide range of problems [9, 49, 53].

### 2.6. MultiLayer perceptron neural networks

The MLP neural networks are typically composed of several neuron layers. The first layer is known as the input layer, where information is passed to the network. The last layer is called the output layer, where the model responses of a given information is then produced. Among input and output layers, there are one or more layers, which are referred to as intermediate layers.

Each layer is interconnected with the adjacent layer. If each neuron of a layer is connected to all neurons of the next layer, then it haas a fully connected MLP network (illustrated in Figure 2).

An MLP is able to mapp past observations (network input) in their future values (network output). However, before having the capability to perform a given task, it is necessary that the network passes through a training or learning process. The MLP is typically trained by a supervised process and an external supervisor presents the input patterns and adjusts the weights of the network, according to the ANN success degree. For each pair of input-output, the network will be adjusted to make the mapping between input patterns and desired output (output pattern).

The ANN training is usually a very complex process, and according to a given problem, it requires a large number of input patterns. Each pattern of these vectors is presented to a neuron of the network input layer. In the time series prediction problem, the number of processing units in the ANN input layer is determined by the number of time lags of a given time series.

The set of patterns (or historical data) are usually divided into three sets according to Prechelt [54]: training set (50% of the points), validation set (25% of the points) and test set (25% of the points). Thus, the ANN training uses these three sets. Initially, the set of training examples are presented to the network, then the information passed between the input, hidden and output layers, the response is calculated, then a training algorithm is perfomed to minimize the global error achieved by the network, calculating new values for the weights of the network, taking the difference between the desired output and the output obtained by the network, as in Sum of Squared Errors (SSE), given by

12n=1N(targetioutputi)2E23

where targeti is the real value of the i-th pattern, outputi is the response obtained by the network to the i-th pattern, and the factor 12is just a term for the simplification derived from the expressions in Equation 21, often calculated in training algorithms such as BackPropagation in [49].

### 2.7. Genetic algorithms

Evolutionary Algorithms (EAs) are a powerful class of stochastic optimization algorithms and have been widely used to solve complex problems which cannot be solved analytically. The most popular EA is the Genetic Algorithm (GA) [55, 56]. The GAs were developed by Holland [55] motivated by Charles Darwin's Evolution Theory [57], where its main goal was to find ways in which the mechanisms of natural adaptation might be integrated into computer systems. The GAs are used successfully in several kind of real-world applications due to their high search power in state spaces, being widely applied to optimization and learning machine problems. The GAs work with a set of attempt solutions (initial states) for the problem. This set, referred to as population, is evolved towards a sub-optimal or optimal solution to a given problem by performing a search in the multiple trajectory simultaneously.

Standard Genetic Algorithm. In this section, a brief description of Standard Genetic Algorithm (SGA) procedure is presented, which is illustrated in Figure 3. More details will be supplied as follows. For further details see [56, 58-60].

According to Figure 3, the SGA procedure starts with the creation of an individuals' population, or more specifically, the solutions set. Then, each individual is evaluated by a fitness function (or cost function), which is a heuristic function that guides the search for an optimal solution in state space. After evaluating the SGA population, it is necessary to use some procedures to select the individual parent pairs, which will be used to perform the genetic operators (crossover and mutation). There are some procedures to perform this selection, and is worth mentioning the rank-based selection, elitist strategies, steady-state election and tournament selection [16], amongst others. The next step is responsible for performing the crossover genetic operator. Usually, the crossover operator mixes the parent genes for exchanging genetic information from both, obtaining its individual offspring. There are some procedures to perform the crossover operator such as one-point, two-point or multi-point crossover, arithmetic crossover, heuristic crossover [16], amongst others. After crossover operator, all offspring individuals will be the new population, which contains relevant characteristics of all individual parent pairs obtained in the selection process. The next step is to mutate the new population. The mutation operator is responsible for the individual genes aleatory modification, allowing the population diversification and enabling SGA to escape from the local minima (or maxima) of the surface of the cost function (fitness). After that, the new mutated population is evaluated. This procedure is repeated until a stop condition has been reached.

Modified Genetic Algorithm. The Modified Genetic Algorithm (MGA) used here is based on the work of Leung et al. [16]. The MGA is a second version of the Standard Genetic Algorithm (SGA) [56, 58, 59] that was modified to improve search convergence. The SGA was first studied, and, then, was modified to accelerate its convergence through the use of modified crossover and mutation operators (described later). The algorithm is described in Figure 4.

According to Figure 4, the MGA procedure consists of selecting a parent pair of chromosomes and then performing crossover and mutation operators (generating the offspring chromosomes – the new population) until the termination condition is reached; then the best individual in the population is selected as a solution to the problem.

The crossover operator is used for exchanging information from two parents (vectors p¯1andp¯2) obtained in the selection process by a roulette wheel approach [16]. The recombination process to generate the offsprings (vectors C1,C2,C3 and C4) is done by four crossover operators, which are defined by the following equations [16]:

C¯1=p¯1+p¯22E24
C¯2=p¯max(1w)+max(p¯1,p¯2)wE25
C¯3=p¯min(1w)+min(p¯1,p¯2)wE26
C¯4=(p¯max+p¯min)(1w)+(p¯1,p¯2)w2E27

where w [0, 1] denotes the crossover weight (the closer w is to 1, the greater is the direct contribution from parents), max(p¯1,p¯2) and min(p¯1,p¯2) denotes the vector whose elements are the maximum and the minimum, respectively, between the gene values of p¯1andp¯2. The terms p¯maxand p¯mindenote a vector with the maximum and minimum possible gene values, respectively. After offspring generation by crossover operators, the offspring with the best evaluation (greater fitness value) will be chosen as the offspring generated by the crossover process and will be denoted byCbest.

After the crossover operator, Cbestis selected to have a mutation process, where three new mutated offsprings are generated and defined by the following equation [17]:

M¯j=Cibest+γiΔMi,j=1,2,3andi=1,2,...,NGE28

where i can only take the values 0 or 1, ΔMi are randomly generated numbers such that pminCibest+ΔMipmaxand NG denotes the number of genes in the chromosome.

The first mutated offspring (M1) is obtained according to (26) using only one term i set to 1 (i is randomly selected within the range [1,NG]) and the remaining terms i are set to 0. The second mutated offspring (M2) is obtained according to (26) using some i, randomly chosen, set to 1 and the remaining terms i are set to 0. The third mutated offspring (M3) is obtained according to (26) using all i set to 1.

It is worth mentioning that the GA is not directly used for modeling and predicting time series, but it is applied to support other methods and techniques in the search for the optimal or sub-optimal parameters of the predictive model [15].

### 2.8. Intelligent hybrid models

Humans can be considered a good example of machines that have hybrid information. Their attitudes and actions are governed by a combination of genetic information and information acquired through learning. In genetic information, known as genotype, the information that come with the individual in the form of genetic coding, are the features inherited from your parents. The phenotype is the combination of features given by genotype combined with the environmental influences. The information in our genes ensures the success of our survival, which has been proven and tested over millions of years of evolution. Human learning consists of a variety of complex processes that use information acquired from environmental interactions. It is the combination of these different types of methods of processing information that enables humans to succeed in their survival in dynamic environments that change all the time.

This kind of hybrid information processing has been replicated on adaptive machines generation, where in their main unit processing there are intelligent computing systems and some mechanisms inspired by nature. It is possible to find some examples: neural networks [49, 61], genetic algorithms [56, 58, 59], fuzzy systems [62], artificial immune systems [63], expert systems [64] and induction rules [65]. The IA techniques have produced encouraging results in some particular tasks, but some complex problems, such as time series prediction, can not be successfully solved by a single intelligent technique. Each of these techniques have strengths and weaknesses, which make them suitable for some and not other problems. These limitations have been the main motivation for the study of Hybrid Intelligent Systems (HIS) where two or more AI techniques are combined in order to overcome the particular limitations of an individual technique. Hybrid Intelligent systems are also important when considering a wide range of real world applications. Many areas have many complex components of different problems, each of them may require a different type of processing. Moreover, the HIS being can be combined with different techniques, including conventional computing systems. The reasons for the HIS built are numerous, but can be summarized in three [66]:

1. Intensification Techniques: the integration of at least two different techniques, with the purpose of offsetting the weakness of a technique with the strength of the other;

2. Multiplicity of Applications in Tasks: A HIS is built, with the purpose of a single technique not being applied to many sub-problems that some application might have;

3. Implementation of Multiple Feature: the HIS build exhibits the capacity for multiple processing information within an architecture. Functionally, these systems emulate or mimic different processing techniques.

There are many possible combinations of the various techniques of artificial intelligence for hybrid intelligent systems built, however the discussion outlined here will be limited to a combination of techniques such as artificial neural networks and genetic algorithms.

TAEF Model. The Time-delay Added Evolutionary Forecasting (TAEF) method [15] tries to reconstruct the phase space of a given time series by carrying out a search for the minimum dimensionality necessary to reproduce the generator phenomenon of the time series. The TAEF method is an intelligent hybrid system based on Artificial Neural Networks (ANNs) architectures trained and adjusted by a Modified Genetic Algorithm (MGA) which not only searches for the ANN parameters but also for the adequate embedded dimension represented in the time lags.

The scheme describing the TAEF algorithm is based on the iterative definition of the four main elements: (i) the underlying information necessary to predict the series (the minimum number of lags), (ii) the structure of the model capable of representing such underlying information for the purpose of prediction (the number of units in the ANN structure), (iii) the appropriate algorithm for training the model, and (iv) a behavior test to adjust time phase distortions that appear in some time series.

Following this principle, the important parameters defined by the algorithm are:

1. The number of time lags to represent the series;

2. The number of units in the ANN hidden layer;

3. The training algorithm for the ANN.

The TAEF method starts with the user defining a minimum initial fitness value (MinFit) which should be reached by at least one individual of the population in a given MGA round. The fitness function is defined as

FitnessFunciton=11+MSEE29

where MSE is the Mean Squared Error of the ANN and will be formally defined in Section 5.

In each MGA round, a population of M individuals are generated, each of them being represented by a chromosome (in Ferreira's works [15] M = 10 was used). Each individual is in fact a three-layer ANN where the first layer is defined by the number of time lags, the second layer is composed of a number of hidden processing units (sigmoidal units) and the third layer is composed by one processing unit (prediction horizon of one step ahead).

The stopping criteria for each one of the individuals are the number of epochs (NEpochs), the increase in the validation error (Gl) and the decrease in the training error (Pt).

The best repetition (the smallest validation error) is chosen to represent the best individual. Following this procedure, the MGA evolves towards an optimal or close to optimal fitness solution (which may not be the best solution yet), according to the stopping criteria: number of generations created (NGen) and fitness evolution of the best individual (BestFit).

After this point, when the MGA reaches a solution, the algorithm checks if the fitness of the best individual paired or overcame the initial value specified for the variable MinFit (minimum fitness requirement). If this is not the case, the value of MaxLags (maximum number of lags) is increased by one and the MGA procedure is repeated to search for a better solution.

However, if the fitness reached was satisfactory, then the algorithm checks the number of lags chosen for the best individual, places this value as MaxLags, sets MinFit with the fitness value reached by this individual, and repeats the whole MGA procedure. In this case, the fitness achieved by the best individual was better than the fitness previously set and, therefore, the model can possibly generate a solution of higher accuracy with the lags of the best individual (and with the MinFit reached by the best individual as the new target). If, however, the new value of MinFit is, again, not reached in the next round, MaxLags gets the same value defined for it, just before the round that found the best individual, increased by one (the maximum number of lags is increased by one). The state space for the lag search is then increased by one to allow a wider search for the definition of the lag set. This procedure goes on until the stop condition is reached. After that, the TAEF method chooses the best model found among all the candidates.

After the best model is chosen, when the training process is finished, a statistical test (t-test) is employed to check if the network representation has reached an “in-phase” matching (without a one step shift – the shape of the time series and the shape of the generated prediction has a time matching) or “out-of-phase” matching (with a one step shift { the shape of the time series and the shape of the generated prediction do not have a time matching). If this test accepts the “in-phase” matching hypothesis, the elected model is ready for practical use. Otherwise, the method carries out a new procedure to adjust the relative phase between the prediction and the actual time series. The validation patterns are presented to the ANN and the output of these patterns are re-arranged to create new inputs that are both presented to the same ANN and set as the output (prediction) target.

It is worth mentioning that the variable cont just represents the current iteration of the TAEF method. The maximum of ten iterations of the TAEF method (given by expression not cont > 10), was chosen empirically according to previous experiments in order to generate an optimal prediction model.

## 3. Mathematical morphology

The Mathematical Morphology (MM) is based on two basic operations, the sum and substraction of Minkowski [67], which are respectively given by [68]

X+B=bBXbE30
XB=bBrXbE31

where Xb = {x + b : x X} represents the input signal and Br = {–b : b B} is the reflected structuring element B.

All of MM transformations are based on combinations of four basic operations, which are defined by [68]

Dilation:δB(X)=X+BE32
Erosion:B(X)=XBE33
AntiDilation:δBa(X)=(X+Brc)rcE34
AntiErosion:Ba(X)=(XBrc)rcE35

where Brc = {–b : b B} represents the reflected complement of structuring element B.

According to Sousa [68], an operator of kind : P(E) → P(E), where P(E) represents all subsets of E = R N , may be a translation invariant (Equation 35), increasing (Equation 36), decreasing (Equation 37) or window (Equation 38).

ψ(Xh)(ψ(X))hE36

where Xh = {x + h : x X} represents the translation of X P(E) by vector h E.

Xh={x+h:xX}E37
XYψ(X)ψ(Y),X,YP(E)E38
xE,xψ(X)xψ(XLx)E39

where Lx is the translation of L E finite.

### 3.1. Morphological-Rank-Linear (MRL) filter preliminaries

Definition 1 – Rank Function: the r-th rank function of the vector t = (t1, t2,…, tn) Rn is the r-th element of the vector t sorted in decreasing order (t(1) ≥ t(2) ≥ … ≥t(n)). It is denoted by [31]

Rr(t¯)=t(r),r=1,2,...,nE40

For example, given the vector t = (3, 0, 5, 7, 2, 1, 3), its 4-th rank function is R4(t¯)=3

Definition 2 – Unit Sample Function: the unit sample function is given by [31]

q(v)={1,ifv=00,otherwise}E41

where v R.

Applying the unit sample function to a vector v = (v1, v2,…, vn) Rn, yields a vector unit sample function (Q(v)), given by [31]

Q(v¯)=[q(v1),q(v2),...,q(vn)]E42

Definition 3 – Rank Indicator Vector: the r-th rank indicator vector c of t is given by [31]

c¯(t¯,r)=Q((z*1¯)t¯)Q((z*1¯)t¯)*1¯TE43

where z=Rr(t¯),1=(1,1,...,1)“·” represents scalar product and the symbol T denotes transposition.

For example, given the vector t = (3, 0, 5, 7, 2, 1, 3), its 4-th rank indicator function is c(t, 4) = 12(1, 0, 0, 0, 0, 0, 1).

Definition 4 – Smoothed Rank Function: the smoothed r-th rank function is given by [31]

Rr,σ(t¯)=c¯σ(t¯,r)*t¯TE44
with
c¯σ(t¯,r)=Qσ((z*1¯)t¯)Qσ((z*1¯)t¯)*1¯TE45

where cσ is an approximation for the rank function c and Qσ(v¯)=[qσ(v1),qσ(v2),...,qσ(vn)]is a smoothed impulse function (where qσ(v)is like sech2(v/σ)) (where sech is the hyperbolic secant), σ ≥ 0 is a scale parameter and “·” represents the scalar product.

Thus, c¯σis an approximation for the rank indicator vector v. Using ideas from the fuzzy set theory, c¯σcan also be interpreted as a membership function vector [31]. For example, if the vector t = (3, 0, 5, 7, 2, 1, 3), qσ(v)=sech2(vσ)andσ=0.5then its smoothed 4-th rank indicator function is

c¯σ(t¯,4)=12(0.9646,0,0.0013,0,0.0682,0.0013,0.9646)E46

where

c¯(t¯,4)=12(1,0,0,0,0,0,1)E47
.

### 3.2. MRL filter definition

The MRL filter [31] is a linear combination between a Morphological-Rank (MR) filter [29, 30] and a linear Finite Impulse Response (FIR) filter [31].

Definition 5 – MRL Filter [31]: Let x = (x1, x2,…, xn) Rn represent the input signal inside an n-point moving window and let y be the output from the filter. Then, the MRL filter is defined as the shift-invariant system whose local signal transformation rule x → y is given by [31]

y=λα+(1λ)βE48
with
α=Rr(x¯+a¯)=Rr(x1+a1,x2+a2,...,xn+an)E49
and
β=x¯*b¯'=x1b1+x2b2+,...,xnbnE50

where R, a and b Rn. Terms a = (a1, a2,…, an) and b = (b1, b2,…, bn) represent the coefficients of the MR filter and the coefficients of the linear FIR filter, respectively. Term a is usually referred to “structuring element” because for r = 1 or r = n the rank filter becomes the morphological dilation and erosion by a structuring function equal to a within its support [31]. The structure of the MRL filter is illustrated in Figure 5.

### 3.3. MRL filter training algorithm

Pessoa and Maragos [31] presented an adaptive design of MRL filters based on the LMS algorithm [29, 30], the “rank indicator vector"” [31] and “smoothed impulses” [31] for overcoming the problem of nondifferentiability of rank operations.

Pessoa and Maragos [31] have shown that the main goal of the MRL filter is to specify a set of parameters (a, b, r, ) according to some design requirements. However, instead of using the integer rank parameter r directly in the MRL filter definition equations (46-48), they argued that it is possible to work with a real variable implicitly defined through the following rescaling [31]

r=round(nn1exp(p))E51

where R, n is the dimension of the input signal vector x inside the moving window and round(·) denotes the usual symmetrical rounding operation. In this way, the weight vector to be used in the filter design task is defined by [31]

w¯(a¯,b¯,ρ,λ)E52

The framework of the MRL filter adaptive design is viewed as a learning process where the filter parameters are iteratively adjusted. The usual approach to adaptively adjust the vector w, and therefore design the filter, is to define a cost function J(w), estimate its gradient J(w), and update the vector w by the iterative formula

w¯(a¯,b¯,ρ,λ)E53

where µ0> 0 (usually called step size) and i {1, 2, …}. The term µ0 is responsible for regulating the tradeoff between stability and speed of convergence of the iterative procedure. The iteration of Equation 49 starts with an initial guess w(0) and stops when some desired condition is reached. This approach is known as the method of gradient steepest descent [31].

The cost function J must reflect the solution quality achieved by the parameters configuration of the system. A cost function J, for example, can be any error function, such as

J[w¯(i)]=1Mk=iM+1ie2(k)E54

where M {1, 2,…} is a memory parameter and e(k) is the instantaneous error, given by

e(k)=d(k)y(k)E55

where d(k) and y(k) are the desired output signal and the actual filter output for the training sample k, respectively. The memory parameter M controls the smoothness of the updating process. If we are processing noiseless signals, M = 1 is recommended [31]. However, when we use M > 1, the updating process tends to reduce the noise influence of noisy signals during the training [31].

Hence, the resulting adaptation algorithm is given by [31]

w¯(i+1)=w¯(i)+μMk=iM+1ie2(k)y(k)w¯E56

where µ = 2µ0 and i {1, 2,…}. From Equations (46), (47), (48) and (50), term y(k)w¯[31] may be calculated as

yw¯=(ya¯,yb¯,yρ,yλ)E57
with
ya¯=λαa¯E58
yb¯=(1λ)x¯E59
yρ=λyρE60
yλ=(αβ)E61

where

αa¯=c¯=Q((α*1¯)x¯a¯)Q((α*1¯)x¯a¯)*1¯'E62
αρ=11nQ((α*1¯)x¯a¯)*1¯'E63

where n is the dimension of x andα=Rr(x¯+a¯).

It is important to mention that the unit sample function Q is frequently replaced by smoothed impulsesQσ, in which case an appropriate smoothing parameter σ should be selected (which will affect only the gradient estimation step in the design procedure [31]).

## 4. The proposed morphological-rank-linear time-lag added forecasting (MRLTAEF) model

The approach model in this work, referred to as Morphological-Rank-Linear Time-lag Added Evolutionary Forecasting (MRLTAEF) model, uses an evolutionary search mechanism in order to train and adjust the Morphological-Rank-Liner (MRL) filter applied to financial time series prediction. It is based on the definition of the four main elements necessary for building an accurate forecasting system [15]:

1. The underlying information necessary to predict the time series;

2. The structure of the model capable of representing such underlying information for the purpose of prediction;

3. The appropriate algorithm for training the model

4. The behavior statistical test to adjust time phase distortions

It is important to consider the minimum possible number of time lags in the representation of the series because the model must to be as parsimonious as possible, avoiding the overfitting problem and decreasing the computacional cost.

Based on that definition, the proposed method consists of a hybrid intelligent morphological-rank-linear model composed of a MRL filter [31] with a MGA [16], which searches for:

1. The minimum number of time lags to represent the series: initially, a maximum number of time lags (MaxLags) is pre-defined and then the MGA will search for the number of time lags in the range [1,MaxLags] for each individual of the population;

2. The initial (sub-optimal) parameters of the MRL filter (mixing parameter (), rank (r), linear Finite Impulse Response (FIR) filter (b) and the Morphological-Rank (MR) filter (a) coefficients.

Then, each element of the MGA population is trained via LMS algorithm [31] to further improve the parameters supplied by the MGA, that is, the LMS is used, for each individual candidate, to perform a local search around the initial parameters supplied by MGA. The main idea used here is to conjugate a local search method (LMS) to a global search method (MGA). While the MGA makes it possible to test of varied solutions in different areas of the solution space, the LMS acts on the initial solution to produce a fine-tuned forecasting model. The proposed method is described in Figure 6.

Such a process is able to seek the most compact MRL filter, reducing computational cost and probability of model overfitting. Each MGA individual represents a MRL filter, where its input is defined by the number of time lags and its output represents the prediction horizon of one step ahead.

Most works found in the literature have the fitness function (or objective function) based on just one performance measure, like Mean Square Error (MSE). However, Clements et al. [69], since 1993 has shown that the MSE measure has some limitations of availability and comparing the prediction model performance. Information about the prediction, as the absolute percentage error, the accuracy in the future direction prediction and the relative gain regarding naive prediction models (like random walk models and mean prediction) are not described using MSE measure.

In order to provide a more robust forecasting model, a multi-objective evaluation function is defined, which is a combination of five well-known performance measures: Prediction Of Change In Direction (POCID), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), Normalized Mean Square Error (NMSE) or U of Theil Statistic (THEIL) and Average Relative Variance (ARV), where all these measures will be formally defined in Section 5. The multi-objective evaluation function used here is given by

FitnessFunciton=POCID1+MSE+MAPE+THEIL+ARVE64

Whereas there are linear and nonlinear metrics in the such evaluation function and each one of these metrics can contribute to different forms for the evolution process, the Equation 62 was built from empirical form to have all information necessary to describe as well as allow the time series generator phenomenon.

After MRL filter adjusting and training, the proposed method uses the phase fix procedure presented by Ferreira [15], where a two step procedure is introduced to adjust time phase distortions observed (“out-of-phase” matching) in financial time series. Ferreira [15] has shown that the representations of some time series (natural phenomena) were developed by the model with a very close approximation between the actual and the predicted time series (referred to as “in-phase” matching), whereas the predictions of other time series (mostly financial time series) were always presented with a one step delay regarding the original data (referred to as “out-of-phase” matching).

The proposed method uses the statistical test (t-test) to check if the MRL filter model representation has reached an in-phase or out-of-phase matching (in the same way of TAEF method [15]). This is conducted by comparing the outputs of the prediction model with the actual series, making use only of the validation data set. This comparison is a simple hypothesis test, where the null hypothesis is that the prediction corresponds to in-phase matching and the alternative hypothesis is that the prediction does not correspond to in-phase matching (or corresponds to out-of-phase matching).

If this test accepts the in-phase matching hypothesis, the elected model is ready for practical use. Otherwise, the proposed method performs a new procedure to adjust the relative phase between the prediction and the actual time series. The phase fix procedure has two steps (described in Figure 7): (i) the validation patterns are presented to the MRL filter and the output of these patterns are re-arranged to create new inputs patterns (reconstructed patterns), and (ii) these reconstructed patterns are represented to the same MRL filter and the output set as the prediction target. This procedure of phase adjustment considers that the MRL filter is not a random walk model, it just shows a behavior characteristic of a random walk model: the t + 1 prediction is taken as the t value (Random Walk Dilemma).

If the MRL filter was like a random walk model, the phase adjust procedure would not work. Such phase fix was originally proposed by Ferreira [15], where he observed the fact that when Artificial Neural Network (ANN – Multilayer Perceptron like) is correctly adjusted (TAEF method), the one step shift distortion in the prediction can be softened.

The termination conditions for the MGA are:

1. Minimum value of fitness function: fitness ≥ 40, where this value mean the accuracy to predict direction around 80% (POCID 80%) and the sum of the other errors around one (MSE +MAPE + THEIL + ARV 1);

2. The increase in the validation error or generalization loss (Gl) [54]: Gl > 5%;

3. The decrease in the training error process training (Pt) [54]: Pt ≤ 10-6.

Each individual of the MGA population is a MRL filter represented by the data structure with the following components (MRL filter parameters):

1. a: MR filter coefficients;

2. b: linear FIR filter coefficients;

3. : variable used to determine the rank r;

4. : mixing parameter;

5. NLags: a vector, where each position has a real-valued codification, which is used to determine if a specific time lag will be used (NLagsi > 0) or not (NLagsi ≤ 0).

## 5. Performance metrics

Many performance evaluation criteria are found in literature. However, most of the existing literature on time series prediction frequently employ only one performance criterion for prediction evaluation. The most widely used performance criterion is the Mean Squared Error (MSE), given by

MSE1Nj=1N(targetjoutputj)2E65

where N is the number of patterns, targetj is the desired output for pattern j and outputj is the predicted value for pattern j.

The MSE measure may be used to drive the prediction model in the training process, but it cannot be considered alone as a conclusive measure for comparison of different prediction models [69]. For this reason, other performance criteria should be considered for allowing a more robust performance evaluation.

A measure that presents accurately identifying model deviations is the Mean Absolute Percentage Error (MAPE), given by

MAPE=1Nj=1N|targetjoutputxj|E66
E67

where xj is the time series value at point j.

The random walk dilemma can be used as a naive predictor (Xt+1 = Xt), commonly applied to financial time series prediction. Thus, a way to evaluate the model regarding a random walk model is using the Normalized Mean Squared Error (NMSE) or U of Theil Statistic (THEIL) [70], which associates the model performance with a random walk model, and given by

THEIL=NjN(targetjoutputj)2NjN(targetjoutputj1)2E68

where, if the THEIL is equal to 1, the predictor has the same performance than a random model. If the THEIL is greater than 1, then the predictor has a performance worse than a random walk model, and if the THEIL is less than 1, the predictor is better than a random walk model. In the perfect model, the THEIL tend to zero.

Another interesting measure maps the accuracy in the future direction prediction of the time series or, more specifically, the ability of the method to predict if the future series value (prediction target) will increase or decrease with respect to the previous value. This metric is known as the Prediction Of Change In Direction (POCID) [15], and is given by

POCID=100Nj=1NDjE69

where

Dj={1,if(targetjtargetj1)(outputjoutputj1)00,otherwise}E70

The last measure used associates the model performance with the mean of the time series. The measure is the Average Relative Variance (ARV), and given by

ARV=j=1N(targetjoutputj)2j=1N(outputj¯target)2E71

where, ¯targetis the mean of the time series. If the ARV is equal to 1, the predictor has the same performance of the time series average prediction. If the ARV is greater than 1, then the predictor has a performance worse than the time series average prediction, and if the ARV is less than 1, the predictor is better than the time series average prediction. In the ideal model, ARV tend to zero.

## 6. Simulations and experimental results

A set of six real world financial time series (Dow Jones Industrial Average (DJIA) Index, National Association of Securities Dealers Automated Quotation (NASDAQ) Index, Standard & Poor 500 Stock (S&P500) Index and Petrobras Stock Prices, General Motors Corporation Stock Prices and Google Inc Stock Prices) were used as a test bed for evaluation of the proposed method. All time series investigated were normalized to lie within the range [0, 1] and divided into three sets according to Prechelt [54]: training set (50% of the points), validation set (25% of the points) and test set (25% of the points).

For all the experiments, the following initialization system parameters were used: cont = 1, MinFit = 40 and MaxLags = 4. The MGA parameters used in the proposed MRLTAEF method are a maximum number of MGA generations, corresponding to 104, crossover weight w = 0.9 (used in the crossover operator), mutation probability equals to 0.1. The MR filter coefficients and the linear FIR filter coefficients (a and b, respectively) were normalized in the range [–0.5, 0.5]. The MRL filter parameters and were in the range [0, 1] and [–MaxLags,MaxLags], respectively.

Next, the simulation results involving the proposed model will be presented. In order to establish a performance study, results previously published in the literature with the TAEF Method [15] were examined in the same context and under the same experimental conditions. For each time series, ten experiments were done, where the experiment with the best validation fitness function is chosen to represent the prediction model.

In order to establish a performance study, results previously published in the literature with the TAEF Method [15] on the same series and under the same conditions are employed for comparison of results. In addition, experiments with MultiLayer Perceptron (MLP) networks and Morphological-Rank-Linear (MRL) filters were used for comparison with the MRLTAEF method. The Levenberg-Marquardt Algorithm [71] and the LMS algorithm [31] were employed for training the MLP network and the MRL filter, respectively. In all of the experiments, ten random initializations for each architecture were carried out, where the experiment with the best validation fitness function is chosen to represent the prediction model. The statistical behavioral test, for phase fix procedure, was also applied to all the MLP, MRL and TAEF models in order to guarantee a fair comparison among the models.

It is worth mentioning that the results with ARIMA models were not presented in our comparative analysis since Ferreira [15] has shown that MLP networks obtained results better than ARIMA models, for all financial time series used in this work. Therefore, only MLP networks were used in our comparative analysis.

Furthermore, in order to analyze time lag relations in the studied time series, the graphical methodology proposed by [42, 72], referred to as lagplot [72] or phase portrait [42], was employed. This consists of dispersion graph constructions relating the different time lags of the time series (Xt vs Xt-1, Xt vs Xt-2, Xt vs Xt-3, …), and allow observations of possible relative strong relationships between any pair of time lags (when a structured appearance is shown in the graph). Although such technique is very limited since it depends on human interpretation of the graphs. However, its simplicity is a strong argument for its utilization [15].

### 6.1. Dow Jones Industrial Average (DJIA) index series

The Dow Jones Industrial Average (DJIA) Index series corresponds to daily records from January 1st 1998 to August 26th 2003, constituting a database of 1,420 points. Figure 8 shows the DJIA Index lagplot.

According to Figure 8, it is seen that for all the time lags of DJIA Index series there is a clear linear relationship among the lags. However, with the increase in the time lag degree, the appearance of the structure towards the graph center indicates a nonlinear relationship among the lags.

For the DJIA Index series prediction (with one step ahead of prediction horizon), the proposed method automatically chose the lag 2 as the relevant time lag (n = 1), defined the parameters = 1.6374 and = 0.0038, and classified the best model as the “out-of-phase” model. Table 1 shows the results (with respect to the test set) for all the performance measures for the MLP, MRL, TAEF and MRLTAEF models.

Figure 9 shows the actual DJIA Index values (solid line) and the predicted values generated by the MRLTAEF out-of-phase model (dashed line) for the last 100 points of the test set.

Another relevant aspect to notice is that the MRLTAEF model chose the parameter = 0.0038, which indicates that it used 99.62% of the linear component of the MRL filter and 0.38% of the nonlinear component of the MRL filter, supporting the assumption (through lagplot analysis) that the DJIA Index series has a strong linear component mixed with a nonlinear component.

### 6.2. National association of securities dealers automated quotation (NASDAQ) index series

The National Association of Securities Dealers Automated Quotation (NASDAQ) Index series corresponds to daily observations from February 2nd 1971 to June 18th 2004, constituting a database of 8428 points. Figure 10 shows the NASDAQ Index lagplot.

According to Figure 10, it is seen that the time lags of NASDAQ Index series present a clear linear relationship among them, which, in theory, can contribute to a better forecasting result.

For the NASDAQ Index series prediction (with one step ahead of prediction horizon), the proposed method automatically chose the lag 2 as the relevant time lag (n = 1), defined the parameters = 1.5581 and = 0.0005, and classified the model as “out-of-phase” matching. Table 2 shows the results (of the test set) for all performance measures for MLP, MRL, TAEF and MRLTAEF models.

Figure 11 shows the actual NASDAQ Index values (solid line) and the predicted values generated by the MRLTAEF out-of-phase model (dashed line) for the last 100 points of the test set.

It is worth mention that, as the MRLTAEF model chose = 0.0005, it used 99.95% of the linear component of the MRL filter and 0.05% of the nonlinear component of the MRL filter. This result can indicate that there is a nonlinear relationship among the time lags, a fact which could not be detected by the lagplot analysis.

### 6.3. Standard & Poor 500 (S&P500) index series

The Standard & Poor 500 (S&P500) Index is a pondered index of market values of the most negotiated stocks in the New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and Nasdaq National Market System. The S&P500 series used corresponds to the monthly records from January 1970 to August 2003, constituting a database of 369 points. Figure 12 shows the S&P500 Index lagplot.

According to Figure 12, it is also seen that for all the time lags of S&P500 Index series there is a clear linear relationship among the lags. However, with the increase in the time lag degree, the appearance of the structure towards the upper corner on the right hand side of the graph indicates a nonlinear relationship among the lags.

For the S&P500 Index series prediction (with one step ahead of prediction horizon), the proposed method automatically chose the lags 2, 3 and 10 as the relevant time lags (n = 3), defined the parameters = 1.2508 and = 0.0091, and classified the best model as “out-of-phase” matching. Table 3 shows the results (for the test set) for all the performance measures for the MLP, MRL, TAEF and MRLTAEF models.

Figure 13 shows the actual S&P500 Index values (solid line) and the predicted values generated by the MRLTAEF out-of-phase model (dashed line) for the 90 points of the test set.

The proposed MRLTAEF chose = 0.0091, implying that it used 99.01% of the linear component of the MRL filter and 0.91% of the nonlinear component of the MRL filter, confirming the assumption (through lagplot analysis) that the S&P500 Index series has a strong linear component mixed with a nonlinear component.

### 6.4. Petrobras stock prices series

The Petrobras Stock Prices series corresponds to the daily records of Brazilian Petroleum Company from January 1st 1995 to July 3rd 2003, constituting a database of 2,060 points. Figure 14 shows the Petrobras Stock Prices lagplot.

According to Figure 14, it is seen that for all the time lags of the Petrobras Stock Prices series there is a clear linear relationship among the lags. However, with the increase in the time lag degree, the appearance of the structure towards the graph center indicates a nonlinear relationship among the lags.

For the Petrobras Stock Prices series prediction (with one step ahead of prediction horizon), the proposed method chose the lag 3 as the relevant time lag (n = 1), defined the parameters = 1.9010 and = 0.0070, and classified the best model as “out-of-phase" matching. Table 4 shows the results (for the test set) of all the performance measures for the MLP, MRL, TAEF and MRLTAEF models.

Figure 15 shows the actual Petrobras Stock Prices (solid line) and the predicted values generated by the MRLTAEF model out-of-phase (dashed line) for the 100 points of the test set.

For this series the proposed MRLTAEF chose = 0.0070, which means that it used 99.30% of the linear component of the MRL filter and 0.7% of the nonlinear component of the MRL filter, confirming the assumption (through lagplot analysis) that the Petrobras Stock Prices series has a strong linear component mixed with a nonlinear component.

### 6.5. General motors corporation stock prices series

The General Motors Corporation Stock Prices series corresponds to the daily records of General Motors Corporation from June 23th 2000 to June 22th 2007, constituting a database of 1,758 points. Figure 16 shows the General Motors Corporation Stock Prices lagplot.

According to Figure 16, it is virified that for all the time lags of the General Motors Corporation Stock Prices series there is a clear linear relationship among the lags. However, with the increase in the time lag degree, the appearance of the structure towards the upper corner on the right hand side of the graph indicates a nonlinear relationship among the lags.

For the General Motors Corporation Stock Prices series prediction (with one step ahead of prediction horizon), the proposed method chose the lags 2, 4, 5 and 8 as the relevant time lags (n = 4), defined the parameters = 0.0617 and = 0.0011, and classified the best model as “out-of-phase” matching. Table 5 shows the results (for the test set) of all the performance measures for the MLP, MRL, TAEF and MRLTAEF models.

Figure 17 shows the actual General Motors Corporation Stock Prices (solid line) and the predicted values generated by the MRLTAEF model out-of-phase (dashed line) for the 100 points of the test set.

For this series the proposed MRLTAEF chose = 0.0011, which means that it used 99.89% of the linear component of the MRL filter and 0.11% of the nonlinear component of the MRL filter, confirming the assumption (through lagplot analysis) that the General Motors Corporation Stock Prices series has a strong linear component mixed with a nonlinear component.

### 6.6. Google Inc Stock Prices series

The Google Inc Stock Prices series corresponds to the daily records of Google Inc from August 19th 2004 to June 21th 2007, constituting a database of 715 points. Figure 18 shows the Google Inc Stock Prices lagplot.

According to Figure 14, it is seen that for all the time lags of the Google Inc Stock Prices series there is a clear linear relationship among the lags. However, with the increase in the time lag degree, the appearance of the structure towards the graph center indicates a nonlinear relationship among the lags.

For the Google Inc Stock Prices series prediction (with one step ahead of prediction horizon), the proposed method chose the lags 2, 3 and 10 as the relevant time lags (n = 3), defined the parameters = –1.5108 and = 0.0192, and classified the best model as “out-of-phase” matching. Table 6 shows the results (for the test set) of all the performance measures for the MLP, MRL, TAEF and MRLTAEF models.

Figure 19 shows the actual Google Inc Stock Prices (solid line) and the predicted values generated by the MRLTAEF model out-of-phase (dashed line) for the 100 points of the test set.

For this series the proposed MRLTAEF chose = 0.0192, which means that it used 98.08% of the linear component of the MRL filter and 1.92% of the nonlinear component of the MRL filter, confirming the assumption (through lagplot analysis) that the Google Inc Stock Prices series has a strong linear component mixed with a nonlinear component.

In general, all generated prediction models using the phase fix procedure to adjust time phase distortions shown forecasting performance much better than the MLP model and MRL model, and slightly better than the TAEF model. The proposed method was able to adjust the time phase distortions of all analyzed time series (the prediction generated by the out-of-phase matching hypothesis is not delayed with respect to the original data), while the MLP model and MRL model were not able to adjust the time phase. This corroborates with the assumption made by Ferreira [15], where he discusses that the success of the phase fix procedure is strongly dependent on an accurate adjustment of the prediction model parameters and on the model itself used for prediction.

## 7. Conclusions

This work presented a new approach, referred to as Morpological-Rank-Linear Time-lag Added Forecasting (MRLTAEF) model, to overcome the RW dilemma for financial time series forecasting, which performs an evolutionary search for the minimum dimension to determining the characteristic phase space that generates the financial time series phenomenon. It is inspired on Takens Theorem and consists of an intelligent hybrid model composed of a Morpological-Rank-Linear (MRL) filter combined with a Modified Genetic Algorithm (MGA), which searches for the minimum number of time lags for a correct time series representation and estimates the initial (sub-optimal) parameters of the MRL filter (mixing parameter (), rank (r), linear Finite Impulse Response (FIR) filter (b) and the Morphological-Rank (MR) filter (a) coefficients). Each individual of the MGA population is trained by the averaged Least Mean Squares (LMS) algorithm to further improve the MRL filter parameters supplied by the MGA. After adjusting the model, it performs a behavioral statistical test and a phase fix procedure to adjust time phase distortions observed in financial time series.

Five different metrics were used to measure the performance of the proposed MRLTAEF method for financial time series forecasting. A fitness function was designed with these five well-known statistic error measures in order to improve the description of the time series phenomenon as much as possible. The five different evaluation measures used to compose this fitness function can have different contributions to the final prediction, where a more sophisticated analysis must be done to determine the optimal combination of such metrics.

An experimental validation of the method was carried out on four real world financial time series, showing the robustness of the MRLTAEF method through a comparison, according to five performance measures, of previous results found in the literature (MLP, MRL and TAEF models). This experimental investigation indicates a better, more consistent global performance of the proposed MRLTAEF method.

In general, all generated predictive models with the MRLTAEF method using the phase fix procedure (to adjust time phase distortions) showed forecasting performance much better than the MLP model and MRL model, and slightly better than the TAEF model. The MRLTAEF method was able to adjust the time phase distortions of all analyzed time series, while the MLP model and MRL model were not able to adjust the time phase. This fact shows that the success of the phase fix procedure is strongly dependent on the accurate adjustment of parameters of the predictive model and on the model itself used for forecasting. It was also observed that the MRLTAEF model reached a much better performance when compared with a random walk like model, overcoming the random walk dilemma for the analyzed financial times series.

The models generated by the MRLTAEF method are not random walk models. This affirmation is shown with the phase fix procedure. If the MRL filter models were random walk models, the phase fix procedure would generate the same result of the original prediction, since in the random walk model the t+1 value is always the t value.

It is worth mentioning that the first time lag is never selected to predict any time series used in this work. However, a random walk structure is necessary for the phase fix procedure to work, since the key of this procedure is the two step prediction (described by the phase fix procedure) in order to adjust the one step time phase.

Also, one of the main advantages of the MRLTAEF model (apart from its predictive performance when compared to all analyzed models) is that not only they have linear and nonlinear components, but they are quite attractive due to their simpler computational complexity when compared to other approaches such as [33, 34], other MLP-GA models [15] and other statistical models [2-5].

Furthermore, another assumption made by Ferreira [15] was confirmed through the analyzes of the MRL filter mixing parameter (). It was argued that through lagplot analysis it is possible to notice in financial time series indicative structures of some nonlinear relationship among the time lags even though they are super-imposed by a dominant linear component. In all the experiments, the MRLTAEF model set a strong linear component mixed with a weak nonlinear component (it uses ~99% of the linear component of MRL filter and ~1% of the nonlinear component of the MRL filter). Since the MRLTAEF method defines a MRL filter like model, which has the ability to select the percentage of use of the linear and nonlinear components, it is believed that it improves the prediction performance through a balanced estimation of the linear and nolinear relationships.

Future works will consider the development of further studies in order to formalize properties of the proposed model using the phase fix procedure. Also, other financial time series with components such as trends, seasonalities, impulses, steps and other nonlinearities can be used for the efficiency confirmation of the proposed method, as well as, further studies, in terms of risk and financial return, can be developed in order to determine the additional economical benefits, for an investor, with the use of the proposed method.

## Acknowledgments

The authors are thankful to Mr. Chè Donavon David Davis for English support.

chapter PDF
Citations in RIS format
Citations in bibtex format

## How to cite and reference

### Cite this chapter Copy to clipboard

Ricardo de A. Araújo, Gláucio G. de M. Melo, Adriano L. I. de Oliveira and Sergio C. B. Soares (February 1st 2010). Morphological-Rank-Linear Models for Financial Time Series Forecasting, New Achievements in Evolutionary Computation, Peter Korosec, IntechOpen, DOI: 10.5772/8048. Available from:

### Related Content

#### New Achievements in Evolutionary Computation

Edited by Peter Korosec

Next chapter

#### Evolutionary Logic Synthesis of Quantum Finite State Machines for Sequence Detection

By Martin Lukac and Marek Perkowski

#### Particle Swarm Optimization

Edited by Alex Lazinica

First chapter

#### Novel Binary Particle Swarm Optimization

By Mojtaba Ahmadieh Khanesar, Hassan Tavakoli, Mohammad Teshnehlab and Mahdi Aliyari Shoorehdeli

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all Books