16 A Multi Adaptive Neuro Fuzzy Inference System for Short Term Load Forecasting by Using Previous Day Features

For short-term load forecasting several factors should be considered, such as time factors, weather data, and possible customers’ classes. The mediumand long-term forecasts take into account the historical load and weather data, the number of customers in different categories, the appliances in the area and their characteristics including age, the economic and demographic data and their forecasts, the appliance sales data, and other factors [17].


Introduction
Load forecasting had an important role in power system design, planning and development and it is the base of economical studies of energy distribution and power market.The period of load forecasting can be for one year or month (long-term or medium-term) and for one day or hour (short-term) [1, 2, 3, and 4].
For short-term load forecasting several factors should be considered, such as time factors, weather data, and possible customers' classes.The medium-and long-term forecasts take into account the historical load and weather data, the number of customers in different categories, the appliances in the area and their characteristics including age, the economic and demographic data and their forecasts, the appliance sales data, and other factors [17].
The time factors include the time of the year, the day of the week, and the hour of the day.There are important differences in load between weekdays and weekends.The load on different weekdays also can behave differently.For example, in Iran, Fridays is weekends, may have structurally different loads than Saturdays through Thursday.This is particularly true during the summer time.Holidays are more difficult to forecast than non-holidays because of their relative infrequent occurrence.
Several techniques have been used for load forecasting that among its common methods we can refer to linear-regression model, ARMA, BOX-Jenkis [5] and filter model of Kalman, expert systems [6] and ANN [1][2][3][4]7].According to load-forecasting complex nature, however its studying by linear techniques cannot meet the need of having high accuracy and being resistant.Adaptive neural-fuzzy systems can learn and build any non-linear and complex record through educational input-output data.
A Multi Adaptive Neuro Fuzzy Inference System for Short Term Load Forecasting by Using Previous Day Features 339 load consumption and other factors such as weather, day type, and customer class.Engle et al. [18] presented several regression models for the next day peak forecasting.Their models incorporate deterministic influences such as holidays, stochastic influences such as average loads, and exogenous influences such as weather.References [19], [20], [21], [22] describe other applications of regression models to loads forecasting.Time series.Time series methods are based on the assumption that the data have an internal structure, such as autocorrelation, trend, or seasonal variation.Time series forecasting methods detect and explore such a structure.Time series have been used for decades in such fields as economics, digital signal processing, as well as electric load forecasting.In particular, ARMA (autoregressive moving average), ARIMA (autoregressive integrated moving average), ARMAX (autoregressive moving average with exogenous variables), and ARIMAX (autoregressive integrated moving average with exogenous variables) are the most often used classical time series methods.ARMA models are usually used for stationary processes while ARIMA is an extension of ARMA to non-stationary processes.ARMA and ARIMA use the time and load as the only input parameters.Since load generally depends on the weather and time of the day, ARIMAX is the most natural tool for load forecasting among the classical time series models.Fan and McDonald [23] and Cho et al. [24] describe implementations of ARIMAX models for load forecasting.Yang et al. [25] used evolutionary programming (EP) approach to identify the ARMAX model parameters for one day to one week ahead hourly load demand forecast.Evolutionary programming [26] is a method for simulating evolution and constitutes a stochastic optimization algorithm.Yang and Huang [27] proposed a fuzzy autoregressive moving average with exogenous input variables (FARMAX) for one day ahead hourly load forecasts.
Neural networks.The use of artificial neural networks (ANN or simply NN) has been a widely studied electric load forecasting technique since 1990 [28].Neural networks are essentially non-linear circuits that have the demonstrated capability to do non-linear curve fitting.The outputs of an artificial neural network are some linear or nonlinear mathematical function of its inputs.The inputs may be the outputs of other network elements as well as actual network inputs.In practice network elements are arranged in a relatively small number of connected layers of elements between network inputs and outputs.Feedback paths are sometimes used.In applying a neural network to electric load forecasting, one must select one of a number of architectures (e.g.Hopfield, back propagation, Boltzmann machine), the number and connectivity of layers and elements, use of bi-directional or unidirectional links, and the number format (e.g.binary or continuous) to be used by inputs and outputs, and internally.
The most popular artificial neural network architecture for electric load forecasting is back propagation.Back propagation neural networks use continuously valued functions and supervised learning.That is, under supervised learning, the actual numerical weights assigned to element inputs are determined by matching historical data (such as time and weather) to desired outputs (such as historical electric loads) in a pre-operational "training session".Artificial neural networks with unsupervised learning do not require preoperational training.Bakirtzis et al. [29] developed an ANN based short-term load forecasting model for the Energy Control Center of the Greek Public Power Corporation.In the development they used a fully connected three-layer feed forward ANN and back propagation algorithm was used for training.
Input variables include historical hourly load data, temperature, and the day of the week.The model can forecast load profiles from one to seven days.Also Papalexopoulos et al. [30] developed and implemented a multi-layered feed forward ANN for short-term system load forecasting.In the model three types of variables are used as inputs to the neural network: season related inputs, weather related inputs, and historical loads.Khotanzad et al. [31] described a load forecasting system known as ANNSTLF.ANNSTLF is based on multiple ANN strategies that capture various trends in the data.In the development they used a multilayer perceptron trained with the error back propagation algorithm.ANNSTLF can consider the effect of temperature and relative humidity on the load.It also contains forecasters that can generate the hourly temperature and relative humidity forecasts needed by the system.An improvement of the above system was described in [32].In the new generation, ANNSTLF includes two ANN forecasters, one predicts the base load and the other forecasts the change in load.The final forecast is computed by an adaptive combination of these forecasts.The effects of humidity and wind speed are considered through a linear transformation of temperature.As reported in [32], ANNSTLF was being used by 35 utilities across the USA and Canada.Chen et al. [4] developed a three layer fully connected feed forward neural network and the back propagation algorithm was used as the training method.Their ANN though considers the electricity price as one of the main characteristics of the system load.Many published studies use artificial neural networks in conjunction with other forecasting techniques (such as with regression trees [26], time series [33] or fuzzy logic [34]).
Expert systems.Rule based forecasting makes use of rules, which are often heuristic in nature, to do accurate forecasting.Expert systems, incorporates rules and procedures used by human experts in the field of interest into software that is then able to automatically make forecasts without human assistance.
Expert system use began in the 1960's for such applications as geological prospecting and computer design.Expert systems work best when a human expert is available to work with software developers for a considerable amount of time in imparting the expert's knowledge to the expert system software.Also, an expert's knowledge must be appropriate for codification into software rules (i.e. the expert must be able to explain his/her decision process to programmers).An expert system may codify up to hundreds or thousands of production rules.Ho et al. [35] proposed a knowledge-based expert system for the shortterm load forecasting of the Taiwan power system.Operator's knowledge and the hourly observations of system load over the past five years were employed to establish eleven day types.Weather parameters were also considered.The developed algorithm performed better compared to the conventional Box-Jenkins method.Rahman and Hazim [36] developed a site-independent technique for short-term load forecasting.Knowledge about the load and the factors affecting it are extracted and represented in a parameterized rule base.This rule base is complemented by a parameter database that varies from site to site.The technique was tested in several sites in the United States with low forecasting errors.
The load model, the rules, and the parameters presented in the paper have been designed using no specific knowledge about any particular site.The results can be improved if operators at a particular site are consulted.
Fuzzy logic.Fuzzy logic is a generalization of the usual Boolean logic used for digital circuit design.An input under Boolean logic takes on a truth value of "0" or "1".Under fuzzy logic an input has associated with it a certain qualitative ranges.For instance a transformer load may be "low", "medium" and "high".Fuzzy logic allows one to (logically) deduce outputs from fuzzy inputs.In this sense fuzzy logic is one of a number of techniques for mapping inputs to outputs (i.e.curve fitting).
Among the advantages of fuzzy logic are the absence of a need for a mathematical model mapping inputs to outputs and the absence of a need for precise (or even noise free) inputs.With such generic conditioning rules, properly designed fuzzy logic systems can be very robust when used for forecasting.Of course in many situations an exact output (e.g. the precise 12PM load) is needed.After the logical processing of fuzzy inputs, a "defuzzification" process can be used to produce such precise outputs.References [37], [38], [39] describe applications of fuzzy logic to electric load forecasting.Support vector machines.Support Vector Machines (SVMs) are a more recent powerful technique for solving classification and regression problems.This approach was originated from Vapnik's [40] statistical learning theory.Unlike neural networks, which try to define complex functions of the input feature space, support vector machines perform a nonlinear mapping (by using so-called kernel functions) of the data into a high dimensional (feature) space.Then support vector machines use simple linear functions to create linear decision boundaries in the new space.The problem of choosing an architecture for a neural network is replaced here by the problem of choosing a suitable kernel for the support vector machine [41].Mohandes [42] applied the method of support vector machines for short-term electrical load forecasting.The author compares its performance with the autoregressive method.The results indicate that SVMs compare favorably against the autoregressive method.Chen et al. [43] proposed a SVM model to predict daily load demand of a month.Their program was the winning entry of the competition organized by the EU Load NITE network.Li and Fang [44] also used a SVM model for short-term load forecasting.

Consumed load model
The load forecasting art is in selecting the most appropriate way and model for and the closest ones to the existing reality of the network among different methods and models of load forecasting, by studying and analyzing the last procedure of load and recognizing the effective factors sufficiently and maximizing each of them, and then in this way it forecasts different time periods required for the network with an acceptable approximation.It should be accepted that there is always some error in load forecasting due to the accidental load behavior but never this error should go further than the acceptable and tolerable limit.Relative accuracy has a particular importance in load forecasting in power industry.Especially when load forecasting is the basis of network development planning and power plant capacity.Since, any forecasting with open hand causes extra investment and the installation capacity to be useless and vice versa any forecasting less than real needs, faces the network with shortage in production and damages the instruments due to extra load.
Consumed load model is influenced by different parameters such as weather, vacations or holidays, working days of week and etc. in order to build a short-term load forecasting system, we should consider the influence of different parameters in load forecasting, which it can be full field by a correct selection of system entries.Selection of these parameters depends on experimental observations and is influenced by the environment conditions and is determined by trial and error.

Reviewing the predictability of time series by the help of lyapunov exponent 6
Chaos is a phenomenon that occurs in many non-linear definable systems which show a high sensitivity to the primary conditions and semi random behavior.These systems will remain stable in the chaotic mode if they provide the Lyapunov exponent equations.

Background
Detecting the presence of chaos in a dynamical system is an important problem that is solved by measuring the largest Lyapunov exponent.Lyapunov exponents quantify the exponential divergence of initially close state-space trajectories and estimate the amount of chaos in a system.[50] Over the past decade, distinguishing deterministic chaos from noise has become an important problem in many diverse fields, e.g., physiology [51], economics [52].This is due, in part, to the availability of numerical algorithms for quantifying chaos using experimental time series.In particular, methods exist for calculating correlation dimension (D2 ) [53], Kolmogorov entropy [54], and Lyapunov characteristic exponents.Dimension gives an estimate of the system complexity; entropy and characteristic exponents give an estimate of the level of chaos in the dynamical system.The Grassberger-Procaccia algorithm (GPA) [53] appears to be the most popular method used to quantify chaos.This is probably due to the simplicity of the algorithm [55] and the fact that the same intermediate calculations are used to estimate both dimension and entropy.
However, the GPA is sensitive to variations in its parameters, e.g., number of data points [56], embedding dimension [56], reconstruction delay [57], and it is usually unreliable except for long, noise-free time series.Hence, the practical significance of the GPA is questionable, and the Lyapunov exponents may provide a more useful characterization of chaotic systems.
For time series produced by dynamical systems, the presence of a positive characteristic exponent indicates chaos.Furthermore, in many applications it is sufficient to calculate only the largest Lyapunov exponent ( 1).However, the existing methods for estimating 1 suffer from at least one of the following drawbacks: (1) unreliable for small data sets, (2) computationally intensive, (3) relatively difficult to implement.For this reason, we have developed a new method for calculating the largest Lyapunov exponent.The method is reliable for small data sets, fast, and easy to implement."Easy to implement" is largely a subjective quality, although we believe it has had a notable positive effect on the popularity of dimension estimates.For a dynamical system, sensitivity to initial conditions is quantified by the Lyapunov exponents.For example, consider two trajectories with nearby initial conditions on an attracting manifold.When the attractor is chaotic, the trajectories diverge, on average, at an exponential rate characterized by the largest Lyapunov exponent [58].This concept is also generalized for the spectrum of Lyapunov exponents, i (i=1, 2, ..., n), by considering a small n-dimensional sphere of initial conditions, where n is the number of equations (or, equivalently, the number of state variables) used to describe the system.As time (t) progresses, the sphere evolves into an ellipsoid whose principal axes expand (or contract) at rates given by the Lyapunov exponents.
The presence of a positive exponent is sufficient for diagnosing chaos and represents local instability in a particular direction.Note that for the existence of an attractor, the overall dynamics must be dissipative, i.e., globally stable, and the total rate of contraction must outweigh the total rate of expansion.Thus, even when there are several positive Lyapunov exponents, the sum across the entire spectrum is negative.
Wolf et al. [59] explain the Lyapunov spectrum by providing the following geometrical interpretation.First, arrange the n principal axes of the ellipsoid in the order of most rapidly expanding to most rapidly contracting.It follows that the associated Lyapunov exponents will be arranged such that > >…..> where and correspond to the most rapidly expanding and contracting principal axes, respectively.Next, recognize that the length of the first principal axis is proportional to ; the area determined by the first two principal axes is proportional to ; and the volume determined by the first k principal axes is proportional to ⋯ .Thus, the Lyapunov spectrum can be defined such that the exponential growth of a k-volume element is given by the sum of the k largest Lyapunov exponents.Note that information created by the system is represented as a change in the volume defined by the expanding principal axes.The sum of the corresponding exponents, i.e., the positive exponents, equals the Kolmogorov entropy (K) or mean rate of information gain [58]: K=∑ When the equations describing the dynamical system are available, one can calculate the entire Lyapunov spectrum.The approach involves numerically solving the system's n equations for n+1 nearby initial conditions.The growth of a corresponding set of vectors is measured, and as the system evolves, the vectors are repeatedly reorthonormalized using the Gram-Schmidt procedure.This guarantees that only one vector has a component in the direction of most rapid expansion, i.e., the vectors maintain a proper phase space orientation.In experimental settings, however, the equations of motion are usually unknown and this approach is not applicable.Furthermore, experimental data often consist of time series from a single observable, and one must employ a technique for attractor reconstruction, e.g., method of delays [60], singular value decomposition.
As suggested above, one cannot calculate the entire Lyapunov spectrum by choosing arbitrary directions for measuring the separation of nearby initial conditions.One must measure the separation along the Lyapunov directions which correspond to the principal axes of the ellipsoid previously considered.These Lyapunov directions are dependent upon the system flow and are defined using the Jacobian matrix, i.e., the tangent map, at each point of interest along the flow [58].Hence, one must preserve the proper phase space orientation by using a suitable approximation of the tangent map.This requirement, however, becomes unnecessary when calculating only the largest Lyapunov exponent.
If we assume that there exists an ergodic measure of the system, then the multiplicative ergodic theorem of Oseledec [61] justifies the use of arbitrary phase space directions when calculating the largest Lyapunov exponent with smooth dynamical systems.We can expect that two randomly chosen initial conditions will diverge exponentially at a rate given by the largest Lyapunov exponent [62].In other words, we can expect that a random vector of initial conditions will converge to the most unstable manifold, since exponential growth in this direction quickly dominates growth (or contraction) along the other Lyapunov directions.Thus, the largest Lyapunov exponent can be defined using the following equation where d(t) is the average divergence at time t and C is a constant that normalizes the initial separation: For experimental applications, a number of researchers have proposed algorithms that estimate the largest Lyapunov exponent [55,59], the positive Lyapunov spectrum, i.e., only positive exponents [59], or the complete Lyapunov spectrum [58].Each method can be considered as a variation of one of several earlier approaches [59] and as suffering from at least one of the following drawbacks: (1) unreliable for small data sets, (2) computationally intensive, (3) relatively difficult to implement.These drawbacks motivated our search for an improved method of estimating the largest Lyapunov exponent.

Calculation of lyapunov exponent for time series
In order to calculate Lyapunov exponent for those systems which their equation is not determined and their time series is not available, different algorithm is suggested [45][46][47][48][49].
The algorithm proposed by Wolf [48], seeks the time series of close points in the phase space.These points went round the phase space or got divergent rapidly.Close points in the same direction are selected.
The differential coefficient is in the direction of the maximum development and their average logarithm on the route of phase space yields the biggest Lyapunov exponent.Suppose that series of x , x , x ,… x is available and the interval between them is obtained as tt = n that τ is the interval between two successive measurement.If the system has chaotic behavior, we can explain divergence of the adjacent routes based on the difference range between them, as following.

= (3)
It is supposed that d will increase exponential by n increase: = So by calculating its logarithm we have:

= Ln (5)
There should be at least one Lyapunov exponent bigger than zero to have chaos, the existence of positive value of means the chaotic behavior of system.Therefore, in order to Table 1 we can expect system to forecast.

Preparing the input data
First step in the process of electricity load forecasting is to provide last information of the system load being studied.After preparing the input data matrix, it is turn of classification.The reason of this classification is the existence of completely determined models in different days that were referred to in many references.Among different days of weeks, Saturday to Thursday which are working days in Iran, have the same load model.Fridays have also their own particular model and have a low level of load.Special days have a completely different model, too.So it seems necessary at the first look that each of these classes should be analyzed separately.We cons ider 2 groups of features that refer to previous days; 2, 7, and 14 day ago, and 2, 3, 4 day ago.

Adaptive neural-Fuzzy inference system
ANFIS, proposed by Jang [14,15], is an architecture which functionally integrates the interpretability of a fuzzy inference system with adaptability of a neural network.Loosely speaking ANFIS is a method for tuning an existing rule base of fuzzy system with a learning algorithm based on a collection of training data found in artificial neural network.Due to the less tunable use of parameters of fuzzy system compared with conventional artificial neural network, ANFIS is trained faster and more accurately than the conventional artificial neural network.An ANFIS which corresponds to a Sugeno type fuzzy model of two inputs and single output is shown in Fig. 1.A rule set of first order Sugeno fuzzy system is the following form: Rule i: If x is Ai and y is Bi then fi = p i x+q i y+r i .
ANFIS structure as shown in Figure 1 is a weightless multi-layer array of five different elements [15]:

The proposed method for power consumed load forecasting
Since fuzzy methods and systems were presented for using in different applications, researchers noticed that making a fuzzy powerful system is not a simple work.The reason is that finding suitable fuzzy rules and membership functions is not a systematic work and mainly requires many trails and errors to reach to the best possible efficiency.Therefore the idea of using learning algorithms was proposed for fuzzy systems.Meanwhile learning of fuzzy network proposed them as the first goals for being unified in fuzzy methods in order to make the development and usage process of fuzzy systems automatic for different applications.Function estimation by using the learning methods is proposed in neural networks and neural-fuzzy networks.
In the suggested methods we forecast load consume and its improvement by the help of the offered method.One of the famous neural-fuzzy systems for function estimation is ANFIS model.We used this system for power consumed load forecasting in this paper too, but with this difference that we used one separate adaptive neural-fuzzy system for each season of the year.Although at the time of training these systems data overlapping is considered, because data of each season of the year is not completely independent and there is some similarities between the first days of a season with its previous season regarding the amount of load consumption.Figure 2 shows the diagram of multi adaptive neural-fuzzy system (multi ANFIS).
As it is shown too, in the Figure 2, we us a switch for any subsystem of a season be thought in lieu of that season.Therefore the time of system training and testing will decrease and the entrance of extra data is prevented.

Result
In the proposed method we classified day into two categories.We divide the season days into two groups of working days (Saturday to Thursday) and holidays that their load consumption is different from other days.
Here we also calculated the output of Multi ANFIS based on the features of previous day, one time with 2, 7, and 14 day ago and another time with 2, 3, and 4 day ago.You can see the results in Table 2 and 3.
The amount of the accuracy of the performance of any of calculation methods in load forecasting is determined by measuring the obtained values of system model and comparing it with real data.As it is obvious of the above Tables, making working days separate from holidays with using previous days features (2, 7,and 14 day ago) yields a better result, in load consumption forecasting.

Conclusion and suggestion
Comparing mentioned methods above shows that separation of working days from holidays has a better result in load consumption forecasting.As shown in Figure 5 we can see that using the features of 2, 7 and 14 day ago are better than 2, 3 and 4 day ago.A cyan and yellow line are refer to 3 and 4 day ago.We can see that these features cannot have good effect on load forecasting.
According to this that in most proposed methods load consumption time series data is used; it seems that we can obtain better results by using time series data of one or more parameters effective in load consumption [16] also with load consumption time series.Accurate load forecasting is very important for electric utilities in a competitive environment created by the electric industry deregulation.
µA(x) = | |  a i , b i , c i is the parameter set.Parameters are referred to as premise parameters.

Fig. 1 .
Fig. 1.ANFIS architecture  Layer 2: The activation of fuzzy rules is calculated via differentiable T-norms (usually, the soft-min or product);  Every node in this layer is a fixed node labeled Prod. The output is the product of all the incoming signals. O 2,i = w i = µA i (x) • µB i (y), i = 1, 2  Each node represents the fire strength of the rule  Any other T-norm operator that perform the AND operator can be used  Layer 3: A normalization (arithmetic division) operation is realized over the rules matching values;  Every node in this layer is a fixed node labeled Norm. The ith node calculates the ratio of the ith rulet's firing strenght to the sum of all rulet's firing strengths. O 3,i = i = , i = 1, 2  Outputs are called normalized firing strengths. Layer 4: The consequent part is obtained via linear regression or multiplication between the normalized activation level and the output of the respective rule;  Every node i in this layer is an adaptive node with a node function:

Fig. 3 .
Fig. 3. Power load forecasting for Working days (Saturday to Thursday) of fall with features of 2, 3, and 4 day ago

Fig. 4 .
Fig. 4. Power load forecasting for Working days (Saturday to Thursday) of fall with features of 2, 7, and 14 day ago

Fig. 5 .
Fig. 5. Compare of the feature of 2, 7 and 14 day ago with 2, 3 and 4 day ago

Table 1 .
Lyapunov exponent for seasons of one year x (or y) is the input node i and A i (or B i−2 ) is a linguistic label associated with this node  Therefore O 1,i is the membership grade of a fuzzy set (A1,A2,B1,B2).
= i f i = i (p x + q i y + r i )i is the normalized firing strenght from layer 3.  {p i , q i , r i } is the parameter set of this node.The main objective of the ANFIS design is to optimize the ANFIS parameters.There are two steps in the ANFIS design.First is design of the premise parameters and the other is consequent parameter training.There are several methods proposed for designing the premise parameter such as grid partition, fuzzy C-means clustering and subtractive clustering.Once the premise parameters are fixed, the consequent parameters are obtained based on the input-output training data.A hybrid learning algorithm is a popular learning algorithm used to train the ANFIS for this purpose.

Table 2 .
A Multi Adaptive Neuro Fuzzy Inference System for Short Term Load Forecasting by Using Previous Day Features 349 Mean Absolute Percentage Error (MAPE) is used for studying the performance of every mentioned method with the data of related test.MAPE is determined by following relation: Power load consumption forecasting for the working days (saturday to thursday) with 2, 3, and 4 day ago

Table 3 .
Power load consumption forecasting for the working days (saturday to thursday) with 2, 7, and 14 day ago