Data Mining-Based Identification of Nonlinear Systems

This chapter presents identification methods using associative search of analogs and wavelet analysis. It investigates the properties of data mining-based identification algorithms which allow to predict: (i) the approach of process variables to critical values and (ii) process transition to chaotic dynamics. The methods proposed are based on the modeling of human operator decision-making. The effectiveness of the methods is illustrated with an example of product quality prediction in oil refining. The development of fuzzy analogs of associative identification models is further discussed. Fuzzy approach expands the application area of associative techniques. Finally, state prediction techniques for manufacturing resources are developed on the basis of binary models and a machine learning procedure, which is named associative rules search.


Introduction
The reduction of uncertainty in object description in terms of adjustable model has been a key conceptual direction in the identification theory and applications for a long time. In the statistical description of uncertainty, consistent estimates of plant's characteristics can be obtained by analyzing the convergence of the empirical distribution functional with the corresponding "theoretical" values, but this entails appropriate increase of the sample size. The difficulties in implementing this approach, especially for nonlinear and nonstationary objects, along with the increased possibilities of plant history analysis resulted in the advent of identification methods based on data mining [1].
The use of additional a priori information on the system for its training is considered by some authors today to be one of the key trends in the theory and practice of identification [2,3].
One method that implements this approach to identification is the associative search method based on the design of predictive models [2]. They are based on inductive learning, that is, on associative search of analogs by means of intelligent analysis of process history and knowledge base development. The development of a predictive model for a dynamic object by associative search technique (i.e., by building a new model at every time step) is based on the generated and updated knowledge about the system. This approach allows to use any available a priori information about the plant [3].
The stability of a model built using the associative search techniques is investigated in terms of the spectrum analysis of a multi-scale wavelet expansion [4]. Methods based on the wavelet analysis open up a unique possibility to select "frequency-domain windows" as against the well-known windowed Fourier transform.
The development of intelligent identification algorithms for nonlinear and nonstationary objects is important for various applications, in particular, in chemical, oil refining, and power (smart grids) industries; transportation and logistics system; and trading processes (Bakhtadze et al. [1,2,[4][5][6][7]).

Control system identification
Consider a traditional problem of dynamic object identification. For input vectors meeting Gauss-Markov assumptions, the least squares parameter estimates are consistent, unbiased, and efficient. However, the development of a closed-loop control system (for identification-based control system synthesis) faces considerable challenges. In a closed loop, the system state depends on control values at earlier time instants, which results in a degeneration problem.
To develop an informational model of control system's dynamics in a degenerate case, the Moore-Penrose method [8,9] can be used for getting pseudo-solutions to a linear system by means of least squares techniques.
For a wide class of objects and, in particular, processes, control based on a linear model identification is not satisfactory. At the same time, models constructed by the method of associative search frequently are highly accurate even for nonlinear objects. However, some processes can be characterized by certain "irregularities" in certain time intervals, which affect the accuracy and adequacy of associative models.
Examples of such irregularities (which are often oscillatory in engineering systems) can be: • seasonal and daily load oscillations in power networks that affect directly the optimization of power transmission control modes; • ups and downs of stock market caused by various economic reasons; • feed source changes in industrial process, and so on.

Associative search as intelligent modeling method
The difference between the associative search method based on data mining and traditional identification techniques is as follows. The method does not approximate process dynamics in time; it rather builds a new predictive model of the dynamic object (a "virtual model") at each time step using historical data sets ("associations") generated at the training phase.
As a result, at any time step, process control decision-making by a human individual (process operator, supervisor, plant or enterprise manager, trading operator, etc.) is modeled on the base of his/her knowledge and emerging associations.
Clustering (self-organizing learning) is an effective way to form associations.
Knowledge in intelligent systems is of two types [10]. The first type of knowledge, that is, declarative knowledge, by means of appropriate ontologies describes different facts, events, and observation. A formal description of skills is called procedural knowledge. Depending on the level of this knowledge, users can be referred to as beginners or experts [11]. These two groups have different structures and ways of thinking. Beginners use so-called inverse reasoning in the procedure for decision-making. They make decisions based on the analysis of the information obtained in the previous step. In contrast to the beginners, experts at an intuitive, subconscious level form the so-called direct reasoning. Thus, cognitive psychology defines knowledge as a collection of symbols stored in the memory of a particular person [12]. The symbols, in turn, can be determined by their structure and the nature of neuron links [13].
Knowledge processing in an intelligent system consists in the recovery (associative search) of knowledge by its fragment [14]. The knowledge can be defined as an associative link between images ( Figure 1). As an image, we will use "feature sets," that is, components of input vectors or input variables. The set of all associations over the set of images forms the memory of the intelligent system's knowledge base.
The associative search process can be either an image reconstruction procedure by a feature set (this set may not be complete; this approach is often used in models of a human associative memory), or the search procedure of other images in the archive, similar to the image under study by a certain criterion.
In Ref. [14], a model of decision-making search by the human operator is proposed, representing the process of associative thinking as a sequence of sets of associations. Association is a pair of images (the image-source and the imageoutput), wherein each image is described by a set of features. This approach is intermediate between neural networks and logical models in the classical theory of artificial intelligence.
The criterion for the similarity of two images in the general case can be represented as a logical function-a predicate. In the particular case, the features have a numerical expression. The feature sets that form the image are vectors in ndimensional space. In this case, as a criterion of image similarity can be a metric in the space.

Associative search technique
Associative search method consists in constructing virtual predictive models. The term "virtual" should be understood as "ad hoc" [2]. The method presumes the construction of a predictive model for a dynamic object as follows. A traditional identification algorithm approximates real process in time. As against such algorithms, our method builds a new model at each time step t based on the analysis of the history data set ("associations") formed at the stage of learning and further adaptively corrected in accordance to certain criteria.
Within the present context, linear dynamic model is of the form: where y N is the prediction of the object's output at the time instant N, x N is the input vector, m is the memory depth in the output, r s is the memory depth in the input, S is the dimension of the input vectors, and a i and b j, s are tuning coefficients of the model. Model (1) is a regression whose structure is determined by a criterion of similarity of images forming the association.
In general, a new structure is formed for each time instant. The associative model is virtual in the sense that for each time step, it formed a new structure. For each current input vector, the corresponding input vectors and their corresponding outputs are selected from the archive. Further, a system of linear equations with respect to the adjustable coefficients is formed. Its decision in accordance with the least squares method determines the point linear model of a nonlinear object, as well as the output forecast.
Thus, each point of the global nonlinear regression surface is formed as a result of using linear "local" models at each new time step.
The set of values of inputs at each fixed point and the corresponding output replenish the procedural knowledge base.
Unlike classical regression models, for each fixed time instant from the process history, input vectors are selected close to the current input vector in the sense of a certain criterion (rather than the chronological sequence as in regression models). Thus, in Eq. (1), r s is the number of vectors from the archive (from the time instant 1 to the time instant N), selected in accordance to the associative search criterion. A certain set of vectors r s , 1 ≤ r s ≤ N, is selected at each time segment N À 1; N ½ . The criterion for selecting the input vectors from the archive is described below (Figure 2). As a distance (a norm in R S ) between points of the S-dimensional space of inputs, we introduce the value: where x N, s are the components of the input vector at the current time instant N. By virtue of a property of the norm ("the triangle inequality"), we have: Let for the current input vector x N : To derive an approximating hypersurface for the vector x N , we select from the archive of the input data such vectors x NÀj , j ¼ 1,´N that for a set D N the condition: holds, where D N may be selected, for instance, from the condition ( Figure 3): x NÀj, s : Under the assumptions that the inputs meet the Gauss-Markov conditions, the estimates obtained via the LS method are unbiased and statistically effective.

Fuzzy virtual models
Fuzzy models under uncertainty are advisable to apply in decision-making systems in the following cases [3]: • dynamics of the investigated quality index is described by a complex nonlinear dependence; and • one or more factors of this dynamics are weakly or not formalized.
In fuzzy systems, the most commonly used technique is the production rule one. The production rule consists of antecedent (or several premises) and consequent. In the general case, the premises are connected by logical operators AND and OR.
Fuzzy systems are based on production-type rules with linguistic variables used as premise and conclusion in the rule.
By renaming the variables, the linear dynamic plant's model can be represented as follows: The fuzzy system based on the production rules has the form: Clear values of fuzzy variables X i and Y are denoted by x i and y, respectively. l is the number of fuzzy values. LY j is the name of the output linguistic term. The rule base in the fuzzy Mamdani system is a set of fuzzy rules such as: The j-th fuzzy rule in the singleton-type system looks as follows: where r j is a real number to estimate the output y.
The j-th rule in the Takagi-Sugeno model [15] looks as follows: where the output y is estimated by a linear function. Thus, the fuzzy system performs the mapping L : R nþm ! R.
The grade of crisp variable x i membership in the fuzzy notion LX ij is determined by membership functions μ LX ij (x i ). The rule base is formed by the criterion of minimum output error which can be defined by the following expressions: where К is the number of samples.
Depending on the features of the object and the purpose of identification, various fuzzy models can be formed. Thus, the Takagi-Sugeno model is most suitable for objects with complex nonlinear dynamics, such as moving objects, in the control of which the accuracy requirements prevail.
A fuzzy model of the Mamdani type is suitable for problems in the solution of which it is important to form knowledge based on data analysis.
The singleton-type system may be used in both identification and knowledgeformation tasks.
Singleton-type fuzzy model performs the mapping L : R nþm ! R where the fuzzy conjunction operator is replaced by a product, and the operator of fuzzy rules aggregation, that is, by summation. The mapping L is defined by the following expression: q is the number of rules in a fuzzy model; n þ m is the number of input variables in the model; and μ LX ij x ij À Á is the membership function.
The expression for L mapping in the Takagi-Sugeno model looks as follows: In Mamdani fuzzy systems, fuzzy logic techniques are used for describing the input vector's x mapping into the output value y, for example, Mamdani approximation or a method based on a formal logical proof.
Let the variables in (1) be fuzzy. In this case, (1) can be represented as a fuzzy model of Takagi-Sugeno (TS) [15].
To form the model, product rules with linear finite-difference equations on the right-hand side are defined (for simplicity, we consider one-input case, i.e., P = 1): where: , and membership functions: where m ¼ r ¼ s þ 1, one obtains the analytic form of the fuzzy model, intended for calculating the output ŷ t ð Þ:ŷ where c ¼ c 1 0 ; … ; c n 0 ; … ; c 1 m ; … ; c n m À Á T is the vector of the adjustable parameters; is a fuzzy function where ⊗ denotes the minimization operation of fuzzy product.
If for t = 0, the vectorc 0 ð Þ ¼ 0, the correcting mn Â nm matrix Q 0 ð Þ (m is the number of input vectors, n is the number of production rules), and the values of u t ð Þ, t ¼ 1, … , N are specified, the parameter vector c t ð Þ is calculated using the known multi-step LSM: Q 0 ð Þ ¼ γI, γ > > 1, where I is the unit matrix. The above equations show that even in case of one-dimensional input and few production rules, a lot of observations are needed to apply LSM which makes the fuzzy model too unwieldy. Therefore, only a part of the whole set of rules (r < n) should be chosen according to a certain criterion.
The application of the associative search techniques where one or more model parameters are fuzzy is reduced to such determination of the predicate , so that the number of production rules in the TS model is significantly reduced according to some criterion.
For example, the following matrix: can be defined for P-dimensional input vectors at time steps tÀj, j = 1, …, s. If the rows of this matrix are ranged, say, w.r.t. ∑ P p¼1 β Θ i p decrease and a certain number of rows are selected, then such selection combined with condition (4) will determine the predicate Ξ and, respectively, the criterion for selecting the images (sets of input vector) from the history. Let us range the rows of this matrix, for example, subject to the criterion of descending the values ∑ P p¼1 β Θ i p , and select a certain number of rows. Such selection combined with condition (4) defines the predicate Ξ ¼ Ξ i R a 0 , R a ; T a À Á È É , and, respectively, the image selection criterion (sets of input vectors) from the archive.

Fuzzy associative search
Notwithstanding all benefits delivered by fuzzy techniques, their application significantly reduces the calculation speed that is critical for predicting the dynamics of some plants. This consideration coupled with the principal impossibility of formalizing some factors necessitated the development of algorithms that could combine all advantages of fuzzy approach and associative search algorithms.
Assume the associative search procedure is determined by the predicate Ξ(P a , R a ), which interprets input variables' limits (specified, say, by process specifications) as a fuzzy conjunction of input variables: Then, the production rules, where fuzzy variables possess such values that Ξ(P a , R a ) possesses the value FALSE, will be discarded automatically. This reduces drastically the number of production rules employed in the fuzzy model and thus increases significantly the algorithms' speed.

Solving the associative search problem by means of clusterization techniques
The associative search problem is solved by clustering technique (both crisp and fuzzy) in the following way.
The current vector under investigation is attributed to a certain cluster per the criterion of minimum distance to the center: wherex N ∈ X is the current input vector of the control plant under investigation. Within this cluster, the vectors are sought that satisfy the assigned associative criterion. It may turn out that one cannot find within this cluster the number of vectors necessary to solve the problem of forecasting using the method of least squares. In this case, one of the known methods of combining two clusters with the minimum distance between any two of their members can be applied. This approach provides significant savings in computing resources compared to searching through a full search. However, such a combination of clusters does not yet guarantee the solution of the problem. The approach described below looks the most reasonable.

Virtual clustering ("impostor" method)
The current input vector at any particular time can be assigned to a specific cluster. This can, for example, be done by the criterion of the minimum distance to the center. Let be satisfied for k = r. Letx N denote the center of the cluster A r . If additional selection of input vectors from the archive is required (to form a system of a sufficient number of equations to identify the system using the associative search method), clusters with the minimum distance between their centers andx N are selected for the join. This approach allows not only to discard a significant number of vectors removed froḿ x N , but also to select from the archive the maximum possible number of vectors satisfying the criterion of associative search.
After the completion of this procedure, assigningx N as the cluster center A r is canceled, and the procedure of the formation of virtual (relevant to the certain time instant) models continues using conventional clustering algorithms.

Case study: oil refining product quality modeling
Key process equipment of an atmospheric distillation unit comprises of cold and hot crude oil preheat trains, desalter, a flash drum or, instead, a pre-flash column with an overhead reflux drum, atmospheric heaters, and an atmospheric distillation column with a reflux drum and three side stripping columns for middle distillates (typically, kerosene, light diesel and heavy diesel aka atmospheric gas oil). The naphtha streams from both reflux drums are re-combined and further sent to downstream stabilization and rerun facilities. The atmospheric residuum from the bottom of the main atmospheric column is typically streamed to a vacuum distillation section.
To obtain a soft sensor model for the 10% distillation point of a kerosene stream, the lab data for this quality were collected along with process data from the atmospheric column. The predictive model is formed by means of the associative search method. The process data were analyzed, and process variables measured by plant instruments were selected for modeling along with the distillation point sampled at the plant and measured in the refinery's laboratory. Based on the preliminary data analysis, the following linear predictive model was developed: where T t ð Þ is the desired estimate; F i t À j ð Þare various process parameters, such as flows, temperatures, and pressures, measured directly at the plant; and b 1 , … , b 12 are model's coefficients.
The forecast was calculated per linear and associative models for 10,525 time steps (1 step = 10 min). Figure 4 shows simulation results for the steps t ¼ 102´, 301.

Application of wavelet approach to the analysis of nonstationary processes
Within the last two decades, applying wavelet transform (WT) to the analysis of nonstationary processes has been widely used. The wavelet transform of signals is a generalization of the spectral analysis, for instance, with regard to the Fourier transform.
First papers on the wavelet analysis of time (spatial) series with a pronounced heterogeneity appeared in the end of 1980s [16,17]. The method was positioned as an alternative to the Fourier transform, localizing the frequencies but not providing the time extension of a process under study. In sequel, the theory of wavelets has appeared and is developed, as well as its numerous applications.
The scope of wavelet analysis today is very wide: it includes the synthesis and processing of nonstationary signals, compression and coding of information, image recognition and image analysis, the study of functions and time-dependent signals and inhomogeneity in space. The approach is effective for tasks where the results of the analysis should contain not only the characteristics of the frequency signal (signal power distribution by frequency components) but also information about local coordinates in which certain groups of frequency components manifest themselves or in which rapid changes in the frequency components of the signal occur. A significant number of practical applications have been created, including in health care, the study of geophysical fields, temporary meteorological series, and prediction of earthquakes [18].
The wavelet analysis method consists in applying a special linear conversion of signals. In particular, it becomes possible to study the physical properties or dynamics of real objects and processes in depth. For example, it can be processes in manufacturing. The wavelet transform (WT) of a one-dimensional signal is its representation in the form of a generalized Fourier series (or Fourier integral) over a system of basis functions called the "wavelet." A wavelet is characterized by the fact that the function that forms it (a wavelet-formation function or a wavelet matrix) is distinguished by a certain scale (frequency) and localization in time based on the time shift and the change in the time scale.
The time scale is analogous to the oscillation period, that is, it is inverse one with regard to the frequency, and the shift interprets the displacement of the signal over the time axis.
The wavelet transform performs the projection of a one-dimensional process into a two-dimensional surface in three-dimensional space. The frequency and time are treated as independent variables.
At the same time, it becomes realistic to simultaneously study the properties of the process being studied both in the time domain and in the frequency domain. It becomes possible to investigate the dynamics of the frequency process and its local features. This allows us to identify the coordinates at which certain frequencies manifest themselves most significantly.
The graphical representation of the wavelet analysis can be displayed in the form of isolines, illustrating the change in the intensities of wavelet transform coefficients at different time scales, and also for revealing local extrema of surfaces.
If a function is used in the Fourier transform that generates an orthonormal basis of space by means of a scale transformation, then the wavelet transform is formed using a basis function localized in a bounded domain, although defined on the whole numerical axis.
The wavelet transform, as a mathematical tool, serves mainly to analyze data in the time and frequency domains.
Wavelet transformation, as a mathematical tool, provides the ability to analyze data in the time and frequency domains simultaneously. The wavelet transform can provide time-frequency information about a function that in many practical situations is more relevant than information obtained through standard Fourier analysis.
There are examples of the use of wavelet analysis in identification problems [5]. In the literature, it is noted that wavelets are used mainly to identify nonlinear systems with a certain structure, where unknown time-varying coefficients can be represented as a linear combination of basis wavelet functions [6,7]. It was stated that along with the usual ("direct") wavelet analysis, biorthogonal bursts [18], wavelet frames [19], or wavelet networks [20] can be used to identify the system.
There exist many different ways of applying wavelets for linear system identification. In Ref. [21], the identification of systems with a specific input/output structure was studied, in which the parameters are identified via spline-wavelets and their derivatives. In paper [22], an extended use of an orthonormal transformation least squares method is presented in order to reveal useful information from data.

9.
Conditions of the associative model stability in the aspect of the analysis of the spectrum of multi-scale wavelet expansion Let (1) be an associative search model. We represent the multi-scale wavelet decomposition for the current input vector x t ð Þ for a fixed level of detail L [7]: where L is the depth of the multi-scale expansion; φ L, k t ð Þ are scaling functions; ψ l, k t ð Þ are the wavelet functions that are obtained from the mother wavelets by tension/combustion and shift ψ l, k t ð Þ ¼ 2 l=2 ψ mother 2 l t À k À Á (as the mother wavelets, in the present case, we consider the Haar wavelets); l is the level of data detailing; c L, k are the scaling coefficients; and d l, k are the detailing coefficients. The coefficients are calculated by use of the Mallat algorithm [17].
Let us expand Eq. (1) over wavelets: Let us consider individually the detailing and approximating parts correspondingly: In [7], it was shown that a sufficient condition for the stability of plant (1) is as follows: for ∀k ¼ 1,´N meeting the inequalities is to be provided: , then the condition for the detailing coefficients: for the approximating coefficients: 2. if m < R, R ¼ maxr s s¼1,´S , then the condition for the detailing coefficients: for the approximating coefficients: 3. if m ¼ R 6 ¼ 1, R ¼ maxr s s¼1,´S , then the condition of the stability for the detailing coefficients: for the approximating coefficients: 10. Prediction of the transfer to chaos The chaotic system dynamics is characterized by considerable dependence on initial conditions, when as close as needed at the initial time instant trajectories during certain time are diverge by a finite distance. The main characteristics of the chaotic behavior are the speed of divergence of the trajectories defined by the senior Lyapunov exponent. This speed is determined by the Lyapunov exponent whose value represents the degree of instability or degree of sensitivity to the original data. For a linear system with a constant matrix, the senior Lyapunov exponent is χ 1 ¼ max Rλ i , where λ i are the eigenvalues of the system matrix. In other words, χ 1 j j coincides with the conventional degree of the system stability [23].

Prediction of manufacturing situations
Optimal routine enterprise resource planning and scheduling are currently based on detailed mathematical models of production processes [24]. Rescheduling requires model update subject to the current production information.
Present-day industrial sites feature interrelated multi-variable production processes and sophisticated material flow networks; scheduling at such sites poses nonlinear NP-hard optimization problems.
The state of manufacturing resources should be nevertheless assessed and predicted both to improve control agility and to foresee the situations where schedule execution becomes problematic or impossible. Such situations will be further referred to as incidents.
It may make sense to develop intelligent predictive models describing the overall current state of resources employed to execute all production operations of a specific production process.
The term "production resources" will hereafter mean the following: • input flows characterized by formal properties dependent on production specificity; and • production equipment. For the resources from the categories 2 and 3, the respective codes will have the same value in all positions (either 1 or 0). < С 3 > is the code of the time before the maintenance end. If a resource is available and operated, the respective code consists of 1s. < С 4 > is the code of the time before the equipment piece fails with the probability close to 1 (remaining life).
In the scheduling practice, this time is not less than the operating time. However, resource replacement just during the operation may be sometimes more costeffective. Moreover, the equipment piece may fail unexpectedly. For resource types from categories 1 and 3, < С 4 > has 1s in all positions.
< С 5 > is the time before the scheduled end of the operation. In real-life manufacturing situations, time may be wasted (with the need in schedule update) for the reasons neither stipulated in the production model nor caused by equipment failures.
Generally, it is hardly possible to formalize all such causes of schedule disruption. Therefore, their consolidation as the "remaining plan execution time" is a way to allow for these hidden factors in the production state model.
For the developed binary chain, a forecast may be obtained using data mining techniques. It makes sense to apply the methods named association rules search [25].
A forecast of a state described by a binary chain with an identifier can be obtained by revealing the most probable combination of two binary sets of values at a fixed time instant and at the next instant (a one-step forecast). A more distant prediction horizon is also possible.

Conclusion
Modern information technologies offer new possibilities for solving identification problems for control and decision-making systems. Data mining methods allow to solve problems that in the general case could not be solved by classical methods, or required heuristic approaches.
In this chapter, associative search techniques are presented. The techniques allow the identification of nonlinear systems, without the need to build a bunch of Wiener-Hammerstein models, etc. An alternative is to analyze the current state of the system using the knowledge base and training system. This approach allows the best use of a priori information on the object.
The algorithms may be successfully applied in the identification of nonlinear nonstationary processes. For these purposes, the multi-scale wavelet expansion is used. By investigating the dynamics of the coefficients of this expansion, one can predict the approach of process parameters to stability limits. Finally, sufficient conditions of stability are derived.
The high accuracy of forecasting by associative search technique makes it relevant for studying the dynamics of processes and predicting the transition to chaos. Also, it becomes possible to predict the contingencies of production processes. For this, the method of searching for associative rules is applied.