Ubiquitous Filtering for Nonlinear Problems

This chapter develops and extends the general theoretical results, previously published in the chapter “ Nonlinear filtering of weak chaotic signals ” , and presents detailed implementations of a computationally simple, robust (filtering fidelity almost insensitive to changes of the desired input signal properties) and rather precise approach for the filtering of weak signals of different physical nature (biological, seismic, voice, etc.) in presence of white Gaussian noise. The implementations rely on non-linear filtering techniques that in general can be considered as either one-moment or multi-moment, in the sense that they operate with a single sample (instantaneous fashion) or with various adjacent samples (non-instantaneous fashion). Chaotic modeling of the real input signals allows achieving an almost ubiquitous filtering approach with a computationally simple implementation. Application of the linearization strategies (for both one and two-moment filtering) provide, additionally, “ invariance ” of the processing algorithms to variations on the nature and statistics of the input signals.


Introduction
The signal filtering plays a fundamental role in the design of signal processing algorithms for many problems, that is, the first step is to remove (to filter) the background noise from the input (incoming) signal, and the second step is to perform the corresponding signal processing [1,2]. In this sense, the filtering approach based on the theory of dynamic systems [3][4][5] pops up immediately as one of the possible ways to address this issue. The dynamic filtering approach, such as classic linear Kalman filtering, has been applied for many problems long ago [6] and recently as well [7]. However, in the following the dynamic filtering is proposed adopting a different (nonlinear) angle [8,9], namely, using signals from nonlinear chaotic attractors as a model for the desired signals arriving at the filtering structure. The modeling of real phenomena using chaos has been used for more than 50 years, and there is a wide range of scientific and practical applications, such as seismology [10][11][12], statistical theory of communication [13,14], control, geophysics, biomedical telemetry [15,16] under water signal processing [17], and many other areas related to applied physics as well [18].
When the signals of interest are significantly weak (smaller than the background additive white Gaussian noise, AWGN), the problem is far from trivial. The following material will show the effectiveness of using a dynamic nonlinear strategy (introduced in [4,5]) for filtering signals, belonging to different types of real phenomena, which are modeled through components of chaotic attractors, all this in presence of strong AWGN, which concretely means a signal to noise ratio, SNR, < 1 (<0 dB). Note that even though weak signals are treated in the literature, [10,12], their processing is not addressed from the dynamic filtering point of view, and therefore (in our opinion) optimum fidelity solutions are still required.
In the following the term, effective filtering is used to indicate high precision which is evaluated considering values of the normalized mean square error, NMSE, <1% (in the following the normalization of the MSE will be considered in relation to the variance of the desired input signal). In the regular practice, there might be several precision measures corresponding to each specific filtering scenario. The use of the NMSE, or RMSE (root mean square error for nonstationary scenarios), as a measure of precision (fidelity) for filtering is well established from the statistical theory [19], and so, it can be considered as "universal" because its formal definition is the same irrespectively to specific filtering scenarios [1,2]. Also note that the case of small values of the NMSE might adequately correspond to concrete practical criteria of fidelity [3,19].
The proposed strategy is robust but not in the sense used in control theory, where the term "robust" means that the filter's structure is invariant to a priori unknown features of the input signals. The proposed chaotic filtering is considered as robust in the sense that its fixed structure and fidelity are almost invariant to signals from rather different filtering scenarios, which in the following correspond to seismic signals, heart beat signals, voice-like signals, and radio frequency interference (RFI) signals. Actually one can see that such invariance makes the filtering "ubiquitous." For conditions when the SNR ≤ 0 dB, the term "weak chaotic signals" will be used, keeping in mind that chaos modeling is applied for the abovementioned filtering scenarios. Chaos modeling might be immensely useful because almost all quasi-optimum filtering algorithms (which formally are nonlinear but are essentially quasi-linear) show rather high precision in the sense of low NMSE (≤1%) and exhibit low computational complexity among other benefits. These properties were broadly discussed in [8,9], where theoretical proofs can be found. The material presented here contains some experimental applications, for rather different scenarios that apply and extend the ideas presented in [9], and so, both have to be considered together. In [9] it was impossible for the authors to include the present material, not even partially, for the lack of space.

Extraction of some theoretical principles 2.1 Chaotic modeling and filtering
Let us assume that a chaos vector process x(t) can be generated from the following ordinary differential equation (ODE) of certain attractor [18]: where the initial condition is x(t 0 ) = x 0 and F(•) is a vector function, which (for any real application) is a priori unknown (together with the initial conditions) and needs to be somehow identified beforehand and moreover is usually time varying. It is worth mentioning that the identification problem for F(•) has attracted a lot of interest in the last decades, but being rather complex, it has not been accomplished so far, at least to the author's knowledge. The reason behind this is the identification of Eq. (1) is an identification of the "inertial vector nonlinear system" which does not have an unique solution and can be formulated only for a previously defined class of nonlinear systems; the complexity of this task has been addressed elsewhere [8,9,18, 20] and will not be considered in the following. As examples for F(•), which will be used in the following, there are the equations for the chaotic attractors corresponding to Rossler, Lorenz, and Chua types [8,9]: Continuous time: Discrete time: Rossler Chua where , m 0 ¼ À 1 7 , and m 1 ¼ 2 7 and T S is the sampling time.
In order to neglect the uncertainty effects of the initial conditions, at least for real data filtering, the approach used in [21], based on the fundamental statement of statistical dynamics for deterministic systems related to Kolmogorov and Max Born [18], together with the introduction of the so-called additive "process noise" in Eq. (1), can be applied. The latter transforms the ODE Eq. (1) into a stochastic differential equation (SDE) [20]. The transformation of Eq. (1) into SDE is relevant for the following material.
The equation for strange attractor Eq. (1) can be transformed into the equivalent stochastic form as a stochastic differential equation (SDE), which "generates" the n-dimensional Markov stochastic process [18,20]: where f(x(t)) is identical to F(x(t)) from Eq. (1). The influence of a weak external source of white noise is denoted by ξ(t); the noise intensities are given in a matrix form ε = [ε ij ] nxn .
The solutions proposed hereafter might be encountered from the structural analysis of the quasi-optimum filtering algorithms for weak chaos in presence of AWGN [8,9] and are synthesized in the following for convenience.
When one uses the SDE Eq. (5) as a model for chaos, the first strategy that comes up immediately to mind is the nonlinear filtering of chaotic signals which was rigorously developed in [8,9]. The kernel presented in [8] is the Stratonovich-Kushner equation (SKE) [4,5], which allows to describe the dynamic equation for the a posteriori probability density function for the chaos x(t). For the filtering with this generalized approach, some additional information from the received aggregate signal has to be incorporated on several sequential time instants, i.e., the information has to be considered in the block manner by aggregating data, in our case, from several time instants (multi-moment processing). Multi-moment algorithms are carried out through the generalization of the Stratonovich-Kushner Equations for the corresponding multi-moment data. In this way the resulting heuristics are not arbitrary; they are actually generalized heuristics from the standard one-moment SKE. All this gives hope that one can achieve rather good MSE for successively lower thresholds of the SNR using an algorithm with rather low complexity.
Note that the time evolution for the a posteriori PDF for x(t) is completely described by the SKE, but, unfortunately, it does not provide exact analytical solutions. There are very few exceptions: linear SDE Eq. (4) which yields the wellknown Kalman filtering algorithm [4,22] and some others [4,5]. Due to this the nonlinear filtering algorithms are practically always simplified, as quasi-optimum or even quasi-linear [4,5]. In practical applications quasi-linear filters are broadly applied [4,5].
One might wonder, what is the reason behind the application of chaotic modeling for weak signal filtering? The kernel for this lies in the "singular" properties of the solution of the SKE (see Eqs. (9) and (10) in reference [9]) for the dynamic ODE for chaos Eq. (1), when the solution of the SKE is almost "tuned" to the deterministic chaos from Eq. (1) without any dependence to the SNR [8]. Sure, this statement has to be interpreted as a qualitative explanation for the solution properties of the SKE, and it is almost true for the behavior of the quasi-linear algorithms as well [8,9].
The following is a list of several quasi-linear filtering algorithms for chaotic signals, based on so-called "Local Gaussian Approximation Approach for the a posteriori PDF" [4,9], which was found as rather opportunistic for real-time implementations: 1. Extended Kalman filter (EKF)

Conditionally optimum filter
All these algorithms certainly show different filtering precision for a fixed SNR and completely different computational complexity for a fixed filtering fidelity. So, in the selection of a concrete filtering algorithm for a concrete scenario, one has to consider as possible selecting criteria the MSE (NMSE and RMSE) together with the computational complexity.
Theoretically, the simplest way to get a comparative analysis of the abovementioned algorithms for the case of weak chaos filtering is in the framework of the so-called stochastic equivalent approximation of the observable component of the chaotic attractor, considered as an adequate model of the real process for filtering.

Stochastic differential equation of the first order (SDE-1)
The idea of the stochastically equivalent dynamic system (or SDE) was presented for the first time by Stratonovich and Kushner in [5] and extensively developed for many real scenarios [20]. Let a chaotic attractor with certain observable component in Eq. (1) together with its stochastic characterization be a model of the input data. One might consider a random process, generated by a stochastic differential equation of the first order (SDE-1), and name it as a stochastic equivalent as long as it has the same probability density function (PDF) and the same covariance function as the observable component. So, if one assumes that the stochastic equivalent (through the solution of the scalar SDE-1) is an adequate approach to substitute the model of the real phenomena (in the form of an observable component of the multidimensional chaotic attractor), then the actual model is [5,20]: where the local characteristics, here denoted as K 2 (x) and K 1 (x) for Eq. (6), are [5,20,23]. 1 If the input signal for filtering is: where n 0 (t) is AWGN with intensity N 0 , then applying the standard procedure of local Gaussian approximation approach for the a posteriori PDF (which for this particular case includes Taylor series representation for all nonlinearities and also includes the PDF exponent and is limited to only quadratic terms at the SKE [4,5]), one can get the following quasi-optimum filtering algorithms: wherex t ð Þ and P 11 (t) are a posteriori mean (estimated value) and variance (error) of filtering, respectively. Applying then the well-known standard EKF synthesis procedure [4] for Eqs. (6) and (7), one can also easily obtain the algorithm Eq. (8). It is worth mentioning that the difference between the above-listed algorithms for the local Gaussian approximation depends only on the way the localization of the instantaneous estimation of x(t) is chosen (as it will be commented in the following).
For the case of high filtering accuracy, all other algorithms that apply local Gaussian approximation [8] can be successfully approximated by the EKF, because the true value of the filtered process and the reference point for application of the Gaussian approximation are obviously very "close." The algorithm Eq. (8) is related to the so-called one-moment (1MM) regime which is classical for the EKF. In the 1MM regime during each processing cycle, one sample from one instant of time is processed (instantaneous processing). The 2MM regime was exhaustively presented at [8,9] as a special case of multi-moment filtering and could be easily reviewed by the interested reader. In the 2MM regime during each cycle, two samples from two instants of time are processed (non-instantaneous processing). The main parameter for 2MM algorithm is "ρ", which is the correlation coefficient between two adjacent samples of the processing algorithm.
Let us stress here that the concept of stochastic equivalence of the observable component together with the SDE-1 was used only to make our statements in a simple and "friendly" way and to provide computationally simple algorithms. For the general case of the vector SDE (vector ODE) Eq. (1), when the stochastic equivalence in the above presented form cannot be applied, because the high-order statistics (HOS) play a significant role [4], all of the above qualitative comments are true as well; the term ∂Sx;t ð Þ ∂x at Eq. (8) has to be substituted by the Jacobian matrix, which is usually considered as a "linearization coefficient" at the point x ¼x [4]. It follows that the synthesis approach for the filtering algorithms (in the framework of the local Gaussian approximations for the a posteriori PDF) can be considered as an instantaneous (miscellaneous) linearization approach.

Computational complexity
The next issue, which has to be analyzed here, is the computational complexity of the quasi-linear algorithms. This subject is crucial for the applications addressed in the next section.
For the general case, when EKF, UKF, GHF, and QKF algorithms are applied considering Chua, Lorenz, and Rössler attractor signals as desired input signals, the computational complexity for the processing is presented in the following table, where all operations, additions (subtractions), multiplications (divisions), Cholesky decompositions, Jacobian calculation (linearization), and nonlinear propagation are included.
From Table 1, it can be easily seen that UKF involves the bigger complexity, while EKF seems to be the simplest algorithm. However, the linearization process performed by the Jacobian calculation involves partial derivatives. For that reason, and depending on the mathematical model of the attractor, the EKF may not always be the fastest algorithm. It follows from Table 1 that the EKF algorithm provides the simplest implementation. Moreover, as it will be shown in the following section, the EKF fidelity for weak signal detection is acceptable in all practical cases. Together with the simple theoretical analysis, the EKF can be considered as an opportunistic approach for applications (see the next section as well).
But one has to notice that for the robust (ubiquitous) solution and applications (see above) the EKF has to be additionally modified by the following heuristics. One can assume, as an alternative to the quasi-linear EKF algorithms, where the linearization is instantaneously updated, that the robust solution for the EKF applications might be found if a "fixed linearization" (with predefined linearization matrix) is used instead of an "instantaneous" one. It actually means that instead of the EKF, the standard Kalman filtering (SKF) approach is applied [3][4][5][6][7], and obviously one has to admit some "losses" in the filtering accuracy for this case. has to be taken into account that the local Gaussian approximation of the a posteriori PDF assumes that actually all the model components are almost linear and therefore the accuracy losses might be rather moderate. These filtering assumptions seem to be valid for several practical problems such as interference mitigation, seismology, biomedical telemetry, etc. For weak chaotic signals, in this condition it is possible to consider the EKF with "linear" Jacobian matrix or even SKF instead of the EKF, which additionally simplifies the problem. To obtain the linearization procedure, i.e., operate with a linear matrix A(t) at Eq. (1), that comes from the linear approximation of the attractor's model for chaos, one can use the broadly applied "system identification toolbox" (SIT), [24,25], which actually provides a solution for A(t) with the spectral properties of the real data. It is worth mentioning that the way how the SIT identifies the linear matrix A(t) follows from four "canonical representations" for the linear systems stated at [3].
Once more, it is only an approximation of an instantaneous linearization procedure required by the quasi-optimum filtering using local Gaussian approximation, but it gives a hope that for a high filtering precision NMSE of about 1% or less (see comments above) the filtering precision losses (by use of the mentioned identification approach) might be moderate and rather acceptable for practice (see also results of the experimental setup). As a final comment, let us note that the "linearization ideology," as an approach, is rather common (see the references already cited above) for quasi-optimum filtering algorithms with varying input data.

Results and discusion
The aim of this section is twofold and it will be considered separately. On one side the aim is to show that the stochastic equivalent approach (SDE-1) is efficient and has good accuracy for filtering purposes, taking the sufficiently nonlinear Chua attractor Eq. (4) as the most attractive example. On the other side, the aim is to illustrate the efficiency of the proposed methodology when it is applied to several real-world signals, of absolutely different physical nature, namely, seismic signals, electrocardiogram (ECG) signals, voice-like signals, and RFI signals. These experimental settings have been associated to nonlinear chaotic signals [10-12, 15, 26, 27], and very often, the scenario of such kind of signals includes a strong AWGN background, and so the desired signals are rather weak.
An experimental real-time test bed was developed, containing block generators for the AWGN, the EKF estimation (with their SDE-1 equivalents), the SKF estimation (with the linearization matrix coefficients evaluated from the SIT block), and the real input signals. The chaos EKF segment is a discrete implementation of the EKF which internally contains the discrete version of the equations for the strange attractors of Rossler, Lorenz, and Chua. It also performs a linearization by calculating the Jacobian in each processing cycle. For each signal setting, one of the attractor components (x, y or z) has to be adapted as a possible signal model.
For this purpose, first, the sampling time of the chaotic discrete equations is varied so as to achieve a "match" between the temporal variations of the selected attractor component and the desired signal (make the time scales as close as possible). Second, the desired signal is normalized in relation to the mean and variance of the attractor component. The material of [8,9] shows that the x-component of the three strange attractors might be suitable for modeling the signals from the experimental settings.
The SKF segment is a discrete implementation of the standard Kalman filter which in this case is tridimensional in order to make a fair comparison with the tridimensional EKF. In this segment the linearization matrix is obtained from the experimental signal (seismic, EGG, voice-like) using MATLAB's SIT. The matrix evaluation is made offline, calling the MATLAB's command "ident." Once the signal is loaded in the workspace, the identification is made selecting the option "state space models" [3,22] for the tridimensional case. The program offers three estimation options, and at the end it yields the confidence percentage for the selected estimation option. It was found experimentally that the PEM option (prediction error method) gives the best confidence for the estimated matrix. Note that for a fixed scenario from the real life, the matrix should be evaluated for each incoming signal offline before the signal processing is done (to obtain information both a priori and (or) from experimental data) as it is illustrated in Figure 1 by introducing a "virtual" delay "ρ," which means the separation in time of the matrix identification and filtering procedure; as the signals are stationary, the identification made for a large vector signal will suffice for any short vector signal. This experimental strategy is shown at Figure 1.
In the following, the experimental results apply the 1MM and 2MM filtering strategies. The 2MM strategy requires for its processing the correlation between two samples which in our case was set to ρ = 0.85. The 2MM shows a bit better NMSE values as it is intuitively expected. For the scenario of seismic signals, it was not possible to calculate the linearization matrix from the SIT, as the signals are not tractable (limited signal durations for the spectral analysis). For all filtering scenarios, a weak process noise value (Q) has been introduced (EKF and SKF) in order to exclude the uncertainty of the initial conditions and is indicated in the corresponding tables.

Experiment one
This experiment shows the efficiency of the stochastic equivalent SDE-1 for filtering. For illustration purposes, the intended signal here is the x-component of the chaotic attractor from Chua. It is worth noticing that upon taking x(t) in Eq. (4) as the observable component, the correspondent PDF is bimodal due to the function U(Á) [20]. The statistically equivalent SDE-1 for the case of Chua's x-component can be obtained straightforward from Eq. (8) [8,9]: where p 1 = 3.5 and q 1 = 1.5.  Figure 2 shows the result for the NMSE. The dotted line corresponds to the SDE-1 filtering according to Eq. (9) and the continuous line to the 1MM 3D EKF using Eq. (4) in (A3), and in both cases the input signal is the Chua's x-component. The reason for doing this is that when one filters the input signal (Chua's x-component) using the Chua's Eq. (4), it is almost the best one can do (quasi-optimum solution), and that is why it gives the most adequate benchmark. From Figure 2 it is possible to see that there are some very moderate losses due to the use of the SDE-1 methodology, as it is logically expected; however the NMSE for the SDE-1 does not differ too much from the 1MM 3D EKF, and so the SDE-1 approach offers almost the best accuracy.
The following examples are devoted to the filtering of real data, which obviously differ from the theoretical chaos. The NMSE will increase because there is a mismatch between the input signal and the "chaotic signal component" from the filtering algorithm. This "mismatch" as it was mentioned above can be "compensated" by introducing a process noise with intensity Q, in the filtering structure (A3).

Experiment two: fetal electrocardiogram signals (FECG)
The experimental data were obtained from a database ATM at PhysioNet [28]. The signal for this experiment corresponds to a baby's heart in fetal stage at the 36th week of the pregnancy cycle. For an SNR = À3 dB, Figure 3 shows the original signal and the filtered signal using 1MM EKF with Rossler x-component as a model. Full results for the NMSE are shown in Table 2.

Experiment three: voice sounds
For this experiment sustained vowel sounds were used. These kinds of signals are used for voice synthesis procedures [26]. Figure 4 shows the snapshot (continuous line) of the vowel sound "O" (recorded in a sustained fashion for 5 s at 22050 Hz) and also the filtered signal (broken line) using 2MM SKF with its matrix evaluated with the SIT. Almost identical results as in the previous experiments are shown in Table 3. For this experiment none of the components from the Lorenz attractor were suitable as a model for the voice-like signals.

Experiment four: seismic signals
For this experiment a MATLAB simulator based on the seismic models of [29] was used. For an SNR = À3 dB, Figure 5 shows the seismic signal and its filtered version using 2MM EKF with Rossler x-component as a model. Full results are presented in Table 4. For the seismic signals, it was not possible to obtain an adequate linearization matrix, and so the SKF was not applied for this scenario.

Experiment five: radio frequency interference (RFI) signals
This experiment considers the RFI generated by computing equipment [27,30] that affects the transmission of the desired information signals. For an SNR = À3 dB,   Figure 6 shows the RFI signal and its filtered version using 1MM SKF with its matrix evaluated with the SIT. Full results are presented in Table 5.
The simulation results obtained from the linearization approach, applying SIT, are presented at Tables 2, 3, and 5. Comparative analysis of the data in the tables allows the following conclusions. All the filtering approaches presented above are rather effective, as all of them show low values of NMSE. One can notice that for the worst-case scenario (À10 dB), signals are visually impossible to be distinguished from noise; however, the NMSE is around 1% for both strategies (SKF and EKF) with either 1MM or 2MM. The tables also show the average time (in seconds) required to process 5000 samples applied for statistical processing, for each filtering scenario. One has to notice that the 2MM approach consumes more time than the 1MM algorithm but (roughly speaking) no more than the double of the time required for the 1MM processing, as an upper bound. Second, the use of SKF is faster (almost 3 times) because there is no time consumed for the linearization process. The processing time together with the filtering complexity and fidelity might be considered as "criteria" while choosing the appropriate filtering algorithm for concrete implementations.
The SKF with the linearization approach yielded the best results; this once more confirms what was pointed above that for the processing of the quasi-linear algorithms of filtering, the influence of the spectral properties of the input data for these algorithms prevails over the influence of the "non-Gaussian" statistics of the data. The values of the NMSE, obtained by simulations, can be regarded as below or equal to the requirements for many cases of the practice, at least from the study of the corresponding references [1,2] and the author's knowledge as well. Moreover, one can see that the NMSE values are rather close for all filtering scenarios, and for the   practice, it is not so important what particular model of chaotic attractor or linearization matrix from SIT is applied! Why it happened? This spectacular issue was briefly mentioned above, but in the following the feasible explanation is presented once more. It is worth to stress that all the chaotic attractors mentioned and applied for modeling of the real data are "generating" chaos as quasi-deterministic stochastic process Eqs. (1) and (5). Therefore all quasi-optimum filtering algorithms listed before (including EKF and its modifications) that apply chaotic modeling are working in almost "singular" regime, i.e., the shape of the a posteriori PDF is "concentrated" along the a priori PDF of the desired signal "irrespective" to the value of the SNR [8,9]. That is why it is possible to obtain so low values of NMSE for weak signals (SNR <0 dB and down to À10 dB). Thus, for high filtering fidelity, the linear term of the Taylor expansion for the quasi-linear algorithm [4,5,22] significantly prevails over the terms related to the "nonlinearities" (Jacobian matrix, etc.), i.e., the linear approximation is "enough." So the influence of the nonlinear character of the ODE of the attractors on the value of the NMSE will be relatively small, which follows from the experimental data in tables. Sure, the explanation above is "qualitative" but well corresponding to the theoretic development of the quasi-optimum algorithms [4,22].

Conclusions
In this material a rather simple and robust structure for weak signal filtering is proposed, based on the EKF algorithm and its 2MM modification. In addition, the linearized filtering approach is considered as well.
Based on this it is possible to suggest, for chaotic modeling of input of non-Gaussian data, a "high degree of freedom" for the filtering block design depending on certain fidelity requirements and computational complexity.
Taking advantage of the quasi-linear character of the effective real-time filtering algorithms for stochastic non-Gaussian real signals, an approach using the wellknown "system identification toolbox" was proposed as well and might be selected as a reasonable compromise between computational complexity and filtering accuracy.
The experimental results show that the filtering accuracy losses for the linearization case and even for the application of the simplified SDE-1 equivalent approach are very moderate and almost negligible for practical implementations. This issue might significantly simplify the theoretical study applied for comparative selection of the filtering algorithms.
For the interested reader, it is highly recommended to consider together the material of the previous chapter "nonlinear filtering of weak chaotic signals" and the material presented above as it gives a complete "panorama" of the recommended algorithms and their real-life implementations.
All the results presented in the plots and in the tables clearly show that the implementation of the proposed strategy for solving filtering problems might be recommended for the practical scenarios.