Digital Filters for Maintenance ManagementBy Fausto Pedro García Márquez and Diego José Pedregal Tercero

Faults in mechanisms must be detected quickly and reliably in order to avoid important losses. Detection systems should be developed to minimize maintenance costs and are generally based on consistent models, but as simple as possible. Also, the models for detecting faults must adapt to external and internal conditions to the mechanism. The present chapter deals with three particular maintenance algorithms for turnouts in railway infrastructure by means of discrete filters that comply with these general objectives. All of them have the virtue of being developed within a well-known and common framework, namely the State Space with the help of the Kalman Filter (KF) and/or complementary Fixed Interval Smoother (FIS) algorithms. The algorithms are tested on real applications and thorough results are shown. 2. Introduction Faults in any important mechanisms must be detected quickly and reliably if the information is to be useful. Generally such mechanisms may be modeled as discrete dynamic systems, where data must be processed on line. When feasible, the detection system should use a model as simple as possible for detecting faults quickly by analyzing data in real time. The models for detecting faults must adapt to external and internal conditions to the mechanism, since both of them may affect the system as a whole. The present chapter deals with maintenance systems for turnouts in railway infrastructure by means of discrete filters. Turnouts are assembled from switches and a crossing where the moving parts are often described as the â€œpointsâ€ move by the point mechanism. The standard railway point mechanism is a complex electro-mechanical device with many potential failure modes. Several approaches for maintenance of such devices are shown in this chapter and briefly described in this introduction. All of them have the virtue of being developed within a wellknown common framework, namely the State Space (SS) with the help of the Kalman Filter (KF) and/or complementary Fixed Interval Smoother (FIS) algorithms, exposed in general terms in the following section. 2 Digital Filters Based on this common framework, the following subsections in this introduction show the particular applications shown in later sections of the chapter. 2.1. Filtering with Integrated Random Walks (IRW) One possible way to analyze faults on line is to work with a reference dynamic system for their analysis. If the absolute value of the difference between the actual data and the reference data (i.e. the profile without any fault) is analyzed, the majority of faults may be detected by means of a simplified univariate dynamic system, like the one explored in [9]. The dynamic system and the use of the SS framework and the KF in this study allow increasing the reliability of the model presented that is the basic input to a rule-based decision mechanism. When applied to the linear discrete data filtering problem, the KF is a powerful algorithm, because it supports estimations of past, present and, most importantly, future states. It can therefore be used in predictive maintenance applications where data collected from sensors is affected by measurement and transmission line noise [12]. The previous approach may be exploited by setting up a bivariate model composed of two time series, i.e. the reference curve on one hand and each one of the empirical curves obtained on line on the other hand. More specifically (see section 4.2 below) a tentative model consists of a bivariate trend plus noise structure. The correlation between either trends or signals free from noise is considered as an indication of similarity between the curves and therefore the inexistence of failures. As long as the new incoming data is free from fault, the correlation parameter is close to one, but as a failure starts to develop this parameter tends to differ from one. The cut-off value of the correlation coefficient relevant to discern â€˜goodâ€™ and â€˜badâ€™ curves is selected on practical grounds based on past experience with this kind of data, but refined formal statistical criteria may be used as well [19]. Even forecasts of the curve that is being studied may be produced at any point in time, based on the current parameter values and the future data of the reference curve [14]. Therefore the fault may be detected ahead of time. 2.2. Random Walks and smoothing Similar measurement data were collected from sensors mounted on a UK type M63 point machine at the Carillion Rail (formerly GTRM) Training Centre in Stafford (UK). It is difficult to compare the measurements taken during induced failure conditions with those from the fault-free condition because of noise in the measurements. The measurement data needed to be filtered in order to reduce the noise before comparisons may be made. Filtering using a SS model and the KF was an option (like in [9], [19] and [20]). Assuming the noisy data is a signal plus noise model, the KF reduces the power of the 100 and 200 Hz interfering signals. Rather than augmenting the SS models to express the additional knowledge of the interfering signals, a much simpler smoothing seems more convenient because of the relationship between the sample rate and the frequencies of the interfering signals, and provides excellent results for the data collected during this series of experiments [10]. 2.3. Advance Dynamic Harmonic Regression (DHR) A different case study was based on data collected from point mechanisms at Abbotswood Junction (UK). Three electro-mechanical and four electro-hydraulic point machines were Digital Filters for Maintenance Management 3 monitored by a RCM system. Processed information was sent remotely from the trackside data-collection units to a personal computer located in a local relay room. A fault is detected by comparing the forecasts of the model, considered as the expected signal in the case of no faults, with the actual data coming from the point mechanism when a movement is in progress. If the error is too large, measured by its standard deviation, a fault alarm is issued. The limit at which an error is considered too large is a design parameter that is fixed by experimentation. The system adapts to the changes experienced by the point machine. There are internal alterations (like friction, wear, etc.) and external as well (like environmental conditions, impacts, obstacles, etc.). The adaptability of the system is accomplished by continuous estimation of the models as new information becomes available and by discarding the oldest information. Models are always estimated on faultfree data [13]. The key point in this application is that the expected shape is computed as the forecast of a combination of two models that work interactively on historical data coming from signals free from any fault. The first of the models forecasts the time span a movement would take in case of absence of faults (an appropriate model used in this case was of the VARMA class or a local level plus noise but set up in continuous time). The second model is run to forecast the signal itself (due to the nature of the data a pertinent class is a Dynamic Harmonic Regression, DHR, similar to a Fourier analysis, but with advanced features included to incorporate a time varying period observed in the data). The outline of the chapter is as follows. Section 3 reports a brief explanation of the general framework on which all the models in this chapter are set up, namely the State Space systems. Section 4 shows the first of the applications, i.e. in the point mechanisms. Finally section 5 shows how a fault detection algorithm may be implemented on seven point machines at Abbotswood junction (UK). 3. State Space systems The general framework on which all models in this chapter are cast, is the so called State Space systems, that have experienced a remarkable attention during the last decades, as the extended literature about it reveals [3], [7], [13], [15], [16], [17], [21], [24], [26] and [27]. A stochastic discrete-time State Space system (SS) is a model composed of two sets of equations, the Observation Equations, and State Equations. The former relates the output to the states of the system, while the latter reflects the dynamic behavior of the system by relating the current value of the states to their past values. There are a number of different formulations of these equations, but one fairly general representation is given by equations (1) (see [3] and [21]). In general, much simpler models are sufficient, as later case studies show. State Equations : x t ï€« 1 ï€½ ï† t x t ï€« Et w t (i) (ii) Observation Equations : z t ï€½ Ht xt + Ct vt (1) 4 Digital Filters In (1) zt is the m dimensional vector of observed variables for t ï€½ 1,2,ï‹, N ; xt is an n dimensional stochastic state vector; wt is an r dimensional vector of (to be Gaussian) system disturbances, i.e. zero mean white noise inputs with a covariance matrix Q t ; and v t is a s dimensional vector of zero mean white noise variables (measurement noise: again assumed to be Gaussian) with a covariance matrix R t . In general, the vector v t is assumed the initial state vector x0 . ï†t , Et , Ht , Ct , Qt , and R t are, respectively, the n ï‚´ n, n ï‚´ r, m ï‚´ n, to be independent of w t (not necessarily), and these two noise vectors are independent of and m ï‚´ s, r ï‚´ r and s ï‚´ s system matrices, some elements of which are known and others that need to be estimated in some way. Given the general SS form (1), the estimation problem consists of finding the first and second order moments (mean and covariance) of the state vector, conditional on all the data in a sample. Provided that all disturbances in the model are Gaussian, a Kalman Filter (KF) produces the optimal estimates of such moments in the sense of minimizing the Mean Squared Errors (MSE). An algorithm that is used in parallel with the KF and is not so wellknown in certain contexts is the Fixed Interval Smoothing (FIS) algorithm, which allows for an operation similar to that of the KF but with a different set of information. The KF used in this chapter is: Ë† Ft ï€½ Ct R t CT ï€« H t Pt |t ï€1HT t t Kt ï› ï€½ ï›Î¦ ï› t ï€«1 t |t ï€1 Ë† P HT Ftï€1 t ï ï ï ï€«EQ E T t t T t Ë† Ë† xt ï€«1 t ï€½ Î¦t ï€«1 ï€ K t HT xt |t ï€1 ï€« K t z t t Ë† Ë† Ë† Pt ï€«1|t ï€½ Î¦t ï€«1Pt |t ï€1Î¦Tï€«1 ï€ K t Î¦t ï€«1Pt |t ï€1HT t t The backward FIS recursions are: ï ï› Ë† Ë† Ë† xt N ï€½ xt |t ï€1 ï€« Pt |t ï€1st ï€1 Ë† st ï€1 ï€½ HT Ftï€1 ï€¨z t ï€ H t xt |t ï€1 ï€© ï€« Î¦tT st t St ï€1 ï€½ HT Ftï€1H t ï€« Î¦tT St Î¦t t Ë† Î¦t ï€½ Î¦t ï€ Î¦t Pt |t ï€1HT Ftï€1H t t Ë† Ë† Ë† Ë† Pt | N ï€½ Pt |t ï€1 ï€ Pt |t ï€1St ï€1Pt |t ï€1 with s N ï€½ 0 with S N ï€½ 0 This general SS formulation is capable of handling many nonstationary linear dynamical systems; also it can model nonlinear systems but conditionally Gaussian; general heteroscedastic systems; time-varying systems; etc. In addition, many kinds of extensions of model have been proposed in the literature, such as linear approximations of functionally nonlinear dynamic systems; non-Gaussian disturbances; etc. Missing data is not a problem given the recursive nature of the algorithms, because such data are replaced by their Digital Filters for Maintenance Management 5 expectations based on the model and the data. Then, if such data is at the end of the sample the KF produces forecasts of the signal, while if they are in the middle or at the beginning both algorithms produce interpolation or forecasts from the beginning of the series backwards. The application of the recursive KF/FIS algorithms requires values for all the system matrices ï† t , Et , H t , Ct , Q t , and R t . Most of the elements of these matrices must be estimated by efficient methods. The Maximum Likelihood (ML) method in the time domain by means of â€˜prediction error decompositionâ€™ ([24] and [15]) is the most common because of its generality and good theoretical properties. 4. Filtering with Integrated Random Walks (IRW) 4.1. Data Approximately 55 % of railway infrastructure component failures on high speed lines are due to signalling equipment and turnouts. â€œSignalling equipmentâ€ covers signals, track circuits, interlockings, automatic train protection (ATP) or LZB (track loop based ATP), and the traffic control centre. From another point of view, the annual cost of maintaining points is rather high compared to other infrastructure elements, about 3.4 million UKP (United Kingdom Pound) per year for about 1000 km of railway. TC-TCR trade circuits, for example, cost 2.1 million UKP per year for the same area. Of the points expenditure, 1.2 million UKP is for clamp lock type (hydraulic) turnout and 1.4 UPK million for electrically operated turnouts (data provided by a British asset manager). Turnouts can also be used to implement flank protection for a train route allocated to another train. This is achieved by positioning the blades of the turnout in such a way that a train driving through the turnout is not directed into a track segment belonging to the route of another train. Most standard point machines (see Fig. 1) contain a switch actuating and a locking mechanism which includes a hand-throw lever and a selector lever to allow operation by power or hand. The mechanism is normally divided into three major subsystems: (i) the motor unit which may includes a contactor control arrangement and a terminal area; (ii) a gearbox comprising spur-gears and a worm reduction unit with overload clutch; and (iii) the dual control mechanism as well as a controller subsystem with motor cut-off and detection contacts. Generally, there are also mechanical linkages for the detection and locking of the point. The standard railway point is therefore a complex electro-mechanical device with many potential failure modes. The circuit controller includes detection switches and a pair of snap-action switches to stop the machine at the end of its stroke and to brake the motor electrically so that the mechanism is not subject to impacts. The detection switches have high pressure wiping contacts made of silver/cadmium oxide or gold and they are operated by both the lockbox and the detection rod. The detection switches have additional contacts to allow mid-stroke short circuiting of the detection relays to avoid wrong indications in the signal box or electronic interlocking. 6 Digital Filters Belt Hand Crank Slide Chairs Drive bar Motor Crank Close Gap Circuit Controller Stretchers Lock Blade Detector Blade Left Detector Blade Right Stock Rail Left and Right Fig. 1. Point Mechanism 476 experiments (point moves or attempted point moves) were carried out while collecting time, force and operating current data. The data from the point mechanism is initially classified in terms of direction of movement, i.e., either reverse to normal direction or normal to reverse direction. For both directions, faults have been detected with â€œcurrent (A) vs. time (s)â€ curves and â€œforce (N) vs. (s)â€ curves (see some examples in Fig. 2(a) and 2(b)). It was observed that â€œcurrent (A) vs. time (s)â€ curves are not the best choice for detecting faults in point mechanisms. The final classification of faults employs only the magnitude and the moment when they change with respect to the reference curves. (a) Normal to reverse direction 1200 Force(N) 900 600 300 0 0 20 40 Time(s) 60 80 400 200 0 -200 0 20 40 Time(s) 60 80 (b) Reverse to normal direction Fig. 2. Operating force curves for a point mechanism For detecting faults in point mechanisms, a model was employed that can determine the dynamic character of the system. For instance, the reference signals or curves for detecting faults depend on the environmental conditions (temperature, humidity, etc.), and on the in service time of the system, because the friction forces are larger at the beginning than once the system has worn in. The available data consists of 79 curves for the reverse to normal direction, including 4 curves â€œas commissionedâ€, and 72 curves for the normal to reverse Digital Filters for Maintenance Management 7 direction, with 3 curves â€œas commissionedâ€ (some of them may be seen in Fig. 2). A reference dynamic system has to be applied to all of these variables. The data collected refers to force (N) versus time (s). The first conclusion after studying these curves is that we can detect only a few faults by analyzing the signal directly but, if we analyze the differences between the current data xj and the reference data xi in the form of absolute values dj (1), we can detect the majority of faults as they develop. d t j ï€½ xtj ï€ xti , ï€¢t (1) Some of these curves are shown in Fig. 3(a) and 3(b) for reverse to normal direction and normal to reverse direction respectively. The â€˜xâ€™ axis is time [s] and the â€˜yâ€™ axis is the difference between the dynamic mean geometric and the current curve as an absolute value [N]. (a) Normal to reverse direction 100 Abs. Diff. Force(N) 80 60 40 20 0 0 20 40 Time(s) 60 80 (b) Reverse to normal direction 100 80 60 40 20 0 0 20 40 Time(s) 60 80 Fig. 3. Difference between the reference signal for the point and the newly acquired data in absolute values 4.2. The model One feasible model written in SS form (1) for this application is of the type local mean plus noise for two signals simultaneously, where the local means are modeled by the dynamics implied by the state equations, i.e. ïƒ¦I Iïƒ¶ ïƒ¦ 0ïƒ¶ïƒ¦ w1t ïƒ¶ xt ï€«1 ï€½ ïƒ§ ïƒ§ 0 I ïƒ·xt ï€« ïƒ§ I ïƒ·ïƒ§ w ïƒ· ïƒ· ïƒ§ ïƒ·ïƒ§ ïƒ· ïƒ¨ ïƒ¸ ïƒ¨ ïƒ¸ïƒ¨ 2t ïƒ¸ zt ï€½ signalï€« noiseï€½ ï€¨I 0ï€©xt ï€« vt ïƒ¦ ï³2 w1 Qï€½ïƒ§ ïƒ§ï² ï³ 2 ï³ 2 w1 w2 ïƒ¨ 22 ï² ï³w ï³w ïƒ¶ ïƒ· 2 ï³w 2 ïƒ¦ ï³ v2 ï³ v1v2 ïƒ¶ 2 ïƒ· ; Rï€½ïƒ§ 1 ïƒ§ï³ v v ï³ v2 ïƒ· ïƒ· 2ïƒ¸ ïƒ¨ 12 ïƒ¸ (2) In model (2) all the system matrices are time invariant: I is a two dimensional identity matrix; 0 is a two by two matrix of zeros; ï³ ï‚·2 are the variances of the noise signals or disturbances and either in the state or observation equations; ï³ ï‚· ï‚· is the covariance between two disturbances; ï² is the correlation coefficient between the two noise signals in the state equation. 8 Digital Filters By comparing systems (2) and (1) it is easy to see the system matrices values in this particular case, i.e. ïƒ¦w ïƒ¶ ïƒ¦ I Iïƒ¶ ïƒ¦ 0ïƒ¶ ïƒ§ïƒ· ïƒ§ ïƒ·; Et ï€½ ïƒ§ ïƒ·; wt ï€½ ïƒ§ 1t ïƒ·; Ht ï€½ ï€¨I 0ï€©; Ct ï€½ 1 ïƒ§ïƒ· Î¦t ï€½ ïƒ§ 0 Iïƒ· Iïƒ¸ ïƒ¨ ïƒ¸ ïƒ¨ ïƒ¨ w2t ïƒ¸ The unknown hyper-parameters to be estimated by ML in this model are Q and R . It should be noted that Q is parameterized in the way shown above in order to force the appearance of the correlation coefficient between the state disturbances explicitly. The following points must be taken into account when interpreting model (2): ï‚· The observation equation implies that the series are composed of a local mean level or trend with added noise. ï‚· The first two states in the model are the local mean level (or trends) of each series. In other words, they are the signals free from noise; ï‚· Given the structure of the model, it is easy to show that the third and fourth states are the gradients of the trends. The slopes are modelled here as stochastic and therefore changing as a function of time according to the variance of the state disturbances; ï‚· If the correlation coefficient is 1, both trends are proportional to each other, meaning that the dynamic behaviour of both trends is the same. This is an important point that the authors wanted to test later; ï‚· must be positive; ï€ 1 ï‚£ ï² ï‚£ 1 ; and R must be positive definite. Since all these are parameters to be estimated, it may be advantageous constrained search algorithms; The asymptotic distribution of the ML estimates are Gaussian if all the disturbances in model (2) are Gaussian. Then, since ï² is estimated explicitly, By definition, ï³ ï‚·2 ï‚· the confidence intervals and statistical hypothesis tests for this parameter may be easily constructed. In fact, the parameter ï² is proposed here as a way to discriminate between â€œfaultyâ€ and â€œas commissionedâ€ curves (see below), where the â€œfaultyâ€ curve is caused by wear as described above. Strictly speaking, the two curves are behaving in the same way when ï² ï€½ 1 , but previous experience with point mechanisms of a similar kind must be incorporated here, because it is, difficult, in general to find those values in practical situations. Then, a cut-off value of ï² must be considered in order to discriminate between â€˜goodâ€™ and â€˜badâ€™ curves. The modeling strategy outlined above may be applied to both off-line and on-line situations. In this latter case it would be possible to get an estimated time series for ï² (with confidence bands) and the time of wear assessment detected on-line very quickly when parameters start to move away from their initial values. Even forecasts of the current curve may be produced at any point in time, based on the current parameter values and the future data of the reference curve. Very fast algorithms have been developed for ML estimation of SS systems in which all the unknowns are some elements of the covariance matrices Q and R , such as in model (4). The problem of initializing the KF and hence ML needs to be resolved. One of the most important tools is the use of the exact likelihood function [5] and[6]. Digital Filters for Maintenance Management 9 4.3. Experimental Results The model described in the previous subsection was employed in an off-line mode with data collected during laboratory tests (see Fig. 2). The model output (shown in Fig. 4, based in signals from Fig. 3) was then used to classify the curves as either â€œas commissionedâ€ or â€œfaultyâ€. This step may be achieved several ways. The approach compares ï² with the individual points in time with a relating high threshold value. A value of ï² below the threshold is an indication of a lack of correlation with the current reference curve and therefore is classified as â€œfaultyâ€i. A more refined and somewhat more formal criterion is based on such single point estimate and its 95% confidence band. In this case, a curve is considered to be â€œas commissionedâ€ if the upper limit of the confidence band is close to target value or equal to 1. For point operation in both directions, with a value of ï² ï€½ 0.99 the totality of faulty curves could be detected. In the NR direction, since the highest value of ï² for faulty curves was 0.92 and the 95% confidence interval uses (0.77, 0.98). In the RN direction, the highest value of ï² for faulty curves was 0.97 and the 95% confidence interval was (0.93, 0.99). The results achieved with the same reference curve, but different test results are shown in Fig. 4, one â€œas commissionedâ€ curve (top panel) and one faulty curve (bottom). Difference in Abs. Value (N) 80 60 40 20 0 0 1 2 3 4 5 6 7 8 Differences in Abs. Value (N) 80 60 40 20 0 0 1 2 3 4 Time (s ) 5 6 7 8 Fig. 4. Two examples of forecasts based on model (4) at different forecast origins. One â€œas commissionedâ€ curve (top) and one â€œfaultyâ€ curve (bottom). Forecast origins are marked by the vertical line. In both cases the reference curve was available for the whole time span (based on previous curves taken from the system) and the information to test each curves was set up to the Alternatively, the estimated correlation coefficient may be tuned so that the number of curves correctly classified is maximised. i 10 Digital Filters forecast origin (vertical line). The objective of obtaining a forecast for the behavior of the system based on such incomplete information was thus using model (4). In an on-line situation, the parameters and the forecasts are updated each time a new observation is available. Fig. 5 shows the recursive estimate of ï² with its 95% confidence intervals (assuming gaussian noises) for an â€œas commissionedâ€ curve (top) and a â€œfaultyâ€ one (bottom). In both cases the confidence on the estimate tends to increase as more information becomes available. 1 0.8 0.6 Rho 0.4 0.2 0 1 1 0.8 0.6 2 3 4 5 6 7 8 Rho 0.4 0.2 0 1 2 3 4 5 6 7 8 Fig. 5. Recursive estimation of ï² Time (s) (stars) and 95% confidence bands (solid) for one â€œas commissionedâ€ curve (top) and one â€œfaultyâ€ curve (bottom). 5. Random Walks and smoothing 5.1. Device and data Following successful implementation on a level crossing mechanism (Roberts 2002) [23], the authors adapted the methods to detect faults in seven point machines at Abbotswood junction, shown in Fig. 6 as boxes 638, 639, 640, 641A, 641B, 642A and 642B. The configuration deployed at Abbotswood junction was developed in collaboration with Carillion Rail (formerly GTRM), Network Rail (formerly RailTrack) and Computer Controlled Solutions Ltd. The junction consists of four electro-mechanical M63 and three electro-hydraulic point machines, shown in Figure 2. Each M63 machine is fitted with a load pin and Hall-effect current clamps. The electric-hydraulic point machines are instrumented with two hydraulic pressure transducers, namely an oil level transducer and a current transducer. A 1 Mb/sec WorldFIP network, compatible with the Fieldbus standard EN50170 (CENELEC EN50170 2002) [4], connects the trackside data-collection units to a PC located in the local relay room. Data acquisition software was written to collect data with a sampling rate of 200 Hz. Processed results can be observed on the local PC and also remotely. Digital Filters for Maintenance Management 11 Fig. 6. Set of points and the relevant components/sub-units at Abbotswood junction. The supply voltage of the point machine was measured (Fig. 7a), as well as the current drawn by the electric motor (Fig. 7b) and the system as a whole (Fig. 7d). In addition, the force in the drive bar was measured with a load pin introduced into the bolted connection between the drive bar and the drive rod (Fig. 7c). Fig. 7 shows the raw measurement signals taken in the fault-free (control or â€œas commissionedâ€) condition for normal to reverse and reverse to normal operation, respectively. Note that the currents and voltages begin and end at zero for both directions of operation, but a static force remains following the reverse to normal throw and a different force remains after the normal to reverse throw. It is difficult to compare the measurements taken during induced failure conditions with those from the fault-free condition because of noise in the measurements. 12 Digital Filters Voltage (V) 100 (a) 20 10 Currenta (A) (b) 50 0 0 -10 -50 0 2000 Sample Force (kN) 4000 (c) 6000 -20 0 2000 Sample 4000 (d) 6000 Currentb (A) 40 4 3 20 2 1 0 0 0 2000 Sample 4000 6000 -20 0 2000 Sample 4000 6000 Fig. 7. â€˜As commissionedâ€™ measured signals for the normal to reverse throw 5.2. Filtering the signal One possibility to reduce the noise is by using the SS formulation in (1) as a digital filter capable of reducing observation noise when the measured quantity varies slowly, but additive measurement noise covers a broad spectrum [8], [9]. In this particular case the signal being measured is modeled as a random walk, i.e. it tends to change by small amounts in a short time but can change by larger amounts over longer periods of time. The SS model used for each signal is described by equations (3). xt ï€«1 ï€½ xt ï€« wt ïƒ¼ ïƒ½ zt ï€½ xt ï€« vt ïƒ¾ (3) Q ï€½ E wt2 ; R ï€½ E vt2 Comparing with the general SS equations (1) we have: ï‚§ ï‚§ ï‚§ ï‚§ ï‚§ Variables ï€¨ï€© vt ï€¨ï€© xt , zt , Q, R, wt and are all scalars. Î¦t ï€½ 1; Et ï€½ 1; wt ï€½ wt ; Ht ï€½ 1; Ct ï€½ 1. The initial value given to Ë† Ë† x0 is: x0 ï€½ 0 . The initial value of P0 is chosen to reflect uncertainty in the initial estimate. Here The remaining quantities to be specified are Q, the variance of the noise driving the random walk, and R, the variance of the observation noise. P0 is initialised as P0 ï€½ 10 6 . By empirical methods using simulation, the best filtering is achieved with Q = 0.03 and R = 0.5. Note that the ratio Q/R defines the filter behavior. Digital Filters for Maintenance Management 13 The power spectral density of the filtered motor current (computed only while the motor is running) shows significant energy peaks at 100 and 200 Hz (Fig. 8, where the normalized frequency of 1 corresponds to a frequency of 250 Hz). Power Spectral Density Estimate via Welch 30 20 Power/frequency (dB/rad/sample) 10 0 -10 -20 -30 -40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency (ï‚´ï° rad/sample) 0.9 Fig. 8. Motor current power spectral density following Kalman filtering The dynamic model used can be augmented to model the observed interfering signals as narrow band disturbances centred at 100 and 200 Hz. The spectrum of the motor current signal is examined next before a decision on the most appropriate filtering is taken. A spectral analysis of the motor current signal against time (or sample) shows that the characteristic of the noise varies with the operating condition of the motor. From the spectrogram one can identify a small 50 Hz interference signal before the motor begins to turn (samples 1 to 1100). In the second stage, where the motor is turning, the interfering signal has strong 100 Hz and 200 Hz components but no 50 Hz component. In the final stage, the motor current does not have identifiable 50, 100, or 200 Hz components, but is affected by general wideband noise. Power spectral densities (psds) were computed for data selected from each of the three distinct operating regions. There is a 50 Hz interference signal during the first region and wideband noise during the last. Fig. 9 shows the psd for the middle phase, which is the noisiest region. It is possible to augment the SS model to describe the observed interfering signals, using different models for each of the three distinct phases. However, a simpler yet effective smoothing scheme exists, as described in the next section. 14 Digital Filters 40 30 Power/frequency (dB/rad/sample) Power Spectral Density Estimate via Welch 20 10 0 -10 -20 -30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency (ï‚´ï° rad/sample) 0.9 Fig. 9. Power Spectral Density estimate (samples 1000 to 4000). 5.3. Smoothing Noting that the sampling rate is 500 Hz and the interfering signals appear at 50, 100 and 200 Hz, an alternative filtering method, or, more correctly, smoothing method, is to compute a moving average of the original signal over a suitable number of samples. For example, computing the moving average with 10 samples has zero response to signals at 50 Hz. However, a 100 Hz signal, with only 5 samples per cycle, is not necessarily removed, depending on the relative phase of the 100 Hz signal and the samples. Removal of the 50 Hz, 100 Hz and 200 Hz interfering signals is guaranteed by computing a moving average over 40 samples, i.e. over a time window of 80 ms. This moving average also spreads an instantaneous motor current change over 80 ms, but this is not a problem in practice as the motor current does not change instantaneously. A moving average computed over 40 samples (80 ms) removes information at 12.5 Hz (and integer multiples thereof) and in addition acts as a general first-order low pass filter with a â€“3 dB point at 5.5 Hz. Losing information around 12.5 Hz is not important as long as comparisons are made between identically processed signals. By suitable alignment of the moving average result, filtering becomes smoothing. The smoothed signals are delayed by 40 ms, but this is of no concern for comparison with similarly processed fault-free signals. There is still some residual 100 and 200 Hz interference, but it is much reduced. Identical smoothing has been applied to all measurement channels, even though they are not equally affected by 50 Hz noise and its harmonics. A comparison of the smoothed signals with the corresponding signals obtained in the fault-free condition is now possible. Digital Filters for Maintenance Management 15 80 Voltage ()V 60 40 20 0 0 N-R (a) Currenta (A) 15 10 5 0 N-R (b) 500 1000 Sample N-R (c) Currentb (A) 0 N-R 500 Sample 1000 (d) 4 Force (kN) 3 2 1 20 10 0 2000 4000 Sample 6000 0 0 500 Time 1000 Fig. 10. Average control curves. N-R: Normal to Reverse Direction 5.4. Results The failure modes listed are identified using a pattern recognition method. The signals obtained in the fault-free condition, smoothed as described above and averaged over five throws, are shown in Fig. 10. The smoothed signals obtained under induced failure modes have been compared to the reference (or control) signals. 70 Control Signal Switch Blocked 1 Malleable Blockage Switch Blocked 2 60 50 Voltage (V) 40 30 20 10 0 0 1000 2000 3000 4000 5000 6000 7000 Sample Fig. 11. A Control signal and Switch Blocked and Malleable Blockage failure modes signals 16 Digital Filters Fig. 11 shows the voltage signals for the failure modes Switch Blocked 1, Switch Blocked 2 and Malleable Blockage, in the normal to reverse direction. Every failure can potentially be detected from signals a, b and c for normal to reverse transitions, and using signals b and c for reverse to normal transitions. Therefore, employing only signal b or c it potentially is possible to detect every fault in both operating directions. 6. Advanced Dynamic Harmonic Regression (DHR) The system developed in this section detects faults by means of comparing what can be considered a â€œnormalâ€ or â€œexpectedâ€ shape of a signal with respect to the actual shape observed as new data become available. One important feature of this system is that it adapts gradually to the changes experienced in the state of the point mechanism. The forecasts are always computed by including into the estimation sample the last point movements and discarding the older ones. In this way, time varying properties of the system due to a number of factors, like wear, are included, and hence the forecasts are adaptive. The data is a signal with long periods of inactivity, mixed up with other short periods where a point movement is being produced. Fig. 12 shows one small part of the dataset in the later case study, where the time axis has been truncated in order to show the movements of the signal. The real picture is one in which the inactivity periods are much longer that those shown in the figure, in a way that the movement periods would appear as thin lines. Signal 0 -2 -4 Signal -6 -8 -10 -12 0 500 1000 Truncated time 1500 2000 Fig. 12. Signal used by the fault detection algorithm. A new signal can be composed exclusively of those time intervals where the point mechanism is actually working. Looking at Fig. 12 it can be devised that even movements (normal to reverse move) have a slightly different pattern than uneven movements (reverse to normal). Therefore, two signals may be formed by concatenating the normal to reverse movements of the point mechanism in one hand, and the reverse to normal moves in the other. Fig. 13 shows one portion of the normal to reverse signal. Digital Filters for Maintenance Management 17 Periodic Signal 0 -2 -4 Signal -6 -8 -10 -12 0 2 4 6 Time (seconds) 8 10 Fig. 13. Signal obtained by concatenation of portions of data where the point mechanism is working. As it is clearly shown in Figure 13, the signal to analyse has strong periodicity and can be then modelled and forecast by a statistical model capable of replicating such behaviour. The period of the signal is exactly the time it takes to the point mechanism to produce a complete movement. Two difficulties arise that should be considered by the model: (i) the sampling interval of the data is not constant, it has small variations produced by the measurement equipment that should be taken into account; and (ii) the frequency or period of the waves changes over time. As a matter of fact, the changes of the period may be considered as a measurement of the wear in the system, as illustrated in Figure 14. Time length of movements 2.4 2.3 Time (seconds) 2.2 2.1 2 0 50 100 150 200 250 Movement index 300 350 Fig. 14. Time the point mechanism spend to produce movements in normal to reverse direction (solid) and reverse to normal (dotted). Fig. 14 shows the 380 time varying periods (or time to produce a complete movement of the mechanism) for the \\\"normal to reverse\\\" and \\\"reverse to normal\\\" signals (the first five data points corresponds to the signal shown in Fig. 13) that constitutes the full data set in the 18 Digital Filters later case study. There were several sudden increases of the period at some points in time due to faults that have been removed from the figure, in order to avoid distortions of the vertical axis. The time axis is on an irregular sampling interval, in order to take into account the moment at which each movement has taken place. It is clear that the period is lower at the beginning of the sample with a rapid increase that tends to come down from the middle of the sample. A similar behaviour is devised in the reverse to normal signal. The fault detection algorithm proposed here in essence would be composed of the following steps: 1. Forecasting next period on the basis of the signal in Figure 14. 2. Forecasting the signal in Figure 13 by a Dynamic Harmonic Regression model that uses the period forecast of the previous step. Assessing forecasts by comparing the forecast of step 2 with the actual signal coming from the sensors installed in the point mechanism. If the forecasts generated in step 2 are too bad (measured by the variance of the forecast error), a fault is detected. The way to assess whether a failure has been produced is by checking the variance of the forecast error above a certain level fixed for each specific point mechanism. 6.1. Step 1: Modeling and forecasting the period Two procedures have been considered: i) VARMA models in discrete time with two signals (the periods for normal to reverse and reverse to normal) modeled jointly; ii) once again a univariate local level model plus noise, but in continuous time. 6.1.1. VARMA model The VARMA (Vector Auto-Regressive Moving-Average) models (see e.g. [1], [18] and [25]) are natural extensions of the ARIMA (Auto-Regressive Integrated Moving Average) models to the multivariate case. One of the simplest but general formulations of a VARMA(p, q) model is Pt ï€½ Ï†1Pt ï€1 ï€« ï‹ ï€« Ï† p Pt ï€ p ï€« v t ï€« Î˜1 v t ï€1 ï€« Î˜ q v t ï€ q where Pt ï€½ p1,t and (4) ï› T p2,t ï is a bivariate signal; v t is a bivariate white noise, i.e. purely random signal with no serial correlation and covariance matrix R ; and Ï†i ( i ï€½1,2,ï‹ , p ) Î˜ j ( j ï€½1,2,ï‹, q ) are squared blocks of coefficients of dimension 2 ï‚´ 2 . r ï€½ max ï€¨ p , q ï€© ) VARMA models admit several SS representation according to equation (1). The one prefered here is (with Digital Filters for Maintenance Management 19 ïƒ¦ Ï†1 I 0 ïŒ ïƒ§ ïƒ§ Ï†2 0 I ïŒ x t ï€«1 ï€½ ïƒ§ ï ïïï ïƒ§ ïƒ§ Ï† r ï€1 0 0 ïŒ ïƒ§Ï† ïƒ¨r 00ïŒ 0ïƒ¶ ïƒ¦ Ï†1 ï€« Î˜1 ïƒ¶ ïƒ§ ïƒ· ïƒ· 0ïƒ· ïƒ§ Ï†2 ï€« Î˜2 ïƒ· ïƒ·v ï ïƒ·x t ï€« ïƒ§ ï ïƒ§ ïƒ·t ïƒ· Iïƒ· ïƒ§ Ï† r ï€1 ï€« Î˜ r ï€1 ïƒ· ïƒ§ Ï† ï€«Î˜ ïƒ· 0ïƒ· rïƒ¸ ïƒ¨r ïƒ¸ z t ï€½ ï€¨I 0 0 ïŒ 0 ï€©x t ï€« v t The model orders p and q can be identified using multivariate autocorrelation and multivariate partial autocorrelation functions. The block parameters, as well as the covariance matrix of the noise, are estimated using Maximum Likelihood. Forecasts are then computed on the basis of the actual data and the estimates of the model parameters, once the model passes a validation process. One of the most important validation tests is the absence of serial correlation in the perturbation vector noise v t (see e.g. [1], [18] and [25]). It is vital that the signals Pt on which all the VARMA methodology is applied should have stationary mean and variance. 6.1.2. Local level model in continuous time The model used for forecasting the period of the next movement (in a particular direction) in this case represents the observation, i.e. the period drifts over time, as wear varies simply because of usage (increases) or by preventive maintenance (decreases). Since the point movements are not produced at equally spaced intervals of time, a continuous-time model should be used. Formally, the continuous time SS model is given by d ïƒ© l ï€¨t ï€©ïƒ¹ ïƒ©0 1ïƒ¹ ïƒ© l ï€¨t ï€©ïƒ¹ ïƒ© w1 ï€¨t ï€©ïƒ¹ ï€« ï€½ dt ïƒª sï€¨t ï€©ïƒº ïƒª0 0ïƒº ïƒªsï€¨t ï€©ïƒº ïƒª w2 ï€¨t ï€©ïƒº ïƒ»ïƒ« ïƒ» ïƒ« ïƒ«ïƒ»ïƒ« ïƒ» Pï€¨t ï€© ï€½ l ï€¨t ï€© ï€« vï€¨t ï€© with ïƒ©q Qï€½ïƒª 1 ïƒ«0 0ïƒ¹ q2 ïƒº ïƒ» . (5) where Pï€¨t ï€© stands for the time varying period that is decomposed into the local level l ï€¨t ï€© and a noise term v ï€¨t ï€© assumed to be white Gaussian noise; w1 ï€¨t ï€© and w2 ï€¨t ï€© are independent white noises. One way to treat the continuous system above is by finding a discrete-time SS equivalent to it (see e.g. Harvey 1989) [15], by means of the solution to the differential equation implied by the system. A change in notation is necessary to convert the system to discrete-time: denote the k th observation of the series z (for k ï€½ ,2,ï‹ , N ) and assume that this observation is made at 1 k time tk. Let t0 ï€½ 0 and ï¤ k ï€½ t k ï€ t k ï€1 , i.e. the time interval between two consecutive measurements. System (3) may be represented by the discrete-time SS system in (5). 20 Digital Filters ïƒ© l k ïƒ¹ ïƒ©1 ï¤ k ïƒ¹ ïƒ© l k ï€1 ïƒ¹ ïƒ© w1,k ïƒ¹ ïƒª s ïƒº ï€½ ïƒª0 1 ïƒº ïƒª s ïƒº ï€« ïƒª w ïƒº ïƒ» ïƒ« k ï€1 ïƒ» ïƒ« 2,k ïƒ» ïƒ« kïƒ» ïƒ« Pk ï€½ l k ï€« v k unchanged as becomes (6) In order to make systems (6) and (5) equivalent, the variances of observational noise is R, but the covariance matrix of the process noise in the state equations ïƒ©1 / 3ï¤ k2 q 2 ï€« q1 1 / 2q 2ï¤ k ïƒ¹ Qk ï€½ ï¤k ïƒª ïƒº q2 ïƒ» ïƒ« 1 / 2q 2ï¤ k (see Harvey 1989, page 487) [15]. If all the data are sampled at regular time intervals, then ï¤ k ï€½ ï¤ and the noise variances are all constant; but if the data is irregularly spaced, as it is in our case, ï¤ k would take into account the irregularities of the sampling process. It is worth noting that the continuous-time model (5) involved system matrices that are all constant and the state noises were all independent of each other with constant variances. Beware that system (6) is written in form (1) and is the only case in this chapter that involves a time variable transition matrix Î¦ k and time variable variance noises that are correlated to each other according to the expression of Q k . 6.2. Step 2: Modeling and forecasting the signal Once the period or the time length of the next movement of the point mechanism is forecast by any of the models in section 5.1., it is necessary to produce the forecast of the signal itself for the next occurrence, in order to produce what should be expected in case of no faults. This is done by a Dynamic Harmonic Regression model (DHR) set up as described below. This model is very convenient in the present situation because it can easily handle the timevarying nature of the movement period. Obviously, the model can also be written in the form of a SS system as in (1). The formula of a DHR with the required properties is shown in equation (7). z k ,t ï€½ ïƒ¥ a i ,k sin ï· i ,k ,t t * ï€« bi , k cos ï· i ,k ,t t * ï€« ek ,t i ï€½1 M ï› ï€¨ ï€© ï€¨ ï€©ï (7) Here, z k ,t is the periodic signal in which the subscript k indicates whether the normal to reverse ( k ï€½ 1 ) or the reverse to normal ( k ï€½ 2 ) signals are being considered; M is the number of harmonics that should be included in the regression to achieve an adequate representation of the signal z k ,t ; a i , k and bi , k are 2M parameters to be estimated, representing the amplitudes of the co-sinusoidal waves; ï· i ,k ,t are frequencies at which the Digital Filters for Maintenance Management 21 sinusoids are evaluated, with ï· i ,k ,t ï€½ 2iï° p k ,t for i ï€½1,2, ï‹ , M and M ï‚³ p k ,t 2 and k ï€½ 1,2 ; ek ,t is a pure random white noise with constant variance. Separate Harmonic Regression models are used for the normal to reverse and reverse to normal signals. There are two key points for the model (7) to be an adequate representation of 1. z k ,t : pk ,t and ï·i ,k ,t have time varying period/frequency. The nature of such variation is dependent on the signal itself. For one full movement of the point mechanism p k ,t is maintained constant and is equal to the time it takes to produce the full movement. This value will be different in the next movement and is modified accordingly. 2. The time index t * is a variable linked to p k ,t that varies from 0 to p k ,t in each movement. Therefore, this variable is reset to 0 as soon as a movement finishes (see Fig. 15. Variables involved in the HR model 2 0 z(t), t*, P(t) -2 -4 -6 -8 -10 -12 100 100.5 101 101.5 102 102.5 Time 103 103.5 z(t) t* P(t) 104 104.5 Fig. 15. Two full movements of the point mechanism, with their associated period and time index according to model (7). Model (7) is then a regression of a signal on a set of deterministic functions of time and therefore all the standard regression theory can be applied, in particular estimates and forecasts can be made quickly. Model (7) have been generalized further by allowing parameters a i , k and bi , k to be time varying, producing a more flexible model, known as a Dynamic Harmonic Regression (DHR; see [21] [26]), but such complications are not found necessary in the case study described later. 6.3. The full fault detection algorithm The full algorithm for fault detection comprises the following steps: 1. Determine which historical data to use. In the later case study the previous 50 freefrom-faults movements of the point mechanism are used to estimate models (4) (5) and (7) at each new movement. 22 Digital Filters 2. A point forecast of the time that it would take the next movement is produced by means of model (4) or (5), together with its 95% confidence interval. In this way, a range of lengths or periods of the next movement are considered. Then, a different forecast of the signal z k ,t is produced for each period forecast in the previous step. Following this a full set of forecasts become available for a time horizon long enough to cover a full movement of the point mechanism. The new data points measured by the system are compared to all the forecasts produced in the previous step. The forecast closer to the actual data measured by the minimum of the standard deviation of the error is then considered to be the best forecast of the signal. If the best forecast is systematically bad, a fault has occurred and the system issues a warning. If the best errors are always low, no faults are detected. The boundary is measured in terms of standard deviation of the errors and such a value has to be adjusted for each particular point mechanism. If no fault is detected, then the data of the latest movement is incorporated into the historical data to be used next time, the oldest movement data being dropped. However, if a fault is detected, the historical data used to perform step 1 for the next movement is unchanged for the next movement. 3. 4. 5. The algorithm can be used in on-line or off-line contexts. For on-line use, step 3 can be repeated as each measurement data point becomes available. For off-line use the algorithm is applied to all the data collected for a full movement of the mechanism. The system requires a couple of values to be fixed by experimentation, namely the alarm limit that can be calculated from the standard deviation of signal z k ,t , and also the number of harmonics to include in the Harmonic Regression ( M in model (7)). Experiments carried out on logged data have been performed to set these two design parameters of the algorithm. The final setting for the standard deviation is 0.4 for the standard deviation, found to give the best discrimination between faulty and non-faulty events; and M ï€½ 62 harmonics for model (7) produces accurate fit and forecasts to the signal. 6.4. Results Standard identification techniques on VARMA models suggested a VARMA(0, 1). Estimation of such a model for the full data set was ïƒ© p1 ïƒ¹ ïƒ© p1 ïƒ¹ ïƒ© v1 ïƒ¹ ïƒ©ï€ 0.89 0.13 ïƒ¹ ïƒ© v1 ïƒ¹ ïƒªp ïƒº ï€½ ïƒªp ïƒº ï€« ïƒª 0 ïƒº ïƒªv ïƒº ï€« ïƒª v ïƒº ï€ 0.71ïƒ» ïƒ« 2 ïƒ» t ï€1 ïƒ« 2 ïƒ» t ïƒ« 2 ïƒ» t ïƒ« 2 ïƒ» t ï€1 ïƒ« ïƒ©3.3 0.9ïƒ¹ Rï€½ïƒª ï‚´ 10ï€ 3 0.9 2.5ïƒº ïƒ« ïƒ» The correlation between the components of the noise vector is 0.3. The relation between the output variables can be more easily seen if the model is written in the form of difference equations, Digital Filters for Maintenance Management 23 p1,t ï€½ p1,t ï€1 ï€ 0.89v1,t ï€1 ï€« 0.13v2,t ï€1 ï€« v1,t p 2,t ï€½ p 2,t ï€1 ï€ 0.71v2,t ï€1 ï€« v2,t The correlation of each variable with its own past is more important that the relation to each other, judging by the coefficients relating both variables and the correlation of noises. Nevertheless, the relation between them is significant and should be taken into account in order to forecast the output variables. The model is adequate in the sense that no serial correlation left in the residuals. One example is shown in Fig. 16. The top panels show the forecast of the periods to use in the DHR models, with the 95% confidence intervals. Such period is the expected length of the next movement, that is the value introduced into the DHR model to forecast the signal itself. The forecast of the signal is shown in the bottom panels, where the dotted lines are the actual values and the solid lines are the final forecast of the system. It is clear that the left case is free from any fault, since the forecast matches perfectly the actual data, while the expected behavior in the right panel is very different to the actual data, implying that a fault has occurred. Forecast of Normal to Reverse P(1, t) 2.1 P(1, t) 2.05 P(1, t) 2.2 2.1 2 Forecast of Normal to Reverse P(1, t) 2 1.95 50 2.3 2.25 55 65 70 75 Movement index Forecast of Reverse to Normal P(2, t) 60 80 70 75 80 85 90 95 100 Movement index Forecast of Reverse to Normal P(2, t) 105 2.3 P(2, t) 2.2 2.1 P(2, t) 2.2 2.15 2.1 50 55 60 65 70 Movement index 75 80 70 75 80 85 90 95 Movement index 100 105 Forecast of z(1, t) 0 z(1, t) z(1, t) 0 -2 -4 Forecast of z(1, t) -2 -4 -6 0 0.5 1 1.5 2 2.5 Time (seconds) 3 3.5 4 -6 0 0.5 1.5 2 2.5 Time (seconds) 3 3.5 4 Fig. 16. Left panels shows results for fault free data. Right panels show results for a faulty signal. Panels in the two first rows show the forecast of VARMA model (from the vertical line on); solid lines show the actual periods and the forecast (smoother line). Panels in bottom row show the forecast of the DHR model with the period forecast in the top panels; solid lines are the actual data, dashed lines are the forecast. 24 Digital Filters Similar results are achieved when the local level model set up in continuous time is used instead (see Fig. 17). Forecast of P(t) 2.1 P(t) Forecast of P(t) 2.2 P(t) 2.1 2 2.05 2 1.95 100 120 140 160 180 200 220 Time (seconds/1000) Forecast of z(t) 240 260 280 200 250 300 350 400 450 Time (seconds/1000) Forecast of z(t) 500 550 0 z(t) 0 z(t) -2 -4 0 0.5 1 1.5 Time (seconds) 2 -2 -4 -6 -6 0 0.5

Part of the book: Digital Filters