## 1. Introduction

Event-driven controlled systems based on the Programmable Logic Controller (PLC) are widely used in many industrial processes. The number of such a control system is said to occupy more than eighty percent of the entire existing control systems. Nowadays, the demands for production facilities are shifting from the high speed and highly efficiency to the safety and high reliability. In order to meet these requirements, several strategies for fault diagnosis of systems and the design of recovery procedure have been proposed.

In the case of considering the PLC-based control systems, since they have discrete and event-driven characteristics inherently, system models based on discrete-event-system description give more efficient diagnostic algorithm than those based on continuous-time systems (for surveys cf. (A. Darwiche & G. Provan (1996); D. N. Pandalai& L. E. Holloway (2000); M. Sampath et al. (1995); S.H.Zad et al. (1999))). This aspect will be more emphasized when the number of components would be large. Based on these considerations, Lunze proposed a centralized fault diagnosis framework based on the system model with Timed Markov Model (TMM) (J.Lunze (2000)). This method especially becomes useful when numerous number of input and output data are collected through daily operation since the TMM is based on a stochastic expression of time interval between successive events. This approach also has some robustness against unevenness underlying in the ordinary production facilities. However, this kind of centralized diagnosis strategies will cause an explosion of the computational burden when they are applied to the large scale systems. In this case, the decentralized approach is highly recommended wherein the diagnosis is performed by each diagnose together with the communication with other diagnosers (O.Contant (2006); S.Debouk (2000); R.Su et al. (2002)). These approaches, however, were based on the deterministic model.

Based on these backgrounds, the authors (S.Inagaki et al. (2007)) proposed a decentralized stochastic fault diagnosis strategy based on a combination of TMM and Bayesian Network (BN). The BN represents the causal relationship between the fault and observation in subsystems. Since the decentralized diagnosis architecture distributes the computational burden for the diagnosis to the subsystems, a large scale diagnosis problems in real-world application can be solved. In the decentralized approach, the computational burden and the diagnosis performance strongly depend on the complexity of the graph structure of BN.

This chapter also addresses a design method of the graph structure of the BN in decentralized stochastic fault diagnosis of (S.Inagaki et al. (2007)) based on the control logic implemented on the system. For example, an actuator speed reduction affects on the (timed) event sequences observed by the sensors allocated in the subsystems. The effects of this type of fault on other subsystems depend on the control logic wherein the observed event signal is used as an firing condition of the actuators in other subsystems. Thus, the coupling in the control logic over subsystems must be considered in the design of the graph structure of BN. In order to formally realize this idea, the Sensor Actuator Dependency (SAD) graph and the Dependency Tree (DT) are constructed from the control logic in our strategy. The resulting DT represents the hierarchy of the causal relationship between the components in the system. Therefore, by specifying the level of hierarchy appropriately, the graph structure of BN with different level of complexities can be designed.

The remaining part of this chapter is organized as follows: In section 2, we define the problem statement of decentralized fault diagnosis. In section 3, we overview the entire strategy of the fault diagnosis based on BN with a simple example. In section 4, local diagnosis based on TMM is introduced and, in addition, the calculation results of the local diagnosers are combined based on BN. Section 5 shows the procedure of the proposed decentralized diagnosis. In section 6, estimation strategy of probability distribution functions (PDF) which is used in the local diagnosis is introduced based on maximum entropy principle (M.Saito et al (2006)). In section 7, the usefulness of the stochastic decentralized fault diagnosis is verified through some experimental results of an automatic transfer line which is widely used in the industrial world. Section 8 proposes a design method of the graph structure of BN, and, in section 9, the decentralized fault diagnosis is applied to the automatic transfer line, while the system scale is larger than that in section 7, with trying some BN structures which are constructed based on the proposed design method. Section 10 concludes this chapter.

## 2. Problem statement

First, we assume that the controlled system can be divided into *n* subsystems in consideration of the architecture of the hardware and/or software. Furthermore, the output (event) sequence, which corresponds to the series of the ON/OFF of sensors and actuators, can be observed in each subsystem. Then, the event sequence for the *k*-th subsystem (*t*_{h}) is defined as follows:

where is the *H*-th event and is the occurrence time of the *H*-th event in the *k*-th subsystem. In addition, the *κ*-th fault in the *k*-th subsystem is represented by, and a combination of faults for all subsystems is defined as “r–combination of faults for the entire system.” The set of r–combination of faults for the entire system is defined bellow:

This paper deals with the following diagnosis problem:

### 3. Global diagnosis based on Bayesian network

Bayesian Network (BN) is a probabilistic inference network which expresses qualitative causal relations between some random variables by a graph structure together with the conditional probability assigned to each arc (E.Castillo et al. (1997)).

In this section, the proposed global diagnosis method is explained. First, two types of random variables are defined. The first one is *R*^{k} which takes (*κ* {0, 1, …,*K*}) as a realization. The second one is the *E*^{k} which takes the observed event sequence as a realization. In the BN, the causal relationship between these random variables are defined using a graph structure wherein each node corresponds to each random variable. For the purpose of the fault diagnosis, we restrict the structure of the BN in the bipartite graph. One subset consists of the set of *R*^{k}s, and the other subset consists of the set of *E*^{k}s (Fig.1). We also assume that there are no causal relationship between nodes in the same subset. The development of an appropriate graph structure must be made by considering the physical and logical interactions between subsystems. The fault diagnosis can be realized by calculating the occurrence probability of each fault conditioned by the observed event sequence.

Figure 2 shows the example of the BN for fault diagnosis. The occurrence probability of the fault in the subsystem 1 can be systematically calculated as follows: First, the joint probability distribution (JPD) is uniquely decided based on the graph structure.

Then, the occurrence probability of the fault in the subsystem 1 is calculated by marginalizing the JPD. For example, the fault occurrence probability of the fault in the subsystem 1 is calculated as follows:

where *Z* is normalized term and is represented as (5).

In (4), the term represents the conditional probabilities assigned to the corresponding arc. This conditional probability can be calculated using the local diagnosis results and the Bayesian estimation (see section 4.3 for detail). Also, the prior probabilities (for example *P*(*R*^{1} =) in (4) are supposed to be given in advance. See section 7.4 for another example.

### 4. Local diagnosis based on TMM

### 4.1. Timed Markov model

For the local diagnosis, the relationship between two successive events observed in the corresponding subsystem are represented by means of Timed Markov Model (TMM). The TMM is one of the Markov model wherein the state transition probabilities depend on time. In other words, state transition probabilities vary according to the time interval between two successive events. In the following, representation of the event driven system based on the TMM is briefly described (J.Lunze (2000)).

First of all, the set of fault random variables which are connected to the random variable *E*^{k} is defined and denoted by. Then, a combination of these realizations is defined as “r^{k}– combination of faults for the *k*-th subsystem.” Furthermore, the set of these is denoted by ^{k} = {r^{k} =. Roughly speaking, r^{k} consists of the realization of the faults which affect on the measurement of the *k*-th subsystem *E*^{k}. For example, in Fig.2, , and. Based on definition of the r^{k}, the following two functions are defined to specify the stochastic characteristics in the TMM.

Definition 1 *A probability density function* (PDF) *represents a probability density function for the time interval τ*
^{k}
*under the situation that the fault* r^{k}
*exists. Note that* τ^{k}
*is a time interval between two successive events*
*and*
*in*
*the*
*k*-*th subsystem.*

Definition 2 *A probability distribution function*
*represents a probability distribution function that the event*
*dose not occur within τ*
^{k}
*after event*
*has occurred under the situation that the fault* r^{k}
*exists.*
*is represented by integrating*.

where some symbols are defined as follows:

: H-th event in the k-th subsystem: Occurrence time of event t_{h} : Sampling time indexτ ^{k} : Waiting time from the occurrence of the latest event in the k-th subsystem (τ ^{k} = t_{h} –)^{k} : Set of events that occur in the k-th subsystem

Then, relationship between two successive events observed in the subsystem can be described by specifying the probability distribution functions. This function plays an essential role in the TMM based modeling and diagnosis. Section 6 shows an effective estimation method of the probability distribution functions.

### 4.2. Local diagnosis method

The goal of the local diagnosis is to find the following fault occurrence probability based on the observation only of the k-th subsystem:

Equation (8)represents an occurrence probability of the r^{k} conditioned by the observation in the k-th subsystem (t_{h}). For the calculation of (8), the recursive algorithm has been developed in (J.Lunze (2000)). First, the following two cases must be distinguished:

Case(a): There is no event at time t_{h}

Case(b): The (H + 1)-th event occurs at time t_{h}

Fig. 3 shows relations between time and events in the cases (a) and (b). The diagnosis begins with no information on the existence of the fault, i.e. the initial probabilities are given by

where denotes the number of realizations in ^{k}. Next, an auxiliary function is calculated as follows:

Case(a) : No event is observed at time t_{h}

Case(b) : The (H + 1)-th event occurs at time t_{h}

The fault occurrence probability given by (8) is updated by

### 4.3. Calculation of conditional probability in the BN

In the global diagnosis, the calculation of the conditional probability was the key computation (see (4) as an example). The conditional probabilities assigned to each arc (appearing in the marginalized JPD) in the BN can be calculated using (8) and Bayes theorem as follows:

where the prior probability is given in advance. Note that the probability P(E^{k} = (t_{h})) is not required to be calculated in advance because it is canceled out in (4). This equation implies that the global diagnosis can be executed by integrating results of the local diagnosis.

#### 5. Diagnosis procedure

The procedure of the proposed decentralized diagnosis is depicted in Fig.4. First of all, observe the event sequence in each subsystem. Second, perform the local diagnosis in each subsystem based on the observed event sequence and calculate the conditional probabilities in the BN using (13). Then, calculate the fault occurrence probabilities by means of the BN (global diagnosis). Finally, select the greatest probability among all fault candidates in each subsystem. The diagnosis result for the k-th subsystem is the fault that satisfies the following equation in the case that the fault candidates for the k-th subsystem are.Diagnosis Result for the k-th subsystem

## 6. Estimation of probability density function by maximum entropy principal

As described in the preceding sections, it is required to estimate all probability distribution functions (PDF) in advance for modeling the system based on TMM, where the

superscript k representing subsystem k is omitted for simplicity in this section. One of the most straightforward way to do it is to collect numerous number of output sequences, and generate the histogram of the time interval of all two successive events for various situations such as normal or some kind of faulty. In the real application, it is not necessary to collect data for all situations in advance. When some new fault occur, then the new observed data for the new fault can be simply added to the old database as for the PDF. Thus, the PDF can be updated according to the occurrence of the new fault.

Although, the PDF can be estimated by collecting the observed output sequence, when we consider to use it as the system model, we often face the zero frequency problem which leads to incorrect result in the system diagnosis based on TMM. In order to overcome this problem, the maximum entropy principle (M.Saito et al (2006)) is introduced in this section. It enables us to find the PDF, which maximizes the entropy with keeping the stochastic characteristics of the collected observed data (i.e. the histogram). The remaining part of this section is devoted to describe the estimation procedure for PDF by means of the maximum entropy principle.

First of all, a histogram is created based on observed data. Then, a range of τ, is quantized into n equal intervals under the assumption that all unknown data exists in where μ and σ are mean value of the observed data and standard deviation, respectively.

Second, let {τ_{1},τ_{2}, …,τ_{n}} be the center of each interval, and let be the probabilities corresponding to the points {τ_{1},τ_{2}, …,τ_{n}}. The example of this quantization is illustrated in Fig.5.

Finally, we solve the following entropy maximization problem:

subject to

where a_{j}(= E[(τ ) ^{j}]) is the j-th moment obtained from the observed data. This problem can be solved by applying the Lagrange multiplier method, and the solution has a form given by

where λ_{0} is given by (18) and λ_{1},...,λ_{m} are the Lagrange multipliers corresponding to the m constraints.

The estimated PDFs are applied to the interval. For the outside of the range, probabilities are set to be zero and ( in normal and faulty situations, respectively.

Fig.6 and Fig.7 show PDF examples constructed by observed data in a transfer machine (see section 7 for details). Then, several moment constraints given by (16) were specified by using the histogram. In these examples, 1st and 2nd moments were considered. The problem of entropy maximization (15) was solved by using the Lagrange multiplier method. Estimated PDF are given by (19) and (20), respectively, where ( is 0.01. Thick solid line in Fig.6 and Fig.7 represent the estimated PDF.

### 7. Application to automatic transfer line

### 7.1. Automatic transfer line

In this section, the proposed diagnosis procedure is applied to the automatic transfer line depicted in Fig.8. This type of machine is widely used in industrial world.

Fig.9 shows the diagram of the developed prototype transfer line shown in Fig.8. This system transfers works to the unload station by means of two belt-conveyors (L1, L2: their length are 50cm) and two cranes (C1, C2). Sensors (S1-S6) are installed at the beginning, end and center of the conveyors and the sensor S7 is installed at the unload station. The events are observed when the work crosses the sensors, and are depicted also in Fig.9 superimposing on the automatic transfer line.

The transfer line system is decomposed into the four subsystems (Lane1, Crane1, Lane2, Crane2) as shown in Fig.10. The set of events observed in each subsystem is specified in Table 1.

### 7.2. Candidates of fault

We consider the candidates of fault in each subsystem specified in Table 2. Note that it is unlikely that these faults are diagnosed using deterministic approach.

For the Lane1 and Lane2, the “normal” implies the case that the speed of the belt-conveyor is between 7.8cm/sec and 8.6cm/sec, and the “Speed of the belt-conveyor is reduced” implies the case that the speed of the belt-conveyor goes down between 7.0cm/sec and 7.8cm/sec. Faults and may come from a fatigue of the actuator. The “Sensor does not respond with probability of 50%” may occur by means of a defective wiring and so on. This corresponds to the early stage of the fatal fault wherein the sensor does not respond at all. Thus, 3 ( 1 ( 3 ( 2 = 18 fault cases are investigated for the entire system including cases that some faults occur simultaneously in some subsystems.

### 7.3. Experimental conditions

Experimental conditions are specified as follows:

Works are provided to the line with almost constant intervals (about 5 sec).

Works do not exist in the system at time t_{h} = 0.

The experiment is finished if ten works are transferred to the unload station.

A sampling time for observation of events is 0.1 sec.

Under these experimental conditions, the event sequences are collected. The probability density functions (PDFs) for every combination of two successive events in each subsystem are estimated before fault diagnoses. The PDFs are estimated through eighty trials per each fault case in advance.

### 7.4. Graph structure

As mentioned in section 3, two types of random variables are defined and specified as nodes in the BN. The first one is R^{k} which takes (κ {0, 1, …,K}) as a realization. The second one is the E^{k} which takes the observed event sequence as a realization. In this application, a graph structure depicted in Fig.11 is adopted under the consideration that the faults occurred in the k-th subsystem influence on the event sequences observed in the (k – 1)-th and the k-th subsystem. Generally speaking, the graph structure should be designed from viewpoints of the computational burden for the diagnosis and the hardware / software interactions between subsystems. Development of the formal procedure for the generation of the graph structure is now under investigation.

The JPD is calculated based on Fig.11 as follows:

Then, the probabilistic inference based on the BN becomes possible by marginalizing the JPD. For example, the occurrence probability of the fault in the subsystem 1 is calculated as follows:

where Z_{1} is normalized term, and is represented by

### 7.5. Results of fault diagnosis

#### 7.5.1. Faultless case

Fig.12 shows the profiles of the fault occurrence probability in each subsystem wherein no fault has occurred in the entire controlled system. The result in the subsystem 2 (Crane 1) is eliminated because no fault has considered in the subsystem 2. In Fig.12 (a) to (c), the probability of the “normal ()” becomes almost 1 before 45 sec in all subsystems, and this result lasts until the experiment is completed. This implies that the result of the diagnosis for all subsystems are “normal ()”, and agrees with the actual situation of the system.

### 7.5.2. Multiple faulty case

Fig.13shows the profiles of the fault occurrence probability in each subsystem wherein the faults, and have occurred at a certain time (no fault has occurred in the subsystem 2). In Fig.13 (a), (b) and (c), the vertical lines represent the time instants when the faults, , and occurred, respectively. In the subsystem 1 (Fig.13(a)), the probability of the fault goes up every time when the fault occurs, and shows the greatest probability when the experiment is completed. As the result, the fault can be uniquely identified in the subsystem 1. Furthermore, in the subsystem 3 (Fig.13 (b)) and subsystem 4 (Fig.13 (c)), the faults and can be identified successfully a few seconds after each fault has occurred. These results show that the diagnosis results completely agree with the actual faulty situation.

### 7.5.3. Comparison with centralized method

We have performed the experiments seven times for each fault, i.e., the total number of the trials is 718=126. The statistics of the diagnosis results are listed in Table 3 together with the statistics of the centralized approach (M.Saito et al (2006)) (i.e. the system is not decomposed). In Table 3, the “Success Rate” means the rate that the all diagnosis results coincide with the actual fault situation, the “Wrong Diagnosis Rate” means the rate that at least one of the subsystems had wrong diagnosis result, and the “Undetection Rate” means the rate that the diagnosis result was “normal” in spite of existence of the fault. The success rate of proposed decentralized diagnosis is 81%. This is reduced by 13% compared with the conventional centralized method. This reason is considered that the direct relationships (arcs) between R^{k} and R (k ≠ ), E^{k} and E (k ≠ ) are ignored. However, using the proposed decentralized strategy can distribute the computational burden for the diagnosis to the subsystems with sacrificing the small degradation of the success rate. The appropriate selection of the graph structure in the BN will lead to the increase of the success rate.

## 8. Design of graph structure

In this section, the graph structure of the BN is designed based on the control law applied to the controlled system. The design procedure is explained step by step with an example.

The controlled system is defined by three tuples as follows:

where S is the set of sensors, A is the set of actuators, and C is the set of control laws. The system is divided into subsystems:

where A^{k} and S^{k} are the set of actuators and sensors included in the k-th subsystem, respectively. In addition, C^{k} is the set of control laws relevant to A^{k}. Figure 14 shows the diagram of the developed prototype transfer line. This system transfers works to the unload station by means of six actuators; four lanes (Lane1-Lane4; their length are 50 cm) and two cranes (Crane1, Crane2). Sensors (S1-S12) are installed at the beginning, end and center of the lanes, and the sensor S13 is installed at the unload station. The events depicted in Fig.14 are observed when the work crosses the sensors. The transfer line system is decomposed into six subsystems as shown in Fig.14. The set of events observed in each subsystem is specified in Table 4.

The control lows applied to the system are summarized as follows:

Each lane is interlocked by its terminal sensor, i.e., stops when the terminal sensors (S3, S6, S9 and S12) are fired.

The lane continues to behave in the absence of the interlock stop.

Lane3/Lane4 stop when Crane1 is moving down at the position S7/S10.

Each crane starts to move and to transfer a work when a work reaches at the terminal sensor.

Crane1 transfers a work from Lane1 or Lane2 to Lane3 or Lane4.

Crane2 transfers a work from Lane3 or Lane4 to the unload station.

The crane transfers a work to the nearest lane which is available.

These control lows can be described using the form of a ladder logic ?. For example, Fig.15shows the ladder logic of the C^{1} wherein the operating situation of the Lane1 (L1) is expressed by L1 = (X L1) . In other case, the logic of the C^{4} wherein the operating situation of the Lane3 (L3) is expressed by. This is due to the logic that Lane3/Lane4 stop when Crane1 is moving down at the position S7/S10.

Based on this logical relationship between sensors and actuators, the causal relationships between sensors and actuators are extracted and expressed by a sensor actuator dependency (SAD) graph by using the following algorithm:

An example of the SAD graph constructed from the control logic is shown in Fig.16. In the next step, a dependency tree (DT) is produced from the SAD graph by the following algorithm:

An example of the DT produced from Fig.16 is shown in Fig.17. In the last step, the structure of BN is designed from the DT by the following algorithm:

In this algorithm, the parameter L is a depth of the DT and represents a threshold to take into consider the causal relationship between the subsystems into the graph structure of the BN. Figure 18 is the resultant graph structure when L = 2 for the DT in Fig.17. In Fig.18, for example, there exist arcs from R^{6} to E^{3}, E^{4}, E^{5}, and E^{6} because S^{3}, S^{4}, S^{5}, and S^{6} are included within Level 2 in Fig.17. Note that although the DT in Fig.17 starts from the actuator, a DT which starts from the sensor is simply constructed by straightforward modification of Algorithm 2.

## 9. Experimental verification

In this section, the decentralized diagnosis procedure is applied to the automatic transfer line depicted in Fig.14. The diagnosis procedure is executed by means of three graph structures. Graph structure 1 depicted in Fig.18 is derived in Section 8. Graph structure 2 depicted in Fig.19 considers all causal relationships, i.e., L = in the DT. Graph structure 3 depicted in Fig.20 represents the completely independent diagnosis.

### 9.1. Candidates of fault

We consider the candidates of fault in each subsystem specified in Table 5. For the lane, the “normal” implies the case that the speed is between 7.8 cm/sec and 8.6 cm/sec, and the “Speed of the lane is reduced” implies the case that the speed goes down between 7.0 cm/sec and 7.8 cm/sec. Faults and may come from a fatigue of the actuator. For the crane, the “Speed of the crane is reduced” implies the case that it takes 0.2 more seconds

than the “normal” situation to transfer a work to the destination lane. Thus, 1 2 2 1 2 2 = 16 faulty cases are investigated for the entire system including cases that some faults occur simultaneously among some subsystems.

### 9.2. Experimental conditions

Experimental conditions are specified as follows:

Works are provided to the Lane1 and Lane2 alternately with almost constant intervals (about 5 sec).

Works do not exist in the system at time t_{h} = 0.

The experiment is finished when twenty works are transferred to the unload station.

A sampling time for observation of events is 0.1 sec.

All the prior probabilities (i = 1, 2, …, m) in (13) are set to beThis means that no statistical information about the faults has not been used for the diagnosis. Under these experimental conditions, the event sequences are collected. The probability density functions (PDFs) for every combination of two successive events in each subsystem are estimated before the fault diagnosis. The PDFs are estimated through fifty trials per each faulty case in advance. The calculation of the diagnosis was performed by personal computers (Pentium 4 2.39 GHz).

### 9.3. Results of fault diagnosis

We have performed the experiments ten times for each faulty case, i.e., the total number of the trials is 10 16 = 160. The statistics of the diagnosis results are listed in Table 6. In Table 6, the “Success Rate” means the rate that the all diagnosis results coincide with the actual faulty situation, the “Wrong Diagnosis Rate” means the rate that at least one of the subsystems had wrong diagnosis result, and the “Undetection Rate” means the rate that the diagnosis result was “normal” in spite of existence of the fault.

The success rate of the graph structure 1 and 2 are both increased compared with the structure 3. This is due to the consideration of the causal relationships between subsystems. The structure 2 is better than the structure 1 from viewpoint of the success rate, however, the number of PDFs of the structure 1 is almost half of that of the structure 2. Since the number of the PDFs is related with the computational burden for the real-time inference, the structure 1 can be realized with less computational burden than the structure 2. The computing time shown in Table 6 is the total required time to diagnose the 150.5 [sec] data. These times were obtained from the maximum computing time of each local diagnoser and the computing time of the global diagnoser as shown in Fig.21 (in the case of the structure 1). The computation of the local diagnosers is dominate in the computation of entire

### 10. Conclusions

This paper presented a design method of the graph structure of the Bayesian Network (BN) in the decentralized stochastic fault diagnosis of large-scale event-driven controlled systems. First, in order to estimate the probability density functions of the randomized time intervals, the maximum entropy principle was introduced, which can estimate probability density functions so as to maximize the uniformity with satisfying the constraints caused by observed data.

Second, the controlled plant was decomposed into some subsystems, and the global diagnosis was formulated using the Bayesian Network (BN), which represents the causal relationship between the fault and observation between subsystems.

Third, the local diagnoser was developed using the conventional Timed Markov Model (TMM), and the local diagnosis results were used to specify the conditional probability assigned to each arc in the BN. By exploiting the decentralized diagnosis architecture, the computational burden for the diagnosis can be distributed to the subsystems. As the result, large scale diagnosis problems in the practical situation can be solved.

Forth, the graph structure of the BN is designed based on the control logic applied to the system. In order to realize this, the Sensor Actuator Dependency (SAD) graph and the Dependency Tree (DT) are constructed from the control logic. Since the computational burden and the diagnosis performance mainly depend on the complexity of the graph structure of BN, they are adjusted adequately by specifying the depth of the DT which represents the strength of the causal relationship between components in subsystems.

Finally, the usefulness of the proposed strategy has been verified through some experimental results of an automatic transfer line. Our future work is to verify the decentralized stochastic fault diagnosis strategy in larger scale event-driven controlled systems.