Example FMEA Table
1. Introduction
Risk assessment has been used to analyze a wide range of industries to determine vulnerabilities with the ultimate purpose of eliminating the sources of risk or reducing them to a reasonable level. The purpose of this chapter is to show how risk assessment tools can be used to develop risk models of aviation maintenance tasks. Two tools will be discussed in this chapter, though many other methods exist. The tools discussed in this chapter are:
Failure Mode and Effect Analysis (FMEA)
Event and Fault Tree Analysis
Ostrom and Wilhelmsen (2011) discuss a wide range of risk assessment tools and this book provides many examples of how these tools are used to analyze various industries.
2. Failure mode and effect analysis
An FMEA is a detailed document that identifies ways in which a process or product can fail to meet critical requirements. It is a living document that lists all the possible causes of failure from which a list of items can be generated to determine types of controls or where changes in the procedures should be made to reduce or mitigate risk. The FMEA also allows procedure developers to prioritize and track procedure changes (Mil Std 882B, C, 1984 and 1993). The process is effective because it provides a very systematic process for evaluating a system or a procedure, in this instance. It provides a means for identifying and documenting:
Potential areas of failure in process, system, component, or procedure.
Potential effects of the process, system, component, or procedure failing.
Potential failure causes.
Methods of reducing the probability of failure.
Methods of improving the means of detecting the causes of failure.
Risk ranking of failures, allowing risk informed decisions by those responsible.
A starting point from which the control plan can be created.
FMEA can be used to analyze:
Process: Documents and addresses failure modes associated with the manufacturing and assembly process.
Procedure: Documents and addresses failure points and modes in procedures.
Software: Documents and addresses failure modes associated with software functions.
Design: Documents and addresses failure modes of products and components long before they are manufactured and should always be completed well in advance of prototype build.
System: Documents and addresses failure modes for system and subsystem level functions early in the product concept stage.
Project: Documents and addresses failures that could happen during a major program.
A procedure analysis will be used to demonstrate how an FMEA can be conducted. An FMEA is conducted on a step-by-step basis. Table 1 shows an example of an FMEA table. The following constitutes the steps of an FMEA. These steps will be illustrated by use of an example.
Item | Potential Failure Mode | Cause of Failure | Possible Effects | Probability | Criticality (Optional) | Prevention |
Step in procedure, part, or component | How it can fail: –pump not working –stuck valve –no money in a checking account –broken wire –software error –system down –reactor melting down |
What caused the failure: Broken part Electrical failure Human error Explosion Bug in software |
Outcome of the failures: Nothing System crash Explosion Fire Accident Environmental release |
How possible is it: Can use numeric values: 0.1, 0.01, or 1E-5 Can use a qualitative measure: Negligible, low probability, high probability. |
How bad are the results: Can use dollar value: $10., $1,000., or $1,000,000 Can use a qualitative measure: Nil, Minimal problems, major problems. |
What can be done to prevent either failures or results of the failures? |
The first step is to create a flow diagram of the procedure. This is a relatively simple process in which a table or block diagram is constructed that shows the steps in the procedure. Table 2 shows the simple steps checking an engine chip detector. Note that this is a simple example and not an exhaustive analysis. Table 3 lists the major, credible failures associated with each step in the process. Table 4 shows the effect of the potential failures. Table 5 shows the complete FMEA for the task.
|
FMEA is a relatively simple, but powerful tool and has a wide range of applicability for analyzing aircraft maintenance tasks.
3. Event tree and fault tree analysis
An event tree is a graphical representation of a series of possible events in an accident sequence (Vesely, William; et. al., 2002). Using this approach assumes that as each event occurs there are only two outcomes, failure or success. A success ends the accident sequence and the postulated outcome is either that the accident sequence terminated successfully or was mitigated successfully. For instance, a fire starts in an engine. This is the initiating event. Then the automated system closes fuel feed. If the lack of fuel does not extinguish the fire, the next step is that that the fire suppression system is challenged. If the system actuates the fire suppression system the fire is suppressed and the event sequence ends. If the fire suppression system fails the fire is not suppressed then the accident sequence progresses. Table 6 shows this postulated accident sequence. Figure 1 shows this accident sequence in an event tree.
As in most of the risk assessment techniques, probabilities can be assigned to the events and combined using the appropriate Boolean Logic to develop an overall probability for the various paths in the event. Using our example from above, we will now add probabilities to the events and show how the probabilities combine for each path. Figure 2 shows the addition of path probability to the event tree.
Inspecting Chip Detector | |
Process Steps | Major Failures |
Cut and Remove Lock Wire from Oil Drain Plug | No major failures that affect process outcome |
Remove Oil Drain Plug | No major failures that affect process outcome |
Drain Oil | No major failures that affect process outcome |
Cut and Remove Lock Wire from Chip Detector | No major failures that affect process outcome |
Remove Chip Detector | Improper removal can remove debris from chip detector and cause false reading. Chip detector can be damaged if improperly removed. |
Examine Chip Detector | Aircraft Maintenance Technician (AMT) fails to notice debris on chip detector. |
Clean Chip Detector | AMT fails to properly clean chip detector |
Replace Chip Detector | AMT fails to properly install chip detector |
Lock Wire Chip Detector | AMT fails to properly lock wire chip detector |
Replace Oil Drain Plug | AMT fails to properly install oil drain plug |
Lock Wire Oil Drain Plug | AMT fails to properly lock oil drain plug |
Replace Oil | AMT fails to properly replace oil |
Inspecting Chip Detector | ||
Process Steps | Potential Failure Modes | Potential Failure Effects |
Remove Chip Detector | Improper removal can remove debris from chip detector and cause false reading. Chip detector can be damaged if improperly removed. | Engine could fail if chips are not properly detected. Added cost to replace damaged chip detector. |
Examine Chip Detector | Aircraft Maintenance Technician (AMT) fails to notice debris on chip detector. |
Engine could fail if chips are not properly detected. |
Clean Chip Detector | AMT fails to properly clean chip detector | Debris could be placed back into engine. |
Replace Chip Detector | AMT fails to properly install chip detector | Oil could leak past chip detector. Threads of chip detector could be damaged. |
Lock Wire Chip Detector | AMT fails to properly lock wire chip detector | Chip detector could become lose and fall out, leading to loss of engine oil. |
Replace Oil Drain Plug | AMT fails to properly install oil drain plug | Engine oil could leak out. Oil drain plug could become damaged. |
Lock Wire Oil Drain Plug | AMT fails to properly lock oil drain plug | Oil drain plug could become loose and fall out. Oil drain plug could become damaged. |
Replace Oil | AMT fails to properly replace oil | Engine could fail. |
Procedure Step | Potential Failure Mode | Cause of Failure | Possible Effects | Probability | Criticality | Prevention |
Cut and Remove Lock Wire from Oil Drain Plug | No major failures that affect process outcome | AMT Fails to Perform Task | Delay in performing task. | Very Low | Not Critical | Ensure AMTs follow work schedule |
Remove Oil Drain Plug | No major failures that affect process outcome | AMT Fails to Perform Task | Delay in performing task. | Very Low | Not Critical | Ensure AMTs follow work schedule |
Drain Oil | No major failures that affect process outcome | AMT Fails to Perform Task | Delay in performing task. | Very Low | Not Critical | Ensure AMTs follow work schedule |
Cut and Remove Lock Wire from Chip Detector | No major failures that affect process outcome | AMT Fails to Perform Task | Delay in performing task. | Very Low | Not Critical | Ensure AMTs follow work schedule |
Examine Chip Detector | AMT fails to notice debris on chip detector. | AMT Fails to Properly Perform Task | Engine could fail if chips are not properly detected. Added cost to replace damaged chip detector. |
Moderate | Critical | Training, procedures, and inspection oversight |
Clean Chip Detector | AMT fails to properly clean chip detector | AMT Fails to Properly Perform Task | Engine could fail if chips are not properly detected. | Moderate | Critical | Training, procedures, and inspection oversight |
Replace Chip Detector | AMT fails to properly install chip detector | AMT Fails to Properly Perform Task | Debris could be placed back into engine. | Moderate | Critical | Training, procedures, and inspection oversight |
Lock Wire Chip Detector | AMT fails to properly lock wire chip detector | AMT Fails to Properly Perform Task | Oil could leak past chip detector. Threads of chip detector could be damaged. |
Moderate | Critical | Training, procedures, and inspection oversight |
Replace Oil Drain Plug | AMT fails to properly install oil drain plug | AMT Fails to Properly Perform Task | Chip detector could become lose and fall out, leading to loss of engine oil. | Moderate | Critical | Training, procedures, and inspection oversight |
Lock Wire Oil Drain Plug | AMT fails to properly lock oil drain plug | AMT Fails to Properly Perform Task | Engine oil could leak out. Oil drain plug could become damaged. |
Moderate | Critical | Training, procedures, and inspection oversight |
Replace Oil | AMT fails to properly replace oil | AMT Fails to Properly Perform Task | Oil drain plug could become loose and fall out. Oil drain plug could become damaged. |
Low | Critical | Training, procedures, and inspection oversight |
Engine could fail. |
|
|
This result of this analysis tells us that the probability derived for a fire in which the fuel feed system stops fuel supply to engine actuates and the consequence in minimal damage is approximately 1/1000 or 1X10-3. The probability derived for a fire in which the fuel feed system fails to actuate, but the fire suppression system successfully extinguishes the fire and there is only moderate damage is 1E-6 or 1X10-6. Finally, the probability that a fire occurs and both the fuel feed system fails and fire suppression system fails and severe damage occurs is 1E-8 or 5X10-8.
This approach is considered inductive in nature. Meaning the system uses forward logic. A fault tree, discussed below, is considered deductive because usually the analyst starts at the top event and works down to the initiating event. In complex risk analyses event trees are used to describe the major events in the accident sequence and each event can then be further analyzed using a technique most likely being a fault tree (Modarres, M., 2006).
As indicated, the fault tree begins at the end, so to speak. This top-down approach starts by supposing that an accident takes place (Vesely, William; et. al., 2002). It then considers the possible direct causes that could lead to this accident. Next it looks for the origins of these causes. Finally it looks for ways to avoid these origins and causes. The resulting diagram resembles a tree, thus the name.
Fault trees can also be used to model success paths as well. In this regard they are modeled with the success at the top and the basic events are the entry level success that put the system on the path to success.
The goal of fault tree construction is to model the system conditions that can result in the undesired event. Before construction of a fault tree, the analyst must acquire a thorough understanding of the system. A system description should be part of the analysis. The analysis must be bounded, both spatially and temporally, in order to define a beginning and endpoint for the analysis. The fault tree is a model that graphically and logically represents the various combinations of possible events, both fault and normal, occurring in a system leading to the top event. The term “event” denotes a dynamic change of state that occurs to a system element. System elements include hardware, software, human, and environmental factors (Vesely, William; et. al. 2002).
Table 8 shows the most common fault tree symbols. These symbols represent specific types of fault and normal events in fault tree analysis. In many simple trees only the Basic Event, Undeveloped Event and Output Event are used.
|
Events representing failures of equipment or humans (components) can be divided into failures and faults. A component failure is a malfunction that requires the component to be repaired before it can successfully function again. For example, when a turbine blade in an engine breaks, it is classified as a component failure. A component fault is a malfunction that will “heal” itself once the condition causing the malfunction is corrected. An example of a component fault is a switch whose contacts fail to operate because they are wet. Once they are dried, they will operate properly.
Output events include the top event, or ultimate outcome, and intermediate events, usually groupings of events. Basic events are used at the ends of branches since they are events that cannot be further analyzed. A basic event cannot be broken down without losing its identity. The undeveloped event is also used only at the ends of event branches. The undeveloped event represents an event that is not further analyzed either because there is insufficient data to analyze or because it has no importance to the analysis.
Logic gates are used to connect events. The two fundamental gates are the “AND” and “OR” gates. Table 9 describes the gate functions and also provides insight to their applicability.
There are four steps to performing a Fault Tree Analysis:
Defining the problem
Constructing the fault tree
Analyzing the fault tree qualitatively
Documenting the results
|
A top event and boundary conditions must be determined when defining the problem. Boundary conditions include:
System physical boundaries
Level of resolution
Initial Conditions
Not allowed events
Existing Conditions
Other Assumptions
Top events should be precisely defined for the system being evaluated. A poorly defined top event can lead to an inefficient analysis.
Construction begins at the top event and continues, level by level, until all fault events have been broken into their basic events. Several basic rules have been developed to promote consistency and completeness in the fault tree construction process. These rules, as listed in Table 10, are used to ensure systematic fault tree construction (American Institute of Chemical Engineers, 1992).
|
Many times it is difficult to identify all of the possible combinations of failures that may lead to an accident by directly looking at the fault tree. One method for determining these failure paths is the development of “minimal cut sets.” Minimal cut sets are all of the combinations of failures that can result in the top event. The cut sets are useful for ranking the ways the accident may occur and are useful for quantifying the events, if the data is available. Large fault trees require computer analysis to derive the minimal cut sets, but some basic steps can be applied for simpler fault trees:
Uniquely identify all gates and events in the fault tree.
If a basic event appears more than once, it must be labeled with the same identifier each time. Resolve all gates into basic events.
Gates are resolved by placing them in a matrix with their events.
Remove duplicate events within each set of basic events identified.
Delete all supersets that appear in the sets of basic events.
By evaluating the minimal cut sets, an analyst may efficiently evaluate areas for improved system safety. The analyst should provide a description of the system analyzed, a well as a discussion of the problem definition, a list of the assumptions, the fault tree model(s), lists of minimal cut sets, and an evaluation of the significance of the minimal cut sets. Any recommendations should also be presented. An example fault tree for the engine fire example is shown in Figure 3.
4. Summary
This chapter discussed how common risk assessment techniques could be used to perform risk assessments of aviation related activities. As discussed in the very beginning paragraph of this chapter, Ostrom and Wilhelmsen (2011) discuss in depth how to use risk assessment techniques to analyze a wide variety of systems, tasks, and activities.
References
- 1.
American Institute of Chemical Engineers., 1992 New York. - 2.
Mil Std 882B, C, 1984 and 1993 - 3.
Modarres M. 2006 Risk Analysis in Engineering: Techniques, Tools, and Trends, CRC Press; 1 edition,1-57444-794-7 - 4.
Ostrom L. Wilhelmsen C. Summer . 2011 Risk Assessment Tools and techniques and Their Application, in Process. - 5.
Vesely William. et al. 2002 pdf). . National Aeronautics and Space Administration. http://www.hq.nasa.gov/office/codeq/doctree/fthb.pdf.Retrieved 2010-01-17.