Probabilistic Modeling of Failure

Failure of a system or a component of a system is and has been a major concern to systems ’ operators and owners. Failure could be traced back to different causes and may take different forms and shapes. It may result from software malfunction, hardware degraded performance, human error, sabotage, environmental as well as other external factors. There are various techniques found in the literature that can assist in the analysis of failure. These techniques comprise deterministic and probabilistic techniques. Deterministic techniques ignore the variability and uncertainties of the variables in the analysis which may lead to unsatisfactory and inaccurate results. While probabilistic techniques produce accurate and an all-inclusive result because they incorporate the variabilities and uncertainties in the analysis. The focus of this chapter is to present commonly used probabilistic failure analysis techniques and their mathematical derivations. Examples to enhance the understanding of the concept of failure analysis are also presented.


Introduction
Traditionally, failure analysis is conducted using deterministic techniques to assess the operability and integrity of industrial systems. These techniques lack the ability to report or predict the probabilistic nature of the systems' behavior. Moreover, they ignore the probabilistic and random nature of the external factors that have direct impact on the performance of the systems. Implementing these techniques may produce inadequate assessment and eventually, may lead to wrong decisions concerning the integrity and reliability of the evaluated systems. To make an informed and reliable decision about the reliability and operability of such systems, probabilistic failure analysis should be adopted as an alternative analysis technique. This technique should be made as an integral part of the decision process as well as be a part of the overall organization's risk control.
This chapter presents techniques that assist in the analysis of failure using engineering probabilistic methods. They include simulation as well as analytical methods. Simulation methods can be conducted using Monte Carlo simulation technique. Two different Monte Carlo simulation approaches are presented in this chapter. These are, the counting approach and sample statistics approach. The main drawback with simulation is that it takes great deal of time to perform and may require an extensive processing power. However, it is an essential step in the analysis to validate the results obtained by the analytical methods.
Some of the analytical methods include first order reliability method (FORM) and second order reliability method (SORM). FORM involves two approaches to calculate probability of failure, these are first order and second moment (FOSM) and advanced FOSM.
The focus of this chapter will be only on the FORM with the assumption that all random variables are uncorrelated. Analysis that require the use of FORM for correlated random variables is beyond the scope of this chapter. Likewise, analysis requiring the application of SORM to analyze limit state functions involving second order representation is beyond the scope of this chapter.

Failure modeling
Failure can be defined as the inability of industrial systems or subsystems either partially or totally to satisfy operational requirements as set forth by design specifications. Failure of a system could be partial or complete; in either case the consequences of failure may result in adverse consequences. Interruption of services, degraded performance, system shutdown, environmental damage and customers' dissatisfaction are some of the consequences. Such consequences may lead to financial losses, liabilities and destroyed image of the operating company. As an example, if failure involves leak detection system to detect oil and gas leakage from subsea oil and gas pipelines, consequences could be so severe. Pollution of the ocean, damage to the fishery and tourism industries are some of the major consequences.
The system fails when the imposed demand or load on the system exceeds its capacity or resistance. The strength or the capacity of the system is a design parameter that specifies the maximum load the system can endure or the maximum demand the system can satisfy. The variabilities of the system's capacity to satisfy the demand or load imposed on it are mainly attributed to the inherent uncertainties of the operation characteristics of the system's components as well as external environmental factors. Therefore, the capacity of the system is assumed to be probabilistic in nature that varies from time to time due to the reasons mentioned above. Likewise, using same argument the load or demand imposed on the system are considered probabilistic in nature due to the effect of the varying environmental conditions.
Considering the above, the performance function of the system or sometime is called limit state function can be formulated as the difference between the system's capacity and the load or demand imposed on it. The same argument can be used for production facility, the performance is the difference between supply and demand, supply being the capacity, or the strength and demand is the load. If the two parameters are the same then it can be said that the system is at a limit state, if the system cannot meet the demand then the system is at a failure state, and if the system capacity exceeds the load imposed on it, the system is at a satisfactory state.

Reliability analysis methods
Knowing in advance when the system is going to fail or degrade in performance is an essential step in the failure analysis. Under this step, the probability of failure is calculated in terms of the random variables affecting the performance of the system. There are several approaches found in the literature that can be used to evaluate the probability of failure either analytically or by simulation. Analytical methods approximate the probability of failure by using first order reliability method (FORM) or second order reliability method (SORM). The FORM uses two approximation techniques that evaluates the probability of failure, these are the first order and second moment (FOSM) and advanced first order second moment (AFOSM) techniques.
Calculating the probability of failure based on the methods mentioned above can be used to predict the ability of the system to satisfy operational as well as safety requirements during its life cycle. Combining this analysis with risk analysis, the consequences of failure can be easily determined. First order reliability method consists of two techniques namely:

First order second moment (FOSM)
FOSM makes use of second moment statistics (mean and variance) and ignores higher moments (skewness and kurtosis) of the random variables. It evaluates the performance function by using the first order Taylor series expansion of the limit state function (LSF) at the mean value. This method is used when the performance function is linear having statistically independent, normally distributed and noncorrelated random variables X 0 i s. Performance function can be defined as [3,5,10]: where C is the capacity and D is the demand are statistically independent random variables and are assumed to be normally distributed. Failure occurs when: Then the probability of failure (P f ) can be computed as: Or Figure 1 shows the probability density function (PDF) of the random variable Z, as it can be depicted from the figure that the probability of failure comprises the shaded area where Z < 0.  The probability of failure is expressed as [3,5,10]: Alternatively, the performance function Z can be formulated in terms of many random variables designated by a vector X as [3,5,10]: X 1 ,…:X n are the random variables in the performance function. The integration of the performance function as indicated in Eq. (5) is performed for the region where Z , 0, this type of integration is difficult to solve, alternatively Taylor series expansion is used. The first order Taylor series approximation about the mean of the random variables is shown in Eq. (7). The expansion is truncated at the linear terms to obtain the first order approximation of the performance function [7][8][9].
Then the mean and variance are given as: The reliability index (β) is taken as ratio of the mean to the standard deviation of the performance function.
The reliability index is computed for every failure mode, where the probability of failure is expressed as: This method is simple to use and assumes that the random variables are normally distributed. All is needed for the calculation is the knowledge of the mean and the standard deviation and it is not necessary to know the distribution of the random variables. The downside of this method is that it can cause error in the final results if the function is nonlinear or if the tail of the distribution cannot be approximated by normal distribution. Moreover, if the function is nonlinear it will be provided different answer than that if it is linear. Advanced FOSM is used to deal with the limitations of the FOSM mentioned above.

Advanced first order second moment method (AFOSM)
AFOSM provides solution for linear and nonlinear performance function by determining the shortest distance from the origin to the failure surface. This method is also called Hasofer-Lind method. It evaluates the probability of failure for the limit state function or the performance function by determining the most probable failure point instead of the mean. Hasofer and Lind developed this advanced method in 1974 which is called Hasofer-Lind method and is abbreviated as H-L method. As stated above, the main objective of this method is to estimate the failure point which is the shortest distance from the origin to the failure surface that separates the failure region from the safe region. This can be clearly shown in Figure 2. The failure point is sometimes called in the literature design point or check point, but in this chapter, it will be referred to as the most probable point of failure (MPPF). Let us consider a limit state function/performance function with normally distributed and independent random variables X as: This method transforms the random variables into reduced form as: The performance function is then formulated in terms of the reduced random variables as: Figure 2 shows the plot of the limit state function in the original as well as the transformed coordinates. It shows that the MPPF is the tangent point on the curve Z X ð Þ ¼ 0 and the reliability index β as the shortest distance from the origin to the limit surface.
To find the MPPF x 0 i on the limit surface under the condition that Z X ð Þ ¼ 0, Taylor series expansion is used around the MPPF, considering the first order terms only, this gives: Using chain rule for derivative and considering that relationship between U and X as: and using Eqs. (14) and (15) the partial derivative ∂g U * ð Þ ∂u i becomes: Substituting Eq. (19) into Eq. (17) gives: 6

Failure Analysis
The mean of Z U ð Þ is: The variance is expressed as: It must be noted that constants have no variance, their variances equal to zero. The first term of the Taylor expansion in Eq. (27) is constant; therefore, its variance equals to zero. Similarly, the variance at the mean value is zero.
The reliability index is calculated as: The directional cosine α i along the coordinate axes is computed as: It can be shown from Figure 2 that: Using Eqs. (14), (15) and (36) we can determine the design point in the original coordinates as: The probability of failure, P f can be computed as: The steps to estimate the reliability index are: 1. Formulate the performance function in terms of the original random variables, x i : Assume the initial design points as the given mean of each variables.
3. Compute the initial reliability index β in terms of the mean values of the random variables using Eq. (11).

4.
Compute the partial derivatives of the limit state function (LSF)/performance function in terms of mean value of the random variables.

7.
Compute the LSF in terms of new design points.
8. Compute the partial derivatives at the new design points.
a. Another alternative is to compute β from the limit state function using the newly determined design points in step 7 and solve for β.
11. Repeat steps six through nine until β converges to a pre-established tolerance level.
12. Use Eq. (39) to calculate the probability of failure.
The steps mentioned above are used with the assumption that the random variables are normally distributed. For non-normally distributed variables additional steps are needed to determine the mean and standard deviation of the equivalent normal distribution as listed below. These steps should be carried out after step number two to determine mean and standard deviation of the equivalent normal distribution. Assuming the random variables are statistically independent and non-normally distributed: 1. Determine the distribution parameters.
2. Compute the cumulative distribution function cdf, F x i ð Þ, the probability density function pdf, f x i ð Þ and the inverse cdf, Φ À1 F x i ð Þ ½ of the original non-normal random variables at the initial design point.

3.
Compute the values of the standard deviation, σ x i and the mean, μ x i of the equivalent normal distribution as: It must be noted that f x i ð Þ refers to the pdf of the original non-normal random variable and ϕð Þ refers to the pdf of the equivalent standard normal random variable.
4. Compute the standard normal variable (u i Þ of x i by: For a log normally distributed random variable, distribution parameters μ ln x ð Þ , σ ln x ð Þ are defined using the following equations: The pdf and cdf are defined as: For other distribution types the readers are referred to Refs. [4,7,8].

Simulation methods
Alternatively, the probability of failure is computed using Monte Carlo simulation method. Two methods are considered in this chapter, the counting and sample statistics methods. The simulation is conducted using computer programs such MATLAB, C++ or MINITAB or any other simulation programming packages.

Monte Carlo counting method
The counting method is formulated by dividing the number of simulation cycles at the events when the Z function becomes less than 0 (N f , 0Þ by the total number of simulation cycles (N).
The steps for Monte Carlo simulation counting method are listed below: Counting method 1. Formulate the performance function in terms of the original random variables, x i : Z x 1 ; x 2 ; ⋯⋯⋯⋯⋯⋯; x n ð Þ .

Determine the distribution and its parameters for each random variable.
3. Assign N to be the number of simulation cycles.

4.
Assign M to be the number of calculation times.

5.
Assign Nf to be the number of simulation cycles when the Z function becomes less than 0. N is number of simulation cycles, mu and var. are the distribution parameters (the mean and standard deviation of the random variable).
b. If the random variable is log normally distributed with a mean, μ x i and standard deviation, σ x i : i. Determine the distribution parameters; Eqs. (43) and (44) can be used use to calculate these two parameters.
ii. Use the following Matlab function to generate the random values: 8. Calculate Pf: a. Is Z < 0?
a. If no-go to step 7 b.If yes-calculate Pf ¼ Nf N Alternative approach to random number generation is to use the following steps: • Generate uniformly distributed random numbers u i between 0 and 1; this can be accomplished by using software packages such as MATLAB, excel or C++ and other software packages.
• Equate the inverse cumulative distribution function cdf of the random variable to the generated random numbers, u i by using the following equation: As an example, if x i follows normal distribution with a mean μ x i , and standard deviation σ x i then its random number becomes: can be determined using the cdf of the standard normal distribution, these tables are included in most probability and statistics books. The random numbers can be generated N times using the following MATALB function:

Monte Carlo sample statistics method
The Monte Carlo sample statistics methods considers the mean μ z and the standard deviation σ z in computing the reliability index.
The steps for Monte Carlo simulation sample statistics method are listed below: Sampling method 1. Formulate the performance function in terms of the original random variables, x i : Z x 1 ; x 2 ; Á Á ⋯⋯⋯⋯⋯⋯; x n ð Þ .

2.
Determine the distribution and its parameters for each random variable.
3. Assign N to be the number of simulation times.

4.
Assign M to be the number of calculation times.

Initialize:
a. Set M to zero b.Set Nf to zero 6. Generate random values from the given distribution with the determined distribution parameters for each variable.
8. Calculate Pf: i.If no-go to step 6 ii.If yes-stop It must be noted that Monte Carlo sample statistics method can be used only for linear functions having uncorrelated normal random variables.
b.Determine the probability of failure using the analytical methods: FOSM and AFOSM (Hasofer-Lind) methods.

Solution:
A. Monte Carlo simulation was conducted for both methods, the sample statistics and counting methods. The number of simulation cycles used in the analysis are 2e5 and 1e6 cycles ( Table 1). Table 2.

Example 2
The performance function for a leak detection system has been formulated as [2]: b.Determine the probability of failure using AFOSM method.
This example is adopted from reference [1,2], with some modifications. Solution Part a Monte Carlo simulation was conducted for both methods, the sample statistics and counting methods as indicated in the table. This example adopted from [9]. Table 2.

Part b
First the Z function is computed in terms of the mean values of the random variables: The partial derivatives: Iteration 1 u 1 ¼ βα x 1 ¼ 3:2261 * 0:75947 ¼ 2:45013 u 2 ¼ βα x 2 ¼ 3:2261 * 0:65054 ¼ 2:09872 The probability of failure obtained by counting method is very close to that obtained by AFOSM ( Table 4).
Example 3 A pipeline segment is suffering corrosion that grows annually at steady rate. The extent of the initial growth has been estimated to be 4.7 mm and it is assumed that it follows log normal distribution with standard deviation of 1.1.
The corrosion annual growth follows log normal distribution with a mean and standard deviation values of 0.2 and 0.01. The pipeline wall thickness follows normal distribution with a mean and standard deviation values of 14 mm and 4.7 respectively. The critical pipeline wall thickness has been determined to be 80% of the wall thickness [3]. A summary of the relevant information pertaining to the pipeline corrosion is presented in the net table, Table 5.
The owner of the pipeline decides not to repair the corrosion and wants to know if the pipeline can survive for the next 14 years without causing a leakage. It has been decided that in order to be in the safe side the maximum acceptable probability of failure has been set to 1e4 [3].
Solve this problem using analytical method as well as Monte Carlo (MCS) simulation method.
Pipeline wall thickness: Initial corrosion: Corrosion growth rate: ð Þ ¼ À1:61069 Monte Carlo simulation produced the following results as shown in Table 6. The probability of failure and beta converge to the following values: Here only the counting method is used because the sampling method produces different results. The sampling method produces accurate results for linear and normal limit state/performance function only.
Analytical Solution Assume the initial value for each random variable to be its mean.
Pipeline wall thickness, x 1 : Initial corrosion, x 2 : Compute the pdf of the original non-normal variable (log normal distribution) using Eq. (45):  Table 6.