Open access peer-reviewed chapter

Reliability Testing and Verification

Written By

Jaroslav Menčík

Reviewed: February 3rd, 2016 Published: April 13th, 2016

DOI: 10.5772/62377

Chapter metrics overview

1,519 Chapter Downloads

View Full Metrics


This chapter describes various methods for reduction of uncertainties in the determination of characteristic values of random quantities (quantiles of normal and Weibull distribution, tolerance limits, linearly correlated data, interference method, Monte Carlo method, bootstrap method).


  • Random quantity
  • uncertainty
  • normal distribution
  • Weibull distribution
  • tolerance limits
  • correlation
  • interference method
  • Monte Carlo method
  • bootstrap method

Reliability tests are often indispensable. The material properties, needed in design, can only sometimes be found in data sheets. If they are not available, they must be obtained by testing, for example the strength of a new alloy or concrete or the fatigue resistance of a vehicle part. Also, the manufacturers of electrical components must provide the reliability data for catalogs (e.g. the failure rate and the data characterizing the influence of some factors, such as temperature or vibrations). It is also impossible to predict with 100% accuracy the properties of a new bridge, an engine or a complex system consisting of many parts, whose properties vary more or less around the nominal values. In all these cases, tests are often necessary to verify whether the object has the demanded properties or if it conforms to the standards. Also, the information on loads (e.g. wind velocities in an unknown area) must often be obtained by measurement.

The reliability tests can be divided into two groups: those for providing detailed information on properties of new materials or components, and those for the verification of the expected values. The former are more extensive, as they must provide the mean value and statistical parameters characterizing the random variability. The extent of verification tests is smaller.

In this chapter, the reliability tests of mass-produced components will be described first, followed by the tests of large or complex structures or components and the tests of strength and fatigue resistance.

1. Testing of mass-produced electrical and mechanical components

The most important reliability characteristics are the mean failure rate and the mean time to failure or between failures. The tests can be done so that several components are loaded in a usual way (e.g. by electric current), and the times to failure of individual pieces are measured. As the time to failure of some samples can be very long, the test is sometimes terminated after failure of several pieces, at time tt. The total cumulated time of operation to failure is calculated generally as

ttot=tf,j+mtt,j= 1, ...,r,E1

where tf,j is the time to failure of j-th piece, r is the number of failed specimens, and m is the number of pieces that have survived the test, whose duration was tt. The total number of all checked samples is n = r + m. If all pieces have failed during the test, m = 0 and the term mtt falls out. (Also other test arrangements are possible, for example with replacing the failed pieces by good ones; see [1] or the corresponding IEC standards listed in Appendix 2. The mean time to failure is calculated as


The individual times to failure vary, and this must also be characterized. If failures occur due to various reasons, an exponential distribution of times to failure is often assumed. A simple check of this is the standard deviation σ. For exponential distribution, the standard deviation has the same value as the mean μ (in an ideal case; in real tests it can somewhat differ). If the difference between μ and σ is larger, a statistical test should be made to check whether the exponential distribution is suitable. Common for this purpose are the goodness-of-fit tests (e.g. Kolmogorov-Smirnov or the χ2 test); see [2 - 4]. If exponential distribution is not suitable, another distribution can be better, e.g. Weibull.

If an exponential distribution is acceptable, the estimate of the mean failure rate can be obtained easily as the reciprocal value of the mean time to failure,

λ¯= 1/MTTF.E3

The two-sided confidence interval for the true mean failure rate λ is [1]:


if the testing was terminated at the rth failure, and


if the testing continued come time after the rth failure. In Equations (4) and (5), λ is the calculated mean value of λ, the subscripts L and U denote the lower and upper confidence limit, χ21 –α/2(2r) is the (1-α/2)-critical value of the chi-square distribution for 2r degrees of freedom, χ2α/2(2r) is the α/2-critical value for 2r degrees of freedom, and χ2α/2(2r+2) is the α/2-critical value for 2r+2 degrees of freedom. The probability that λ will lie within this confidence interval is γ = 1 – α. Often, we are interested only in the maximum expectable failure rate; the pertinent formula for the upper limit of one-sided interval is


the probability that the actual failure rate will be higher is now α. As the mean time to failure failure is the reciprocal of the failure rate, the corresponding two-sided confidence interval for the mean time to failure is obtained as


if the test was terminated after the rth failure (and analogously for a longer test).

The determination and importance of confidence limits will be illustrated on the following examples.

Example 1

Ten electrical components were tested to determine the failure rate. The tests were terminated after tT = 500 h. During this time, six components failed (r = 6), in times: 65, 75, 90, 120, and 410 h. Four components survived the test. Estimate the mean time to failure and failure rate and construct two-sided confidence intervals (for confidence α = 90%).

Solution. The mean value and standard deviation of the times to failure of the six failed components were 168.66 and 136.33 h, respectively. It is thus possible to assume exponential distribution.

The cumulated duration of tests, calculated after [1], was:

ttot=i=16ti+4×tt= 60 + 75 + 90 + 120 + 250 + 410 + 4×500 = 3010 h.BB1

The mean time to failure is tmean = ttot/r = 3010/6 = 501.67 h, and the mean failure rate is λmean = λ¯ = 1/tmean = 1/501.67 = 1.993 × 10-3 h-1.

The lower and upper confidence limits for λ were calculated, with respect that the tests were terminated before the failure of all samples, according to Equation (5). With r = 6 and α = 10%, the critical values are χ2 0.95(12) = 5.226 and χ2 0.5(14) = 23.685. Inserting them, together with λmean = 1.993 × 10-3 h-1 into (5) gives λL = 8.68 × 10– 4 h– 1 and λU = 3.93 × 10–3 h– 1. The confidence limits for the mean time to failure are tL = 1/λU = 254.4 h and tU = 1/λL = 1152.1 h. The mean time to failure thus can lie within the interval tmean ∈ (254 h; 1152 h).

As we can see from this example, the confidence interval obtained from only six failures is very wide. If it should be narrower (to get more accurate estimate), it is necessary either to make a longer test so that more parts of the tested group fail or to increase the number of parts tested simultaneously or both.

Example 2

The above testing has continued until the time tt = 1000 h. During this time, two more pieces failed, at the times t7 = 520 h and t8 = 760 h.

Solution. The same procedure as above has given the following results: Τ = 4290 h and r = 8, so that the mean time to failure is now tmean = tt /r = 4290/8 = 536 h and the mean failure rate λmean = 1/536 = 1.865 × 10– 3 h– 1. Also, the confidence interval will respect that more pieces have failed. The critical values now are χ2 0.95(16) = 7.962 and χ2 0.5(18) = 28.869. With all these values, the lower and upper limits of failure rate are λL = 9.28×10– 4 h– 1 and λU = 3.4×10– 3 h– 1. The mean time to failure tmean thus can be expected to lie within the interval (297 h; 1078 h).

The whole test lasted twice as long as the previous one, but the new confidence interval is only slightly narrower. If significantly more accurate estimates should be achieved, much longer tests or with substantially higher number of tested pieces must be done. Thus, when preparing the tests for the determination of failure rate, one should estimate in advance the duration of the test, the number of tested pieces, and the number of pieces that can fail — all this for the acceptable probability α that the actual maximum failure rate would be higher than that obtained from the test.

The rearrangement of the expression for the upper limit of confidence interval for λ gives the following relationship between the expected failure rate λ0, the number of tested samples n, test duration tt, and the number of failed components r [1]:

ntt=χ2α(2r) / (2λ0) .E8

If the number of failed samples does not exceed r, the actual failure rate is not higher than λ0, the risk of wrong prediction being α.

As it follows from the product n × tt in Equation (8), the number of tested parts n is equivalent to the test duration tt. This means that the same information can be obtained by testing, for example, 10 specimens for 1000 h or 1000 specimens for 10 h. If the tested objects are expensive, one would prefer testing fewer specimens for longer time. However, at least several pieces should always be tested to reduce the risk that the only piece chosen at random for the test was especially good or especially bad.

The following table, based on Equation (8), shows the values of the product n × tt for the various numbers of failed parts during the tests; the probability of a wrong result is α = 10%.

λ0(h-1) n × t for r = 3 n × t for r = 5
0.001 5322 7994
0.0001 53,223 79,936
0.00001 532,232 799,359
0.000001 5,322,320 7,993,590

Table 1.

Extent of tests for various failure rate and the number of failed pieces.

For example, the reliability testing of components with assumed exponential distribution, failure rate λ = 10-4 h-1 and the test terminated after the fifth failure, needs n × tt = 79,936 ≈ 80,000 pieces × hour. Thus, for example, 100 components should be tested 800 h or 800 components for 100 h. If the expected failure rate were λ = 10-6 h-1, then n × tt ≈ 8,000,000 pieces × hour, so that 10,000 components must be tested for 800 h or 100 components for 80,000 h. One can see that testing for proving the reliability of very reliable components becomes very difficult or impracticable. Therefore, various accelerated tests are often used. One way, suitable for the items working periodically with pauses between the operations, such as switches or valves, eliminates the idle times: the switch is permanently switched on and off.

Another way to obtain the demanded reliability information sooner uses a higher intensity of load (e.g. higher mechanical load, higher electric stress or electric current, or more severe environment (e.g. higher temperature or vibrations). If this approach should be effective, one must know the mechanism of degradation and the relationship between the load intensity and the rate of degradation. For example, the rate of chemical processes, which are the cause of some failures, often depends on the temperature according to the Arrhenius equation:


C is a constant, ∆E is the activation energy, k is the Boltzmann constant, and T is the absolute temperature (K). If the times to failure have exponential distribution, the failure rates or times to failure are related with the absolute temperatures as follows [1]:


Equation (10) can be used for the determination of necessary temperature change from T1 to T2 if the test duration should be reduced from t1 to t2.

Similarly, the number of cycles to fatigue failure of periodically loaded components can be reduced by increasing the characteristic stress or load amplitude P. The basic relationship, based on the Wöhler-like curve [Equation (1) in Chapter 6], is


C and n are constants for a given material and environment. Similar relationships can be used for finding the increased load for shortened tests of components exposed to creep or static fatigue (stress enhanced corrosion), with rates depending on some power of the load.

Today, mass-produced electronic and electrical components are tested in special chambers and under special conditions enabling acceptably short duration of the tests. More about these tests, denoted HALT (for highly accelerated life testing) or HASS (for highly accelerated stress screening), can be found in the literature, for example [5].

Sorting tests

These tests aim at sorting out “weak” items that could fail shortly after being put into service. However, they must not cause excessive degradation of properties in “good” components (i.e. they should not shorten their life significantly). Sorting tests can be nondestructive or destructive. Nondestructive tests use visual observation, X-ray, ultrasound or magnetic inspection, and special electrical or other measurements. Destructive tests can be arranged in several ways, for example proof tests that use short-time overloading by mechanical or electrical stress exceeding the nominal value so that the weak parts are destroyed during the test. Other ways for revealing the weak parts are artificial aging under increased temperature, cyclic loading by varying temperatures (this causes additional thermal stresses that can reveal hidden defects or weak joints), the use of burn-in period with 75% to 100% of nominal load acting several tens of hours before putting into service, special kinds of mechanical loading, such as impacts, vibrations of certain amplitude, and frequency, overloading of rotating parts by centrifugal forces and others.


2. Acceptance sampling

This operation, common in series production, ensures that only those batches of items will be released to the customer or to the next operation, which are either perfect or contain only very small proportion of out-of-tolerance parts. Before this control is introduced, a test plan must be prepared, which contains:

  1. Kinds of monitored indicators,

  2. Number of tested items,

  3. Duration of the tests,

  4. Criterion for the decision on the acceptance.

Generally, three approaches are used:

  1. 100% control. Every component or item must pass the inspection, and those that do not fulfill certain parameters are discarded. This control is most expensive, but it should be the safest. Nevertheless, if the evaluation depends on human senses (visual check, for example) here, a small probability of erroneous decisions also exists. The inspector can overlook a defect or, vice versa, he can denote a good item as defective, especially if the number of tested items is very high, which can lead to his fatigue. An example is the check for internal flaws using X-rays, with the images interpreted via observation by naked eye. Also, 100% control cannot be done if every test ends with the destruction of the tested piece, even if it is good (e.g. the check of the airbag deployment system in cars).

  2. Random inspection. Only several pieces, chosen at random, are tested (e.g. 1% of the batch). The entire lot is accepted or rejected according to the result of the inspection. This kind of acceptance is much less demanding than 100% control, but it has been criticized that it is rather subjective and not sufficiently reliable. If, for example, a batch of 10,000 pieces contains 1% of defective piece and it was decided that 1% will be tested, then 100 pieces must be checked. One percent of 100 is one piece. However, it can happen that the checked sample will contain not exactly one defective piece, but two or three or even none. This uncertainty has led to the development of the following method based on the probability theory.

  3. Statistical acceptance. Several variants exist. The principle will be explained on the so-called single sampling plan. A sample of n pieces is taken at random from the lot and tested. The number z of out-of-tolerance pieces, found in the sample, is compared with the so-called decisive number c. If it is lower, z < c, the whole lot is accepted; if z > c, it is rejected. The values of c for various expected proportions p of unsuitable pieces and extent n of the tested sample can be found in the standards for statistical acceptance [6] or calculated, with a consideration of further important parameters, AQL (acceptable quality level) and LQL (limiting quality level, also called the lot tolerance percent defective, LTPD). LQL gives the maximum fraction of defectives, acceptable, on average, in the batches denoted as good. The principle of determination of the decisive number c is as follows. If the fraction of defectives in the population is p, the number z of the defectives that can appear in a random sample of size n has binomial or Poisson’s distribution (for low probabilities p in the latter case). It is thus possible to calculate the cumulative probabilities for z = 0, 1, 2, 3,... The decisive number c is such that only very low probability β exists that the lot, whose test has given zc, will contain higher percentage of defective than LQL. The probability β is called customer’s risk and means the risk that an unsatisfactory lot will be accepted as good. On the other hand, also a producer’s risk α exists, such that a good lot, with less defective pieces than AQL, will be rejected. Usually, 5% or 1% is chosen for both α and β.

The curve showing how the probability of accepting the lot decreases with increasing proportion of defectives in the sample is called the operating characteristic curve (OCC). Figure 1 shows examples of OCCs for two different decisive numbers.

The rejected batch is either discarded or 100% checked. In the latter case, the good pieces are added to other good items. This makes the average quality of the batches composed in this way better, so that the quality demands in the tests may slightly be reduced.

Figure 1.

Operating characteristic curve (OCC). P - probability of acceptance; p - percentage of defectives in the population; α – producer’s risk; β – customer’s risk. Subscripts 1 and 2 denote curves OCC1 and OCC2.

Also other schemes exist. For example, a double sampling scheme uses two decisive numbers, c1 and c2. If the number z of defectives in the first sample is smaller than c1, the lot is accepted, and if it is higher than c2, it is rejected. If c1 < zc2, another sample is taken and the total number of defective in both samples is checked, etc. Further modifications, such as multiple sampling or sequential sampling, exist as well. For more, see [6].

However, doubts are sometimes cast on the cost-effectivity of statistical control. On the one hand, this control costs money. On the other hand, losses can arise due to possible defective pieces hidden in the batches checked as good. Deming [7] has pointed out that if the cost for inspection of one piece is k1 and the average cost of a failure caused by not inspecting is k2 and the average fraction of defective is p, then, if pk2 < k1, the lowest total costs (control costs plus costs caused by failures) will be achieved without any testing. If pk2 > k1, full (100%) inspection should be used, especially for higher ratios pk2/k1. However, the situation is often not so simple; the fraction p of defectives can vary, 100% testing can be impracticable for too high investment costs or if all tests end with destruction, etc.

The statistical acceptance was very popular in the second half of the 20th century but not so much today. There are two reasons: the demands on quality and reliability are much higher today than 50 years ago and the allowable probabilities are often of the order 1:106, much lower than the degree of confidence common in statistical sampling. Moreover, the controlling devices are much more powerful today. The incorporation of automated test equipment (ATE) into production line enables 100% control.


3. Testing of large structures and complex components

These tests will be illustrated on two cases: bridges and large components exposed to fatigue, such as parts of heavy vehicles (e.g. locomotives).

The assumed service life of road and railway bridges is many tens of years and sometimes more. During this time, the structure deteriorates and its safety decreases. Also, the loading pattern can change in a long time (new kinds of vehicles and changed traffic demands). For these reasons, bridges must sometimes be repaired or reconstructed. In such case, thorough inspections are done at suitable time, including load tests in important cases. In these tests, the bridge is usually loaded by a group of trucks loaded by sand or concrete blocks as much as possible so that the load-carrying capacity of the bridge is attained. During the tests, deformations and stresses at selected points are measured and compared with the values obtained by computer analysis of the structure – to see if the actual response (e.g. deflection of some parts of the bridge) corresponds to the assumed response. In some cases, dynamic properties are also studied (i.e. the response to periodic or dynamic loading). If the actual condition is worse than allowed, measures must be taken for improvement.

Large parts of mechanical structures, such as vehicles or aircrafts (sometimes these objects as a whole), are mechanically loaded with the purpose to find whether the actual response (deformations and stresses at selected points) corresponds to the values assumed in design. Also, dynamic response is investigated. Exceptionally, the object is loaded until the destruction. In the past, the measurements were often the only reliable source of information of the stresses and behavior. Today, the methods of stress analysis are much better and much information can be obtained by computer simulation as early as in the design stage. Therefore, today the tests serve rather for confirmation whether the demanded parameters have been achieved.

The test loads are often imposed by electrohydraulic cylinders attached to the tested object. Often, special test stands are used, consisting of a massive frame with hydraulic cylinders, clamping equipment, and a controlling unit. The work of the stand is controlled by a computer. This enables one to program the demanded loading sequences. Sometimes, the load program is based on a record made during a test vehicle driving on real roads or on a test track containing typical examples of road surfaces. The test vehicle is equipped with sensors (usually strain gauges fixed at certain points of the car body) and the measured data are recorded. These data must be transformed to the data for the control of the load cylinders of the testing stand. The reason is that these cylinders are often attached to the tested structure at different points than were those used in the test vehicle driven on the track. Also, the data recorded with one test vehicle are sometimes used for the testing of other types of vehicles. The test stand can repeat the recorded load sequence again and again, so that also fatigue resistance can be tested in this way.


4. Tests of strength and fatigue resistance

These tests are often arranged according to various standards. In this paragraph, we thus limit our attention to some probabilistic aspects of these tests.

Strength tests. The individual values vary, so that the number of tests should be adjusted to the purpose of the measurement and to the scatter of individual values. If only approximate information on the average strength is needed, three tests may be sufficient; the standard deviation can serve for the estimation of confidence interval of the mean strength. However, especially for brittle materials with high scatter of individual values, the knowledge of the ”minimum“ strength is often demanded. This is determined as a lowprobability quantile. For this purpose, more tests must be done, often several tens or more. From these tests, the parameters of strength distribution are determined. Often, Weibull distribution, but also log-normal distribution, is assumed. The determination of parameters and quantiles of Weibull distribution was described in Chapters 11 and 18. The parameters of log-normal distribution are found in several steps. In the first step, logarithms are taken from the measured values, then the average and standard deviation are calculated from the transformed data, and finally they are transformed back to the original system of units. The question of which distribution is better can be solved by means of statistical tests of goodness of fit [24].

Generally, many values are necessary to obtain reliable values of low-probability quantiles of strength. (Remember that 1% quantile corresponds to the minimum of 100 values.)

Fatigue tests. The main purpose of fatigue tests is the determination of fatigue limit (if it exists) and finding the relationship between the characteristic stress (S) and the time or number of cycles to failure (Nf). As for the fatigue limit, everything from the above paragraph on strength tests remains valid. The S N relationship is obtained by making the tests under various characteristic stress amplitudes and fitting the data by a suitable function, for example [8, 9]:

Nf=A Sw,E12

or a similar expression. Now, two possibilities exist depending on the number of tests that were done or could be done with respect to the available money and time. If only several tests have been performed, all measured Nf(S) values are fitted by the regression function (12). The consequences of the scatter of individual values are depicted in Figure 1 in Chapter 18. The regression function, obtained by the least-squares method, gives such Nf values that probability 50% exists that the true number of cycles to failure under a chosen stress will be 50% lower (!) than the number obtained from the regression function. The ”safe“ Nf,α values, for which acceptably low probability α would exist that the component or construction can fail earlier, may be found as boundary values of the pertinent confidence band for all S-N data; see Chapter 18.

If more values (e.g. tens) are available for each stress level, a more accurate procedure can be used. The data for individual stress levels are rank-ordered in ascending order. Each value corresponds to some quantile of time to failure for a given stress level. For example, the shortest time of 10 values obtained for the same stress corresponds approximately to 10% quantile of the time to failure. Now, only the Nf,α(S) values, corresponding to the same quantile α, are fitted by regression function (12). The “safety” of the prediction of the time to failure with this function equals 1α. It is also possible to fit all measured data by function of type (12) with additional parameters characterizing the probability that the actual number of cycles to failure will be lower than that calculated via modified Equation (12).


  1. 1. Bednařík J et al. Reliability techniques in electronic practice (In Czech: Technika spolehlivosti v elektronické praxi). Praha: SNTL; 1990. 336 p.
  2. 2. Freund J E, Perles B E. Modern elementary statistics. 12th ed. New Jersey: Prentice-Hall; 2006. 576 p.
  3. 3. Suhir E. Applied Probability for Engineers and Scientists. New York: McGraw-Hill; 1997. 593 p.
  4. 4. Montgomery D C, Runger G C. Applied Statistics and Probability for Engineers. 4th ed. New York: John Wiley; 2006. 784 p.
  5. 5. Levin M A, Kalal T T. Improving Product Reliability. Strategies and Implementation. Chichester, England: John Wiley & Sons; 2003. 313 p.
  6. 6. Schilling E G, Neubauer D V. Acceptance Sampling in Quality Control. Boca Raton: CRC Press (Chapman & Hall); 2009. 700 p.
  7. 7. Deming W E. Out of the Crisis. Reprint Edition. Cambridge MA: The MIT Press; 2000. 485 p.
  8. 8. Fuchs H O, Stephens R I. Metal Fatigue in Engineering. New York: Wiley and Sons; 1980. 336 p.
  9. 9. Stephens R I, Fatemi A, Stephens R R, Fuchs H O. Metal fatigue in engineering. 2nd ed. New York: John Wiley & Sons; 2001. 473 p.

Written By

Jaroslav Menčík

Reviewed: February 3rd, 2016 Published: April 13th, 2016