Parallels between materials science test and protracted subsystem disruptions.
Accelerating digitization of critical infrastructures is increasing interconnection and interdependence among high-reliability subsystems. The resulting dependencies create new challenges in preventing underinvestment in high impact, low probability (HILP) events which can have disastrous consequences for society’s critical subsystems. These more impactful events highlight the differences between reliability and resiliency, with the latter applicable to black swans. A number of approaches for quantifying resiliency have been proposed; however, a review of literature identified conceptual gaps when applied to empirical event data. This chapter provides a scenario agnostic method to quantify resiliency by applying concepts from materials science in a generalized form. This new formulation resulted from a mapping of constructs used in tensile testing to characteristics of protracted subsystem disruptions. Based on the mapping and gap analysis, a resiliency index calculation was developed and applied using examples based on empirical data from high impact events.
- critical infrastructures
- high impact
- low probability (HILP)
- digital systems
Digitization is occurring in many industries in many different forms; however, regardless of the application, a common set of enablers are employed. As the proliferation of digital transformation continues, decision makers will need to distinguish between reliability and resiliency in the planning, design, and operation of these subsystems. Tightly coupled common hardware and software platforms potentially increase the breadth of accidental failures as well as the impact of intentional sabotage. Beyond end use applications is an overall reliance on electricity which these digital subsystems require to function. Hardware, software, and electricity form the foundation upon which digitalization rest. The increased interdependence and interconnection can lead to common failure modes of previously isolated subsystems, resulting in increased probability of high impact events. Interconnection results in the establishment of a singular system with all other structures existing as subsystems. Evaluation of subsystems will need to include internally and externally initiated disruptive events. Highly impactful events, sometimes termed black swans, cannot only disrupt subsystems but fundamentally change their structure. Impactful as they are, rarity can make these events prone to underinvestment due to heuristics and biases, most prominently the availability heuristic. A quantifiable metric can aid in our ability to appropriately allocate resources to study, adapt, and mitigate these high impact, low probability events before they unexpectedly fracture the established subsystems we rely on. The avoidance of fracture is central to the application of the modulus of resilience in critical subsystems. The chapter will review the differences between the reliability and resiliency as well as the importance of distinguishing between the concepts. Additionally, ideals related to resilience are identified and expressed in a concise operational definition. The research utilized the progression shown in Figure 1 for the investigation.
Borrowing concepts from materials science allows for an isomorphic application where analogous structures are leveraged to represent HILP event scenarios. In this chapter, the isomorphic application is presented to provide a method of quantifying resiliency or its absence based on the intended aim of the subsystem. This concept is consistent with select portions of previous literature, but divergent in others. Following a review of previous research, a gap analysis was completed to identify opportunities for new considerations in quantifying resiliency. Lastly, an example in applying the modulus of resilience for critical subsystems is provided to demonstrate the computational process.
2. The increasing case for resiliency
Reliability and resiliency are sometimes discussed in a similar context with respect to subsystem performance; however, they differ conceptually in both the events they measure and the characteristics they quantify. The measures which define reliability provide insights as to the context of the metrics use. Many of the most common reliability metrics utilize mean-based calculations from reoccurring failures over time. These metrics include mean time between failure (MTBF), mean time to failure (MTTF), and mean time to repair (MTTR). These metrics require successive failures in order to quantify subsystem performance. Mean time between failure (MTBF) is used in reliability to provide the number of failures per million hours for a subsystem. Mean time to repair (MTTR) is the time needed to repair a failed subsystem. Mean time to failure (MTTF) measures reliability for a subsystem which cannot be repaired. It is the mean time expected until the first failure of a subsystem. MTTF is a statistical value and represents the mean over a long period of time and a large number of operations. The reliability metrics can effectively represent common cause events which produce reoccurring failures; however, these calculations are less applicable to low probability special cause events. A special cause is something special, not part of the system of common causes. It is detected by a point that falls outside the control limits . Often, subsystems have an allowable level of tolerance to minor disruption preventing sustained impairment in accomplishing the aim of the subsystem. Plotting the number of events by type versus percent of subsystem output disrupted graphically displays the relationship between common cause and special cause events. The allocation of events is closely represented by a pareto distribution Figure 2.
Resiliency events reside at the tail of the distribution as rare events resulting from extraordinary scenarios. Such events have been produced by multiple failures within a single subsystem as discussed in the book Normal Accidents by Charles Perrow. His work examined failures in highly complex operating environments. The increasing interdependence results in an interconnected ecosystem where a failure in a single subsystem can create failures in multiple subsystems. When interactive complexity is joined with tight coupling, the risk of a system accident is considerably increased. Interconnectedness and complexity among contemporary subsystems is increasing at a rapid pace as technologies develop faster than assessments can be made regarding their risks. As we move away from individual events and account for the larger system, we find the “eco-system accident,” an interaction of systems that were thought to be independent but are not because of the larger ecology . As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other subsystems, they experience more and more incomprehensible or unexpected interactions . Common mode failures, first included in analytical models in 1967, can contribute to unexpected actions from complex systems. In addition to common mode failures, proximity and indirect information sources are two additional indications of interconnectedness. Ultimately, the probability of a subsystem being subjected to significant disruption is dependent on the cumulative probability of both internal and external risks. Inevitably, the probability of significant disruption will increase as interdependence increases. While increases in events causing significant disruption are expected, their count is not expected to be significant enough for the application of mean-based reliability metrics. Therefore, resiliency-based metrics are needed which match the periodicity and scale of high impact, low probability events.
3. Quantifying high impact, low probability events
HILP events require a subsystem to bounce back to normalcy following major disruption. The goal is to regain pre-disruption levels of output as quickly as possible; however, recovery time is not the only metric of importance. The shape of the recovery curve is also of significance. Resiliency aids in defining a disaster response paradigm which differs from previous approaches such as resistance and sustainability by emphasizing return to normal. Nonetheless, the literature frequently uses the concept of resilience to imply the ability to recover or bounce back to normalcy after a disaster occurs . Review of scholarly work related to the resiliency concept identified three main ideals: no assumption that disaster prevention is always possible, recognition of the need to include social variables, and the necessity to include disciplines outside the physical sciences and engineering. The term resiliency has increased in usage over the past decades. A multitude of definitions have been proposed whose interpretations can align with either resistance or sustainability. Although the resilience construct offered advantages in many areas relative to competing paradigms, the ambiguity associated with its meaning and scope hindered consensus. The multiplicity of definitions is a reflection of the philosophical and methodological diversities that have emerged from disaster scholarship and research .
Resilience first came to prominence in the English language in the early 19th century when Tredgold used the term to describe a property of timber . In his essay “On the transverse strength and resilience of timber,” Tredgold tested the properties of timber to be used in ship making. Tredgold cites resilience as the power of resisting a body in motion . The statement is foundational in establishing the concept of resilience as more than recovery but instead as an ability to first withstand an applied force. Furthermore, Tredgold varied the weight and height of objects dropped on the test samples and recorded the effects to different forces on various wood pieces. These effects ranged from no effect, broke to curved. A second reference to the consideration of force can be found in the 1858 work, “On the Physical Conditions Involved in the Construction of Artillery, and on Some Hitherto Unexplained Causes of the Destruction of Cannon in Service,” by Robert Mallet. He states the modulus of resilience of other writers, referred to hereafter, depends, is much greater for gunmetal, and hence a given force produces a greater proportional distortion of form . The modulus of resilience was further formalized by materials science using stress/strain testing.
4. Methods in quantifying resilience
The range of methods for defining resilience include qualitative, quantitative and probabilistic. A quantitative method can be used to compare outcomes using data from different actual events. A number of researchers have explored quantifying resilience to move beyond qualitative representations. Henry and Ramirez-Marquez  proposed a quantitative approach for system resilience as a function of time. The formulation was a ratio of the recovery and losses using a figure-of-merit function. A disruptive event (ej) at time, te, impacts the system until time, td.
As shown, the numerator relates to the recovery until time t and the denominator represents the total loss due to disruption. Hosseini et al.  reviewed definitions and measures of system resilience. Their literature review was based on multiple domains including organizational, social, economic, and engineering using papers published between 2000 and April 2015. The major categories of assessment approaches are qualitative and quantitative with quantitative measures further defined as either probabilistic or deterministic.
The intent to analyze protracted subsystem disruptions leads to a focus on quantitative deterministic methods of calculating resiliency. The literature review by Hosseini et al.  included 11 deterministic methods of quantification. Bruneau et al.  utilized a method of integration based on the degradation in quality of infrastructure during recovery period of Eq. (5). Larger RL values indicate lower resilience while smaller RL imply higher resilience. Hosseini et al.  RL is calculated based on the formulation in Eq. (2).
Zobel  proposed a method based on the total possible loss over some suitably long-time interval (T*), percentage of functionality lost after disruption (X), and time required for full recovery (T). An effort was made to analyze different combinations of X and T which result in the same level of resilience as shown in Eq. (3).
This metric is based on a linear recovery making it unrealistic for some scenarios.
Alternative methods were proposed by Cox et al.  based on economic resilience using the difference in disruption (%∆DYmax) between the expected disruption (%∆Y) and maximum potential disruption (%∆Ymax). Therefore, an estimate of performance degradation is required. Such an estimation may be a challenge to precisely develop; however, the formulation is shown in Eq. (4).
Alternatively, Rose  considered time effects using a concept of dynamic resilience. The quantification of dynamic resilience is the difference in system recovery with hastened system recovery (SOHR) and without hastened system recovery (SOWR). This calculation is utilized over the total number of time steps (N) considered. The dynamic resilience calculation is shown in Eq. (5).
Wang et al.  explored resilience in information systems based on the number of operations in the enterprise information system (m). The ratio of the demand time (di) and completion time of operation (ci) are weighted by the importance of operation (zi).
The larger the value of the metric the more resilient the system is determined to be. The calculation requires the assignment of a weight and assumes the number of operations is known. When attempting to quantify unknown events the number of operations can be difficult to estimate.
Chen and Miller-Hooks  quantifies the “post-disruption expected fraction of demand that, for a given network, can be satisfied within pre-determined recovery budgets” (Hosseini et al.). The measure was based on transportation networks and compares the maximum demand that can be satisfied before disruption (Dw) and after disruption (dw) for pair (w).
Orwin and Wardle  considered the instantaneous and maximum disturbance in the quantification of resilience. The maximum absorbable force without upsetting system function (Emax) and effect of the disturbance on safety (Ej) at a given time (Tj) are used to define resilience.
Frameworks for local and global resilience were introduced by Enjalbert et al.  for modeling system safety in public transportation systems. A safety indication function (S(t)) is used to calculate resilience either instantaneously or over time, representing local and global, respectively. Global resilience is calculated from the time of disturbance (tb) to the end of the disturbance (te). The calculations are as follows:
Francis and Bekera  introduced a metric for dynamic resilience. The calculation uses the speed of recovery (Sp), original performance level (Fo), performance level at new stable level (
Cimellaro et al.  utilized quality of service to represent resilience. The method uses before disruption quality of service (Q1(t)), post disruption quality of service (Q2(t)), a control time (TLC) and a weighting factor (α) in developing a healthcare resilience metric.
Aside from the works investigated by Hosseini et al. , Dessavre et al.,  introduced a new model and visual tools adding a stress dimension representing the force and stress of disruptive events. Defining the stress of the events is not a trivial task and completely domain dependent .
A review of the concepts found in literature was completed for elements consistent with the modulus of resilience. Methods were limited to quantitative approaches which could be utilized with empirical data sets. Although the use of scaling factors was identified in literature [13, 18], such methods are not desired in the development of subsystem-based methods due to the subjectivity associated with them. A ratio-based approach has merit in its ability to normalize event effects and resulting recovery. Area-based calculations using integration are preferred to point calculations based on their ability to compensate for nonlinear restoration curves; however, complexity beyond the resilience triangle  would be necessary to capture differences in event magnitude and restoration response in disparate events.
The concept of a yield point was not identified in existing literature. A return to normal operation was typically used to identify the end of the restoration time period; however, this approach does not set the time based on the aim of the subsystem. Evaluations of subsystems beyond a critical point with respect to use of the subsystem output could lead to poor decision-making. One of the main weaknesses of the current resilience metric is that they do not relate the effects of a disruptive event to any of the event characteristics, unlike materials science . Materials science utilizes a change in length for evaluation of stress and strain; however, the difference in recovery response to a common cause and special cause event was not found in the literature review. These distinctions serve to highlight the differences between reliability for normally occurring events and resiliency to low frequency events. Additionally, the need for utilizing subjective variables [10, 11, 12, 14, 15] does not lend well to empirical study.
The ability to normalize responses to different events is beneficial for evaluating the resiliency of different subsystems or different events on the same subsystem. The literature reviewed began analysis of the event from the start of restoration  or by treating the entire curve from time of event to the completed restoration as a single integral . This approach can confuse the quantities of force, stress and strain. An equal force can result in different stress and strain based on the subsystem being reviewed. As a result, the descending slope and associated area prior to the start of recovery may prove informative of stress. Strain is more associated with the total area under the curve. The review of literature did not identify a bifurcation of the curve to delineate stress (prior to start of recovery) and strain (total area). Therefore, the assumption of instantaneous loss and exponential recovery  are not representative of many empirical cases.
In reviewing the concepts of resilience, a force is applied to a subsystem, the subsystem absorbs a portion of the force, experiences stress, and adapts to recover to a pre-disruption state. These references highlight an importance of considering the stress on the subsystem in determining the resiliency of a subsystem. Three primary points of measure for use in quantifying resiliency were identified including: stress, total area of event and change in length. Stress is a foundational variable of resiliency, as the term resiliency implies a response to a significant disruption. Therefore, only events of significance from a subsystem level are commonly referred to in terms of resilience. Additionally, the ability to compare resiliency events needs some level of normalization based on the associated stress for each event. Force continues to be applied until the subsystem decay ceases, allowing for subsystem assessment and initiation of recovery. The rate of subsystem decay influences the stress applied to the subsystem and the subsystem ability to bounce back. This connection exists due to the role of adaptation in the resiliency process. A slow evolving scenario (i.e., slow subsystem decay) presents the subsystem opportunity to adapt, resist, and recover in ways an acute decay will not. Therefore, when considering the normalization process of resiliency both the decay (i.e., stress proxy) and recovery portion of the resiliency curve must be independently considered. The delayed decay provides an opportunity for improved response from the subsystem.
Total area of recovery best quantifies recovery and resiliency by compensating for the nonlinearity in the response function. As the subsystem attempts to recover, disruptions in the recovery process may cause discontinuities not captured by linear slope calculations. Similarly, time to recovery (i.e., 3 days to recovery) calculations may fail to represent intermediate progress in recovery.
Consideration of a failure point based on the aim of the subsystem aids in representing real-world scenarios. Recovery which occurs after a critical point of the subsystem would indicate a lack of resiliency. As an example, if a water subsystem requires 10 days to restore operation post contingency but the consumers of the water can only survive 4 days without water; the subsystem lacks resiliency. Attempts to quantify the subsystem’s resilience should stop at 4 days. Calculations beyond the 4-day time period no longer support the aim of the subsystem or the practical operation of the subsystem.
Lastly, change in length was included in the materials science calculation of the modulus of resilience. The change in length from the original length to the length under stress could be translated to a subsystem resilience construct to allow consideration of how subsystem recovery under lower stress common cause events and high stress special cause events are related. The consideration of a change in length may aid in joining concepts associated with reliability in the quantification of resilience.
Comparing these constructs with the reviewed literature results in the identification of conceptual gaps. The resulting resiliency values should reflect the subsystem performance for practical cases. Units are required based on subsystem parameters. The x-axis utilizes units of time, while the y-axis measures the units associated with the aim of the subsystem.
The methods of quantification reviewed begin the process of quantification at the point of recovery or assume no time delta between the initiating event and start of recovery. To support the incorporation of stress in the quantification of resilience, a bifurcation of the event curve is used as shown in Figure 3.
The use of ratio methods may provide consistency in scenarios of similar characteristics. When disparate characteristics are present, computed values may prove inconsistent with event outcomes. Depending on the event characteristics, either ratio methods or area-based methods may identify a less resilient subsystem response as more resilient. Figure 4 depicts the concept of less recovery time for less disruption. The scenario of Figure 4 is representative of a minor difference in subsystem response and would provide consistent rankings for resilience outcomes in many cases, where less area is representative of increased resilience.
Conversely, cases may exist where a longer recovery results from a less impactful initial event. The delayed recovery to a less impactful event could result from many factors including a lack of preparedness, inability to adapt, etc. In such cases, observation would assume that the subsystem which took longer to recover from a less impactful event is less resilient. However, present formulations may suggest the opposite. Figure 5 illustrates this scenario, where the smaller area is not representative of the more resilient outcome.
The fracture point should be set based on the aim of the subsystem. For example, if a drinking water subsystem failure requires a 7-day restoration period but 4 days is the survival period without water; the calculation of subsystem resiliency should be limited to a 4-day period. In some cases, the acknowledgement of a fracture point will result in the calculation of resiliency stopping prior to the subsystem returning to pre-disruption output levels. Figure 6 represents a case where the subsystem recovery takes longer than the subsystem failure point.
Calculations to quantify resiliency which consider values beyond the failure point are theoretical as opposed to practical in nature. The failure point should be given priority in quantifying resiliency.
An operational definition is derived from the combination of literature review and isomorphic adaptation of the modulus of resilience. Hence, resiliency is defined as the ability to limit proportional stain from abnormal stress to less than the subsystem yield point, through the achievement of recovery in less than the subsystem critical timeframes. This definition allows the use of quantitative measures in the calculation of resilience in a deterministic and normalized approach based on concepts from materials science.
An evaluation between two groups can result in an isomorphic application of findings from one structure to another. This mapping between groups can yield opportunities to apply known methodologies in an inter-disciplinary manner. The process of verifying an isomorphism requires the identification of elements in each structure and evaluating their equivalence. If equivalence is identified an opportunity for applying the computational framework may exist. The quantification of subsystem resilience was compared to resiliency as used in materials science. Materials science’s definition of resiliency includes the concepts of per unit volume, maximum energy, and integration from zero to the elastic limit. The modulus of resilience (Ur) is found from the stress-strain curve measured during the tensile test. Stress (σ) in the stress-strain curve is “the applied force per unit original undeformed cross-sectional area of the specimen”  as delineated in Eq. (15).
where F = force; A0 = cross sectional area.
Young’s modulus (E) serves as a measure of stiffness for a solid material. “Because of the difficulty in determining the elastic limit, it is commonly replaced by the proportional limit, which is the stress at which the stress-strain curve is out of linearity” .
where F = force; A = actual cross-sectional area; ∆L = amount of change in length; L0 = original length of the object.
“The modulus of resilience is the strain energy per unit volume absorbed up to the elastic limit for a tensile test and equals the area under the elastic part of the stress-strain curve” .
“This quantity indicates how much energy a material can absorb without deforming plastically” . Plastic deformation occurs when a material undergoes non-reversible changes in response to applied forces. The use of the stress-strain curve from materials testing is similar to conditions faced by disrupted subsystems regardless of type. Stress is the impact to the material under test, while strain is the resulting effects of the stress.
Based on the desire of applying a consistent methodology to quantify resilience regardless of disruption magnitude or subsystem size, the percentage of subsystem disrupted is proposed to achieve a per unit value. The area under the curve will then be integrated from the beginning to end of the disruptive event. Calculus to determine area under the curve is shown in Eq. (19).
where Ei = Event initial; Er = Event restored.
The area under the curve will then be applied to the maximum percentage of subsystem disrupted.
where SD = % of subsystem disrupted; Anl = Area under the curve to nonlinearity; At = Total area under the curve; Da = Duration of average disruption; De = Duration of event disruption.
Protracted subsystem disruptions create stress and strain due to an inability to complete the subsystem aim. The similarities between tensile strength test used in materials science and the need to measure stress and strain subsystems create an isomorphic relationship. Table 1 shows the parallels between materials science and protracted subsystem disruptions.
|Materials science.||Protracted subsystem disruption||Comparisons|
|Stress applied||Peak percent of subsystem out of service||Percent out of service is equivalent to stress|
|Cross-sectional area||Area under curve from the origin to peak subsystem out of service||Area from zero to peak subsystem out of service is point where curve loses linearity|
|Actual cross-sectional area||Area under curve for entire disruptive event||Represents total strain experienced by subsystem|
|Change in length||Delta between subsystem’s average duration of disruptions and event disruption duration||Use of change in duration accounts for the change in length between average and protracted event|
|Original length||System’s average duration of disruptions||Accounts for average non-protracted disruptions events|
The application of the modulus of resilience to a specific subsystem requires the identification of an aim the subsystem exist to accomplish. “Without an aim, there is no system” . The aim should be quantifiable with metrics available for analysis. The data must be accessible in order to serve as the basis for the resilience calculations and will vary based on the subsystem under study. Examples include percentage of successful operations or percentage of end users receiving service. The next section provides an empirical example in applying the modulus of resilience.
5. Application of the modulus of resiliency
The power industry was selected to provide an example for applying the modulus of resiliency using empirical data. The aim of the electric subsystem is to deliver electricity to all end use customers; therefore, data regarding the number of customers out of service can be used to quantify subsystem performance. The use of customers out of service in quantifying subsystem performance was supported by a review of regulatory reliability metrics used by Public Utility Commissions. For major electric utility disruptions, DOE situation reports provide customer outage information for and are publicly available from the DOE website. One of the most prominent events to challenge utilities is hurricane, and as a result, multiple hurricane events have data on the DOE website. Following data collection, plots can be constructed of the electric utility response in restoring customers. The inflection points were identified, and a yield point designated by reviewing disaster preparedness data from the Capital Region Study . The study indicated that 73% of survey respondents had less than 10 days of food stored. Therefore, an event lasting greater than 10 days would most likely result in scarcity from food spoilage and diminished retail capabilities. With a known bifurcation and yield point, analysis can be completed.
Hurricanes Wilma and Irma presented an opportunity to compare resiliency of separate events in the same region. Following Wilma, the ability of several infrastructures to recover from severe events was reviewed in the Florida region. “[M]ore than $141.5 million has been obligated by FEMA for 119 Hazard Mitigation Grant Program projects to build stronger, safer more resilient communities in Florida” . Florida was once again subjected to a hurricane when Irma came ashore 12 years later. More than six million customers lost power as a result of Irma; compared to 4 million from Wilma. Although more than a decade apart, these two storms provide an opportunity to compare the recoveries following significant investment in resiliency. The comparison of the two resiliency indices can present an opportunity to calculate a cost per unit of resiliency and explore concepts such as diminishing returns or optimization from multi-hazard investment. Multi-hazard resiliency actions would provide an ability to address multiple HILP scenarios with a single investment. A resiliency index for each of the scenarios would be computed in order to create a composite change in resiliency for a given investment. The goal of this composite approach is to provide a means for justifying highly adaptable subsystem structures based on resiliency benefits.
The example demonstrates the process of calculating the resiliency index for a power utility scenario and comparing the response before and after the investment in resiliency. The values shown in Table 2 were extracted from United States Energy Information Administration (EIA) data. The additional data points associated with 0.5 and 1.5 days were included due to nonlinearities in customer outages associated with Hurricanes Wilma and Irma, respectively. Similarly, day 9 for Hurricane Wilma was approximated for the purpose of this analysis. The data required to calculate the change in length was available by collecting System Average Interruption Duration Index (SAIDI) data. SAIDI data provides a basis for the average duration a customer faces and can be compared to the protracted system disruption as a change in length.
|Day.||% Out of service (Hurricane Wilma 2005)||% Out of service (Hurricane Irma 2017)|
Following the collection of empirical data, the total area under the curve was calculated by dividing the outage curve into time steps and summing the areas of each time step as shown in Figures 7 and 8, respectively.
The study region had a SAIDI of 60 minutes and a protracted outage duration of 12,960 minutes. Therefore, the resiliency index (RI) for Hurricane Wilma is determined as shown in Eq. (7).
The study region had a SAIDI of 57 minutes and a protracted outage duration of 12,960 minutes. Therefore, the resiliency index (RI) for Hurricane Irma is determined as shown in Eq. (22) based on EIA data  (Tables 3 and 4).
|Day.||% Out of service (Hurricane Wilma 2005)||Area|
|Total area under curve||1.758|
|Area under curve to nonlinearity||0.258|
|Maximum % of customers out||0.350|
|Day.||% Out of service (Hurricane Irma 2017)||Area|
|Total area under curve||2.175|
|Area under curve to nonlinearity||0.560|
|Maximum % of customers out||0.640|
Change in resiliency is found by Eq. (3).
The determination of a change in resiliency allows for a quantitative measurement related subsystem response. The use of resiliency indices can aid in quantifying the efficacy of resiliency investment.
In this chapter, a comparison to mean-based reliability was contrasted with the use of resiliency calculations for HILP events. Resiliency calculations are required, given the infrequent nature of protracted subsystem disturbances. Following a review of resiliency computations, a gap analysis was used to identify the opportunities for ensuring a resiliency calculation can capture the nonlinearities observed in empirical data. Parallels are provided between the modulus of resilience construct from materials science and an isomorphic application defined. In conclusion, an example is presented for the power utility sector demonstrating the methods of collecting the inputs and completing the computations. These inputs include defining the aim of the system and failure point, data collection, determination of bifurcation point, and the use of reliability data for calculating a change in length.
The ability to calculate resiliency regardless of the subsystem or scenario can assist in the evaluation of resiliency actions already taken or planning for new investment. The ability to compute resiliency on a common base may also offer opportunities to optimize investment based on interconnectedness to the subsystems which yield the greatest improvement. A more integrated approach may lead to increased systemic resiliency as opposed to more common heuristics-based subsystem specific approaches. The proposed method more closely adheres to the ontological and conceptual frameworks associated with initial references of resiliency. Furthermore, subjective inputs are avoided increasing the replicability and repeatability of associated research. By acknowledging a yield point specific to the aim of the subsystem, results from the resiliency index better represent the outcomes of real-world subsystems. Lastly, bifurcating the event curve allows the onset characteristics of the disruptive event to normalize the resiliency performance metric.
Further research on the distribution of events by type will be conducted to validate the anecdotal evidence regarding common cause and special cause events. This additional data will assist in the development of statistics for assessing the correlation between increasing interdependence and HILP events for critical subsystems. In order to test a wider array of empirical data sets, resiliency indexes will be calculated using both historical and future HILP event data. The results of these analyses will be used to continually evaluate the efficacy of the metrics and identify opportunities for enhancements.
high impact, low probability
mean time between failure
mean time to failure
mean time to repair
department of energy
Federal Emergency Management Agency
energy information administration
system average interruption duration index