A resilient system is a system that possesses the ability to survive and recover from the likelihood of damage due to disruptive events or mishaps. The concept that incorporates resiliency into engineering practices is known as engineering resilience. To date, engineering resilience is still predominantly application-oriented. Despite an increase in the usage of engineering resilience concept, the diversity of its applications in various engineering sectors complicates a universal agreement on its quantification and associated measurement techniques. There is a pressing need to develop a generally applicable engineering resilience analysis framework, which standardizes the modeling, assessment, and improvement of engineering resilience for a broader engineering discipline. This paper provides a literature survey of engineering resilience from the design perspective, with a focus on engineering resilience metrics and their design implications. The currently available engineering resilience quantification metrics are reviewed and summarized, the design implications toward the development of resilient-engineered systems are discussed, and further, the challenges of incorporating resilience into engineering design processes are evaluated. The presented study expects to serve as a building block toward developing a generally applicable engineering resilience analysis framework that can be readily used for system design.

Introduction

Change occurs perpetually in life. For an engineered system to adapt to changes, this ability has to be designed into the system. This practice is also known as engineering resilience. To promote a better understanding of engineering resilience, there are several basic questions that should be considered: (1) What is engineering resilience? (2) Why is engineering resilience necessary? (3) Where could engineering resilience be implemented? (4) When is engineering resilience desired? and (5) How can engineering resilience be modeled and quantified, and used to improve the design of engineered systems?

Primarily popularized by researchers in the field of ecology, resilience in an ecosystem is defined as the speed with which an ecosystem returns to its equilibrium state following a perturbation [1]. This idea of “speed of returning to equilibrium” has influenced the origin of the engineering resilience concept [2]. In engineering, speed of returning to equilibrium is typically associated with: (1) how fast an engineered system can adapt to deviation following a misfortune and/or (2) how swiftly an engineered system can be restored from its disrupted states. Engineering resilience is the concept that fuses resilience ability into engineering practices. Resilience in engineering implies the ability of an engineered system to autonomously sense and response to adverse changes in health conditions, to withstand failure events, and to recover from the effects of these unpredicted events [3]. A resilient system, from the perspective of the U.S. Department of Defense as reported in the literature [4], represents the system that exhibits specific resilience properties, such as ability to repel, resist, or absorb, ability to recover, and ability to adapt. A survey of the definitions of resilience that have been reported in different disciplines can be found in Refs. [5,6]. Engineering resilience has been sought as an alternative or as a complement to the traditional view of system safety to endure the possibility of failure [79]. The resilience of engineered systems has been addressed in many different aspects, leading to the fast growing engineering discipline referred to as “engineering resilience,” sometimes also addressed as “resilience engineering” in the engineering society.

The continuous pursuit of developing a better, safer, and longer lasting engineered system has pushed the continuous growth in complexity and scale of engineering systems [3,10]. Subject to operation in unpredictable and uncertain conditions, complex engineered systems may require extraordinarily high safety precautions in design to account for unforeseen failure modes, such as those induced by adverse natural disasters. However, in the early design stage, it is very challenging, if not impossible, for system designers to determine all the possible failure modes. Thus, noticeable consideration has been given to engineering resilience that it is necessary to be designed into engineered systems in order to cope with system complexity and unforeseen failure modes.

To date, the implementation of the engineering resilience concept has been widely spotted in various engineering disciplines. Many of the engineering resilience implementations are associated with large-interconnected-complex systems, such as transportation systems [1119], power systems [5,2024], production systems [2530], multitier supply chains [3,25,3139], general infrastructure systems [5,20,4047], health care systems [4851], and many more. The implementation of engineering resilience is not only limited to complex systems applications, but the engineering resilience concept could also be implemented to single-mechanical-design system, such as aircraft actuators [52], aircraft controllers [5355], or computer numeric control machining systems [56].

Traditional research efforts were focused on developing a system with high reliability to prevent failures. Although the high reliability concept has managed to improve system performance, there are two main reasons why high reliability is no longer sufficient in some instances: (1) High reliability is costly. Improving reliability in a system typically involves backup, redundant, or standby systems and/or components. This simultaneously requires additional costs. The costs involved in improving reliability would increase substantially as the system reliability level approaches the maximum achievable reliability. At some point, it is no longer economical to improve system reliability further as the law of diminishing returns will apply. (2) Failures could be inevitable in many engineering applications, even with very high system reliability. For instance, a failure event with a zero probability of failure could still occur in engineering practice as suggested by the probability theory. In addition, there are some cases where the damage caused by the failure events is unavoidable and uncontrollable, especially those adverse failure events which are induced by nature. Engineering resilience has presented itself as the turning point in recent research efforts toward a more systematic way of addressing failures of engineering systems. In cases when achieving higher system reliability is no longer affordable and failure is inevitable, engineering resilience offers the ability to survive failures and to recover from calamities. Resilience is particularly appropriate when the system is expected to survive and recover from low frequency-high impact disruptions [57].

Although engineering resilience has gained popularity among designers, engineers, and practitioners, the consensus on how engineering resilience can be designed, quantified, and improved in engineered systems has not yet been reached. This may be partly because engineering resilience, during its implementation, is highly subject to the application context. It is dependent on the architecture of the systems, the operating conditions, the type of disruptive events, along with the magnitude of damage [57]. Different systems may be designed to be resilient to different disruptions, which would most likely require different approaches. The catch here is in what way or manner engineering resilience can be translated to unambiguous quantifiable measures. To design or create resilience in a system, a set of actions describing resiliency can be further interpreted in the same quantifiable measures as engineering resilience. After one identifies a proper way to quantify engineering resilience, modifying system designs and operations thereby improving resilience can be further carried out.

This paper provides a literature survey of existing studies in engineering resilience from a system design perspective, with the focus on engineering resilience metrics and their design implications. This paper would offer a better understanding of the engineering resilience concept in the engineering design community and help promoting further developments of generally applicable resilience quantification metrics, resilience analysis methodologies, and resilience design tools. These potential developments are expected to be applicable in a broad range of applications in the design of resilient-engineered systems. The rest of the paper is structured as follows. The conceptual attributes of an engineering resilience curve is first presented in Sec. 2, a survey of the available resilience quantification metrics is presented in Sec. 3, the design implications of engineering resilience are then discussed in Sec. 4, and conclusions drawn are summarized in Sec. 5.

Engineering Resilience Curve

Most engineered systems are exposed to uncertain, unpredictable, and potentially harsh operating conditions, which partake in the alteration of system performance level over time (P(t)). Figure 1 shows the performance behavior of a resilient-engineered system compared to that of a nonresilient-engineered system, after being subjected to a disruptive event.

A resilient-engineered system possesses the ability to recover the system performance level from its disruptive state to its operating state as indicated in Fig. 1(a). On the other hand, a nonresilient-engineered system may gradually decline toward a significantly low performance level due to an unexpected disruptive event. Depending on the inherent capabilities of the system to withstand mishaps, the system may reach an unhealthy or degraded stable-state (Fig. 1(b)). This scenario is indicated by a lower performance level (Pv). If the system cannot survive the disruption, it will continue to worsen until the systems face a complete failure or collapse state (Fig. 1(c)). From Fig. 1, it is apparent that engineering resilience is more favorable when the system is subjected to disruptive events.

Since resilience has been generally associated with the losses of system performances after a disruptive event, a resilience curve is thus typically represented as a system performance curve, P(t), plotted against time, t. In general, there are four states in the timeline of the engineering resilience concept. As illustrated in Fig. 2, these four states are briefly explained as follows:

  1. (1)

    Reliability state (SI): Baseline or original state, when the system operates normally before the occurrence of disruptive events (Po).

  2. (2)

    Unreliability state (SII): Vulnerable state, when the system degrades to Pv following a disruptive event at time td.

  3. (3)

    Recovery state (SIII): Recovery state, when the system improves its performance functions as a result of restorative efforts. The restoration actions occur instantly from tv to tn.

  4. (4)

    Recovered steady state (SIV): System performance reaches a newly recovered steady state after successfully completing the recovery state at time tn.

There are many variations of the engineering resilience curve apart from the one illustrated in Fig. 2. The various versions of engineering resilience curves originate from different perspectives that are mostly for conceptual and qualitative illustration of resilience in the application of interest. These variations are mostly due to differences in the unreliability profile and the recovery profile for different engineering applications. As a disruptive event typically varies in terms of severity and duration, the recovery response may also vary in different scenarios [36]. Figure 3 shows some examples of the conceptual attributes which lead to various forms of the engineering resilience curve.

Following a disruptive event, the impact level captures the severity of the event on the system performance. Impact level could be measured through the difference between the initial performance level and the performance after the disruptive event (Po − Pv) [36,60].

The unreliability profile and the degree of unreliability (θ) vary with the impact level and the inherent ability of the system to survive a disruption. Figure 3 shows three different unreliability profiles (u1, u2, u3). The first unreliability profile (u1) exhibits a sharp vertical performance drop (θu1 = 0 deg). In this scenario, the system is interpreted as unable to endure the impact of a disruption where the disruption may be unavoidable, sudden, and destructive. The second unreliability profile (u2) shows a gradual decrease in system performance and stabilized in a stable disruptive state before the recovery takes place. In the literature [18,20,61], this scenario is often referred to as a five-state resilience curve, as depicted in Fig. 4. The third unreliability profile (u3) expresses a gradual decline in system performance and immediate recovery. Since engineering resilience is associated with an accompanying swift recovery action, the recovery action should take place immediately once the system has sensed a continuous drop in system performance due to a disruptive event. The recovery action should be proactive and preferably triggered before the system settles to a stable disruptive state, as depicted in u2. The system unreliability state and disrupted state in Fig. 4 can be generally reflected as one unreliability state as both states exhibit nonoptimal performance level. Note that θu3 > θu2, which also explains that u3 is able to endure the impact of disruption better than u2. The performance loss area of θu3 is lesser than θu2 although both scenarios are recovered at the same time tv in Fig. 3.

The degree of recovery (γ) determines how much system performance can be recovered. Despite the fact that some failure events cannot be foreseen, engineering resilience offers swift recovery abilities to return the system performance function rapidly to its ideal operating condition (SIV). There are three possible outcomes as seen in Fig. 3, SIV could be improved (higher than baseline), stabilized (same as baseline), and deteriorated (lower than baseline), all in line with the built-in resilience ability in the system and the availability of required resources. The unreliability profile and recovery profile in most resilience curves in this paper are demonstrated as straight lines for simplicity purposes. In practical engineering applications, due to the presence of uncertainties, both unreliability and recovery profiles are more likely to exhibit nonlinear behavior. In some cases, convex and concave profiles are also observed [36,51]. Figure 5 shows four representative behaviors of a recovery profile.

Resilience Quantification Metrics

Quantification of engineering resilience plays an important role in defining resilience of an engineered system and further applying the resilience concept in the engineer design process. Although it has been explored in diverse engineering disciplines, to date, available engineering quantification metrics still exhibit very little standardization. Agreement on a general quantifiable measure remains a challenge. Many different approaches and aspects (including uncertainties) should be taken into consideration when it comes to quantifying engineering resilience. Highly dependent on the application of interest, quantification metrics could be classified as deterministic–probabilistic and/or static–dynamic [62].

In this section, the available metrics are grouped based on the derivation approaches of the resilience quantification metrics. Some metrics could fit into more than one category. There are strengths and weaknesses in every available resilience quantification metrics, depending on the purpose of study and application of interest. A compilation of resilience metrics as reported in this literature is provided in this section to show the diversity of the available metrics. These resilience metrics are categorized based on three categories, namely, (1) resilience curve, (2) pre- and postdisruptions performances, and (3) reliability and restoration, which are detailed below. Note that the available resilience quantification metrics provided in this paper is not exhaustive.

Resilience Metrics Based on Resilience Curve.

Since the resilience curve is often used to illustrate the resilient behavior of an engineered system undergoing a disruptive event, many researchers have used the properties from the resilience curve to quantitatively measure the resilience level of the system. In the resilience curve, the area of concern is the shaded area in Fig. 3 or 6. This area is also referred as the “impacted area (IA),” which defines the performance loss after a disturbance or disruptive event. If the area is enclosed by a nonlinear recovery profile, the performance loss can be approximated using the integral method. In Ref. [63], loss of resilience (Ψloss) is denoted as performance loss. Ψloss can be quantified by the magnitude of the expected degradation in performance quality over recovery time, mathematically expressed in the following equation:

(1)

where Po(to) is the initial performance function before a disruptive event at time (td), and P(t) is the performance quality of a system which varies with time.

The shaded area of concern in Fig. 6 is also known as the resilience triangle in the literature [59,60]. When the recovery profile in Fig. 6 is assumed to be linear, a triangle formulation can be integrated to quantifying resilience. As mentioned in Ref. [64], the predicted resilience (Ψ) is given in Eq. (2), in which X is the percentage of lost performance (Po(to) − Pv(tv)), T is the time required to recover to normal operation (tn − td), and T* is a long time interval in general.
(2)

The system performance does not necessarily show a steep or extreme drop in the aftermath of a disruptive event, as illustrated in Fig. 6. During td and tv, a gradual performance degradation may be experienced by the system, as illustrated in Fig. 7. Most of the gradual performance drops exhibit a nonlinear behavior. For the nonlinear unreliability and recovery profiles, resilience can be explained as the functional capability of a system following a hazard over the control period (T = tn − td). As mathematically shown in Eq. (3), Ψ can be quantified as the normalized shaded region under the system response (describing the functionality of a system) after a disruptive event denoted as AP(t) in Fig. 7 [65].

(3)
During the performance loss period from td to tn, Ψ can be quantified by taking the ratio of the areas below the system response after a disruptive event (AP(t)) over the baseline system response (BP(t)) from time to to T* [20,46,61], mathematically shown in Eq. (4). BP(t) characterizes the system performance if no disruptions occur from time to to T*. AP(t) characterizes the system response in the presence of a disruptive event from time to to T*. Equation (4) is also referred to as the integral resilience [61].
(4)

In cases where BP(t) is measured in a relative scale and assumed to be either 100% (or in other words a constant value of 1.0), the integral of the dominator in Eq. (4) will result in T*, and thus, Eq. (4) could be further rewritten as Eq. (3), in this specific case. Note that the time period proposed by Renschler et al. [65] in Eq. (3) is different than the one in Eq. (4).

Instead of taking the integral value, AP(t) could also be quantified as the baseline performance BP(t) minus the performance loss which is indicated by the impacted area (IA) in Fig. 7. In the case, where the disruptive event occurs more than once in a long period of time T*, Eq. (4) could be accordingly formulated as follows [23,24]:
(5)
where N(T*) is the total number of occurrences during time T*, i is the event occurrence number, and IAi(ti) is the impacted area caused by ith event at time ti. When BP(t) is assumed to be a constant value of 1.0 and is combined with the scenario in which the occurrence of the disruptive event is based on a Poisson process, an expected resilience metric could be derived as [23,24]
(6)

where E[IA] is the expected impact area caused by the disruptive event. E[IA] accounts for all the possible damage intensities. λ is the occurrence rate of the disruptive event per year. P(t), AP(t), and BP(t) could either be deterministic or stochastic variables depending on the application of interest. In order to mimic reality, stochastic variables would be more preferred in resilience analysis because of the incorporation of probabilities, randomness, and uncertainties.

In addition to the performance loss, other resilience dimensions could also be derived from a resilience curve. Figure 8 depicts five resilience dimensions: recovery, impact, performance loss, recovery profile function (f(t)), and weighted-sum (g(t)) as proposed in Ref. [36]. The description of each dimension and the corresponding equations are listed in Table 1. The resulting resilience value could be calculated by the submission of the weighted resilience dimensions in Eq. (7). w1,.,5 is the weight corresponding with the dimension of resilience

(7)

Resilience Metrics Based on Pre- and Postdisruption Performances.

Engineering resilience is often affiliated with performance loss of the system undergoing a disruptive event. Therefore, one of the approaches to quantify resilience is the measurement of performance changes, where resilience metrics could be represented as the ratio of system performance before (pre-) and after (post-) disruption. Expressing resilience based on system performance is highly application-specific, as different applications generally have different performance functions. In addition, there are many cases where a unique application can be described by multiple performance functions. For example, in a networked-system, the performance function could be characterized in various ways, such as the flow/delivery value in a network, the system travel time (STT), the demand that has to be satisfied, etc.

In the case where the flow/delivery value (V) was adopted as system performance in a networked-system application, the corresponding resilience metric has been expressed in the following equation [11]:
(8)
where Vinitial is the initial amount of information that needs to be carried through network, and Vloss is the information loss as a result of disruptions. In a similar networked-system application, a resilience index has been proposed and quantified as the difference between the optimal travel time (SO) and the critical system travel time (STT*) [19]. The proposed resilience index has been normalized relative to the STT* as
(9)
Considering all the nodes in a networked-system, the resilience metric with a fraction of the expected demand E (D) is also shown in Eq. (10) [14,37], where Dw,pre is the original predisruption demand for the origin–destination (O–D) pair w, and Dw,post is the postdisruption maximum demand that can be satisfied for O–D pair w.
(10)
In addition to the pre- and postdisruption ratio, a resilience formula was introduced based on the postdisruption reliability of each supplier in a networked-system application [33], which has been mathematically formulated as
(11)

where Ψi is the resilience of the demand node i, pj is the reliability of supply node j, qk is the reliability of supply link k, di is the demand quantity of demand node i, sj is the availability of supply node j, and ck is the capacity of supply link k, respectively.

In general, the maximum performance drop represents the worst case scenario that could happen for a system as the postdisruption effect, as shown in Fig. 9, where the worst case scenario has been denoted by Pmax. Based upon the worst case scenario, a resilience index was defined as the ratio of the avoided performance drop postdisruption and the potential maximum performance drop [60], as expressed in Eq. (12) mathematically.

(12)
When the performance drop is quantified based on the percentage of performance change instead of the performance output value, a resilience index can be instead formatted in Eq. (13) [17], where %DYm is the maximum percent change in direct output performance, and %DY is the estimated percent change in direct output performance
(13)
Besides the performance change before and after the disruption event, the system recovery process is another performance function that has been utilized to quantify the resilience of a system while considering the pre- and postdisruption conditions. The recovery process generally occurs as an aftermath of a disruptive event, which can be considered as a postdisruption system behavior. Depending on the recovery performance, a more resilient system could normally recover in a faster manner. Thus, by comparing two recovery properties, as denoted by YDR and YDU in Fig. 9, resilience can be quantified as Eq. (14) [60], where YDR is the resilient recovery path and YDU is the normal recovery condition, and m and n are the required recovery time where m > n in order to demonstrate a scenario where lesser time is required to recover under resilient response path.
(14)
In addition to the ratio between pre- and postdisruptions as seen in most of the previously mentioned equations, a weighting factor, α, was introduced in Ref. [51], which combines the integrals of the system performance, P(t), before and after a disruption during the control time, T = tn − td. Accordingly, the resilience metric can be represented mathematically as
(15)

Resilience Metrics Based on Reliability and Restoration.

As discussed previously, a resilient system is a system that possesses the ability to survive and recover from the likelihood damage due to disruptive events or mishaps. For an engineered system, resilience has been defined as the ability of an engineered system to sense and withstand adverse events and to recover from the effects of the disruptive events [49]. A mathematical formula has been derived to quantitatively measure the resilience of engineered systems with two essential attributes as reliability and restoration, in which system reliability quantifies the ability of an engineered system to maintain its capacity and performance above a safety limit during a given period of time under stated conditions, whereas restoration measures the ability of an engineered system to restore its capacity and performance by detecting, predicting, and mitigating/recovering from the system-wide effects of adverse events. Mathematically, it can be expressed as
(16)

in which the capacity restoration (ρ) can be considered as the degree of reliability recovery. The reliability and restoration can be derived as a set of conditional probabilities. The restoration in Eq. (16) was further quantified as a conditional probability of a system failure event (1 − R), a correct diagnosis event (ΛD), a correct prognosis event (ΛP), and a mitigation/recovery action success effect (κ) [52].

In the presence of uncertainties, while taking into account disruptive events and system performance, conditional probabilities were employed [3,25] to quantify resilience as a function of disruptions (D), system specific characteristics (SSCs), reliability (R), and restoration (ρ).
(17)
Reliability of a system in general describes the ability of a system to perform intended function for a predefined period without failure, which is usually measured by a probability. For the analysis of system reliability, failure is usually defined based on a system performance of interest, P, which is generally represented as a function of system random input variables within a random space, where P < 0 indicating system failure. The random input space can be divided into two domains, namely, the failure domain and the safe domain, by a limit state function defined by P = 0 [66,67]. The probability of random input variables falling into the safe region is known as reliability, and accordingly the probability that the random input variables fall in the failure region is denoted as probability of failure. With the reliability and probability of failure being quantified, resilience can be represented as the recovery over system failure in a probabilistic manner. Given system failure at time t1 and recovery of failure at a later time t2, resilience was formulated between t1 and t2 as [67].
(18)
where Ψ(t1, t2) is the conditional probability given a system fails at t1 and recovers at t2. Considering the state transition probabilities (PFS) between the failure state and the reliable state, and the failure probability (PF), resilience can be further quantified as [67,68].
(19)

Reliability has also been expressed in terms of damage or performance loss in many available resilience quantification metrics. Resilience was quantified in Ref. [69] as Pr(A|i), which is the conditional probability that the system will meet predefined system performance standards (A) after the disruptive event i. The performance standards introduced in Ref. [69] include robustness (r*) and rapidity (t*). The robustness has been defined as the maximum acceptable loss, which can be considered as the ability of the system to endure failure or ensure reliability. Moreover, rapidity has been defined as the minimum acceptable disruption time or maximum time to full recovery. After the presence of a disruptive event, the initial loss (ro) and the time to full recovery (tn) should not exceed the performance standards, as shown in Fig. 10 where ro > r* and tn < t*. With this resilience quantification metric, a resilience objective where Pr(A|i) should also meet the reliability goal of R* can be set and represented as

(20)
Equation (20) provides a resilience metric by taking into account an individual disruptive event. In the presence of multiple disruptive events (e.g., different failure evens), a conditional resilience metric was proposed in Ref. [70] that employs the percentage of system performance maintained in response to these disruptive events as
(21)
where ∏i Si, represents the combination of a set of failure or disruptive events. On the other hand, resilience have also been quantified based on the proportion of performance loss that has been restored from its disrupted states [18,44,61,71].
(22)
where P(t|ei) is the proportion of the performance function that has been recovered from its disrupted states P(tvs|ei), as shown in Fig. 4. Given a disruptive event ei, initial time to, time to a disrupted state tvs, time to recovery tvf, and time t∊(tvs, tvf), the resilience metric shown in Eq. (22) is also referred to as the quotient resilience [61]. In a very similar manner, a resilience metric was constructed as the ratio of capacity restoration over the initial performance conditions [5], as
(23)

where Sp is the speed recovery factor, Fr is the performance at the recovered stable state, Fd is the performance level immediately postdisruption, and Fo is the original stable system performance level predisruption. Fd/Fo and Fr/Fo are deemed to be the absorptive capacity and the adaptive capacity of the system, as discussed in Ref. [5]. In this scenario, absorptive capacity can be considered as reliability whereas the adaptive capacity can be considered as restoration of reliability losses.

The resilience curve based quantification approaches as discussed in Sec. 3.1 have also been utilized by to calculate reliability and restoration. A resilience metric was defined in Ref. [58] as
(24)

where failure profile (F) and recovery profile (ρ) are measured based on failure event (f) and recovery event (p), respectively, over the performance P(t).The time notations have been labeled in Figure 11 accordingly. Moreover, efficiency of the system prior to disruption, E0, is also believed to have an effect on the recovery process. Resilience has been quantified in Ref. [72] for civil infrastructures under earthquake disruptions as the recovery over the loss of efficiency by taking into account Eo, the measures of damage transpired (Pd) after a disruptive event, and the measure of the recovery process (Pρ), respectively. The resulting resilience metric was then formulated as

(25)

where E(Pρ) indicates the efficiency of the recovery curve.

Resilience Scale.

Although resilience has been quantified in different manners for different application purposes, as discussed in Secs. 3.13.3, it is, however, important to reach an agreement within the community on a scale that resilience of an engineered system can be measured, which facilitates the resilience analysis and further the assessment of resilience performance for different system design alternatives. A resilience scale allows one to evaluate how much resilience has been gained or lost in a system. As reported in the literature, most of resilience metrics have taken a resilience scale between 0 and 1 [25,52,65], or may be expressed as a percentage value between 0% and 100%. Quantifying resilience based on different system performances of interest with a universal scale between 0 and 1 could potentially simplify the complication induced by all different resilience metrics, thereby reaching a generally applicable quantity.

First, as resilience could also be considered as one of the system characteristics, it is more convenient to quantify it at a relative scale based upon the performance changes before and after a disruptive event. In addition, when uncertainties are incorporated in resilience analysis, probabilistic resilience metrics can be used that generally possess a probability value between 0 and 1. By using a resilience scale, a resilience value could be interpreted based on system performance recovery after a disruptive event or based on the probabilistic concept on how likely the system would survive or recovery from the disruptive event in general. For example, a system that has a resilience value of 0.9 can be interpreted as that the system is 90% resilient to a particular disruptive event in general. Specifically, it could indicate a 90% probability that the system will survive a given disruptive event or recover to a predefined system performance within a given time period after the disruptive event.

From the resilience scale perspective, success in engineering resilience would point toward the ability of a system to sense the changes in health conditions, to prevent and/or survive the likelihood of damage, and to recover from the postdisruption effects successfully. Failures in engineering resilience imply the inability of a system to adequately adapt to changes following a mishap, instead of system breakdowns or malfunctions [73]. In addition, while there are multiple potential disruptions, an engineered system may possess different resilience performance toward different disruptive events. Depending on the severity of disruptions, the system could be more resilient to one type of disruption, but not to other types [57].

Engineering Design Implication

Based on the surveyed resilience quantification metrics, what engineering resilience has to offer from a system design perspective will be discussed in this section. Considering the perception of failure probability, a certain level of resiliency can be designed into a system to improve the system performance against disruptive events, as depicted in Fig. 12. In order to develop a high-resilience and low-cost engineered system from the system design perspectives, there are two questions with regards to integrating resilience in engineered systems: (1) How to connect the resilience quantification metrics to system design parameters, thereby assessing the resilience of different design alternatives? and (2) What resilience strategies can be used in engineering design to generate design alternatives and improve resilience of engineered systems? These two key questions will be further discussed as follows. Section 4.1 discusses the resilience attributes in general from a design perspective, Sec. 4.2 discusses predictive resilience analysis of system design alternatives, Sec. 4.3 describes potential resilience strategies that could be used in design in order to improve the resilience of an engineered system, and Sec. 4.4 provides the discussion for the challenges and further research needs in design for resilience.

Resilience Attributes for Design.

For a system to be resilient against disruptive events or potential failures, there are two essential properties that a system should possess before or after the occurrence of a perturbation, as shown in Fig. 13. The first one is the ability of the system to maintain function without failures, or generally referred to as “reliability.” The second one is the ability of the system to recover from misfortunes, or the ability to “recovery” or “recoverability.” These two key attributes of resilience could be designed and engineered to enable the failure resilience for an engineered system. Reliability and recovery attributes have also been viewed as passive and proactive survival rates [25,52], static and dynamic resilience [57], or absorptive capacity and adaptive capacity [5].

Considering the resilience quantification metrics suggested by the resilience curve, a resilient-engineered system can be designed by minimizing the performance losses for a given disruptive event. This design strategy can be further realized through reducing the impact of a disruptive event, such as reducing the magnitude and duration of performance losses, or increasing the speed of recovery. Similar implications can be drawn from the resilience quantification metrics based on the pre- and postdisruption performances. Although these metrics provide a conceptual representation of resilience in a straightforward manner, incorporating them into engineering design is still very challenging. Due to the growing complexity of an engineered system, as well as the difficulty of precisely knowing how a system would respond to the disruptive event at the early stage of system design, it would be very challenging to measure the resilience level for different design alternatives precisely.

Compared to the resilience metrics based on a resilience curve or pre- and postdisruption performances, the resilience metrics suggested by reliability and restoration as surveyed in Sec. 3.3 (which are often probabilistically measured) could offer a better choice for system designers when designing an engineered system to be failure resilient. As mentioned in Sec. 3.3, reliability can be precisely quantified through the probability that the system or component will perform its required functions under stated conditions for a specified operating period, and measured systematically by a probability distribution of time to failure. In addition, unreliability, survivability, or vulnerability is another term that could be used to describe reliability in a system. The resilience concept extends the concept of reliability by incorporating the ability to recover from disruptive events into the system. As suggested by the resilience quantification metrics based on reliability and restoration, not only reliability of a system must be designed but the ability to recovery from a performance disruption must also be engineered in order for an engineered system to be failure resilient. Compared to tremendous amount of research and development in the area of design for reliability, research in the area of design for restoration is still very limited, despite its importance in realizing engineering resilience.

Besides the reliability and recovery, other resilience attributes have also been studied including the ability of a system to monitor its operations, anticipate potential failures, response to failures, and learn from failures [74]. The ability of a system to monitor includes tracking the changes in its own performance as well as its environment, allowing a disruptive event to be anticipated, minimized, or avoided. When a disruptive event has been anticipated, more coherent, timely, and effective responses can be expected from the system. If the responses of the system are not the desired responses, the ability of the system to learn allows the system to learn from the experience, so that the ability to monitor, anticipate, and response can be enhanced.

Predictive Resilience Analysis.

While designing an engineered system to be failure resilient, it is essential for system designers to be able to assess the resilience levels for different design alternatives in order to make the best design decision. There are many uncertainty factors should be taken into account, while converting a conceptual framework to a designable resilience measure and further developing predictive resilience analysis techniques. A conceptual resilience framework is composed of many factors that affect the system performance in terms of resilience characteristics inherent in the system. As surveyed in Sec. 3, resilience quantification metrics have mostly related system performance outcomes after a disruption to system resilience. How the system responds in the aftermath of a disruption will largely determine the resiliency level of the system, thus one of the essential and challenging tasks in predictive resilience analysis is being able to analyze system disruption responses at early system design stage.

In the early design stage, an engineering assessment technique for predictive resilience analysis is very much needed for system designers to gain necessary knowledge of how the system responds to a disruptive event, and whether the resilience level in a system design is sufficient. The methodologies and tools available in the literature for assessing engineering resilience in the design process are still very limited. This is primarily because assessing the further performance of a system in its operating stage during the design process is challenging. Although advanced system simulation techniques have given system designers more capability in predictive analysis, it is still challenging to take into account the interdependencies and complexities of an engineered system, the uncertainties associated with system design and operation, and the emergent changes in the long term that may affect the system operating conditions.

One of the primary challenges in predictive resilience analysis is the development of effective system modeling techniques, so that the interdependencies and complexity of an engineered system can be modeled, and the performance of the system undergoing a disruption can be simulated and analyzed at the design stage. Some preliminary studies have been reported in the literature in addressing this challenge. One way to understand the design architecture of a complex engineered system by utilizing approaches from game theory and social network analysis [75]. Interdependency between entities can be expressed in the terms of algebraic connectivity. However, this approach requires an accurate modeling of a complex engineered system as an interconnected graph, which could be very challenging in the cases where a large amount of interdependent components and subsystems are considered, thereby the graphical model expands tremendously in size. Thus, recent research efforts have been directed toward adapting a combination of logical and statistical approaches, such as the Bayesian or the Markov approach. Reasoning copes with complexity, and probability handles uncertainty. Bayesian network (BN) approach has been proposed as a way to handle interdependencies [57,70]. A Bayesian network (BN) approach has been applied to assessing the resiliency of a supply chain [3], a production system line [25], and a system-of-system [70]. Figure 14 shows the BN modeling framework that has been reported for engineering resilience analysis and design [25]. In the BN approach, the important system characteristics or critical components are represented as nodes, the interdependencies between components are modeled as links, and the overall complexity of the system structure is demonstrated through the combination of links and nodes. Moreover, in BN the uncertainties are represented as conditional probabilities in multiple possible states. Considering the dynamic or evolving behavior of the system performance over time, the dynamic Bayesian network (DBN) could be further employed [7678]. However, updating the BN or DBN to accommodate system changes for a complex system may be laborious and computationally intensive.

Besides the interdependency and complexity of an engineered system, it is also challenging to take into account the emergent behavior of the system due to the recovery effects, as well as the evolving operating environment. An example would be in the design of a transportation infrastructure system to accommodate more automatically driving vehicles in the future. In a different application, employing partially observable Markov decision process (POMDP) has been proposed for designing resilient spacecraft swarms [79]. Although POMDP allows self-learning and is self-adaptive, a strenuous initial condition is required to define the behavior, reward, and actions to enable an accurate self-learning capability. Considering the evolving characteristics of complex adaptive systems (CASs), the agent-based simulation technique could be potentially used by system designers as a sophisticated tool for analyzing the disruptions in an adaptive evolving simulation environment [26,79,80]. Although some initial efforts have been made in modeling engineering systems for the resilience assessment as reported in the literature, more effective predictive resilience analysis methodology and tools that are readily used in various system design applications should be developed in addition to uncovering different engineering resilience quantification metrics.

Engineering Resilience Strategies.

As discussed in Sec. 4.1, there are two essential resilience attributes that an engineered system must possess in order to be failure resilient, namely, reliability and recovery. The resilience strategies discussed in this section are focused on how to improve the reliability and ability to recover through system designs. Since reliability and recovery are designable quantities, they could be utilized in transforming the conceptual resilience to the designable resilience attributes, enabling system designers to develop resilient-engineered systems, as demonstrated in Fig. 15. Accordingly, design strategies used for advancing reliability and recovery could be implemented for the purpose of advancing resilience in the system. In the rest of this section, design strategies for the improvement of reliability and recovery are further discussed.

Improving Reliability Through Design.

As one of the important design attributes for engineering resilience, reliability is a relatively mature concept within the design community. Reliability can be generally defined as the probability that the system or component will perform its required functions under stated conditions for a specified operating period. Accordingly, substantial research efforts have been made in the past few decades in designing engineered systems for reliability, leading to mutual design frameworks and tools being developed in the literature, such as the reliability-based design optimization framework [66,8184], effective reliability analysis methods for design [8589], and postdesign reliability assessment and growth.

There are different approaches and design strategies to improve the reliability of an engineered system or component. While considering single failure mode, it is beneficial to understand the failure mechanism and physics of failure so that appropriate reliability design strategy could be identified such as discovering new materials, mechanisms or new design concepts, or developing a reliability growth plan. While considering reliability at a systems level with multiple components and failure modes, one of the most used design techniques in improving reliability is the incorporation of redundancy into the system. Reliability allocation could be used to allocate reliability attributes to component and subsystems optimally in design while considering redundancy levels. In addition, when dealing with uncertainties in most engineering applications, there is no certain way that all the failure modes could be taken into account in the early design stage. Therefore, derating and diversity are other design techniques that can be adopted to improve reliability. Derating could be found in the applications where higher tolerance components are used for extra endurance instead of components with normal specifications. Diversity can be seen in logistics applications, such as having a diversity of suppliers to ensure the reliability of the continuous supply process.

Besides the design strategies in improving the system reliability, failure diagnostics, prognostics and health management (PHM), and appropriate operation and maintenance (O/M) plans could also be developed to improve the system reliability in operations. PHM is an emerging engineering discipline that has been applied to a large variety of engineered systems to improve system reliability [9094]. It diagnoses the performance degradation of a system through its operational performance data, thereby predicting the remaining useful life (RUL) of the system. PHM can significantly enhance the reliability, availability, and predictability of the system by providing the early awareness of potential system failures, thus enabling optimized planning of failure mitigation and recovery activities.

Improving the Ability to Recover.

Different with improving the reliability through design, the ability of an engineered system to recovery often relates to the aftermath of disruptive events of the system, which makes it more challenging for system designers to consider it thoroughly at the early stage of a system design process. In many applications, a swift recovery process also depends on the amount of available resources and time. Thus, optimal allocation, high-level preparedness, and good collaboration can be designed into a system with the relation to the decision makers or managerial-level.

Redundancy is also in line with recovery strategies, since it offers an alternative path for maintaining system functionality when a disruption event occurs due to failure of a component or subsystem. Similarly, maintenance actions in mitigating potential failures or recovering the functionality of failed components or subsystems would be another design strategy that could be applied to enhance the ability to recover for an engineered system. Preventive maintenance is associated more with reliability attributes because it is usually used to maintain the healthy condition of a system to prevent a complete system failure. This is opposed to corrective maintenance, which is typically carried out to restore the system to an operational condition, leaning more toward recovery attributes. Development of an effective maintenance plan includes not only the maintenance planning to be optimized but also the system designs to be more effective in conducting the planned maintenance actions. Additionally, functional retrofits that apply partial changes to a system at the operation stage to restore its capacity or improve performance have gradually become a major cost-effective means to maintain desired system functionality of an engineered system over its lifecycle. Functional retrofits through partial system repair, replacement, or upgrade could be a viable strategy in improving the ability to recover, given that these retrofits could be appropriately projected and engineered at the system design stage.

The PHM technique that diagnoses the performance degradation of a system through its operational performance data could facilitate an optimized planning of failure mitigation and recovery actions. The PHM technique could not only improve system reliability by offering early awareness of system failures but also play an important role in improving the ability of an engineered system to recover from the aftermath of disruptive events. This is because the PHM technique enables a proactive approach to address failures at the life cycle use phase through detecting, diagnosing, and predicting the system-wide effects of disruptive events and providing valuable information for failure mitigation and recovery decisions. A resilient design of an engineered system would expect the system to be intelligent so that it can make autonomous decisions to recognize risk induced by a potential hazard or disruptive event, and adjust or reconfigure itself in response to risk [79,80]. Advanced resilience design could leverage the capability offered by the PHM technology in order to develop self-learning or self-restructuring capabilities for the design of a resilient-engineered system [76]. The PHM technique has been successful in lowering system lifecycle costs by providing precise information about operational stage failures. However, in order to realize the resilience through failure prognosis and prognosis-informed maintenance or functional retrofits, a generally applicable PHM system development framework that ensures high accuracy and robustness needs to be developed for the design of resilient-engineered system.

In summary, both reliability and recovery are essential resilience attributes that are quantifiable and designable, and some examples of design strategies for the reliability and recovery improvements are listed in Table 2, which are still being perfected with the continuous progress in design for resilience researches and developments.

Challenges and Further Researches.

From the surveyed resilience quantification metrics and the discussion on their engineering design implications, it is postulated that resilience of an engineered system could be enhanced through better design. The enhancement could be realized from the improvement of designable resilience attributes through appropriate design strategies. However, to achieve resilient designs of engineered systems, challenges from multiple aspects must be addressed. In this subsection, based upon the authors' best knowledge, the challenges and the further research needs are discussed.

As shown by most of the resilience quantification metrics, the resilience measure is closely tied to the system performance changes throughout a disruptive event. Early awareness of potential disruptive events and the aftermath of these events at the early system design stage is one of the primary challenges in resilient system design. This challenge is posed to system designers at the early design stage, as they have to be aware of potential disruptive events, the factors of complexity, and uncertainties in their design applications. They also have to be aware of how these factors would affect the behavior of the system when undergoing one of the disruptive events, before the system is actually being developed.

Disruptions can be categorized according to the types, sources, or impact levels, such as natural or man-made, external or internal, and local or global [95]. Disruptions do not necessarily need to have a sudden fatal impact on the system. Aging and degradation due to long hours of operation could be considered as disruptions as well. Furthermore, a minor disruption will only alter a small part of the system characteristics, whereas a disruption with severe impact could be fatal for the system. From the disruption aspects, the context (behavior, mode, and state), the duration (temporary, permanent, and trend), and the risk (likelihood and severity of damage) should be considered in the design of a resilience scheme [79].

Complexity is generally associated with the hierarchy and the collective behavior of the system. For example, the interdependency between system components and subsystems, different subfunctions, as well as between the system and its operating environment, would substantially increase the complexity of the system. A system in general consists of multiple components and subsystems that are interconnected and interact with each other in various different ways. The collective behaviors of lower-level systems regulate the top-level system performance. Depending on the severity and the impact of the disruptive events, a partial failure, common cause failures, or cascading failures could occur. Either failure imposes negative effects on a system that is indicated by an overall lower system performance level. From the system characteristic viewpoint, the architecture or hierarchy, the collective behavior, the interdependencies, and the functionality of the system should not be disregarded in the scheme of designing resilience.

To address the challenges as outlined above in designing resilient-engineered systems, there is a great research need for a theoretical basis that furnishes a better understanding of how engineered systems achieve resilience, as well as enables the development of an engineering resilience principle readily applicable to engineering design. In the rest of this subsection, several emergent research needs are discussed from four different aspects. This discussion is not intend to be exhaustive, but rather to throw light on further research directions and to stimulate more valuable insights from the community to address the research challenges in designing resilient-engineered systems.

Early Awareness of Disruptive Events.

In the early design stage, it is essential for system designers to be aware of potential disruptive events for their design applications, and be able to have necessary knowledge in terms of the likelihood of occurrence for each of these disruptive events. Although information on the failure rates of different types of system failures exists in the literature, these failure-induced disruptions are largely within the scope of a particular system or due to human error, and primarily considered independently. Knowledge about disruptions is induced by external factors, such as natural disasters or external environments, and their cascading effects due to the interdependency between system components and subsystems are primarily dependent on subjective expert judgments. Understanding the characteristics of these potential disruptive events would enable the development of failure mitigation and recovery techniques to be included in the consideration of system designs. In addition, early awareness of the potential disruptions would help the development of system monitoring, diagnostic, and prognostic techniques so that these potential disruptive events could be avoided or their consequences could be minimized.

Capability of Predictive Resilience Analysis.

During the system design process, it is also essential for system designers to be able to assess the resilience levels for different design alternatives. Thus, enabling techniques for predictive resilience analysis applicable at the early design stage is of paramount importance. The development of advanced complex system modeling methodology and associated system simulation tools would largely enhance the capability of system designers in predictive resilience analysis. The modeling technique must be able to take into account the complexity of systems, simulate the aftermath of system disruptions and system responses to these disruptions, consider the uncertainties associated with system design and operation, and further be adaptive to emergent changes in system operating conditions.

Recovery Strategies for Design.

As discussed in Sec. 4.3, recovery is one essential resilience attribute to be designed for a resilient-engineered system. However, recovery of the performance of degraded or partially failed engineered systems has largely relied on maintenance activities or functional retrofits. The strategies that can be used in design primarily depend on the allocation of redundancy, as it offers an alternative path for maintaining system functionality when a disruption event occurs due to failures of a component or subsystem. Although the PHM research could improve the ability to recover by facilitating an optimized planning of failure mitigation and recovery actions, failure recovery strategies that can used for the system designers in the design stage are very limited. Further research directions are very much needed in the new venue of exploring diverse failure recovery strategies that can be readily used for engineering design. These research needs would generally fall into either developing new performance recovery pathways, such as the use of self-healing materials [96,97] for design, or better implementing of existing recovery strategies, such as an advanced operation and maintenance planning method. Additional efforts would also need to be spent on design decisions on different recovery strategies and design alternatives in achieving the recovery of system performance after the disruptions.

Cost Assessment and Systems Engineering.

With increasing complexity and long projected useful lives of engineered systems, design decisions to ensure resilience of the system generally have to be made while simultaneously considering the costs or affordability. Thus, a lifecycle cost assessment framework that takes into account all costs associated for the improvements on each of the resilience attributes with the resilience strategies must be developed and incorporated into the decision-making process while designing a resilient-engineered system.

In addition, during the process of designing a resilient-engineered system, not only reliability but also the ability to recover from a disruption must be designed in order for an engineered system to be failure resilient, as suggested by the resilience quantification metrics. Advanced systems engineering tools for design, such as those for tradespace explorations [98100], must also be developed and used to facilitate the generation and assessment of different design alternatives considering interdependencies, design constraints, different design outcomes, and their lifecycle costs.

Conclusion

This paper presented a literature survey of engineering resilience quantification metrics from a system design perspective. The engineering resilience quantification metrics reported in the literature were reviewed and summarized in three categories. With the surveyed resilience quantification metrics, the design implications toward the development of resilient-engineered systems were discussed, with the focus on the resilience attributes, predictive resilience analysis, and design strategies for resilience. The challenges of incorporating resilience into the engineering design processes were discussed, and the future research needs were outlined from four different perspectives, with an aim of inspiring future research directions and arousing valuable insights from the community to address the research challenges in designing resilient-engineered systems. The presented study expects to serve as a building block toward developing a generally applicable engineering resilience analysis and design framework that can be readily used for system design.

Acknowledgment

This research was partially supported by the National Science Foundation through Faculty Early Career Development (CAREER) award (CMMI-1351414) and by the Department of Transportation through University Transportation Center (UTC) Program.

Nomenclature

E[•] =

expected value

ei =

disruptive event

P(t) =

system performance level over time

Po =

initial system performance level before disruption

Pv =

system performance level after disruption

R =

system reliability

T =

control period (T = tn − td)

td =

occurrence time of the disruptive event

tn =

time to new recovered state

to =

initial time

tv =

time to vulnerable or degraded state

T* =

a long period of time

ρ =

system recovery/restoration

Ψ =

system resilience

References

1.
DeAngelis
,
D. L.
,
1980
, “
Energy Flow, Nutrient Cycling, and Ecosystem Resilience
,”
Ecology
,
61
(
4
), pp.
764
771
.
2.
Walker
,
B.
,
Holling
,
C. S.
,
Carpenter
,
S. R.
, and
Kinzig
,
A.
,
2004
, “
Resilience, Adaptability and Transformability in Social-Ecological Systems
,”
Ecol. Soc.
,
9
(
2
), p.
5
.http://www.ecologyandsociety.org/vol9/iss2/art5
3.
Yodo
,
N.
, and
Wang
,
P.
,
2016
, “
Resilience Analysis for Complex Supply Chain Systems Using Bayesian Networks
,”
AIAA
Paper No. 2016-0474.
4.
Goerger
,
S. R.
,
Madni
,
A. M.
, and
Eslinger
,
O. J.
,
2014
, “
Engineered Resilient Systems: A DoD Perspective
,”
Procedia Comput. Sci.
,
28
, pp.
865
872
.
5.
Francis
,
R.
, and
Bekera
,
B.
,
2014
, “
A Metric and Frameworks for Resilience Analysis of Engineered and Infrastructure Systems
,”
Reliab. Eng. Syst. Saf.
,
121
, pp.
90
103
.
6.
Righi
,
A. W.
,
Saurin
,
T. A.
, and
Wachs
,
P.
,
2015
, “
A Systematic Literature Review of Resilience Engineering: Research Areas and a Research Agenda Proposal
,”
Reliab. Eng. Syst. Saf.
,
141
, pp.
142
152
.
7.
Steen
,
R.
, and
Aven
,
T.
,
2011
, “
A Risk Perspective Suitable for Resilience Engineering
,”
Saf. Sci.
,
49
(
2
), pp.
292
297
.
8.
Hollnagel
,
E.
,
Nemeth
,
C. P.
, and
Dekker
,
S. W. A.
,
2008
,
Resilience Engineering Perspectives: Remaining Sensitive to the Possibility of Failure
, Vol.
1
,
Ashgate
,
Aldershot, UK
, pp.
1
13
.
9.
Dekker
,
S.
,
Hollnagel
,
E.
,
Woods
,
D.
, and
Cook
,
R.
,
2008
, “
Resilience Engineering: New Directions for Measuring and Maintaining Safety in Complex Systems
,”
Lund University School of Aviation
.https://www.researchgate.net/profile/Erik_Hollnagel/publication/238687807_Resilience_Engineering_New_directions_for_measuring_and_maintaining_safety_in_complex_systems/links/0046353148e497e765111408.pdf
10.
Neches
,
R.
, and
Madni
,
A. M.
,
2013
, “
Towards Affordably Adaptable and Effective Systems
,”
Syst. Eng.
,
16
(
2
), pp.
224
234
.
11.
Omer
,
M.
,
Mostashari
,
A.
, and
Nilchiani
,
R.
,
2013
, “
Assessing Resilience in a Regional Road-Based Transportation Network
,”
Int. J. Ind. Syst. Eng.
,
13
(
4
), pp.
389
408
.
12.
Mattsson
,
L. G.
, and
Jenelius
,
E.
,
2015
, “
Vulnerability and Resilience of Transport Systems: A Discussion of Recent Research
,”
Transp. Res., Part A
,
81
, pp.
16
34
.
13.
Alipour
,
A.
, and
Shafei
,
B.
,
2016
, “
Seismic Resilience of Transportation Networks With Deteriorating Components
,”
J. Struct. Eng.
,
142
(
8
), p.
C4015015
.
14.
Miller-Hooks
,
E.
,
Zhang
,
X.
, and
Faturechi
,
R.
,
2012
, “
Measuring and Maximizing Resilience of Freight Transportation Networks
,”
Comput. Oper. Res.
,
39
(
7
), pp.
1633
1643
.
15.
Tamvakis
,
P.
, and
Xenidis
,
Y.
,
2012
, “
Resilience in Transportation Systems
,”
Procedia-Soc. Behav. Sci.
,
48
, pp.
3441
3450
.
16.
Reggiani
,
A.
,
Nijkamp
,
P.
, and
Lanzi
,
D.
,
2015
, “
Transport Resilience and Vulnerability: The Role of Connectivity
,”
Transp. Res., Part A
,
81
, pp.
4
15
.
17.
Cox
,
A.
,
Prager
,
F.
, and
Rose
,
A.
,
2011
, “
Transportation Security and the Role of Resilience: A Foundation for Operational Metrics
,”
Transp. Policy
,
18
(
2
), pp.
307
317
.
18.
Henry
,
D.
, and
Ramirez-Marquez
,
J. E.
,
2012
, “
Generic Metrics and Quantitative Approaches for System Resilience as a Function of Time
,”
Reliab. Eng. Syst. Saf.
,
99
, pp.
114
122
.
19.
Bhavathrathan
,
B. K.
, and
Patil
,
G. R.
,
2015
, “
Capacity Uncertainty on Urban Road Networks: A Critical State and Its Applicability in Resilience Quantification
,”
Comput., Environ. Urban Syst.
,
54
, pp.
108
118
.
20.
Ouyang
,
M.
,
Dueñas-Osorio
,
L.
, and
Min
,
X.
,
2012
, “
A Three-Stage Resilience Analysis Framework for Urban Infrastructure Systems
,”
Struct. Saf.
,
36
, pp.
23
31
.
21.
Saurin
,
T. A.
, and
Júnior
,
G. C. C.
,
2011
, “
Evaluation and Improvement of a Method for Assessing HSMS From the Resilience Engineering Perspective: A Case Study of an Electricity Distributor
,”
Saf. Sci.
,
49
(
2
), pp.
355
368
.
22.
Farid
,
A. M.
,
2015
, “
Static Resilience of Large Flexible Engineering Systems: Axiomatic Design Model and Measures
,”
IEEE Syst. J.
,
PP
(99), pp.
1
12
.
23.
Ouyang
,
M.
, and
Dueñas-Osorio
,
L.
,
2014
, “
Multi-Dimensional Hurricane Resilience Assessment of Electric Power Systems
,”
Struct. Saf.
,
48
, pp.
15
24
.
24.
Ouyang
,
M.
, and
Wang
,
Z.
,
2015
, “
Resilience Assessment of Interdependent Infrastructure Systems: With a Focus on Joint Restoration Modeling and Analysis
,”
Reliab. Eng. Syst. Saf.
,
141
, pp.
74
82
.
25.
Yodo
,
N.
, and
Wang
,
P.
,
2016
, “
Resilience Modeling and Quantification for Engineered Systems Using Bayesian Networks
,”
ASME J. Mech. Des.
,
138
(
3
), p.
031404
.
26.
Wang
,
X.
,
Qi
,
C.
,
Wang
,
H.
,
Si
,
Q.
, and
Zhang
,
G.
,
2015
, “
Resilience-Driven Maintenance Scheduling Methodology for Multi-Agent Production Line System
,”
27th Chinese Control and Decision Conference
(
2015 CCDC
), Qingdao, China, May 23–25, pp.
614
619
.
27.
Salzano
,
E.
,
Di Nardob
,
M.
,
Gallob
,
M.
,
Oropallob
,
E.
, and
Santillob
,
L. C.
,
2014
, “
The Application of System Dynamics to Industrial Plants in the Perspective of Process Resilience Engineering
,”
Chem. Eng. Trans.
,
36
, pp.
457
462
.
28.
Okoh
,
P.
, and
Haugen
,
S.
,
2015
, “
Improving the Robustness and Resilience Properties of Maintenance
,”
Process Saf. Environ. Prot.
,
94
, pp.
212
226
.
29.
Gu
,
X.
,
Jin
,
X.
,
Ni
,
J.
, and
Koren
,
Y.
,
2015
, “
Manufacturing System Design for Resilience
,”
Procedia CIRP
,
36
, pp.
135
140
.
30.
Shirali
,
G. A.
,
Mohammadfam
,
I.
, and
Ebrahimipour
,
V.
,
2013
, “
A New Method for Quantitative Assessment of Resilience Engineering by PCA and NT Approach: A Case Study in a Process Industry
,”
Reliab. Eng. Syst. Saf.
,
119
, pp.
88
94
.
31.
Spiegler
,
V. L.
,
Naim
,
M. M.
, and
Wikner
,
J.
,
2012
, “
A Control Engineering Approach to the Assessment of Supply Chain Resilience
,”
Int. J. Prod. Res.
,
50
(
21
), pp.
6162
6187
.
32.
Soni
,
U.
, and
Jain
,
V.
,
2011
, “
Minimizing the Vulnerabilities of Supply Chain: A New Framework for Enhancing the Resilience
,”
2011 IEEE International Conference on Industrial Engineering and Engineering Management
(
IEEM
), Singapore, Dec. 6–9, pp.
933
939
.
33.
Wang
,
D.
, and
Ip
,
W. H.
,
2009
, “
Evaluation and Analysis of Logistic Network Resilience With Application to Aircraft Servicing
,”
IEEE Syst. J.
,
3
(
2
), pp.
166
173
.
34.
Carvalho
,
H.
,
Barroso
,
A. P.
,
Machado
,
V. H.
,
Azevedo
,
S.
, and
Cruz-Machado
,
V.
,
2012
, “
Supply Chain Redesign for Resilience Using Simulation
,”
Comput. Ind. Eng.
,
62
(
1
), pp.
329
341
.
35.
Sheffi
,
Y.
, and
Rice
,
J. B.
, Jr.
,
2005
, “
A Supply Chain View of the Resilient Enterprise
,”
MIT Sloan Manage. Rev.
,
47
(
1
), pp.
41
48
.http://search.proquest.com/openview/ab02ef85c43466ea1085994bc7340615/1?pq-origsite=gscholar
36.
Munoz
,
A.
, and
Dunbar
,
M.
,
2015
, “
On the Quantification of Operational Supply Chain Resilience
,”
Int. J. Prod. Res.
,
53
(
22
), pp.
6736
6751
.
37.
Dixit
,
V.
,
Seshadrinath
,
N.
, and
Tiwari
,
M. K.
,
2016
, “
Performance Measures Based Optimization of Supply Chain Network Resilience: A NSGA-II+ Co-Kriging Approach
,”
Comput. Ind. Eng.
,
93
, pp.
205
214
.
38.
Brandon-Jones
,
E.
,
Squire
,
B.
,
Autry
,
C. W.
, and
Petersen
,
K. J.
,
2014
, “
A Contingent Resource-Based Perspective of Supply Chain Resilience and Robustness
,”
J. Supply Chain Manage.
,
50
(
3
), pp.
55
73
.
39.
Murino
,
T.
,
Romano
,
E.
, and
Santillo
,
L. C.
,
2011
, “
Supply Chain Performance Sustainability Through Resilience Function
,”
Winter Simulation Conference
, pp.
1605
1616
.http://dl.acm.org/citation.cfm?id=2431707
40.
Gopalakrishnan
,
K.
, and
Peeta
,
S.
,
2010
,
Sustainable and Resilient Critical Infrastructure Systems: Simulation, Modeling, and Intelligent Engineering
,
Springer
,
Chennai, India
, pp.
84
91
.
41.
Hudson
,
S.
,
Cormie
,
D.
,
Tufton
,
E.
, and
Inglis
,
S.
,
2012
, “
Engineering Resilient Infrastructure
,”
Proc. ICE—Civ. Eng.
,
165
(
6
), pp.
5
12
.
42.
Reed
,
D. A.
,
Kapur
,
K. C.
, and
Christie
,
R. D.
,
2009
, “
Methodology for Assessing the Resilience of Networked Infrastructure
,”
IEEE Syst. J.
,
3
(
2
), pp.
174
180
.
43.
Omer
,
M.
,
Nilchiani
,
R.
, and
Mostashari
,
A.
,
2009
, “
Measuring the Resilience of the Trans-Oceanic Telecommunication Cable System
,”
IEEE Syst. J.
,
3
(
3
), pp.
295
303
.
44.
Baroud
,
H.
,
Ramirez-Marquez
,
J. E.
,
Barker
,
K.
, and
Rocco
,
C. M.
,
2014
, “
Stochastic Measures of Network Resilience: Applications to Waterway Commodity Flows
,”
Risk Anal.
,
34
(
7
), pp.
1317
1335
.
45.
Tamvakis
,
P.
, and
Xenidis
,
Y.
,
2013
, “
Comparative Evaluation of Resilience Quantification Methods for Infrastructure Systems
,”
Procedia-Soc. Behav. Sci.
,
74
, pp.
339
348
.
46.
Shafieezadeh
,
A.
, and
Burden
,
L. I.
,
2014
, “
Scenario-Based Resilience Assessment Framework for Critical Infrastructure Systems: Case Study for Seismic Resilience of Seaports
,”
Reliab. Eng. Syst. Saf.
,
132
, pp.
207
219
.
47.
Shah
,
S. S.
, and
Babiceanu
,
R. F.
,
2015
, “
Resilience Modeling and Analysis of Interdependent Infrastructure Systems
,”
IEEE Systems and Information Engineering Design Symposium
(
SIEDS
), Charlottesville, VA, Apr. 24, pp.
154
158
.
48.
Nemeth
,
C.
,
Wears
,
R.
,
Woods
,
D.
,
Hollnagel
,
E.
, and
Cook
,
R.
,
2008
, “
Minding the Gaps: Creating Resilience in Health Care
,”
Advances in Patient Safety: New Directions and Alternative Approaches
(Performance and Tools), Vol. 3, National Center for Biological Information, Bethesda, MD.http://www.ncbi.nlm.nih.gov/books/NBK43670/
49.
Cimellaro
,
G. P.
,
Fumo
,
C.
,
Reinhorn
,
A. M.
, and
Bruneau
,
M.
,
2008
, “
Seismic Resilience of Health Care Facilities
,”
14th World Conference on Earthquake Engineering
(
14WCEE
), Beijing, China, Oct. 12–17, Paper No. S21-001.http://www.iitk.ac.in/nicee/wcee/article/14_S21-001.PDF
50.
Patterson
,
M. D.
, and
Wears
,
R. L.
,
2015
, “
Resilience and Precarious Success
,”
Reliab. Eng. Syst. Saf.
,
141
, pp.
45
53
.
51.
Cimellaro
,
G. P.
,
Reinhorn
,
A. M.
, and
Bruneau
,
M.
,
2010
, “
Seismic Resilience of a Hospital System
,”
Struct. Infrastruct. Eng.
,
6
(
1–2
), pp.
127
144
.
52.
Youn
,
B. D.
,
Hu
,
C.
,
Wang
,
P.
, and
Yoon
,
J. T.
,
2011
, “
Resilience Allocation for Resilient Engineered System Design
,”
J. Inst. Control, Rob. Syst.
,
17
(
11
), pp.
1082
1089
.
53.
Rafi
,
M.
, and
Steck
,
J.
,
2013
, “
Response and Recovery of an MRAC Advanced Flight Control System to Wake Vortex Encounters
,”
AIAA
Paper No. 2013-5209.
54.
Rafi
,
M.
,
Steck
,
J. E.
, and
Rokhsaz
,
K.
,
2012
, “
A Microburst Response and Recovery Scheme Using Advanced Flight Envelope Protection
,”
AIAA
Paper No. 2012-4444.
55.
Rafi
,
M.
,
Steck
,
J. E.
, and
Watkins
,
J.
,
2016
, “
Application of a Kalman Filter for Reduction of Sensor/Turbulence-Induced Noise Within a Model Reference Adaptive Controller
,”
AIAA
Paper No. 2016-1625.
56.
Li
,
J.
, and
Xi
,
Z.
,
2014
, “
Engineering Recoverability: A New Indicator of Design for Engineering Resilience
,”
ASME
Paper No. DETC2014-35005.
57.
Uday
,
P.
, and
Marais
,
K.
,
2015
, “
Designing Resilient Systems-of-Systems: A Survey of Metrics, Methods, and Challenges
,”
Syst. Eng.
,
18
(
5
), pp.
491
510
.
58.
Ayyub
,
B. M.
,
2014
, “
Systems Resilience for Multihazard Environments: Definition, Metrics, and Valuation for Decision Making
,”
Risk Anal.
,
34
(
2
), pp.
340
355
.
59.
Ayyub
,
B. M.
,
2015
, “
Practical Resilience Metrics for Planning, Design, and Decision Making
,”
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A
,
1
(
3
), p.
04015008
.
60.
Rose
,
A.
,
2007
, “
Economic Resilience to Natural and Man-Made Disasters: Multidisciplinary Origins and Contextual Dimensions
,”
Environ. Hazards
,
7
(
4
), pp.
383
398
.
61.
Dessavre
,
D. G.
,
Ramirez-Marquez
,
J. E.
, and
Barker
,
K.
,
2016
, “
Multidimensional Approach to Complex System Resilience Analysis
,”
Reliab. Eng. Syst. Saf.
,
149
, pp.
34
43
.
62.
Hosseini
,
S.
,
Barker
,
K.
, and
Ramirez-Marquez
,
J. E.
,
2016
, “
A Review of Definitions and Measures of System Resilience
,”
Reliab. Eng. Syst. Saf.
,
145
, pp.
47
61
.
63.
Bruneau
,
M.
,
Chang
,
S. E.
,
Eguchi
,
R. T.
,
Lee
,
G. C.
,
O'Rourke
,
T. D.
,
Reinhorn
,
A. M.
,
Shinozuka
,
M.
,
Tierney
,
K.
,
Wallace
,
W. A.
, and
von Winterfeldt
,
D.
,
2003
, “
A Framework to Quantitatively Assess and Enhance the Seismic Resilience of Communities
,”
Earthquake Spectra
,
19
(
4
), pp.
733
752
.
64.
Zobel
,
C. W.
, and
Khansa
,
L.
,
2014
, “
Characterizing Multi-Event Disaster Resilience
,”
Comput. Oper. Res.
,
42
, pp.
83
94
.
65.
Renschler
,
C. S.
,
Frazier
,
A.
,
Arendt
,
L.
,
Cimellaro
,
G. P.
,
Reinhorn
,
A. M.
, and
Bruneau
,
M.
,
2010
, “
A Framework for Defining and Measuring Resilience at the Community Scale: The PEOPLES Resilience Framework
,”
MCEER
, Buffalo, NY, p.
GCR10-930
.https://www.researchgate.net/profile/Amy_Frazier/publication/284507306_Framework_for_defining_and_measuring_resilience_at_the_community_scale_The_PEOPLES_resilience_framework/links/565e082408ae1ef92983a0ea.pdf
66.
Wang
,
Z.
, and
Wang
,
P.
,
2014
, “
A Maximum Confidence Enhancement Based Sequential Sampling Scheme for Simulation-Based Design
,”
ASME J. Mech. Des.
,
136
(
2
), p.
021006
.
67.
Li
,
Y.
, and
Lence
,
B. J.
,
2007
, “
Estimating Resilience for Water Resources Systems
,”
Water Res. Res.
,
43
(
7
), p.
W07422
.
68.
Attoh-Okine
,
N. O.
,
Cooper
,
A. T.
, and
Mensah
,
S. A.
,
2009
, “
Formulation of Resilience Index of Urban Infrastructure Using Belief Functions
,”
IEEE Syst. J.
,
3
(
2
), pp.
147
153
.
69.
Chang
,
S. E.
, and
Shinozuka
,
M.
,
2004
, “
Measuring Improvements in the Disaster Resilience of Communities
,”
Earthquake Spectra
,
20
(
3
), pp.
739
755
.
70.
Han
,
S. Y.
,
Marais
,
K.
, and
DeLaurentis
,
D.
,
2012
, “
Evaluating System of Systems Resilience Using Interdependency Analysis
,”
2012 IEEE International Conference on Systems, Man, and Cybernetics
(
SMC
), Seoul, Korea, Oct. 14–17, pp.
1251
1256
.
71.
Pant
,
R.
,
Barker
,
K.
,
Ramirez-Marquez
,
J. E.
, and
Rocco
,
C. M.
,
2014
, “
Stochastic Measures of Resilience and Their Application to Container Terminals
,”
Comput. Ind. Eng.
,
70
, pp.
183
194
.
72.
Franchin
,
P.
, and
Cavalieri
,
F.
,
2015
, “
Probabilistic Assessment of Civil Infrastructure Resilience to Earthquakes
,”
Comput.-Aided Civ. Infrastruct. Eng.
,
30
(
7
), pp.
583
600
.
73.
Rahimi
,
M.
, and
Madni
,
A. M.
,
2014
, “
Toward a Resilience Framework for Sustainable Engineered Systems
,”
Procedia Comput. Sci.
,
28
, pp.
809
817
.
74.
Hollnagel
,
E.
,
2011
, “
RAG-The Resilience Analysis Grid
,”
Resilience Engineering in Practice: A Guidebook
,
Ashgate
,
Farnham, UK
, pp.
275
295
.
75.
Mehrpouyan
,
H.
,
Haley
,
B.
,
Dong
,
A.
,
Tumer
, I
. Y.
, and
Hoyle
,
C.
,
2015
, “
Resiliency Analysis for Complex Engineered System Design
,”
Artif. Intell. Eng. Des., Anal. Manuf.
,
29
(
01
), pp.
93
108
.
76.
Cai
,
B.
,
Liu
,
Y.
,
Zhang
,
Y.
,
Fan
,
Q.
,
Liu
,
Z.
, and
Tian
,
X.
,
2013
, “
A Dynamic Bayesian Networks Modeling of Human Factors on Offshore Blowouts
,”
J. Loss Prev. Process Ind.
,
26
(
4
), pp.
639
649
.
77.
Hu
,
J.
,
Zhang
,
L.
,
Ma
,
L.
, and
Liang
,
W.
,
2011
, “
An Integrated Safety Prognosis Model for Complex System Based on Dynamic Bayesian Network and Ant Colony Algorithm
,”
Expert Syst. Appl.
,
38
(
3
), pp.
1431
1446
.
78.
Murphy
,
K. P.
,
2002
, “
Dynamic Bayesian Networks: Representation, Inference and Learning
,”
Doctoral dissertation
, University of California, Berkeley, CA.http://www.cs.ubc.ca/~murphyk/Thesis/thesis.pdf
79.
Sievers
,
M.
, and
Madni
,
A. M.
,
2016
, “
Agent-Based Flexible Design Contracts for Resilient Spacecraft Swarms
,”
AIAA
Paper No. 2016-0476.
80.
Farid
,
A. M.
,
2015
, “
Designing Multi-Agent Systems for Resilient Engineering Systems
,”
Industrial Applications of Holonic and Multi-Agent Systems
,
Springer
,
Berlin
, pp.
3
8
.
81.
Youn
,
B. D.
, and
Wang
,
P.
,
2008
, “
Bayesian Reliability-Based Design Optimization Using Eigenvector Dimension Reduction (EDR) Method
,”
Struct. Multidiscip. Optim.
,
36
(
2
), pp.
107
123
.
82.
Du
,
X.
, and
Chen
,
W.
,
2004
, “
Sequential Optimization and Reliability Assessment Method for Efficient Probabilistic Design
,”
ASME J. Mech. Des.
,
126
(
2
), pp.
225
233
.
83.
Tu
,
J.
,
Choi
,
K. K.
, and
Park
,
Y. H.
,
1999
, “
A New Study on Reliability-Based Design Optimization
,”
ASME J. Mech. Des.
,
121
(
4
), pp.
557
564
.
84.
Liang
,
J.
,
Mourelatos
,
Z. P.
, and
Tu
,
J.
,
2004
, “
A Single-Loop Method for Reliability-Based Design Optimization
,”
ASME
Paper No. DETC2004-57255.
85.
Youn
,
B. D.
,
Xi
,
Z.
, and
Wang
,
P.
,
2008
, “
Eigenvector Dimension Reduction (EDR) Method for Sensitivity-Free Probability Analysis
,”
Struct. Multidiscip. Optim.
,
37
(
1
), pp.
13
28
.
86.
Youn
,
B. D.
, and
Wang
,
P.
,
2009
, “
Complementary Intersection Method for System Reliability Analysis
,”
ASME J. Mech. Des.
,
131
(
4
), p.
041004
.
87.
Wang
,
Z.
, and
Wang
,
P.
,
2012
, “
A Nested Extreme Response Surface Approach for Time-Dependent Reliability-Based Design Optimization
,”
ASME J. Mech. Des.
,
134
(
12
), p.
121007
.
88.
Wang
,
Z.
, and
Wang
,
P.
,
2016
, “
Accelerated Failure Identification Sampling for Probability Analysis of Rare Events
,”
Struct. Multidiscip. Optim.
,
54
(
1
), pp.
137
149
.
89.
Xu
,
H.
, and
Rahman
,
S.
,
2004
, “
A Generalized Dimension-Reduction Method for Multidimensional Integration in Stochastic Mechanics
,”
Int. J. Numer. Methods Eng.
,
61
(
12
), pp.
1992
2019
.
90.
Pecht
,
M.
,
2008
,
Prognostics and Health Management of Electronics
,
Wiley
,
Hoboken, NJ
.
91.
Bai
,
G.
,
Wang
,
P.
, and
Hu
,
C.
,
2015
, “
A Self-Cognizant Dynamic System Approach for Prognostics and Health Management
,”
J. Power Sources
,
278
, pp.
163
174
.
92.
Wang
,
P.
,
Youn
,
B. D.
, and
Hu
,
C.
,
2012
, “
A Generic Probabilistic Framework for Structural Health Prognostics and Uncertainty Management
,”
Mech. Syst. Signal Process.
,
28
, pp.
622
637
.
93.
Hu
,
C.
,
Youn
,
B. D.
,
Wang
,
P.
, and
Yoon
,
J. T.
,
2012
, “
Ensemble of Data-Driven Prognostic Algorithms for Robust Prediction of Remaining Useful Life
,”
Reliab. Eng. Syst. Saf.
,
103
, pp.
120
135
.
94.
Wang
,
P.
,
Tamilselvan
,
P.
, and
Hu
,
C.
,
2014
, “
Health Diagnostics Using Multi-Attribute Classification Fusion
,”
Eng. Appl. Artif. Intell.
,
32
, pp.
192
202
.
95.
Madni
,
A. M.
, and
Jackson
,
S.
,
2009
, “
Towards a Conceptual Framework for Resilience Engineering
,”
IEEE Syst. J.
,
3
(
2
), pp.
181
191
.
96.
Hager
,
M. D.
,
Greil
,
P.
,
Leyens
,
C.
,
van der Zwaag
,
S.
, and
Schubert
,
U. S.
,
2010
, “
Self-Healing Materials
,”
Adv. Mater.
,
22
(
47
), pp.
5424
5430
.
97.
Toohey
,
K. S.
,
Sottos
,
N. R.
,
Lewis
,
J. A.
,
Moore
,
J. S.
, and
White
,
S. R.
,
2007
, “
Self-Healing Materials With Microvascular Networks
,”
Nat. Mater.
,
6
(
8
), pp.
581
585
.
98.
Sitterle
, V
. B.
,
Freeman
,
D. F.
,
Goerger
,
S. R.
, and
Ender
,
T. R.
,
2015
, “
Systems Engineering Resiliency: Guiding Tradespace Exploration Within an Engineered Resilient Systems Context
,”
Procedia Comput. Sci.
,
44
, pp.
649
658
.
99.
Spero
,
E.
,
Avera
,
M. P.
,
Valdez
,
P. E.
, and
Goerger
,
S. R.
,
2014
, “
Tradespace Exploration for the Engineering of Resilient Systems
,”
Procedia Comput. Sci.
,
28
, pp.
591
600
.
100.
Spero
,
E.
,
Bloebaum
,
C. L.
,
German
,
B. J.
,
Pyster
,
A.
, and
Ross
,
A. M.
,
2014
, “
A Research Agenda for Tradespace Exploration and Analysis of Engineered Resilient Systems
,”
Procedia Comput. Sci.
,
28
, pp.
763
772
.