A Strategy to Improve Infrastructure Survivability via Prioritizing Critical Nodes Protection

From an engineering point of view, the survivability of a system is defined as its ability to continue to operate despite a natural or human-made disturbance; for example a serious mechanical fault, a human error, or a malicious cyber or physical attack. In the context of critical infrastructures, due to their relevance for the public wellness, it is mandatory to improve the robustness of such systems in order to ensure the availability of essential services such as the distribution of water, gas and electrical power. Nowadays, due to the increasing number of cyber incidents, the definition of protection strategies, able to improve the survivability level of this infrastructure, is at the heart of the scientific debate. In this chapter we propose a procedure based on three steps aimed at improving infrastructure survivability. In the first stage we propose some approaches to identify the criticality degree of each subsystem composing the infrastructure, in the second stage we propose a method to aggregate multiple criticality evaluations performed by subject matter experts by providing a unique holistic indicator. Finally, on the basis of such indicator, we propose a protection strategy to improve the robustness of the entire system.


Introduction
The physical and cyber protection of critical infrastructures (CIs) is crucial to ensure the availability of multiple essential services. Concerning the physical security aspects, critical infrastructures are, in most cases, complex and geographically distributed systems hence hard to protect. Regardless of the specific scenario, a CI can be represented as a set of sub-systems able to interact and cooperate in order to provide services that are essential for the economy, society and public wellness. For example, in gas distribution systems, the cooperation of metering and regulation stations is fundamental to guarantee the proper functioning of the entire infrastructure. In power grids and water distribution infrastructures, the availability of electrical power and water, depends respectively on the joint action of singular sub-systems such as bus or water supply stations. Analogously, the correct operation of a plant depends on the right operativeness of several elements as illustrated by the 4STER European project.
Critical infrastructure are characterized by a high level of interconnection and interdependency where the operation of a subsystem is essential for the functioning of others. In such a context, the disruption of a subsystem can easily escalate creating waterfall effect impacting multiple services and geographic areas. Therefore, in order to guarantee the functioning of the entire infrastructure it is necessary to protect adequately each sub-system from fault or exogenous events potentially capable of compromising normal operativity levels. As reported in [1], on the 28th September 2003, in Italy and some areas of Switzerland, about 56 million people lost power due to a storm-tossed tree branch that hit Swiss power lines. About 30,000 people remained trapped in trains, several hundred passengers were stranded on underground transit systems, and there were significant knock-on effects across other critical infrastructures. Similarly, the 2005 Hurricane Katrina [2] caused widespread power outages throughout Louisiana, Mississippi, Alabama, Florida, Kentucky and Tennessee due to the cascading effects initiated by a local event. Another example is the 2011 Great East Japan earthquake [3] and the resulting tsunami: 1.5 million households did not have access to their water supply, 4.4 million households were left without electricity, and all the local railway services were halted, and communications were suspended.
Domino effects over the entire infrastructure due to local fault are not caused only by accidental faults or natural disasters, but could also be intentionally caused by malicious actors. For example, with the increasing reliance of CI on Information & Communication Technology (ICT) malicious actors can perform attacks via cyberspace triggering service disruptions significant economic losses and even kinetic effects. This has been particular concerning in relation to the energetic sector with a significant increase of cyber threats capable of causing outages and blackout in power systems.
The first example of how a cyber attack can affect the operativity of CI causing mechanical damage was provided by the Aurora project [4]. This was a test performed by the Idaho National in which the simulation of a cyberattack led to the destruction of a 27-ton generator. Another Significant example is represented by the Stuxnet worm. The worm was able to modify the rotation speed of particular motors installed inside the centrifuges used for the uranium enrichment in plant in Iran. Similarly, recent blackouts in Ukraine in 2015 and 2016 were respectively caused by Blackenergy3 and CrashOverride, two malware specially designed to cause blackouts via cyber intrusion [5].
In addition, we have to consider impacts on workers' safety. Power plants, water plants, gas plants can provoke accidents and enormous damages. Seveso plants can be used for the storage of hazardous materials: an attack aimed at these plants can also cause a domino effect. The capability to adjust machine parameters in order to improve performance or simply in order to change behavior can make other people with criminal intent adjust parameters so that workers and others can be put at risk of harm. Example of parameters can be speeds, forces, torques that can be put at dangerous levels. In addition, graphical interfaces used for human-machine interaction can be altered so that people could see a situation not corresponding to reality (not reported error codes or messages, different values of parameters or measures). In order to identify hazards associated to the use of a machine or a set of machines, procedures like HAZOP, HAZID, accident reviews must be taken into consideration. Anyway, security and safety must be considered as part of the normal working processes and not always this happens.
The main common aspect about these cited events is that a local event is able to compromise the functionality of the entire plants due to a domino effect. The identification of the most critical sub-systems is a crucial point for the definition of effective protection strategies able to improve the survivability of the systems. To this end, it is fundamental to identify adequate metrics and indicators to quantify the criticality rate associated to each sub-system, especially in highly heterogeneous contexts.

Related works
From the literature, one of the typical strategies to obtain such metrics is to simulate the effects of negative events, such as local faults, in order to provide insights on the most critical elements, for which protection needs to be raised. In particular, a well-established approach is to focus on intentional attacks, considering a rational attacker that aims at maximizing the damage while keeping low the effort required for his/her malicious action. Starting from the seminal works of Arulsevan et al. [6] it has become paramount that attacks that take into account the topology of the infrastructure, can select more effectively the target sites, increasing the damage dealt (e.g., in terms of disconnection of large portions of the infrastructure by causing services interruption). In [7][8][9][10] multiple approaches for the identification of critical nodes in infrastructure networks are presented. All these methods consists in optimization problems able to discover the nodes whose removal from the network compromise the connectivity of the entire system. All these approaches requires initial assumptions about the attacker budget and preferences despite this information are not available in general in a real context. Moreover, the results of these approaches are able to highlight the most critical node in a network but not provide a metric capable of quantifying the degree of criticality for each node of the infrastructure. In more details, the approach presented in [7] proposes a method, able to identify the most critical nodes, based on the result of an optimization problem characterized by the presence of assumptions about the strategy of an attacker in terms of available budget and dimension of disconnected components. Similar assumptions are considered also in the approach presented in [8,9], the authors propose a method which aims at minimizing the attack cost against the infrastructure with constraints about the features of the network. Finally, assumptions about the attacker preferences are also required in the formulation presented in [10]. In general, centrality measures, such as the node degree or betweenness centrality are often adopted as criticality measures, while in [11] the authors propose a critical index for the elements of a CI by analyzing the solutions of a multi objective optimization problem without any assumption about the attacker behaviour. However, the adoption of a unique metric or indicator about the criticality rate of each node of the system is quite unrealistic due to the complex nature of the infrastructures. Two approaches able to consider multiple metrics with the aim to compute a final aggregated criticality holistic indicator are presented in [12,13]. The proposed approaches take into account multiple indicators based on multiple data source (topology data, field-related data, expert evaluations, etc.) but not provide a final step necessary to define a defensive strategy and evaluate its effectiveness.

Contribution and outline of the chapter
In this chapter we want to propose a procedure able to define a defensive strategy for CIs based on multiple node criticality measures. In more details, the procedure is based on three steps, as depicted in Figure 1: In the first stage (Section 2) we provide some specific criticality measure for CIs based on the connectivity of the system. The identification of the criticality measures is a fundamental stage in the defensive strategy definition process. In literature, graph centrality measures are often adopted as criticality measures for infrastructure but these approaches (e.g. Node degree or node betweenness) are quite ineffective as proved in [11]. In the second stage (Section 3) a methodology to merge multiple criticality metrics, based on the well-known Analytic Hierarchy Process [14], is described in order to overcome the limit about the application of a single metric in a complex environment. Moreover, such methodology allows considering also the criticality evaluations given for a subset of infrastructure nodes. The definition of the defensive strategy is provided in the last step (Section 4) and its effectiveness is proved by analyzing the global robustness of the network with respect to multiple robustness evaluation methods. Finally, in (Section 5), the application of the three-step procedure is illustrated with respect to the case study network with the aim of proving the effectiveness of the proposed strategy.

Notation
Let us denote by |X| the cardinality of a set X; moreover, we represent vectors via boldface letters, and we use k m to indicate a vector in  m whose components are all equal to k, while by I n we identify the n Â n identity matrix. Finally, we denote the sign of x ∈  by sign x ð Þ and by sign X ð Þ the entry-wise sign of a matrix X. Let Figure 2). The adjacency matrix of a graph G is starting at a node v i ∈ V and ending at a node is a subset of links in E that connects v i and v j without creating loops. An undirected graph G ¼ V, E f gis connected if each node can be reached by each other node by means of the links in E.
For the sake of clarity, we report here the notation adopted in the rest of the chapter.
Normalized pairwise connectivity for a graph with adjacency matrix A and without considering nodes v i s:t:

Node criticality metrics based on network connectivity
As mentioned above the first step of the proposed approach for the identification of a defensive strategy is the identification of metrics of interests able to evaluate the network criticalities from multiple points of view. Despite in literature this process is often reduced to a simple centrality measure computation, in this section we propose two other applicable approaches, based on the infrastructure connectivity, to compute the criticality of each sub-systems of a CI. For the sake of clarity, in this context we represent the entire infrastructure via undirected graph An edge connects two nodes if a real physical connection exists between the two corresponding sub-systems.
Both the approaches for the critical node identification, presented in this section are based on the concept of connectivity. In our models, when a node is attacked and is unable to operate, we remove the node and the incident edges from the graph. The deletion of particular critical nodes could compromise the connectivity of the other elements of the network. Notice that, for each node v i we consider a removal cost c i > 0. With the aim to measure the degree of connectivity of the graph G, we adopt the Pairwise Connectivity (PWC), it is an index that captures the overall degree of connectivity of a graph on the basis of the couples of nodes connected by means of edges in G. where is connected via a path in G, and is zero otherwise. Noting that the maximum number of couples of nodes in a graph with n nodes is n nÀ1 ð Þ 2 , the normalized pairwise connectivity (NPWC) is defined as: Remark 1 NPWC G ð Þ is a measure of connectivity of the graph G, in fact, it is easy to note that G connected⇔NPWC G ð Þ ¼ 1: When NPWC G ð Þ< 1, the graph is not connected, but the larger NPWC G ð Þ is, the more G is "close" to be a connected graph.
□ We now provide a more descriptive definition of a NPWC by taking into account a subset of attacked nodes. Let A be the adjacency matrix of an undirected graph G ¼ V, E f gand let x ∈  n be a column vector whose entries x i = 0 if the i-th node has been removed due to an attack or a fault and x i = 1 otherwise, we define the connectivity as: whereÂ ij ¼ A ij x i x j , 1 n is a column vector composed by n entries equal to 1.

A critical index based on optimization problem
The definition of the Critical Index χ i for a node v i , come directly from the solutions of a multi-objective problem defined by assuming the point of view of a malicious attacker.
In Eq. (5) the behavior of an attacker is defined as a multi-objective optimization problem characterized by two conflicting objectives: the reduction of the connectivity in terms of NPWC and the simultaneous minimization of the required attack effort in terms of removal cost. We reiterate that if an attacker want to disconnect a node v i from the graph then (s)he must pay a cost c i .
Problem 1 where x represents the vector of decision variables, whose entries x i are equal to 0 if the node v i is involved in the attack, 1 otherwise and where and where c ¼ c 1 … c n ½ T is the vector whose entries represent the cost necessary to remove each node from the graph.
As described in [11], in general, a multi-objective problem is characterized by the presence of multiple optimal solutions x j ð Þ collected in the Pareto front set P.  Figure 3. In [11], the Critical Index χ i is defined as in Eq. (8): where |P| represents the number of solutions in the Pareto front. In other words it is defined as the ratio between the frequency with which a node v i is involved in the attacks listed in the Pareto front and its cardinality. If the critical index χ i is close to 0 this implies that the node is rarely involved in attack plans, instead, the closer it is to 1, more frequently the node is involved in optimal attack strategies.

A critical index based on a cooperative game
An alternative approach for the identification of the most critical nodes in a network is presented in [15]. Analogously to the critical index based on the results of the multi-objective optimization problem, the proposed method is based on the concept of NPWC. Differently from the previous critical index, this measure come from the game theory and is based on the solution of a cooperative game.
A cooperative game, sometimes called a value game or a profit game, is a competition among groups of players. Formally, a cooperative game is defined by a set of players P and a characteristic function v : 2 N )  þ which associate to all possible coalitions of players a utility rate. The function describes how much collective payoff a set of players can gain by forming a coalition. Let P be the set of players, and g : 2 P )  þ a function that satisfies the following properties: The cooperative game Γ P, g ð Þ is defined by the couple P, g ð Þ where the elements of P are the players of the game and the characteristic function of the game g S ð Þ estimates the utility of each coalition S ∈ 2 P .
Cooperative games can be solved via multiple approaches, the Shapley value [16] is one of the possible concepts of solution. The Shapley value assigns to each player i ∈ P, a reward ϕ i . The larger is the contribution given by i in all the possible coalitions of players, based on the function g, the larger is the reward ϕ i for the player i.
The Shapley value is a column vector Φ whose entries are ϕ i are defined according to Eq. (9).
With the aim to adopt these concepts to provide a critical index able to quantify the criticality of each node of the network, a cooperative game Γ N, nPWC ð Þis defined. The set of players is represented by the set of nodes N while the characteristic function g is NPWC (Eq. (3)). Notice that, in [15], it is demonstrated that the NPWC satisfy the two fundamental properties of a characteristic function.
The solution of the proposed game will assign a reward to each node in V proportional to its contribution to the connectivity expressed in terms of NPWC, hence the Shapley value can be considered a valid node criticality metric.

A multi-criteria vulnerability detection index
As briefly introduced in Section 1, a research of the most critical nodes based on a single metric is practically worthless and extremely simplistic. In this section we propose an approach able to provide a holistic indicator able to take into account multiple criticality evaluations based on multiple metrics also in presence of incomplete data. The proposed method is based on the well-known Analytic Hierarchy Process (AHP) introduced by Saaty [17]. For a given set of m alternatives, relative utility ratios r i =r j are defined by experts. Such a setting is typical in contexts involving human decision-makers, which are usually more comfortable providing relative comparisons among the utilities of the different alternatives (e.g., "Alternative i is twice better than alternative j"), rather than directly assessing an absolute utility value of each alternative (i..e, "The value of alternative i is ..."). The AHP is a procedure able to estimate the absolute utilities r i starting from the given utility ratios r i =r j . See [17] for additional notions about the AHP.
We now suppose to have m different metrics M 1 … M m . According to these metrics, the entries of the column vectors r 1 ð Þ … r m ð Þ represents the criticality rate of each node of the graph. Notice that the method is applicable also if for some metrics the criticality ratio of some node is not available [12]. Finally, let w 1 … w m be positive weights defined by subject-matter experts (SMEs) representing the relevance of each metric. The larger is the weight associated to the i-th metric, the larger the influence of such metric in the final holistic indicator. Such weights can be obtained also resolving AHP on the basis of pair-wise comparisons between the different metrics.
For each metric we define the n Â n matrix R i ð Þ whose entries are defined as follows: In other words, the matrix R i ð Þ collects the relative utility ratios between the a-th and b-th nodes according to the i-th metric if both the evaluation are available. Notice that some ratio r to this reason, we treat zero-valued entries as not available data.
By considering the matrices R i ð Þ , we aim at finding the aggregated holistic indicator r * ∈  n that solves the following problem.
Problem 2 Find r * ∈  n that solves The holistic indicator r * is a new node criticality measure that represents a compromise between the m initial metrics M 1 … M m by taking into account the SMEs preferences w i . In other words, Problem 2 aims at finding the criticality indicator r * a , assigned to the a-th node, such that the ratios r * a =r * b minimize the deviation from the ratios R i ð Þ for the m considered metrics.

Defensive strategy definition and evaluation
In this section we propose a methodology to define a defensive strategy able to improve the survivability of the network with a focus on the connectivity maintenance with respect to nodes deletion. As introduced in Section 2, an attack cost c i is associated to each node v i . Our aim is the definition of a new distribution of the budget in order to minimize the loss of connectivity in case of malicious attacks.
Let B ¼ P n i¼1 c i the defensive budget computed on the basis of the initial removal costs. We propose a new allocation of the budget by defining the removal cost proportionally to the holistic indicator r * described in Section 3. Hence, we define the new removal costsĉ i as follows:ĉ It is now necessary evaluate the robustness of a network with a particular defensive strategy. As introduced in Section 2.1, due to its multi-objective nature, Problem 1 is characterized by the presence of multiple optimal solutions collected in the Pareto front P. Each optimal solution x j ð Þ is associated to a couple of values: a particular connectivity value f 1 x j ð Þ À Á and an attack cost f 2 x j ð Þ À Á , where f 1 and f 2 represent the two objective function of Problem 1.
In [11], the global robustness index Q is defined as the area under the polygonal chain connecting the points ( ) in the Pareto front using trapezoidal rule for numerical integration.
As depicted in Figure 3, Q is a measure of the overall robustness of the network. In fact, the larger is the area, the higher is the value of the objectives associated to the solutions in the Pareto front; hence, high values of the global robustness index correspond to networks where the attacker is not able to deal large damage, or deals large damage only for large effort.

Case study
In this section we prove the effectiveness of the proposed three-stage methodology able to improve the network survivability via critical nodes protection. The proposed strategy is tested on the CI represented by the network depicted in Figure 4. Notice that the case study is based on a network that does not represents a real infrastructure. The network is composed by n ¼ 15 nodes and e ¼ 35 edges. As discussed in Section 2, the first step of the methodology is devoted to the identification of criticality measures able to take into account the effects about the disconnection of a node from the graph by evaluating the loss of connectivity of the entire infrastructure. Notice that, the removal costs c i are set to 1 for each node of the infrastructure.
The first columns in Table 1 collect the metrics defined by Eqs. 5 and 6 respectively. Concerning the distribution of the critical indices χ i , the largest value are associated to the nodes 10 and 3. Notice that, the deletion of such nodes divides the nodes in two partitions, hence it strongly compromises the connectivity of the network in terms of nPWC.
Similar results are obtained by considering the computation of the Shapley value in order to solve the cooperative game as described in Section 2.2. We remark that this approach assigns a reward to each node of the network according to their contribution to the connectivity of the entire network by considering all the possible partitions of nodes. Notice that, the results computed via Shapley not consider the removal cost c i while the results of Problem 1 take into account also this aspect, moreover, in this case study all the removal costs c i are set to 1.
Finally, the fourth and fifth columns of Table 1 collect the node degree and the betweenness centrality [18] for each node in the graph. In the last column of Table 1, we show the criticality rate for each node according to the new holistic indicator computed as in Eq. (7) considering m=4 metrics (i.e. the critical index, the Shapley Value, the node Degree and the Betweenness centrality). According to the procedure defined in Section 3, we have set the metric relevance as follows: w 1 ¼ 0:3, w 2 ¼ 0:3, w 3 ¼ 0:2, and w 4 ¼ 0:2 in order to emphasize the criticality metrics based on the concept of PWC.
The nodes color in Figure 4 depends on the aggregated criticality values, according to the colormap. On the basis of this new indicator, the node 10 is the most critical node of the graph, in fact the deletion of this node strongly compromise the connectivity of the network and the creation of two disconnected partitions. Due to the same reason, a high criticality rate is also assigned to the nodes 4 and 3. Despite the node 14 is not essential for the connectivity, this node is characterized by a high node degree, in fact it is considered, according to the holistic indicator, as the fourth most critical node in the network.
Starting from the results obtained by computing the holistic indicator r * , we adopt a defensive strategy by defining a new attack costĉ i , for each node, proportional to its holistic criticality rate as defined in Eq. (8). Notice that the defensive budget B ¼ P n i¼1 c i ¼ 15. The effectiveness of the proposed defensive strategy is proved by considering the global robustness index Q , we remark that it came from the solution of Problem 1 and it is defined as the area under the Pareto front. As depicted in Figure 5, the new allocation of the defensive budget B is very effective to contrast an attacker especially with limited budget. In more details, in case of uniform defensive strategy (i.e. all the attack costs set to 1) the area under the Pareto front is equal to Q ¼ 0:1229, while the new budget allocation (Eq. (8)) based on the holistic indicator r * improves the network robustness by increasing the area to Q ¼ 0:1591.

Conclusions
In this chapter we provide a methodology for the definition of a defensive strategy via prioritizing the critical nodes of the network. Due to the complexity of a CI, the adoption of a unique metric for the identification of the node criticality is simplistic, to this end we propose a strategy, based on the AHP, able to merge multiple metrics which take into different aspects of the network. Moreover, the proposed aggregation procedure is applicable also in case of incomplete data. Among the multiple metrics applicable in the merging process, in this chapter we propose two metrics characterized by a focus on the network connectivity. In the one hand the critical index is computed on the basis of a multi objective optimization problem. Assuming an attacker perspective and knowing the topology of the network, the problems aims at identifying the nodes whose removal compromise the connectivity of the entire system. On the other hand, we propose the adoption of the Shapley value as a criticality evaluation by defining a cooperative game among the nodes of the network. Finally, we propose the definition of a defensive strategy that assigns to each node a removal cost proportional to the holistic indicator. Future improvement will be devoted to the inclusion of a final check able to include a final validation based on expert opinions. One of the possible validity check is based on the well-known face validity approach [19], it refers to the transparency or relevance of a test as it appears to test participants.