Ranks of nodes in the network given in Figure 3 by Wks.
The knowledge of node’s ability and importance in spreading information in a complex network is important for developing efficient methods either to decelerate spreading in the case of diseases or to accelerate spreading in the case of information flow, which would benefit the whole population. Some systems are highly affected by a small fraction of influential nodes. Number of fast and efficient spreaders in a network is much less compared to the number of ordinary members. Information about the influential spreaders is significant in the planning for the control of propagation of critical pieces of information in a social or information network. Identifying important members who act as the fastest and efficient spreaders is the focal theme of a large number of research papers. Researchers have identified approximately 10 different methods for this purpose. Degree centrality, closeness centrality, betweenness centrality, k‐core decomposition, mixed degree decomposition, improved k‐shell decomposition, etc., are some of these methods. In this expository article, we review all previous works done in the field of identifying potential spreaders in a network.
- social networks
- information diffusion
- node centrality
- k‐shell decomposition
- improved k‐shell decomposition
- weighted k‐shell decomposition
- directed networks
- degree centrality
- closeness centrality
- betweenness centrality
Identifying and analyzing various kinds of network have become an important theme in the frontiers of research for the past 50 years. It is an emerging area which demands research activities of interdisciplinary and collaborative nature. Network research spreads over a variety of fields such as Mathematics, Physics, Chemistry, Computer Science, Biology and Social Sciences. Recently, network research has proved its importance by establishing itself as a new domain of research. It brings together and coordinates activities and research findings of researchers from other fields. Based on the nature and type of subject area and research, the entire field is divided into many subfields. Social networks, biological networks, chemical networks, physical networks, information networks, etc., are some of the most important ones.
Networks are being used for representing, analyzing and explaining complex systems. Complex systems have structural and behavioral characteristics. Both characteristics influence each other. Structural properties enable us to classify networks as random networks, scale‐free networks, small‐world networks and so on. Networks have been almost completely studied on the basis of its structure. Current scenario in the field is to establish relationship between the structural properties and the behavior of the network. When we speak of behavior, the behavior of the members and the behavior of the entire network come into picture. Behavior of the entire network is either the average of the behavior of the members or the totality of the behavior of the members.
An important area in the behavioral study on a network is the study of flow of information among the members. Information in this context of study is very general, which can be interpreted as spread of virus either in human contact networks [1, 2] or in a computer network [3, 4]. In some context, it is a spread of person‐specific information in ego networks , or rumor that spreads in social networks [6, 7]. Irrespective of the nature and content of the information, there are some important factors which greatly affect the whole process of information diffusion in a social network. Structure of the underlying network is one factor. It is a global property. Ego networks and neighborhood of each actor in a network are important properties, which are classified as local properties. Rumor spread in social networks has got special attention as a kind of information spread. Rumor is unverified and instrumentally relevant information, the origin of which is uncertain and usually spread by word of mouth. Gossip and urban legend are also forms of information which spread in a social network by the way of social contacts. Character of the actors who receive a piece of information and transmit to other is a third factor, which would also greatly affect the information diffusion process.
In this chapter, we concentrate on some aspects of network‐specific properties that influence transmission of information. Role of actors in a network varies according to their relative importance in the network and difference in their character. Characteristic difference and its effect are far from the scope of this chapter. Details of contacts in a network give us very fruitful information about the transmission process. For example, if the network is completely connected, a piece of information can be received by any other in a single step. In a disconnected network, the chance of not receiving the information by some actor cannot be completely removed. If the structure of the network is like a tree, average time taken for the information to reach the nodes depends on the place where it is generated. If the source of information has connection with all the remaining actors, the spread can end in one step. Networks can be modeled as graphs in Mathematics in which nodes represent interacting elements and edges represent interaction among the interacting elements.
Importance of nodes in the transmission of information is a matter of great concern in literature of information diffusion in networks [8, 9, 10]. As mentioned in the previous paragraph, some nodes that have large number of connections are highly important in the sense that any piece of information in their hands may spread faster than that in the hands of others. In other words, degree of nodes in a network is a measure of the importance of nodes. Centrality is a measure of a network that characterizes the importance of nodes. In other words, centrality measures are useful to analyze how “central” an individual node in a network. It is a function that assigns a real number value to each vertex in a network. Based on the value of this function, we can rank the nodes in the network. This measure in some context is used to study flow of something in a network. In some other context, it is a measure of cohesion among actors . A centrality measure which is meant for some purpose may not give the right conclusion in a different context. In the following section, we review some important centrality measures along with an analysis of its ability to rank nodes. Each centrality measure is developed either as a generalization of an existing measure or as a new method which rectifies the shortcoming of an existing methods. We also propose a new measure of centrality which considers the importance of all nodes and edges present in a network while calculating the function value of each node. This measure can uniquely rank each different node in a network in a different class. If two nodes are equally important in the network topology, they fall in the same class. Thus, this method is seems to be more powerful than all other methods.
Clear knowledge of powerful or influential nodes in a network is valuable for many reasons. Certain nodes play a key role in the propagation of information or in the spreading of disease. If we know the role of the nodes in dissemination of information, we can manage to control the speed of information transfer through the network. This technique has great applications in the control of disease and in blocking the diffusion of annoying information such as rumors, negative behaviors, spreading of virus and so on. It can also be used in the campaigning of government, political parties, NGOs, public agencies and even in advertisements.
In general, networks are dynamic. Dynamic networks change over time. In a dynamic network, some nodes which are very important in some point of time may lose its status at some later point of time. Some nodes which are initially inactive may become very important through the interactions with other members in the network. Centrality measures are classified as local metrics, global metrics and hybrid metrics. There are many centrality methods which can be used to rank spreaders such as degree, k‐shell, mixed degree decomposition, neighborhood coreness, extended neighborhood coreness and so on.
2. Comparison of important centrality measures
In this section, we discuss important centrality measures and relative importance. We also mention their drawbacks as evident from various contexts.
2.1. Degree centrality
In 1978, Freeman introduced the degree centrality . He asserted that the degree of a focal node is the number of adjacencies in a network, that is, the number of nodes that the focal node is connected to. This is a basic centrality, and it is often used as a first step in the study of a network. The degree measure can be formulated as:
where is the focal node and represents all other nodes. is the total number of nodes. is the adjacency matrix, in which , if node and are connected and 0, otherwise.
The degree of a node is defined as the number of other nodes connected directly to a given node. In degree decomposition, we rank the nodes of a network according to the descending order of degrees of nodes. Nodes with highest degree are ranked , nodes with next higher degree as rank and so on. This ranking method is known as degree decomposition. The complexity of this method is , where is the number of nodes. It is a local metric. This is the first centrality measure that appeared in literature.
As an illustration, a network is given in the Figure 1 and degree of all vertices is calculated. There are six vertices of degree one. These vertices are labeled as 1, 2, 3, 7, 9 and 15. The vertex labeled 14 has degree two. The vertices 5, 6 and 13 have degree three, and the vertices 4, 10, 11 and 12 have degree four. The vertex 8 has degree six. In the sense of degree centrality, the vertex 8 is the most powerful in the network and the vertices 4, 10, 11 and 12 occupy the next lower position. In this class, there are four members. A close look in the network will reveal the fact that the status of the vertices 11 and 12 is same. But 10 is more powerful than 11 and 12, because it is connected to the most powerful member in the network (vertex 8). But 4 in that group is much weaker than 10, 11 and 12. Clearly, this ranking scheme lacks accuracy.
In the case of weighted networks, degree centrality can be extended as sum of weights. This measure has been labeled as node strength and formulated as follows:
where is the weighted adjacency matrix, in which is greater than 0 if node is connected to node , and the value represents the weight of the link between and . This is equal to the usual degree centrality if the network is binary.
In both case, node strength is a rough measure, as it only takes into consideration a node’s total connections with other nodes in the network. Clearly, degree and strength are both indicators of the level of connections of a node to other nodes of the network. It is important to incorporate both these measures for studying the new centrality of a node.
In Ref. , a new centrality measure is proposed. It is the average of the number of nodes connected to a focal node and the strength of that node. It is calculated using the following formula.
In the formula, is a parameter that can set between 0 and 1. If is near 0, the degree is more important, and if it is near 1, then node strength is more important.
Directed networks add complexity to degree centrality as two additional aspects of a node’s involvement are to be incorporated in the calculations. The activity of a node can be quantified by the number of ties that originate from a node, denoted by , and the number of ties that are directed toward a node, denoted by . For a weighted network, and can be defined as the total weight attached to the outgoing and incoming ties, respectively. Taking into account the weights of the directed ties, the previous formula is modified in  to assess a node’s importance in a directed network:
The value of α in these equations is defined as above.
2.2. Closeness centrality
The closeness centrality was also proposed by Freeman . It is based on the length of the shortest paths among nodes in a network. In a binary network, the shortest path is found by minimizing the intermediary nodes, and its length is defined as the minimum number of ties linking the two nodes, directly or indirectly. We define the shortest distance between the vertices and as
Here stands for intermediary nodes on a path between nodes and . The basic assumption in shortest distance is that the intermediary nodes increase the cost of the interaction and the higher number of intermediary nodes increases the time taken for interaction between nodes. Again intermediary nodes are powerful third parties and can distort information or delay information passage between nodes. In unweighted networks, the shortest path of two nodes is through the smallest number of intermediary nodes.
Closeness centrality relies on the length of the paths from a node to all other nodes. Closeness centrality is defined as:
We know that diseases are more likely to be transferred from one person to another person if they have frequent interaction. Frequency of interaction can be quantified and used as weights of links. In weighted networks, distance between two nodes and is defined as
where is intermediary nodes and denotes its weights.
Here, closeness centrality is defined as
In Ref. , Opsahl et al. propose a new closeness centrality by introducing the tuning parameter α, in finding the least costly path. This ensures that the both the weights of the ties and the number of intermediary nodes are considered in the identification of length of the path. Thus, length of the path is defined as
In the formula, α is a tuning parameter. If α = 0, the proposed measure produces the same output as in the case of unweighted network. When α = 1, this method gives the same result as the method proposed by Freeman. A value α < 1 assigns the path with greatest number of intermediary nodes, the longest distance. Conversely, α > 1, the impact of additional intermediary nodes is relatively unimportant compared to the strength of the ties and paths with more intermediaries.
2.3. Betweenness centrality
Betweenness centrality is measured for each vertex in a network. It quantifies the number of times each node appeared as a bridge along the shortest path between any two other nodes. This measure was also proposed by Freeman in 1958 . Betweenness centrality for unweighted networks is defined as
where is the number of shortest paths between nodes and and is the number of shortest paths between nodes and and passing through node . This quantity is added over all pairs of vertices in the network. In the case of weighted networks, betweenness centrality is defined as
This can also be generalized to directed networks. The identification of the shortest path, and their lengths, in directed networks is similar to the process of undirected networks with an additional constraint. A path from one node to another node cannot be treated as connection in the reverse direction. Thus, a directed path is said to exist only if all edges in the path are directed in the same direction. As a result, the values of betweenness centrality calculated for a directed graph may significantly vary from that of the underlying undirected graph.
2.4. K‐core decomposition
In 2010, Garas et al. put forward a fast node ranking method called k‐shell decomposition  for large networks. The k‐shell or k‐core decomposition method partitions a network into substructures. This method assigns an index to each node, which is the rank of the node in the network, according to its importance. Nodes with high values of the are located at the center or core of the network, and nodes with low values of lie in the periphery of the network. This way, the network is described by a layered structure, exposing the hierarchy of its nodes.
Now, we present a calculation of k‐shell for nodes in the network of Figure 1. First, we removes recursively from the network all nodes of the network with degree 1 and we assign the integer value to them. This procedure is repeated iteratively until there are only nodes with degree in the network. Subsequently, we removes all nodes with degree . Again, this procedure is repeated iteratively until there are only nodes with degree left in the network and assign to them the integer value and so on. This procedure is applied until all nodes of the network have been assigned to one of the k‐shells. Thus, we partition all nodes of the network into different shells with an integer value. This is how the original k‐shell decomposition method works.
From the above network, we can assign to the vertices 1, 2, 3, 4, 7, 9, 14 and 15 in the first step. Then, we can assign to the vertices 5, 6 and 13. Finally, the vertices 8, 10, 11 and 12 get the value These vertices are the most important in the network according to the assumptions of k‐shell decomposition method. There are three categories of vertices. Each contains more than one vertex, which are of different nature. Hence, we arrive at the conclusion that this method is far from attaining the aim of the method. Computational complexity associated with the k‐shell method is . It is a global metric. As a means to overcome this inefficiency of the k‐shell decomposition method, a modified method called mixed degree decomposition method was proposed.
2.5. Mixed degree decomposition method
It was in 2013 that Zeng and Zhang proposed a modification of k‐shell method called the mixed degree decomposition (MDD) . The method is described as follows. The k‐shell method is a dynamical network decomposition procedure in which the residual degree (number of links connected to the remaining nodes) of nodes is updated in each step while all the information of the removed nodes is dropped. In mixed degree decomposition method, both residual degree and exhausted degree (number of links connected to the removed nodes) of the nodes are recorded and the decomposition is based on both of them. For node , exhausted degree is denoted by and residual degree is denoted by . In each iteration, the nodes are removed according to the mixed degree defined by
where is a tunable parameter between 0 and 1. The detailed decomposition is described below.
Initially, of each node is equal to 0, since there is no removed node in the network.
Remove all nodes with least degree, denoted by M and assign them to the M‐shell.
Update of all the remaining nodes by . Then, remove all nodes with smaller than or equal to M and assign them to the M‐shell also. This step is recursively carried on until of all remaining nodes are larger than M.
Repeat steps 2 and 3 as M value increases until all nodes in the network have been assigned to one of the shells.
When this MDD method coincides with the degree centrality method, and when this method coincides with the usual k‐shell decomposition. The MDD method is no longer integer since can be decimal when we take between 0 and 1. The following simple example (Figure 2) illustrates the procedure of MDD method.
If a virus originates from a node with large exhausted degree, not only it has the same probability as the other nodes in the same shell to infect the nodes in the higher shells, but also it has a bigger branch of nodes in the lower shells to infect, so that this virus will end up covering much more nodes at the end. Thus, the information of the exhausted degree cannot be overlooked when ranking the nodes. The frequency of appearance of different nodes in MDD method is higher than in k‐shell decomposition. The k‐shell has limited number of ranks, and the number of nodes in each rank is quite high, which means that the node differences are not well distinguished in k‐shell decomposition. In MDD method, the nodes more differently ranked than k‐shell decomposition. So this method is preferred over the k‐shell decomposition method. Liu et al. proposed an improvement on the k‐shell decomposition in the year 2013. It is discussed below.
2.6. Improved k‐shell decomposition method
Improved k‐shell decomposition method  is calculated in terms of the distance from a target node to the network core, the spreading influence of the nodes within the same k‐shell values could be distinguished. The formula for improved k‐shell decomposition is that where is the largest k‐core value of the network, is the shortest distance from the node to the node , and is the set of nodes whose k‐shell value is the maximum. Using this value, we can rank the nodes of the network. Calculated values of for the network in Figure 1 are as given below. Here, nodes of the network are divided into eight different shells using the new method.
In Ref. , Young Deng et al. proposed a refinement for the k‐shell decomposition for unweighted network.
Here, the weight of the edge connection between and is defined as , where and are the degrees of node and , respectively. Next, for each node , we calculate the weighted degree using the following measure
where is the set of neighboring nodes of , and is a positive tuning parameter between 0 and 1 that can be set according to the requirement and data. If this parameter is near 1, then the degree of nodes is taken as favorable. If , this method becomes the usual k‐shell decomposition. If is close to 0, then the high edge weight is taken as favorable. Since the weighted degree may be no longer an integer, the weighted degree is approximated to the nearest integer. After this, the pruning routine is same as the k‐shell decomposition method but is based on the weighted degree . Using this method, we can partition the node set of the network into more shells.
Consider the network given in Figure 3.
Using the usual k‐shell decomposition, the nodes are portioned as follows. The vertices 1, 2, 3, 4, 7, 9, 14 and 15 have Vertices 5, 6 and 13 have Vertices 8, 10, 11 and 12 have
By the weighted k‐shell decomposition (Wks) described above, the node partitions obtained are given in Table 1. Thus, the new method partitions set of nodes into seven shells, which indicate this method is more efficient (Table 1).
|Sl. no.||Value of||Vertex labels|
|2||2||1, 2, 3|
|4||4||4, 5, 6, 13|
An improved version of the weighted decomposition method was suggested by Reji Kumar et al. . The calculation of the rank of the nodes is based on the following procedure. The total of weight of the vertices are calculated to obtain the value . So Here n is the number of nodes. The right‐hand side is multiplied by q to make an integer. Next, the vertex 1 is deleted to obtain a new graph . Subsequently, we find Then, we delete the vertex 2 to obtain and find exactly as before. This procedure is repeated to obtain values of all vertices. Finally, we find the value for the vertex . The vertices are ranked on the basis of these values.
2.7. Decomposition using k‐shell iteration factor
Very recently, Wang and Zhao put forward a method for fast ranking of influential nodes  in complex networks. He also improves the k‐shell decomposition method using iteration factor. The method is as follows. Suppose node is a node in the network and its k‐shell value is k. In the k‐shell decomposition, the total iteration number is m and is removed in the nth iteration of the k‐shell decomposition, where . Let denotes the k‐shell iteration factor of node which is defined as follows;
The k‐shell iteration factor of each node can thus be obtained with this formula. For example, for the above unweighted figure for node 3, , n = 1 and m = 2; therefore, . For node 4, n = 2, and m = 2, and . Here, nodes 3 and 4 are in the same k‐shell, and this method distinguishes them with different value. Thus, if we rank the nodes of the network with the calculated value, this method will give a refinement of k‐shell decomposition.
2.8. Weighted k‐shell decomposition
One major limitation of most centrality measures, including k‐core decomposition method, is that they are designed for unweighted networks. However, in practice, real‐world networks are weighted, and their weights carry very valuable information about the strength or pace of flow of information through the edges. This information is highly critical in the study of transmission of information. In a weighted network, every node has two underlying attributes, their degree and weight of the edges incident with that node. Since weights are associated with networks links, the nodes weights are calculated as the sum over all link weights incident with a particular node. It may happen in network that a node with high degree can have small weight and vice versa. There are situations in which weights play important role. In such networks weights are related to some measured property. For example, in trade flow, capital flow, etc., nodes with high weight can usually be an important player. Thus, in such systems, the presence of nodes with high degree and relatively low weights may vary from the results obtained by methods that are based only on the degree of nodes. Under this approach, one completely neglects the weights and performs the analysis on unweighted network. In the second approach, consider only links and their weights and neglects degree.
Garas et al. gave the following method for weighted networks , called weighted k‐shell decomposition method (Wk‐shell). This method applies the same pruning method as in k‐shell decomposition. Here, we consider both degree of a node and weights of its links. We also assign for each node a weighted degree k′. The weighted degree of a node is defined as
where is the degree of the node , and is the sum of all link weights. and are two parameters. If , the weight and degree equally get equal consideration in the calculation. If , then the above equation becomes
In the case of unweighted networks, where the weighted degree is equivalent to the usual node degree and we use the same usual k‐shell decomposition method. In general, the weighted method is able to split further the cores obtained by the unweighted method and help us find the most central of the central nodes. Thus, it is reasonable to assume that the weighted k‐shell partitioning method provides us with a more accurate node ranking for representing the node’s spreading power.
In a recent article, Reji Kumar et al.  proposed a method in which degree and weights of all vertices and edges get relative representation in a network. This method is named as m‐ranking method. This method is discussed in the subsequent section.
2.9. The m‐ranking of nodes
Total power is calculated for each node in the m‐ranking method. Total power of a node is defined by the formula,
In the formula, is a parameter between 0 and 1 and is another parameter. It is good practice to choose an integer close to the average degree of the nodes of the graph. The first series contains at most terms, and the second series contains at most terms, where is the diameter of the graph. Here, is the degree of the node and is the sum of the degrees of the nodes of the graph at a distance away from node . Similarly, is the sum of the weights of the incident edges of the node and is the sum of the weights of all edges which are steps away from the node . Since we are considering degree of all nodes and weights of all edges, usually the total power of all nodes will be different except for vertices which are same with respect to an isomorphism. When value is very large, this method tends to usual degree centrality. We rank the nodes in the descending order of total power . If we put , this method can be applied to unweighted networks. In Ref. , the authors have verified the reliability of this method using rank correlation method.
PageRank  is a method for rating web pages effectively by measuring the human interest and attention devoted to them. In the network of web pages, a webpage is a node and a hyperlink is a directed link in the network. It is the ranking method used by Google. In PageRank, a hyperlink is understood as an endorsement relationship. In this method, number and quality of links to a page are used to determine a rough estimate of how influential the web page is. This method was first introduced in 1976 by Gabriel Pinski and Francis Narin. The PageRank is defined as a function
where is the page rank of and is the page rank of which is linked to . is the number of outbound links on page , and is a parameter between 0 and 1.
A large variety of node centrality measures are being used by network scientists to identify potential spreaders of information in a network. Degree centrality, betweenness centrality, closeness centrality, k‐core decomposition, mixed degree decomposition, improved k‐shell decomposition, weighted k‐shell decomposition, page ranking and m‐ranking method, etc., are important methods, which are discussed in this paper. These methods help us to rank the members in a network according to their importance. A method, which is suitable in some context, may not fit in another situation. Each method has its own strengths and weakness. The weakness of the methods motivates social scientists to redefine or modify the existing methods, which in turn leads to the development of new methods to rank the nodes in a better way. All the methods, except m‐ranking method, have been criticized about its inefficiency in uniquely ranking the members. In addition, some methods may give higher rank to less important nodes, while most important nodes being underestimated. It has been claimed that m‐ranking method is far better than all other methods, which are described above.
This research was completed with the financial assistance of University Grants Commission, India. First author is grateful to University Grants Commission for sanctioning a major research project titled, “Modeling of social ties; a modified approach”, No. F. No. 40‐243/2011 (SR).