Controllable Subspace as a New Characterization of Influence of Nodes in Complex Networks

In this chapter, we investigate the influence of a node on a network. By virtue of the classical control theory, the influence of a node is represented by the controllable sub-space, which is further transformed into a specified graph named general-cacti. An algorithm is developed to calculate the influence, which contains the searching of different kinds of circles and the longest path in the directed graph. Moreover, eight real networks are studied and simulations show that (1) a node in dense and homogeneous networks could have more influence comparing with nodes in sparse and heterogeneous networks, (2) any single studied classical centrality measure including output degree, betweenness and PageRank could not rank the influence and (3) a node with high rank in all these three measures could have more influence.


Introduction
With the development of society and technology, nowadays lots of industry and manmade systems have become larger and larger. Different from traditional systems, these systems have a large amount of state variables and complex connection between states. Usually the variables, which capture the connection between states, are unable to measure and at the same time these variables can be easily changed for reasons of temperature, humidity and so on. Hence, these complex systems are difficult to analyze by using classical control theory. In the 1970s, a new control conception, structural controllability, was first introduced by Ching-Tai Lin in 1974 [1]. In Ref. [1] the complete structural controllability of systems with single input was studied. Shields and Pearson [2] extended the result of Lin to multiple input structural systems. Glover and Silverman [3] simplified the work of Shields and Pearson. The conception of structural controllability allows part of systems of the same structure, though with measure of zero, be uncontrollable. Mayeda and Yamada [4] introduced the conception of strong structural controllability to eliminate this kind of uncontrollability. Reinschke et al. [5] studied the conception in multiinput and multioutput systems. Jarczyk et al. [6] revisited the previous results and gave out a new graph-theoretic characterization of strong structural controllability. The solid foundation of structural controllability had been laid down from then on. For those structural uncontrollable systems, the structural controllable subspace is studied by Hosoe [7] and Poliak [8]. All these studies shed light on the study on the systems where the variables, which capture the connections between states, are unmeasured or mutable. When the system becomes larger, the methods and algorithms mentioned by the above literature studies become inapplicable for the reason of complexity of computation. Recently, Liu et al. [9] combined the structural controllability theory with the complex network, by using the algorithm of maximum match to calculate the minimum input nodes for complete control and methods of complex network science to analyze the properties of the input nodes. Since then more and more studies have been dedicated to controllability of complex network. Nepusz and Vicsek [10] used edge dynamics to describe the controllability of complex networks. Banerjee and Roy [11] exploited the property of driver nodes and made some supplement for the result of Liu et al. All the existing literature studies about the controllability of complex networks are focused on complete controllability. How about those complex networks, who are not controllable with the given diver nodes? Among hundreds even thousands or billions nodes, what role does a single node actually play? How many nodes will it affect, if it is chosen as a driver node? What properties do those nodes with powerful control ability have? All these questions motivated us to study the problem when networks are not fully controlled.
The chapter is organized as follows. In Section 2, some useful preliminaries and the model descriptions are introduced. In Section 3, an algorithm to search for the influence of a single node is proposed and we apply it to several real complex networks in Section 4. Finally, Section 5 contains concluding remarks.

Preliminaries and model description
Consider a linear time-invariant control system where xðtÞ∈R n , uðtÞ∈R m and we assume that A and B are structural matrices, which means all elements in the matrix are either fixed (zero) or unfixed (indeterminate) and the indeterminate parameters are unrelated. Hereafter Eq. (1) is called a structural system and denoted by ðA, BÞ.
Let ½AB denote the matrix of the system and GðV, EÞ denote the digraph of the system. We generate a digraph GðV, EÞ from ðA, BÞ, with n þ m nodes, , v n represent the state nodes of ðA, BÞ and v nþ1 , …, v nþm represent the control nodes. And all those edges are generated as follows: for every fixed entry m ij of ½AB, the graph contains an oriented edge ðv i , v j Þ. For a structural system ðA, BÞ, his corresponding fixed system is denoted by ðÂ,BÞ, where unfixed parameters in ðA, BÞ are filled by one class of unrelated real numbers.
Definition 1: A system ðA, BÞ is structural controllable if all its possible fixed systems ðÂ,BÞ are controllable.
The above definition was first introduced by C-T Lin. He reduced the property of structural property to the property of the graph of the system.
Definition 2 (stem) [1]: A stem is an elementary path originating from an input vertex (see Figure 1).
Definition 3 (bud) [1]: A bud is an elementary cycle with an additional edge that ends, but not begins, in a vertex of the cycle. Vertex v1 is called the origin of the bud (see Figure 2).
Definition 4 (cactus) [1]: A cactus is defined recursively as follows. A stem is cacti. Given a stem S 0 and buds B 1 , Figure 3).
We call the red node in Figure 3, i.e., the input vertex, driver node and the nodes connected to driver node are defined as driven nodes(e.g., node d1 and d2 in Figure 3.    Theorem 1 (Lin) [1]: The single input system pair ðA, bÞ is structurally controllable iff the graph of ðA, bÞ is spanned by cacti.
Many works have been done to study the complete controllability of a system. Hosoe considered the problem for the first time, where the system is not fully controllable. He defined the maximal dimension of controllability matrix of the structural system as the generic dimension of controllable subspace, denoted by d c ¼ gennericrank½ A AB Á Á Á A n−1 B . And give out two methods for determining the generic dimension of controllable subspace of a system. We show the graph-theoretic method here (see Theorem 1).
Theorem 2 [2]: Let d c be the generic dimension of the controllable subspace of the structural system ðA, BÞ and GðA, BÞ represents the graph of ðA, BÞ. Assume that ½A B is irreducible. Then d c ¼ maxG ∈G {jEðGÞj} where G denotes the subgraphs of GðA, BÞ which is defined by and jEðGÞj represents the number of edges in graphG. Comparing with the graphic properties of complete controllable system in Lin [1], the graph of controllable subspace contains more circles than cactus. Those circles are not directly connected to the cactus in the form of buds. They are connected to the cactus through at least one node. We define the circle inG as a general-bud. Then replacing all the buds in cacti by generalbuds, we can have the definition of general-cacti (see Figure 4). In Figure 4, those circles with dash lines are the defined general-bud.
In our research, we consider the influence of a single node in the complex network. By the influence of node i, we mean the controllable subspace of the system when we only control node i. Then we denote the structural input vector as b i , where all elements in vector b i are fixed zero except for the ith element. Therefore, we can just study the controllable subspace of structural system ðA, b i Þ, whose corresponding structural matrix and digraph are denoted as ½A b i and G i . Let us define the controllability of a node i in a complex network as P i . Then we can easily derive P i from Theorem [1], i.e., where G i denotes the set of general-cacti originated from node i in digraph G i . To ignore the influence of the network size, the influence of node i is normalized as p i ¼ P i =N.

Algorithm
To find the largest general-cacti, we classify the circle in the network into three categories. The first one is circles with no inbound (see Figure 5a). This kind of circles cannot be arrived by the input, so they are not in our general-cacti. The second one is circles with no outbound edges (see Figure 5b). This kind of circles can only act as general-buds in the general-cacti without affecting the competition of two different general-cactus. The third one is circles with both inbound and outbound edges (see Figure 5c). These circles can also be part of a stem, which means these nodes affect the dimension of general-cacti.
Our instinct is exploring the longest path to act as a stem and then add all the circles as the general-buds. Nevertheless, the third kind of circles makes it difficult to search for the largest general-cacti. As shown in Figure 6, the blue stem in (b) is longer than the red one in (a), but the circle can also act as a general-bud to increase the dimension of controllable subspace. Hence, the dimension of general-cacti in (a) is larger than the one in (b).
However, if the circle of the third kind is not large enough, the choice of the longest path will not affect the influence of the origin node significantly. When the circle is large enough, we set it as a general-bud to prevent it from participating in the stem forming. The definition of "large" affects the accuracy of our method. However, from the real database we find that the circles are either very small or very large. Our method is efficient in real networks. Thus far we can give an approach to discover the influence of a single node in the complex network.   3.1. Procedure to discover the influence of a single node • Search the reachable nodes from node i in the original graph and delete all those unreachable nodes.
• Search for the second category circles in the current graph, count the number (N 1 ) of their nodes, then delete all those nodes and the edges connected to them.
• Search for large circles in the current graph, count the number (N 2 ) of their nodes, then delete all those nodes and the edges connected to them.
• Search for the longest path originate from the input in the current graph, count the number (N 3 ) of the nodes, then delete all those nodes and the edges connected to them.
• Search for all the circles left in the current graph, count the number (N 4 ) of the nodes, then delete all those nodes and the edges connected to them.
The first, second and fourth steps of this procedure are all about search for the directed circle in the digraph. We recursively delete the nodes without any input or without any output. Then we can get all the nodes in circle. Using the depth-first search method, we can separate these nodes into different circles and count for the number of nodes of each circle. For the circle, which has small circles inside, we just count the largest circle.
The third step of this procedure is the most difficult one. To the best of our knowledge, there exist no optimal algorithms to search for the longest path in the digraph. The existing algorithms of searching for the shortest path cannot be modified to search for the longest path, for the existing of directed circles. We modify the classical depth-first search method to search for the longest path in a digraph with directed circles (see Algorithm 1).

Algorithm 1 Algorithm to search for the longest path in a digraph
Require: v ≠ NULL In the algorithm, we use vector visited and circle to mark the node. Node i is visited (in a circle) if visitedðiÞ ¼ 1 (circleðiÞ ¼ 1), otherwise unvisited (not in a circle). We use deep to record the current depth and vector vexdeep to record the depth of every vertex. The most significant difference between our method and the depth-first search method is that we allow one vertex to be detected more than one time and at the same time we record the circles in the network for searching in other directions. The time complexity of our algorithm is OðNMÞ, where N is the number of nodes in the network and M denotes the number of edges.

Simulation results
We now apply the proposed algorithm to several kinds of real networks. It is infeasible to search all the nodes to find out the most influential node, for the unbearable complexity. Hence, we rank the node by their ODCR (outcome-degree centrality rank), BCR (betweenness centrality rank) and PR (page rank), then analysis the first three nodes of every categories. Form Table 1, we can find that nodes in dense and homogeneous networks, like network Caenorhabditis elegans, network Seagrass, network s838, network s420 and network s208, can control more percentage of nodes than nodes in sparse and heterogeneous networks. This phenomenon implies that nodes are more influential in dense and homogeneous networks. We think it is reason why dense and homogeneous networks need less driver nodes for full structural control which is observed in [9].
From Figure 7, we find influence of node does not rank according to any single studied centrality measure. In fact, in network Yeast-1, nodes ODR3, BCR1 and PR1 are the same node and it has largest influence 1.15% among the considered nodes. In network Yeast-2, nodes ODR2, BCR2 and PR2 are the same node and it has the largest influence 0.44%. In network EC-2, nodes ODR1, BCR1 and PR1 are the same node and it has the largest influence 0.96%. In network elegans, nodes ODR1, BCR2 and PR2 are the same node and it has the largest influence 58.59%. In network Seagrass, nodes ODR1 and BCR1 are the same node and it has the largest influence 24.49%. In network s838, nodes ODR1, BCR1 and PR1 are the same node and it has the largest influence 31.64%. In network s240, nodes ODR1 and PR1 are the same node and it has the largest influence 32.54%. In network s208, nodes ODR1, BCR2 and PR1 are the same node and it has the largest influence 34.43%. Hence, we can conclude that if a node has a high ODCR, BCR and PR at the same time, this node can have lager influence.

Conclusions
In this chapter, we have introduced the notion of the influence of a node in complex networks and gave out a useful algorithm to count for that property. Meanwhile we applied this method to exploit the influence of nodes in some real networks. Results have shown that nodes in dense and homogeneous networks were more influential than nodes in sparse and heterogeneous networks. And in the same network node Type Regulatory Neuronal Food web Electronic circuits Name TRN-Yeast-1 [12] TRN-Yeast-2 [12] TRN-EC-2 [12] Caenorhabditis elegans [13] Seagrass [14] S838 [12] S420 [ which has high rank in ODR, BCR and PR simultaneously could be more influential.
Next we will try to study the characteristic of the most influential node in a complex network and the influence of a set of nodes.