Protein-Protein Interaction Networks: Structures, Evolution, and Application to Drug Design

Since proteins exert their functions through interaction with other proteins rather than in isolation, networks of protein interactions are inevitable for understanding protein functions, disease mechanisms, and discovering novel targets of therapeutic drugs (Hase et al. 2009, Barabasi et al. 2011, Vidal et al. 2011). With the recent influx of genome-wide data of protein interactions, many researchers have studied on the structures and statistics of protein-protein interaction networks (PINs). To discover novel drug target genes, it is informative to understand topological and statistical characteristics of PINs, and how disease and drug target genes are distributed over the networks. Moreover, because those statistical properties of PINs are the results of long-term evolution, analysis of the PIN architecture from the viewpoint of comparative genomics and molecular evolution is of particular importance.


Introduction
Since proteins exert their functions through interaction with other proteins rather than in isolation, networks of protein interactions are inevitable for understanding protein functions, disease mechanisms, and discovering novel targets of therapeutic drugs (Hase et al. 2009, Barabasi et al. 2011, Vidal et al. 2011).With the recent influx of genome-wide data of protein interactions, many researchers have studied on the structures and statistics of protein-protein interaction networks (PINs).To discover novel drug target genes, it is informative to understand topological and statistical characteristics of PINs, and how disease and drug target genes are distributed over the networks.Moreover, because those statistical properties of PINs are the results of long-term evolution, analysis of the PIN architecture from the viewpoint of comparative genomics and molecular evolution is of particular importance.
In this chapter, we will first summarize our current knowledge of the statistical properties of PINs.We then argue on possible evolutionary mechanisms generating those properties and review the studies related to drug discovery and diseases as an application of the analyses of PIN structure.Finally, we briefly discuss the possibilities of medical studies as an integration of network and evolutionary biology.

Genome-wide data of protein-protein interactions
Genome-wide protein-protein interaction data have been obtained from several organisms, including Escherichia coli (Arifuzzaman et al. 2006), Saccharomyces cerevisiae (Uetz et al. 2000, Ito et al. 2001, Guldener et al. 2006, Reguly et al. 2006, Yu et al. 2008), Plasmodium falciparum (LaCount et al. 2005), Arabidopsis thaliana (Arabidopsis Interactome Mapping Consortium 2011), Caenorhabditis elegance (Li et al. 2004, Simonis et al. 2009), Drosophila melanogaster (Giot et al. 2003), and Homo sapiens (Rual et al. 2005, Stelzl et al. 2005).Table 1 summarizes the PIN datasets that are currently available.These data were mainly obtained by high-throughput experimental techniques such as yeast two-hybrid (Y2H) screens and tandem affinity purification followed by mass spectrometry (APMS) screens (Deane et al. 2002, Parrish et al. 2006, Lavallee-Adam et al. 2011), as well as extensive literature curation by experts.Y2H screens examine an interaction between two proteins, by expressing these genes in yeast nucleus as fusion proteins (Parrish et al. 2006).One protein is fused to a DNA-binding domain of a transcription factor (e.g., Gal4 and LexA), and the other protein is fused to a transcription activation domain of the transcription factor.When two proteins interact with each other, DNA-binding domain and activation domain are indirectly connected.The activation domain can then interact with the transcription start site of the reporter genes (e.g., LacZ).From the expression of the reporter gene, the interaction between two proteins can be detected.In APMS screens, affinity purification selectively purifies a protein complex that includes a protein of interest (bait protein) (Lavallee-Adam et al. 2011).Then, from the purified complex, mass spectrometry identifies possible interacting partners of the bait protein.

Species
It has been pointed out that genome-wide PIN data identified by high-throughput experiments contains a large number of false positive interactions (Hakes et al. 2008).Y2H screens may detect possible interactions between two proteins that actually reside in different subcellular localizations (Deane et al. 2002).APMS studies identify many false positive interactions caused by inadequate purification (Lavallee-Adam et al. 2011).
Literature-curated PIN datasets are likely to be more reliable, because interactions included in such datasets were obtained from small-scale experiments.However, those data are derived from hypothesis-driven researches focusing on several proteins that are supposed to be biologically important, and thus the datasets can be highly biased (Arabidopsis Interactome Mapping Consortium 2011).Therefore, to study the global structure of PINs, researchers should use several datasets obtained by various methods.

Statistical properties of PINs
In PINs, a protein and a physical interaction between two proteins are represented as a node and a link, respectively.A series of studies have revealed that PINs have several interesting properties from the viewpoint of network architecture.

Scale-freeness
The number of links for a given node is called a degree.The degree distribution P(k), the fraction of nodes with k degrees in a network, has been used to characterize the global structure of a network.Erdös and Renyi (1960) investigated a random network with N nodes, in which links are attached between each pair of nodes with a uniform probability p.This network contains approximately pN(N-1)/2 randomly placed links.Erdös and Renyi (1960) showed that, in a random network, the distribution P(k) follows the Poisson distribution (Fig 1A,left).Therefore, most nodes have degrees that are nearly equal to the mean degree <k> among all nodes in the network.
On the other hand, the distribution P(k) of various technological, social, and biological networks including PINs is known to follow the power law, i.e., P(k) ~k-γ (Albert et al. 1999;Fig 1A, right).These networks are highly heterogeneous; they have a large number of lowdegree nodes and a small but significant number of high-degree nodes that are called hubs.A network following the power law does not have a typical degree characterizing most nodes in the network (e.g., the mean degree <k> in a random network), and thus it is called a "scale-free" network.It was shown that scale-free networks are very robust against random removal of nodes, although selective removal of hubs drastically changes their structures (Jeong et al. 2001, Han et al. 2004).

Small-worldness
The cluster coefficient of nodes i is defined as where k i is the degree of node i, and e i is the number of links among k i neighbors of node i (Watts & Strogatz 1998) (see Fig. 1B).In other words, e i is the number of triangles that pass through node i. C i is equal to one when all neighbors of node i fully interact with one another, while C i is 0 when there are no links among the neighbors of node i.The mean of the cluster coefficient among all nodes, <C>, reflects the density of triangles ("cliques") within a network.The shortest path length between a pair of nodes is the smallest number of links (distance) that are necessary for travelling from one node to the other (Barabasi & Oltvai 2004).The mean shortest path length among all possible pairs of nodes in a network is denoted by <L>.Watts and Strogatz (1998) found that a random network has a much smaller value of <L> compared with a regular lattice.Based on this observation, they defined a "small-world" network as a network that has a value of <L> as small as a random network but is highly clustered like a regular lattice.In a random network, <L> ~ logN/log<k>, and <C> = <k>/N, where N is the number of nodes.
In PINs, the value of <L> is small and the value of <C> is much higher than a random network; therefore, PINs are generally considered to be small-world networks.However, several studies showed that PINs are actually "ultra-small", because <L> is considerably smaller than that in a random network (Chung & Lu 2002, Cohen & Havlin 2003, Hase et al. 2008).In a PIN, proteins are located close to each other, suggesting that perturbations given to a single protein would affect the behaviour of many other proteins and even the entire PIN.

Assortativity
Another statistic characteristic of a network is the correlation between degrees of nodes that are linked to each other (Callaway et al. 2001, Newman 2002, Costa et al. 2007).Pearson correlation coefficient r of the degrees at both ends of a link is used to evaluate the degree correlation.Networks with r > 0 and r < 0 are called as assortative and disassortative networks, respectively.In an assortative network, hubs tend to be connected to each other (Fig 1C,left), while in a disassortative network, hubs tend to have links to low-degree nodes (Fig 1C,right).
<K nn (k)>, the mean degree among the neighbors of all k-degree nodes ("nn" in <K nn (k)> means "nearest neighbors"), is also used to evaluate the assortativity of a network (Pastor-Satorras et al. 2001, Maslov & Sneppen 2002, Costa et al. 2007, Hase et al. 2008).In an assortative network, <K nn (k)> increases as k increases, while <K nn (k)> in a disassortative network follows decreasing functions of k (Fig 1C).If there are no correlations between degrees of nodes at both ends of a link (e.g., r = 0), <K nn (k)> is independent from k and is equal to <k 2 >/<k>.
It has been shown that the yeast PIN is a disassortative network (Maslov & Sneppen 2002).Therefore, in the yeast PIN, interactions between high-and low-degree nodes are favoured, while those between hubs are suppressed.The biological significance of this structure is unclear.Maslov and Sneppen (2002) proposed that, in the yeast PIN, a hub protein forms a functional module of a cell together with a large number of low-degree neighbors.They then hypothesized that the suppression of links between hubs minimizes unfavourable cross-talks among different functional modules and makes networks robust against perturbations.
If this hypothesis is true, disassortative structure observed in the yeast PIN is under the natural selection, and the disassortativity should be commonly found among PINs in any organisms.However, by examining PINs from five eukaryote species, Hase et al. (2010) found that the disassortative structure is not a common feature of PINs.The distribution of <K nn (k)> in the PIN can be approximated by <K nn (k)> ~ k −ν , and the value of ν is used to quantify the extent of disassortative structure of a network.Hase et al. (2010) showed that the yeast, worm, fly, and human PINs are disassortative (ν = 0.47, 0.29, 0.35, and 0.26, respectively), while the malaria parasite PIN is not disassortative (ν = 0.02).This observation indicates that the "selectionist view" by Maslov and Sneppen (2002) is not necessary for explaining the disassortative structure of PINs.In section 4, we will see the evolutionary mechanisms generating the difference in assortativity among species.

Evolutionary mechanisms generating structures of PINs
To account for the emergence of PIN architecture mentioned above, researchers developed several network growth models and conducted simulation studies using these models.Moreover, statistical properties of PINs were analyzed from the viewpoint of comparative genomics and molecular evolution.In this section, we review evolutionary studies of PINs.Barabasi and Albert (1999) suggested that the emergence of scale-freeness can be explained by two basic mechanisms: network growth and preferential attachment.The process of network growth adds a new node into a network (red node in Fig 2A).The process of preferential attachment introduces a new link between the new node and each of the other nodes with the probability proportional to the degree of the latter node.For example, the probability that the red node in Fig. 2A gains a new link connected to a blue node is three times higher than that to a black node (Fig 2A).Due to these two processes, a node with a higher degree gains a larger number of links, and thus the degrees of high-degree nodes increase faster than those of low-degree nodes, generating a scale-free network.

Preferential attachment and gene duplication
In fact, Eisenberg and Levanon (2003) demonstrated that the number of interactions that a protein gained during its evolution is roughly proportional to the degree of the protein by comparing the genomes of E. coli, A. thaliana, Schizosaccharomyces pombe, and S. cerevisiae.This observation is consistent with the preferential attachment.
What is the genetic mechanism of network growth and preferential attachment in the evolution of PINs?A plausible mechanism is gene duplication.Let us consider a small PIN containing both high-(node A) and low-degree nodes (node B, C, and D) (Fig 2B,middle).We assume that the number of nodes in a network increases by gene duplication, and a new node has the same interacting partners as the original node.When node B is duplicated, for example, node A acquires a new link and thus the degree of node A increases by one.When node C or node D is duplicated, the same thing happens.On the other hand, if node A is duplicated, each of the degrees of nodes B, C, and D increases by one.Under the assumption that gene duplication occurs randomly with an equal probability for all nodes, the probability that node A acquires a new link is three times higher than the other node does.
In general, when we compare a high-degree node (e.g., A) and a low-degree node (e.g., B), a given node (e.g, C) is more likely to be a neighbor of a high-degree node than that of a lowdegree node.Therefore, a high-degree node gains new links faster than a low-degree node does.For this reason, gene duplication can account for the mechanism of "rich-get-richer".

Duplication and divergence model
A pair of genes generated by duplication will undergo one of three fates, namely, (i) neofunctionalization, (ii) subfunctionalization, and (iii) nonfunctionalization.After gene duplication, one of the duplicated genes becomes free from selective pressure because of the presence of redundant copies of the gene.Therefore, the gene can tolerate to the accumulation of random mutations and in some cases acquire a novel function (Ohno 1970).
This process is called neofunctionalization. On the other hand, in subfunctionalization process, each of the duplicated genes accumulates mutations, and the functions of the ancestral gene are assigned to the two genes (Force et al. 1999).In nonfunctionalization process, one of the duplicated genes loses its function and becomes a pseudogene due to deleterious mutations.Among the three processes, neofunctionalization and subfunctionalization contribute to the evolution of proteins (Lynch et al. 2000, Blanc et al. 2004, He et al. 2005, Freilich et al. 2006).
In the duplication-divergence model, neofunctionalization and subfunctionalization are modelled as attachment of new links and removal of the links generated by gene duplication, respectively.As for subfunctionalization process, there are two different models, the symmetric divergence and asymmetric divergence.In the former, links are eliminated from both of the duplicated nodes, while in the latter, elimination of links occurs only in one of the two nodes generated by duplication (Fig 3A).
Wagner (2002) reported that one of the duplicated proteins retain a significantly larger number of interactions than the other.For this reason, several network growth models adopted the asymmetric divergence model (Kim et al. 2002, Wagner 2003, Chung et al. 2003, Ispolatov et al. 2005c).However, "complete" asymmetric divergence in which links are eliminated from only one of the duplicates is unrealistic, and the actual divergence process should be intermediate between symmetric and asymmetric divergence (Hase et al. 2010).Simulation and analytical studies have demonstrated that this model can generate scale-free networks with a small-world property (Sole et al. 2002, Kim et al. 2002, Pastor-Satorras et al. 2003, Chung et al. 2003, Raval 2003).
However, it has been pointed out that some statistical features of PINs could not be regenerated by the model of Sole et al. (2002).The yeast and fly PINs show a much larger <C> than the networks by Sole et al. with the same number of nodes and links as the actual PINs (Sole et al. 2002, Middendorf et al. 2005, Ispolatov et al. 2005a).To overcome this problem, Vazquez et al. (2003) proposed the heterodimerization (HD) model.In their model, symmetric divergence eliminates links from both of the duplicated nodes with a uniform probability α, and the HD process attaches a new link between two duplicated nodes with another uniform probability β, forming a heterodimer (Fig 3A).
When gene duplication occurs for a self-interacting protein, the duplicated proteins will interact to each other.Therefore, β in Vazquez et al. (2003) represents the probability that a randomly selected protein is self-interacting and the new HD link between two duplicated proteins survives after divergence.Simulation and analytical studies have showed that the HD model could reproduce scale-free networks with a similar <C> to the yeast and fly PINs (Vazquez et al. 2003, Middendorf et al. 2005, Ispolatov et al. 2005a).This is because an HD process creates triangles, and a network containing a large number of triangles shows a high value of <C>.A computational study based on machine learning technique showed that the HD model could best reproduce the fly PIN among seven network growth models (Middendorf et al. 2005).

Non-uniform heterodimerization model
By conducting simulation studies, Hase et al. (2008) showed that, to reproduce the value of <C> in the yeast PIN by the HD model, the number of HD links in the networks by the HD model has to be much larger than that in the yeast PIN.Similar observation was made for the fly PIN (Ispolatov et al. 2005a and b).This means that the HD model is insufficient for explaining the evolution of PINs.
As shown in Fig. 3B, when two duplicated nodes share one, two, and three common neighbors, an HD link between them generates one, two, and three new triangles, respectively.The high <C> in a PIN indicates that the network contains many triangles.Therefore, if a new HD link is attached more preferentially between duplicated nodes sharing a larger number of common neighbors, the value of <C> in a simulation-generated network is expected to become higher.By considering in this way, Hase et al. (2008) proposed the non-uniform heterodimerization (NHD) model in which a new HD link is added between duplicated nodes with a probability proportional to the number of neighbors shared by those nodes (Fig 3B).Simulation studies demonstrated that the NHD model could indeed reproduce both the high value of <C> and the small number of HD links in the yeast PIN.In the evolution of PINs, duplication of a self-interacting protein adds an HD link between duplicated proteins.Some HD links were conserved in evolution, while others were eliminated because of occurrence of mutations at interacting sites in these duplicated proteins.In the HD model, the survival rate of HD links is uniform; on the other hand, the NHD model assumes it to be proportional to the number of their common neighbors (Fig. 4A).In the yeast PIN, the probability that two homologous node retain an HD link increases as the number of neighbors shared by the two nodes increases, which is consistent to the NHD model rather than the HD model (Fig. 4A).
A possible explanation for this observation is as follows.It is expected that, when a given pair of proteins share a large number of common neighbors, the degree of these proteins should be high.In fact, in the yeast PIN, when two homologous proteins are connected by a HD link, there is a positive correlation between the number of common neighbors to the homologues and the mean degree of the two proteins (Fig. 4B).Moreover, several studies showed that high-degree proteins tend to show low evolutionary rate in the yeast PIN (Fraser et al. 2002, 2003, Fraser 2005).Therefore, it is suggested that the survival rates of HD links are also positively correlated with the number of common neighbors shared by the two homologous proteins.

Degree-dependent duplicability and assortativity
Duplication and divergence models including the NHD model can explain various aspects of the architecture of PINs (Pastor-Satrras et al. 2003, Vazquez 2003, Hase et al. 2008).However, these models cannot explain the differences in overall structures of PINs among species.As mentioned in section 3, the yeast, worm, fly, and human PINs are disassortative, while the malaria parasite PIN is non-disassortative.
A possible evolutionary scenario that can explain the difference in assortativity of PINs among different species is as follows (Hase et al. 2010).Let us consider a disassortative network containing low-and high-degree nodes (e.g., A and C, respectively), in which the low-and high-degree nodes are linked to each other (Fig 5A,middle).Duplication of a lowdegree node (e.g., node A) causes the value of ν in the disassortative network to be higher, because the degree of its high-degree neighbor increases (Fig 5A,left).On the other hand, duplication of a high-degree node (e.g., node C) makes the degree of its low-degree neighbors higher, and thus the value of ν decreases (Fig 5A,right).For this reason, duplication of low-and high-degree nodes would make the value of ν in a disassortative network larger and smaller, respectively. .Therefore, it is suggested that assortativity of a PIN is related with the gene duplicability dependent on the degrees of genes.If this is the case, disassortative structure of PINs is merely a byproduct of preferential duplication of lowdegree proteins, and we do not need to assume any adaptive meaning for this structure, as mentioned in section 3. Black squares represent <K nn (k)> in the malaria parasite PIN.Dashed lines in black, blue, and red represent k -0.02 , k 0.01 , and k -0.22 , respectively.(D) and (E) indicate correlations between the degree and the duplicability in the yeast and malaria parasite PINs, respectively.Bars in blue, yellow, and red show the mean duplicability among low-, middle-, and high-degree proteins, respectively.A species name above each diagram denotes the species of which genome was compared with S. cerevisiae or P. falciparum.*, **, and *** represent P < 0.05, P < 0.01, and P < 0.001, respectively, by the Wilcoxon rank-sum test with the Bonferroni correction.

Structures of PINs and their relationships with disease genes and drug targets
As we have seen above, PINs are characterized by several interesting properties that are different from those of a random network.Therefore, understanding diseases and mechanisms of drug action in the context of PIN architecture may allow us to address some fundamental properties of disease genes and drug target molecules.Indeed, number of disease genes and that of drug targets are very small.Only 10% of the human genes are known to be disease genes (Amberger et al. 2009), and only 435 genes are target genes of therapeutic drugs (Rask-Andersen et al. 2011).Why is the number of drug targets and disease genes so small?Are they distributed randomly over the human PIN? Are there any quantifiable correlations between drug target genes and their statistical properties in the human PIN?To address these questions, drug target and disease genes were mapped onto the human PIN and their statistical properties in the PIN were investigated.Moreover, by using biological networks including the human PIN data, several studies showed that side effects of drugs depend on their statistical features in the network.In this and subsequent sections, we review the application of network analyses to medical researches.

Statistical properties of disease genes and drug targets in the human PIN
Elimination of a hub protein affects many proteins in a network (Jeong et al. 2001, Yu et al. 2008).Therefore, it was previously hypothesized that genes encoding hub proteins are associated with diseases (Barabasi et al. 2011).Several studies reported that the mean degree among disease genes is in fact significantly higher than that among non-disease genes (Wachi et al. 2005, Jonsson & Bates 2006, Xu & Li 2006).
A human gene is defined to be essential, when knock-out of its orthologous gene causes embryonic and postnatal lethality or sterility in mouse (Liang & Li 2007).Liang & Li (2007) reported that essential genes tend to encode hub proteins in the human PIN.
However, Wachi et al. (2005), Jonsson & Bates (2006), and Xu & Li (2006) took no account for the fact that there are only a small number of disease genes that are also essential (essential disease genes), while vast majority of disease genes are actually non-essential.Because essential disease genes encode hub proteins, the mean degree of disease genes became apparently high in the three studies.In contrast, non-essential disease genes do not show any tendency to encode hub proteins (Goh et al. 2007).Rather, they tend to encode low-and middle-degree proteins (Feldman et al. 2008).Mutations in high-degree proteins cause dysfunctionality of many neighbor proteins, leading severe impairment of developmental and physiological processes.Individuals having such mutations cannot survive until reproductive years and are likely to be removed from population.For this reason, nonessential disease genes are enriched among low-and middle-degree genes.Hase et al. (2009) investigated drug target genes to see whether they have specific statistical features in the PIN or not.They found that most drug target genes are middle-degree proteins and some are low-degree, while there are almost no drug targets among highdegree proteins (see Fig 6).The degree distribution is similar to that of disease genes, and, not surprisingly, drug target genes significantly overlap with disease genes (Yao & Rzhetsky 2008).These results indicate that middle-degree proteins are likely to be most advantageous targets for therapeutic drugs.
Oncogenes tend to be high-degree proteins (Jonsson & Bates 2006), and thus they are less likely to be targets for drugs, or one must accept major potential side effects.A possible strategy for designing anti-cancer therapy with less severe side effects is to develop a novel combination of drug compounds that targets several low-and middle-degree proteins, because such combination could generate synergetic effects to cancer cure, and low-or middle-degree targets are expected to induce less severe side effects compared with highdegree targets.) and high-degree (k > 30) proteins, respectively.Drug targets (red nodes) are mapped on the network.White, yellow, green, and blue links represent interactions between high-and lowdegree proteins, those between middle-degree proteins, those between high-and middledegree proteins, and those between high-degree proteins, respectively.Middle-degree proteins are extensively connected to each other, while links between high-degree proteins are rather suppressed.For clarity, low-and middle-degree proteins that do not have any interactions with high-degree proteins are not shown.

Predicting candidate drug targets and their side effects based on biological networks
To develop a new drug, it is critical to accurately predict its side effect, because almost 30% of candidate drugs are rejected in clinical stages due to their unexpected toxicity or concerns about drug safety (Kola & Landis 2004, Billingsley 2008).Severe adverse reactions may be found long after the approval of drugs (e.g., Rosiglitazone), and in such cases, those drugs would go out of production (e.g., Rofecoxib) (Moore et al. 2007).
The chemical structures of drugs have been used to predict their adverse side effects and target proteins (Kuhn et al. 2008, Campillos et al. 2008, Yamanishi et al. 2010).Campillos et al. (2008) developed a large-scale database of adverse side effects of drugs.By using the database with information of chemical structure of drugs, they made a similarity metric between two drugs.Under the assumption that drugs with higher similarity in their metric more tend to share the same target proteins, they inferred candidate targets for the drugs.
However, if target proteins of two drugs are close in a molecular network, such drugs may cause similar downstream effects in the network and thus have similar side effects.To understand the molecular mechanisms of drug action and associated adverse effects in greater details, it makes sense to view targets of drugs in the context of biological networks including the genome-wide human interactome (Pache et al. 2008, Zanzoni et al. 2009).
Recently, Brouwers et al. (2011) investigated how side effect similarities of targets depend on their closeness in the human PIN.They found that a certain number of pairs of two drugs without common targets show similar side effects, when they are close in the human PIN.Moreover, Wang et al. (2011) reported that drug side effects are significantly associated with network distances between drug target genes and diseases genes, i.e., targets for failure drugs that make severe adverse side effects are closer to disease genes than targets for approved drugs.Thus, selecting targets that are too close to diseases genes are not always the best strategy (Wang et al. 2011), although the pharmaceutical industry tends to select targets of new drugs that are close with the corresponding disease genes in the biological networks, especially after 1996 (Yildirim et al. 2007).
With recent influx of information of biological networks, especially the human interactome, analyses like Brouwers et al. (2011) or Wang et al. (2011) can be refined and adapted to infer still unknown adverse side effects of drugs and to predict possible target genes.Indeed, by integrating information of the human PIN with similarities between two genes (e.g., GO semantic and sequence similarity) and those between two drugs (e.g., chemical and drug therapeutic similarity), several recent researches attempted to develop a method to predict possible targets for therapeutic drugs (Zhao & Li 2010, Perlman et al. 2011).

Possibilities of medical studies with integration of PINs and evolutionary studies
The human PIN is still incomplete and there are many proteins without any information of protein-protein interactions (Venkatesan et al. 2009).Evolutionary information (e.g., evolutionary rate and duplicability) of genes is significantly correlated with their statistical properties in PINs (see sections 2 and 3); therefore, such information can be utilized to complement to the incompleteness of the human PIN.Rambaldi et al. (2008) reported that most of the cancer genes are singletons and have interactions with many genes.This finding indicates that both gene duplicability and network information are useful for predicting candidate cancer genes.Modification of currently available methods by integrating evolutionary information would improve the accuracy of predicting disease and drug target genes.
Recently, large-scale PINs became available from several prokaryotes, including Methicillinresistant Staphylococcus aureus (MRSA) (Cherkasov et al. 2010), Treponema pallidum (Titz et al. 2008), Campylobacter jejuni (Parrish et al. 2007), Mycoplasma pneumonia (Kuhner et al. 2009), and Mesorhizobium loti (Shimoda et al. 2008) (see Table 1).Some of them are pathogenic.By investigating the evolution of their PINs, we may be able to understand the process of acquiring the pathogenicity and developing drug resistance from the viewpoint of network architecture.Cherkasov et al. (2010) suggested that, in the MRSA PIN, hubs are essential for network stability and may be prospective antimicrobial drug targets.However, almost all known antimicrobial targets have relatively few interactions and hubs have largely been overlooked as drug targets.If hubs in pathogens have no orthologous genes in human and evolve very slowly, by targeting such hubs, we may be able to develop novel antibacterial drugs with high efficacy and small side effects, and without development of resistance to the drugs.With a recent influx of PINs from pathogenic organisms and genomes from various bacterial species, analyses integrating comparative genomics with PINs will become keys to identify still unknown disease mechanisms and novel targets for antibacterial drugs.

Conclusion
In this chapter, we describe various aspects of architecture of PINs, such as scale-freeness, small-world properties, and assortativity.Computational studies based on network-growth models and comparative genomics revealed how accumulation of local changes in PINs affects their overall architecture during evolution.We also discussed possible application of PINs and evolutionary studies to medical researches.With expected explosion of OMICs data (e.g., PINs and SNPs from human) in the near future, an integration of networks and genetics will be among the most powerful strategies to elucidate unknown mechanisms of disorders and discover novel targets for efficacious drugs.

Fig. 1 .
Fig. 1.Measures of a network structure (A) A random network (left) and a scale-free network (right).The degree distribution P(k) is shown below the networks.(B) Cluster coefficient.Red lines represent links among three neighbors of node A. The numbers of links (e A ) among nodes B, C, and D (the neighbors of

Fig. 2 .
Fig. 2. Network growth by preferential attachment (A) Preferential attachment.A red node is added to the network.The probability that a new link is attached between red and blue nodes (3ε) is three times higher than that between red and black nodes (ε).(B) Network growth with gene duplication.Red nodes represent duplicated nodes.Gene duplication occurs with an equal probability (ε)for all nodes.When node A is duplicated, degrees of nodes B, C, and D increase by one (right), whereas when either node B, C, or D is duplicated, degree of node A increases by one (left).
Sole et al. (2002) proposed a model on the basis of neofunctionalization and asymmetric divergence.According to their model, after duplication generates a new node, neofunctionalization process attaches a new link between either of the duplicated nodes and each of the other nodes with a uniform probability θ, and then asymmetric divergence eliminates links to only one of the duplicated nodes with a uniform probability α (Fig3A).

Fig. 3 .
Fig. 3. Network growth models based on gene duplication and divergence A pair of two red nodes are generated by gene duplication.(A) The HD model with asymmetric or symmetric divergence processes.Nodes A and A' are generated by gene duplication.In the symmetric divergence, each of the links to nodes A and A' is eliminated with a uniform probability α.On the other hand, in the asymmetric divergence, each of the links to node A' is eliminated with a uniform probability α.After the divergence process, an HD link (a red line) between two duplicated nodes (nodes A and A') is attached with a uniform probability β. (B) The NHD model.An HD link (red link) is attached between nodes A and A' with a probability proportional to the number (n N ) of common neighbors shared by these nodes.(C) The DDD model.A probability of duplication of a given node is dependent on the degree of the node.If a node has k links, the node is duplicated with the probability proportional to 1 + kσ, where σ is a parameter of the duplicability of a node.

Fig. 4 .
Fig. 4. HD links in the yeast PIN and in the networks by the HD and NHD models (Hase et al. 2008).(A) Distribution of P HD (n N ), the probability that an HD link exists between two homologous proteins when they share n N common neighbors.Green squares, blue diamonds, and red crosses indicate the values for the yeast PIN, the network by the NHD model, and that by the HD model, respectively.(B) Distribution of k HP (n N ), the mean degree of proteins that are connected by an HD link and share n N common neighbors.
Hase et al. (2010) proposed a novel duplication and divergence model named "degreedependent duplication (DDD) model", in which duplication of nodes occurs depending on their degree (seeFig 3C).Simulation studies based on the DDD model revealed that preferential duplication of low-degree nodes can successfully reproduce the disassortative structure observed in the yeast, worm, and fly PINs, while preferential duplication of highdegree nodes generate non-disassortative networks similar to the malaria parasite PIN (see Fig5B and 5C).Moreover,Hase et al. (2010) evaluated the dependency of gene duplicability on their degrees by analyzing orthologous relationships of genes extracted from 55 eukaryotic proteomes.The analyses demonstrated that proteins with a lower degree indeed have higher duplicability in disassortative PINs (the yeast, worm, and fly PINs) (Fig 5D), whereas high-degree proteins tend to have high duplicability in non-disassortative PINs (the malaria parasite PIN)(Fig 5E) Fig. 5.The DDD model and the extent of assortativity in networks (Hase et al. 2010).(A) Duplication of a node alters the distribution of <K nn (k)> and the value of ν in a network.A diagram below a network shows the distribution of <K nn (k)> and the value of ν in the network.(B) The distribution of <K nn (k)> in the networks generated by the DDD model for yeast.Blue diamonds and red crosses show the results of simulation with σ = -0.05 and 0, respectively (as for σ, see Fig. 3C).Black squares represent <K nn (k)> in the yeast PIN.Dashed

Fig. 6 .
Fig.6.Degree distribution of drug targets(Hase et al. 2009).(A)P DT (k) represents the fraction of drug targets to all proteins for the degree of k.The dashed line in red represents the probability that a randomly selected protein is a drug target.(B) White, yellow, and blue nodes represent low-(k = 1 -5), middle-(k = 6 -30) and high-degree (k > 30) proteins, respectively.Drug targets (red nodes) are mapped on the network.White, yellow, green, and blue links represent interactions between high-and lowdegree proteins, those between middle-degree proteins, those between high-and middledegree proteins, and those between high-degree proteins, respectively.Middle-degree proteins are extensively connected to each other, while links between high-degree proteins are rather suppressed.For clarity, low-and middle-degree proteins that do not have any interactions with high-degree proteins are not shown.

Table 1 .
PIN datasets.Y2H, Yeast two-hybrid screens; APMS, tandem affinity purification followed by mass spectrometry screens."Manually curated" indicates that interactions obtained from high-throughput screens and literatures are manually integrated by experts.