Analysis of Protein Interaction Networks to Prioritize Drug Targets of Neglected-Diseases Pathogens

Aldo Segura-Cabrera1,5, Carlos A. García-Pérez1, Mario A. Rodríguez-Pérez2, Xianwu Guo2, Gildardo Rivera3 and Virgilio Bocanegra-García4 1Laboratorio de Bioinformática 2Laboratorio de Biomedicina Molecular 3Laboratorio de Biotecnología Ambiental 4 Laboratorio de Medicina de Conservación Centro de Biotecnología Genómica, Instituto Politécnico Nacional 5U.A.M. Reynosa Aztlán, Universidad Autónoma de Tamaulipas, Reynosa México


Introduction
Many technological, social and biological systems have been modeled in terms of large networks providing invaluable insight in the understanding of such systems.Systems biology is an emerging and multi-disciplinary discipline that studies the interactions of cellular components by treating them as part of an integrated system.Thus, systems biology has shown that functional molecules are involved in complex networks of interrelationships, and that most of the cellular processes depend on functional modules rather than isolated components.Large amounts of biological network data of different types are available, e.g., protein-protein interaction, transcriptional regulatory, signal transduction, and metabolic networks.Since proteins carry out most biological processes, the protein interaction networks (PINs) are of particular importance.The advancement of the functional genomics and systems biology of model organisms such as Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster has contributed to the development of experimental and computational methods, and also to the understanding of human complex diseases.The availability of these methods has facilitated systematic efforts at creating largescale data sets of protein interactions, which are modeled as PINs.
Usually, a PIN is represented as a graph where the proteins are the nodes and the interactions are the edges.According to the complex network theory, PINs are scale-free networks characterized by a power-law degree distribution.In scale-free networks, most nodes have a small number of links between them; whereas, a small percentage of nodes interact with a disproportionately large number of others.The nodes with a large number of links in PINs are called hub proteins.Functional genomics studies showed that in PINs, the deletion of a hub protein is lethal to the organism, a phenomenon known as the centrality-This chapter will introduce the reader to the basic concepts of network analyses and outline why it is important in terms of predicting protein function and essentiality.Work involving PINs of neglected-disease pathogens will be explained so that the reader will understand the current state in terms of its application to prioritize drug targets.The experimental and computational methods most likely to be used to identify and predict PINs, and the strategies for identifying multiple potential drug targets in neglected-disease pathogens will be also outlined using several biological databases in an integrated way.
To achieve this goal, the chapter includes three sections.Firstly, we present an outline of the conceptual development of network biology.The applied functional genomics involving the analysis of PINs of model organisms has led to developing methods and principles for elucidating protein function.We will also explain how these concepts are connected with protein essentiality to identify their "weak" points on the PINs of neglected-disease pathogens and its use for prioritizing drug targets.In the second section, we outline the experimental and computational methods that are most extensively to be used to identify and predict PINs.Some new approaches for predicting PINs are also introduced.These include the probabilistic integrated network methods which have shown the capability to increase the accuracy and coverage of the PINs.These primary research articles will be reviewed and the potential applications for the future be explained.This section mainly focused on analyzing the PINs of most prevalent neglected-disease pathogens in which the use of drugs is often limited by factors including high cost, low efficacy, toxicity, and the emergence of drug resistance.The potential use as an integrated strategy aimed at prioritizing and identifying drug targets of neglected-disease pathogens will be put forward, and the argument for future research involving the application of many tools and strategies will be discussed.In the final section, The biological systems consisting of interacting cellular components have led to the use of graph theory and mathematical tools based on graphs where the individual components are represented by nodes and the interactions by links (Fig. 1).Albert and Barabási (2002) have shown the general properties found among several networks ranging from the Internet to social and biological networks (Albert and Barabási 2002).The analysis of topology of those networks showed that they deviate substantially from randomly built networks as studied by Erdös and Rényi (Fig. 1a) (Erdös and Rényi 1960).Also, these networks did not show a well-shaped frequency distribution of the number of links per node as expected from randomly formed networks; instead, they showed a power-law distribution, which is characteristic of scale-free networks (Fig. 1b and 1c) (Amaral et al., 2000, Albert 2005).
In scale-free network, the majority of nodes have only a few links, whereas very few nodes have a large number of links.Those nodes are called hubs and they represent the most vulnerable points of a network (Barabasi and Albert 1999, Albert et al., 2000, Jeong et al., 2001, Yu et al., 2004a, Tew et al., 2007).The topological features of networks can be quantified by measuring topological parameters whose information content provides a description from local (e.g., single nodes or links) to network-wide level (e.g., connections and relationships between nodes).For example, the nodes of a graph can be characterized by means of the number of links they have (the number of other nodes to which they are connected).This parameter is called "node degree".In directed networks, it is possible to distinguish the number of directed links that points toward the node (in-degree), and the number of directed edges that points outward the node (out-degree).The node degree characterizes individual nodes; however, in order to relate this parameter to whole network, a network degree distribution can be defined.The degree distribution P(k) represents the fraction of nodes that have degree k and it is obtained by counting the number of nodes N(k) that have k = 1, 2… links and dividing it by the total number of nodes N. The degree distributions of numerous networks such as the Internet, social, and biological networks, follow a power law (Fig. 1b and 1c) which is defined by the functional equation P(k) ~ k γ , where γ represents the degree exponent, taking usually values in the range between 2<γ<3 (Barabasi and Oltvai 2004).This function is intimately linked to the growth of the network in which new nodes are preferentially attached to already established nodes, a property that is also thought to characterize the evolution of biological systems (Jeong et al., 2000).The distance between any two nodes in a network could be defined by the path length.In other words, it represents how many links we need to pass between two nodes.Nevertheless, it could have many alternative paths between two nodes in a network.The path with the smallest number of links between the selected nodes (shortest path) is of special interest.A common characteristic of several biological networks, including metabolic networks (Jeong et al., 2000, Wagner andFell 2001) and PINs (Giot et al., 2003, Yook et al., 2004) is that any two nodes can be connected with a path of a few links only.The main biological implications of this characteristic are related to: i) how the biological networks are capable of rapid responses to perturbations; ii) its capacity to employ alternative roads for the same input and output; and iii) the ability to efficiently compensate the perturbations in essential pathways.
Another important issue derived from network analysis is the concept of modularity, which can be used to describe how a group of physically or functionally linked nodes work together to achieve a particular function.The topological parameter used to quantify the modularity in a network is the clustering coefficient Ci, which represents the ratio between the number of links connecting nodes adjacent to node i and the total possible number of links among them (Watts and Strogatz 1998).It is worth noting that in first instance, the modularity concept might be in contradiction of the scale-free nature of the networks because the presence of modules implies that there are clusters of nodes that are relatively isolated from the rest of the network.However, it has been demonstrated that modularity and scale-free properties naturally co-occur in biological networks indicating that modules are not independent, instead, they are combined to form a hierarchical network (Fig. 1c) (Ravasz et al., 2002).Biological networks, including PINs and metabolic networks are good examples of network modularity because they exhibit high average Ci, which are associated to a high level of network robustness (Alon et al., 1999, Ravasz et al., 2002, Barabasi and Oltvai 2004).The most common representation of a module or cluster in a network is as a highly interconnected group of nodes.The biological implication of the modularity concept is that the nodes that integrate a module tend to participate in related biological processes and pathways; for example, protein and nucleic-acid synthesis, protein degradation, signal transduction, and metabolic pathways (Ma'ayan et al., 2005).The analysis of experimental PINs have shown to have a remarkably modularity character (Giot et al., 2003, Yook et al., 2004).These findings in experimental PIN maps have been used to improve the understanding of the pleiotropic effects, and how perturbations on genes or proteins can propagate through the network and produce, in appearance, unrelated or extensive effects.
In addition to the modules, within a network, small and recurring sub-graphs, known as interaction motifs, with well-defined topologies can be identified (Fig. 2).The frequency analysis of these interaction motifs in networks revealed that they are over-represented when compared to a randomized version of the same network, suggesting that not all subgraphs are equally significant in networks and that interaction motifs form functionally separable building blocks of cellular networks (Mangan and Alon 2003, Wuchty et al., 2003, Alon 2007).For example, triangle motifs, also called feed-forward loops in directed networks, appear in both transcription-regulatory and neural networks.Likewise, there is evidence suggesting that specific motif type aggregates to form large motif clusters and that also appear to be commonly involved with certain functional roles (Milo et al., 2002, Shen-Orr et al., 2002, Wuchty et al., 2003).For example, in the E. coli transcription regulatory network, most motifs overlap, in which the specific motifs are no longer clearly separable (Shen-Orr et al., 2002).Fig. 2. Some types of interaction motifs found in biological networks.
The relevance of any node in mediating the communications flow among other nodes in the network is quantified by its betweenness centrality, which is defined as the total number of non-redundant shortest paths going through a certain node or edge (Freeman 1977).Girvan and Newman (2002), have proposed that the edges with high betweenness are the ones that are "between" network clusters; therefore, the information flow within a network could be altered by removing these edges (Girvan and Newman 2002).Dunn et al., (2005) using an edge betweenness based-method have shown that clusters in PINs tend to share similar functions (Dunn et al., 2005).Moreover, Yu et al., (2007) have reconsidered the classical meaning of betweenness as a measure of the centrality of the nodes in a PIN.They have defined those nodes as "bottlenecks" with the highest betweenness centrality and find that bottlenecks nodes have a higher probability to be essential (Yu et al., 2007).
It is worth noting that the topological parameters might be combined between them or with additional information of functional annotations regarding the network nodes (genes or proteins).Thus, a network provides testable predictions ranging from single interactions to essential genes and functional modules (del Rio et al., 2009).Likewise, the functions of unannotated genes or proteins can be also predicted on the basis of the annotation of their interacting partners.This approach to predict the protein/gene function is known as "guilty by association".Additionally, the integration of information related to diseases or specific phenotypes with network approaches also enhances the understanding of human diseases, pharmacology response, and phenotype prediction (Ideker and Sharan 2008, Lee et al., 2008a, Lee et al., 2010, Wang and Marcotte 2010, Lee et al., 2011).

Experimental methods
In the postgenomic era, the accumulation of protein-protein interaction data has enabled the biology systems studies at PINs levels (von Mering et al., 2002).However, PIN analysis requires methods amenable to high throughput (HT) screening, such as large-scale versions of techniques like yeast two hybrid (Y2H) and tandem affinity purification coupled to mass spectrometry (TAP-MS) for performing systematic screens (Ito et al., 2001a, Cusick et al., 2005).In addition, there are a wide variety of methods to detect, analyze, and quantify protein interactions, including surface plasmon resonance spectroscopy, nuclear magnetic resonance (NMR), x-ray crystallography, and fluorescence-based technologies.These techniques provide detailed information on physical properties of protein interactions.
These methods are of paramount usefulness; however, herein, the techniques that can be applied to determine protein-protein interactions, at large-scale level, will be highlighted.In particular, the outcomes of Y2H system and TAP-MS are used further to perform in silico global network analysis.Both techniques were intensively applied to map the PIN of yeast, the first model organism with available PINs (Uetz et al., 2000, Ito et al., 2001b, Gavin et al., 2002, Ho et al., 2002, Ito et al., 2002, Tong et al., 2004, Yu et al., 2008).Afterwards, large-scale efforts have been made to determine PINs for other model minor eukaryotic organisms: D. melanogaster (Giot et al., 2003), and C. elegans (Li et al., 2004); pathogenic microorganisms: Helicobacter pylori, Campylobacter jejuni, Treponema pallidum, M. tuberculosis (Wang et al., 2010), herpes simplex virus 1 (Lee et al., 2008b), and Kaposi's sarcoma-associated herpesvirus (Uetz et al., 2006, Rozen et al., 2008), and major eukaryotic organisms: Arabidopsis thaliana (de Folter et al., 2005) and humans (Rual et al., 2005, Stelzl et al., 2005, Gandhi et al., 2006).Even though the PINs are not completed, the available PINs provide insight into how particular properties of proteins are integrated at systems level, and also, as a useful resource to predict the functional role of genes or proteins.

Yeast two-hybrid (Y2H) system
The Y2H system has considerably accelerated the in vivo large-scale screening of protein interactions enabling the detection of physically interacting proteins by using the modular organization of eukaryotic transcriptional activators.The eukaryotic transcription activators are formed by at least two distinct domains, one responsible of binding to a DNA region (BD) promoter and the other of activating the transcriptional processes (AD).It is wellknown that splitting BD and AD domains will inactivate the transcriptional processes, but the transcription can be restored if a BD domain is re-associated with an AD domain (Fields and Song 1989).Thus, the standard Y2H system includes a DB domain fused to the "bait" protein-coding region and an AD domain fused to the "prey" protein-coding region.When DB-bait and AD-prey domains are co-expressed in the nucleus of yeast cells, "bait"-"prey" domain interaction reconstitutes a functional transcription factor that activates the transcription of one reporter gene (Fig. 3).The most used Y2H system is based on GAL4/LexA, where the GAL4 protein controls the expression of the LacZ gene encoding beta-galactosidase.
The main advantages of Y2H system are: i) the DNA ( not the protein) is manipulated to study both bait and prey proteins (Walhout and Vidal 2001a); ii) it allows to identify protein interactions in vivo; iii) to identify transitory protein interactions, and iv) it is amenable to high-throughput screening methods (Buckholz et al., 1999, Uetz and Hughes 2000, Walhout and Vidal 2001b, Ito et al., 2002, Rual et al., 2005).
The drawbacks include: i) a high proportion of false-positives and negatives (Vidal andLegrain 1999, Ito et al., 2002); ii) it forces sub-cellular localization of bait and prey in the yeast nucleus which might preclude certain interactions from taking place (Cusick et al., 2005).For example, membrane protein interactions cannot be identified by standard Y2H system because the AD-prey fusion will be retained at the membrane, thus, avoiding the reconstitution of a functional transcription factor (Xia et al., 2006); iii) the over-expression of tested proteins, thus modifying the relative concentrations of potential interaction partners in comparison to the in vivo state; iv) the presence of auto-activators, i.e. proteins initiating transcription by themselves (Cusick et al., 2005), and v) the differences in post-translational modifications and protein folding processes between yeasts and other organisms (Shoemaker and Panchenko 2007).Given these cons, several modifications have been made to improve the quality of the Y2H system results, including the development of membrane Y2H, the inclusion of different promoters of reporter genes, the use of low copy vectors, and the reduction of auto-activators.Once that these drawbacks are reduced, the quality of the Y2H system is significantly improved (Lehner et al., 2004, Li et al., 2004, Rual et al., 2005, Yu et al., 2008).Fig. 3.The Y2H system.Y2H detects interactions between proteins X and Y, where X is linked to BD domain which binds to DNA region promoter.

Tandem affinity purification-tag coupled to mass spectrometry (TAP-MS)
TAP-MS method is a powerful approach to determine the composition of relevant protein complexes.In this method, a target protein-coding region is fused with a DNA sequence encoding an affinity tag which will be expressed with other cellular proteins, followed by two-step affinity purification (AP) and elucidation of the complex components by mass spectrometry (MS).A typical TAP tag is formed by an immunoglobulin interacting domain of protein A (protA) and a calmodulin-binding peptide (CBP) (Fig. 4).The protA/CBP binding domains are separated by a short recognition sequence for the site-specific tobaccoetch virus protease (TEV protease).The TEV site allows proteolytic elution of the protein complex from IgG-sepharose after the first affinity-purification step, which is based on the protA/IgG-sepharose interaction.The eluted protein complex is further purified by binding to a calmodulin affinity resin, eluted with EGTA and processed for identification with MS analyses.Fig. 4. TAP-MS method.TAP purifies protein complexes and removes the molecules of contaminants and MS identifies the complex components.Similar to Y2H system results, TAP-MS method shows a high rate of false-positives and negatives, missing many transient interactions.In contrast to the Y2H system, the TAP-MS method can elucidate higher-order interactions beyond binary interactions and, therefore, provides direct information on protein complexes.Several large-scale studies of protein complexes have been performed using the TAP-MS method (Gavin et al., 2002, Ho et al., 2002, Gavin et al., 2006).For example, Gavin et al., (2006) used 5,500 ORFs fused to DNA sequences encoding an affinity tag to analyze PIN of S. cerevisiae.They found 491 complexes, of which 257 are novel, showing that PIN in S. cerevisiae has a modular organization (Gavin et al., 2006).In addition, Stingl et al., (2008), have elucidated the urease interactome of H. pylori.They combined the tandem affinity purification protocol with in vivo cross-link in order to capture transient interactions, which represent an improvement to TAP-MS method (Stingl et al., 2008).
The use of experimental orthogonal approaches has demonstrated that Y2H and TAP-MS interaction data sets contain mostly highly reliable interactions.It has been suggested that the integration of data from the two approaches can also serve to increase confidence in either data set, and has provided support to derivate predictions from these approaches (Cusick et al., 2005).Moreover, Venkatesan et al., (2009) have developed a framework to estimate various quality parameters associated with currently used methods to identify PINs.The combination of these quality parameters (screening completeness, assay sensitivity, sampling sensitivity, and precision), has shown an estimate of the size of human binary interactome and a path toward the completion of its mapping (Venkatesan et al., 2009).
Despite the technical or biological limitations (Cusick et al., 2005) of the aforementioned methods, that does not preclude a reduction on their impact in PINs studies, instead they are marking a paradigm change from one-gene/one-function reductionist approach to a more systemic approach that can capture all potential interactions encoded in a genome or proteome.

Protein interaction databases
The huge amounts of protein interaction data produced by high-throughput experimental methods as Y2H and TAP-MS and analyzed by bioinformatics have led to the conformation of several research groups aimed at conducting important efforts in designing and setting up databases that include carefully analyzed information to provide useful scientific knowledge about protein-protein interactions.Table 1 shows a summary of most significant public databases of protein-protein interactions published to date.These databases contain interactions obtained by direct submission from experimentalists, text-mining and other data sources.Also, there are other online resources integrating information from several of the databases that are listed in Table 1, or tools to browse and visualize such data; for example resources like APID (Prieto andDe Las Rivas 2006, Hernandez-Toro et al., 2007) and PINA (Wu et al., 2009).The information deposited in these databases is verified using automated algorithms or manual curation like in the DIP database (Deane et al., 2002).Altogether, protein interaction databases are an invaluable resource to develop projects that aims to analyze PINs of organisms ranging from viruses to humans.

Number of interactions
Website

Computational methods to predict protein interactions networks (PINs)
Parallel to the experimental methods, several computational methods have been designed to predict protein-protein interactions.Initially, these methods were strictly limited to proteins whose three-dimensional structures had been determined (structure-based methods).The completion of genome sequences has provided large amounts of genomic information enabling the analysis from a genomic context of a given gene.Thus, a number of computational methods and resources have been developed for the prediction of protein interactions resulting from genomic information (genomic context-based methods), even in those cases where the three-dimensional structures are unknown yet (Galperin and Koonin 2000, Huynen et al., 2000, Huynen and Snel 2000).
Hereinafter, we will describe computational methods and resources available for protein interaction prediction that exploit the genomic and biological contexts of proteins for complete genomes.

Gene neighborhood
The gene neighborhood method exploits the notion that genes which physically interact or are functionally associated to the same process or functional pathway will be adjacent to each other in the genome (Fig. 5a) (Tamames et al., 1997, Overbeek et al., 1999, Bowers et al., 2004).For example, Dandekar et al. (2005), have shown that the neighborhood relationship could be used as fingerprint, suggesting that the proteins encoded by these genes may physically interact (Dandekar et al., 1998).The most representative example of this phenomenon can be found in bacterial operons, where genes that work together are generally transcribed as a unit.Furthermore, operons which encode for co-regulated genes are usually conserved.The neighborhood relationship tends to be more relevant when it is conserved across different species (Tamames et al., 1997).Hence, the gene neighborhood method, like many of the comparative genomics approaches, increases its robustness when a larger numbers of genomes are used for the prediction.Since operons and genes neighborhood are uncommon in eukaryotic species (Zorio et al., 1994, Blumenthal 1998, Liu and Han 2009, Fitzpatrick et al., 2010), this method is principally applicable to bacteria where such genome properties are relevant.

Phylogenetic profiles
The phylogenetic profile method is based on the co-occurrence of pairs of genes across multiple genomes (Fig. 5b).Consequently, a pair of orthologous genes remains together across many distant species representing a concerted evolution mechanism and indicating that these genes need to be simultaneously present to participate in the same biological process, pathway or physically interacting.A phylogenetic profile is commonly represented as a vector for the presence or absence of a gene across multiple genomes (Fig. ), where "0" or "1" denoted the presence/absence at each position of a profile (Ouzounis and Kyrpides 1996, Rivera et al., 1998, Pellegrini et al., 1999).
The main drawbacks of this method are: it can only be applied to complete genomes; the prediction robustness is dependent on the number and distribution of genomes used to build the profile, thus, a pair of genes with similar profiles across many bacterial, archaeal and eukaryotic genomes is much more likely to interact each other than those genes found to co-occur in a small number of closely related species; its high computational cost since it needs to compare many complete genomes; and, fails in homology detection between distant organisms.
Like others genomic context methods, with the increasing number of completely sequenced genomes, it is expected that the accuracy of these predictions will be improved over time.

Gene fusion
The gene fusion method is based on the fact that some interacting protein domains (termed the rosetta stones) have homologs in other genomes that are fused into one protein chain (Fig. 5c).Thus, gene fusion events have been proposed for the identification of potential protein-protein interactions, metabolic or regulatory networks (Sali 1999, Galperin andKoonin 2000).The information about gene fusion events can be combined with phylogenomic profiling and identification of conserved chromosomal localization, to test hypotheses leading to the characterization of proteins of unknown function (Marcotte et al., 1999a, Marcotte 2000, Enright and Ouzounis 2001).Marcotte et al., (1999) found 6,809 potentially interacting pairs of non-homologous proteins in E. coli, revealing that, for more than half of the pairs, both involved members were functionally associated.More approaches with similar results have been used, including in eukaryotic genomes (Enright and Ouzounis 2001).
The drawbacks of this method are related with the domain complexity of eukaryotic proteins, the presence of promiscuous domains, and large degrees of paralogy (Enright et al., 2002).
Currently, there are excellent resources implementing the genomic context-based methods.
The most notable are the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and ProLinks.The STRING (URL: http://string-db.org)and ProLinks (URL: http://prl.mbi.ucla.edu)resources provide a web interface giving comprehensive access to gene context information in 1,100 and 900 complete genomes, respectively (Szklarczyk et al., 2011, Bowers et al., 2004).

Interologs
The use of homology relationships is a key paradigm in molecular biology and genomics.This approach has been extensively exploited to predict protein structure (Abagyan and Batalov 1997, Brenner et al., 1998, Rost 1999), to study sub-cellular localization (Nair and Rost 2002), enzymatic activity (Devos andValencia 2001, Todd et al., 2001), and for comparative genomics (Marcotte et al., 1999b, Pellegrini et al., 1999).Thus, interologs is defined as a conserved interaction between a pair of proteins of a given organism which have interacting homologs in another organism (Yu et al., 2004b).For example, the experimental observation that two yeast proteins interact is extrapolated to predict that the two corresponding homologs in human also interact in a similar way.Walhout (Walhout and Vidal 2001b) and Vidal (2001) have used yeast experimental interaction data (Uetz et al., 2000, Ito et al., 2001b) to infer similar interactions in worm (Fig. 6).Mika and Rost (2006) suggested that the extrapolation of interactions between distant organisms has to be undertaken with some caution.They found that the homology transfers are only accurate at high levels of sequence identity, and it is more reliable for protein pairs from the same species than for two protein pairs from different organisms (Mika and Rost 2006).Likewise, Wiles et al., (2010) have developed a scoring schema to assess the confidence of interologs prediction.They have predicted protein interactions across five species (human, mouse, fly, worm, and yeast) based on available experimental evidence and conservation across species (Wiles et al., 2010).Also, they developed the Interolog Finder (URL: http://www.interologfinder.org) to provide access to these data.
Fig. 6.The Interlog method.The A and B are interacting proteins in worm, and A' and B' are homologs in human of A and B proteins.Then A' and B' in human also interact in a similar way.

Integrative approaches
Currently, high-confidence PINs data sets are limited; however, they still provide a framework onto which other types of biological information can be integrated.Thus, new approaches that integrate other types of data, including protein-protein interactions, text mining, homology-based, and functional genomics approaches (Lee et al., 2004, Chua et al., 2007, Lee et al., 2008a, Pena-Castillo et al., 2008, Linghu et al., 2009, Lee et al., 2010, Wu et al., 2010, Lee et al., 2011, Szklarczyk et al., 2011), have shown to be the most effective way to assign function to uncharacterized proteins that are components of the network (Fig. 7).Fig. 7. General scheme for integrative approaches.N1, N2, N3 and N4 are networks representing four data sources.Each node is a protein, while each edge is a binary relationship.The edges are weighted into common weight that is consistent across different data sources.N1, N2, N3 and N4 are then combined and re-scored to form the final high confidence network N'.
The most representative example of these approaches is STRING which integrates experimental as well as predicted interaction information, mostly from the methods aforementioned.STRING provides ease of access to explore this integrated information (URL: http://string-db.org).Moreover, for each protein-protein interaction it provides a confidence score, and supplementary information such as protein domains and 3D structures, all within a stable and consistent identifier space.The version 9.0 of STRING includes the information of more than 1,100 completely sequenced organisms, ranging from bacteria and archaea to humans allowing to periodically execute interaction prediction algorithms and update such data depending on genome sequence information (Szklarczyk et al., 2011).
Similarly, several groups have integrated multiple networks to predict protein functions, interactions and functional modules including data from multiple sources, ranging from coexpression patterns, sequence similarity to genomic context-based methods (Kemmeren et al., 2002, Jansen et al., 2003, Lee et al., 2004, Lu et al., 2005, Chua et al., 2007, Lee et al., 2008a, Pena-Castillo et al., 2008, Linghu et al., 2009, Lee et al., 2010, Wu et al., 2010, Lee et al., 2011).For example, Marcotte´s group have shown the predictive power of an integrated functional network for C. elegans (Lee et al., 2008a).Firstly, they computationally built an integrated functional network covering approximately 82% of C. elegans genes.Second, they used this network to predict the effects of perturbing individual genes on the organism's phenotype, identifying genes causing specific phenotypes ranging from cell cycle defects in single embryonic cells to life-span alterations, neuronal defects, and altered patterning of specific tissues.They select a set of candidate genes and their interactions associated to a phenotype and used RNAi to test whether targeting these candidate genes suppressed such phenotype.They found that 20% of such interactions suppressed the studied phenotype; instead, using only an RNAi, at large-scale screening, inactivation of 0.9% of genes produces such effect.Therefore, predictions arising from interactions of integrated network are 21-fold better than those expected by chance.They suggested a network-guided schema to accelerate research by using screening methods to identify genes and interactions for pathways of interest in human diseases.
The main limitation of integrative approaches is related with the availability of functional association data of genes/proteins.For example, these methods will not be able to make extensive predictions if no associations are available, as in the case of a novel genome with no known sequence or domain homology with known sequences, poorly studied genomes, and lack of functional genomics studies.

Drug targets prioritization
Despite the advent of the high-throughput techniques sparked by the genomics revolution, discovery and development of new drugs for neglected-disease pathogens has lagged in recent years due to the serious problems such as high cost, poor compliance, low efficacy, poor safety, evolution of antibiotic resistance, among others (Schmid 1998).
Target identification is the first step in the drug discovery process and such task can provide the foundation for years of dedicated research in the pharmaceutical industry (Read et al., 2001).As compared with all the other steps in drug discovery, this stage is complicated by the fact that the identified drug target must satisfy a variety of criteria to permit progression to the next step.For example, the target must be selectively present in the pathogen, i.e.
target coding genes that are conserved across different pathogens and have no human homologs represent attractive target candidates for new broad-spectrum drugs (Schmid 2006); relevant for the pathogenesis process (Galperin andKoonin 1999, Sakharkar et al., 2004); and, the essentiality of the target to the pathogen's growth and survival (Koonin et al., 1998, Thanassi et al., 2002, Galperin and Koonin 2004); suitability of the target for expression and assayability, and the availability of structures or models to initiate rational drug design (Aguero et al., 2008).Hence, the integrated uses of above-mentioned strategies are considered as the basic schema in the drug target prioritization approaches.The criteria values of this basic schema can be found by querying publicly available bioinformatics resources and databases.For example, using metabolic pathway databases such as Kyoto Encyclopedia of Genes and Genomes (KEGG) (Ogata et al., 1999, Kanehisa andGoto 2000), protein classification sets such as Clusters of Orthologous Groups (COGs), Gene Ontology (GO), and resources to evaluate the "druggability" of proteins (Hopkins and Groom 2002, Russ and Lampel 2005, Hambly et al., 2006), like "Structure-based DrugEBIlity" online service at EBI (URL: https://www.ebi.ac.uk/chembl/drugebility/structure).For drug targets of neglected-disease pathogens, the TDR Targets Database (URL: http://tdrtargets.org) is an extensive resource for neglected tropical diseases (Aguero et al., 2008).This database includes extensive genetic, biochemical, and pharmacological data related to tropical disease pathogens and computationally predicted druggability for potential targets.The database contains the data on the tuberculosis pathogen M. tuberculosis; the leprosy pathogen M. leprae; the malaria parasites Plasmodium falciparum and P. vivax, the toxoplasmosis parasite Toxoplasma gondii; the trematode Schistosoma mansoni; the filariasis helminth Brugia malayi and its intracellular symbiont bacterium Wolbachia; and the kinetoplastid parasites Leishmania major, Trypanosoma brucei, and T. cruzi, which are responsible for kala-azar and other forms of leishmaniasis, sleeping sickness, and Chagas disease, respectively.

PINs, drug targets, and neglected-disease pathogens
Networks analysis is a broadly applicable tool for the drug discovery and development process.Any type of association data linking one gene to another, a protein or a compound, can be modeled, visualized and analyzed as networks (Lee et al., 2004, Chua et al., 2007, Lee et al., 2008a, Linghu et al., 2009, Lee et al., 2010, McGary et al., 2010, Wu et al., 2010, Lee et al., 2011).Hence, data from pre-clinical and clinical trial studies can be included in network analyses (Nikolsky et al., 2005).Thus, networks could represent the standard for data integration and analysis.Network analysis involving neglected-disease pathogens is a very young area of research.Moreover, despite the availability experimentally PINs of model organisms as S. cerevisiae, C. elegans, and D. melanogaster, and some bacterial pathogens like H. pylori, C. jejuni, Treponema pallidum, the number of experimentally neglected-disease pathogens PINs is limited.For example, LaCount et al., (2005) identified protein-protein interactions of P. falciparum through a high throughput screening version of the yeast twohybrid system (LaCount et al., 2005).They found 2,846 unique interactions in more than 32,000 P. falciparum protein fragments.In order to determine clusters of interacting proteins they used computational methods such as analysis of network connectivity, gene coexpression, and enrichment of Gene Ontology terms.The results of the network analysis was the identification of two protein clusters, one of which related to the chromatin modification, transcription, messenger RNA stability, and ubiquitination and the other implicated in the invasion of host cells.They suggested that the information provided by this network may be relevant to understand the basic biology of the parasite and to discover new drug and vaccine targets.Wang et al., (2010) built a PIN of the M. tuberculosis H37Rv strain based on a high-throughput bacterial two-hybrid method.They found more than 8,000 novel interactions and performed a cross-species PINs comparison, showing 94 conserved sub-networks between M. tuberculosis and several prokaryotic PINs (Wang et al., 2010).
Additionally, even the lack of data, several computational studies aims to predict PINs of neglected-disease pathogens and prioritize drug targets have been performed.Florez et al., (2010) built an in silico PIN of L. major by combining information of PSIMAP, PEIMAP, iPfam databases, and using the interologs method (Florez et al., 2010).They predicted 33,861 interactions for 1,366 proteins, and also analyzed the PIN by calculating topology parameters such as connectivity and betweenness centrality detecting 142 potential and specific drug targets without human orthologs (Fig. 8).Pedamallu and Posfai (2010)  respectively, and found that highly connected region contains 363 and 340 proteins in B. malayi and C. elegans PINs.They suggests that core cellular functions of the two related organisms have similar complexity and that further analysis of these highly connected regions may provide clues about genes missing from a conserved pathway, or proteins missing from a complex.
Similarly, computational studies have been developed in order to model host-neglecteddisease pathogens PINs.For example, Dyer et al., (2007) integrated public intra-species PINs datasets with protein-domain profiles to predict a Human-P.falciparum PIN.They found 516 protein interactions between these two organisms, and showed that Plasmodium proteins interacting with human proteins are co-expressed in DNA microarray datasets, associated with developmental stages of the Plasmodium life cycle (Dyer et al., 2007).Dyer et al., (2008) have analyzed the landscape of human proteins interacting with pathogens.They integrated human-pathogen PINs for 190 pathogen strains from seven public databases and found that both viral and bacterial pathogens tend to interact with proteins with many interacting partners (hubs) and those that are central to many paths (bottlenecks) in the human PIN (Dyer et al., 2008).Similar results were obtained by Navratil et al., (2011).They used a highquality dataset manually curated and validated of virus-host protein interactions to depict the "human infectome" (Navratil et al., 2011).Additionally, they showed, by using functional genomic RNAi data, that the high centrality of targeted proteins was correlated to their essentiality for viruses' lifecycle.Also, they perform a simulation of cellular network perturbations and showed a stealth-attack of viruses on proteins bridging cellular functions, which is a property that could be essential in the molecular etiology of some human diseases (Fig. 9).Doolittle and Gomez (2011) have predicted interactions between dengue Fig. 9.The human infectome by Navratil et al., (2011).
virus (DENV) and its hosts, both human and the insect vector Aedes aegypti.They implemented a protocol based on structural similarity between DENV and host proteins, and also they supported a subset of the predictions via mining from the literature.They predicted, after filtering and based on shared Gene Ontology cellular component, over 2,000 interactions between DENV and humans, as well as 18 interactions between DENV and the A. aegypti vector (Doolittle and Gomez 2011).They suggested those specific interactions between virus and host proteins are involved in interferon signaling, transcriptional regulation, stress, and the unfolded protein response.
The most relevant outcome of such computational studies is the identification of human and pathogen proteins to target experimentally for developing new drugs.It also provides different roadmaps and emerging approaches to develop projects to model and analyze PINs of neglected-disease pathogens.For example, novel therapies for human diseases employ multi-target drugs (Borisy et al., 2003, Csermely et al., 2005) and compounds targeted to inhibit protein-protein interactions (Emerson et al., 2003, Klein and Vassilev 2004, Vassilev 2004, Vassilev et al., 2004).

Conclusions
Because of the development of massive analysis technologies in genomics and computational biology, we can outline a trend to interplay and integrate the computational and experimental techniques.Thus, the methods and resources to identify protein interactions that combine both approaches will be used as a routine protocol in the future.
Even though the use of network biology approaches to drug discovery are in their initial stages, they already contributed to meaningful drug development decisions by accelerating hypothesis-driven biology, modeling specific physiologic problems in target validation or clinical physiology and, providing rapid characterization and interpretation of diseaserelevant cell systems.
Despite the lack of experimental functional genomics and PINs data for neglected-disease pathogens, computational approaches represent a starting point and complementary approach to current high-throughput screening projects whose aim is to delineate the complete genomes of neglected-disease pathogens.Moreover, integrative computational approaches have shown to be a powerful tool as guide for large scale-studies improving and facilitating the rational identification of therapeutic targets.
It is clear that for those organisms whose genome has not been sequenced yet, it will be difficult to implement the aforementioned protocols.That is the case for some nematodes and trypanosomal parasites as T. cruzi, S. mansoni, B. malayi, and O. volvulus, and the soiltransmitted helminthes (e.g., species of A. lumbricoides, and T. trichura).However, according to NCBI Entrez Genome (URL:, http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi;Sep 29, 2011), the status of most of them is in "assembly"stage.Once the genome of the neglecteddisease pathogen is available, we can use the information of experimental PINs of model organism as C. elegans to model and predict PINs of such pathogens enabling the discovery of those hubs and bottlenecks proteins that modulate the infectious process and prioritize them as drug targets.
While the computational approaches analyzed here are by nature probabilistic, i.e. it offers the likelihood of association of a given pair of proteins, nevertheless it clearly indicates the utility of inferring functionally relevant correlations from the available genomic databases for systematic drug target identification.The further improvement of computational approaches will help to increasing the availability of systematically collected biologic data and will provide an easy schema for the integration of different types of data within network analysis, thus enhancing the role of such approaches in drug discovery.
Finally, comprehensive repositories of functional genomic data for neglected-disease pathogens will be created.Hence, as soon as large molecular datasets are processed with the help of network analysis, a growing set of predicted pathways and PINs will emerge and will offer a new paradigm for re-thinking about how to revolutionize the drug discovery process.

Fig. 1 .
Fig. 1.Three types of network models and their associated distributions: (a) random network, (b) scale-free network, and (c) hierarchical network.

Fig. 5 .
Fig. 5. Genomic context-based methods.(a) Gene neighborhood plots for four organisms, showing a pair of genes (blue and magenta) which are in close proximity in all four organisms.(b) Example phylogenetic profiles of four proteins from the three organisms.The proteins 1 and 4 have the same patterns of co-occurrence in all three organisms, and may physically interact based on this evidence.(c) A gene fusion event between two proteins (green and magenta) in two organisms is shown.Thus, the proteins a y b from organism 1 is predict to interact because they form part of a single protein in organism 2.
Fig. 8. Predicted PIN of Leishmania major by Florez et al., (2010).The nodes in color red represent predicted essential proteins without human orthologs.

Table 1 .
Most representative databases of protein-protein interactions.(E) high-throughput experimental data; (S) structural data; (C) manual curation, and (I) integrative resource.The number of interactions was updated on September 29, 2011.