Embryonic stem cells were isolated from the mouse in 1981. Two landmark papers (Evans & Kaufman, 1981; Martin, 1981) described the isolation from the blastocyst of a cell line, that grew rapidly, was maintained by passaging, had a normal karyotype unlike embryonal carcinoma cells and could be induced to differentiate into a wide variety of cell types by injecting them into the mouse or by culturing them in the absence of feeder cells in vitro. The most important property of these embryonic stem cells was the ability to differentiate into most cell types. This is termed “pluripotency”. Understanding the molecular mechanisms of pluripotency will enable scientist to utilise stem cells more effectively, particularly in the field of tissue repair and regeneration (Murry & Keller, 2008). One approach towards the understanding of this molecular mechanism is to look at its protein interaction.
This chapter will explain what a protein interaction network is and why it is used for looking at pluripotency. It will cover methods used to build protein-interaction networks and the methods of validations for these protein interactions. The chapter will also present an integrated dataset that merges the current understanding into one protein interaction network. Base on this integrated network, we will discuss what constitutes key factors in pluripotency, how these key factors are connected in the network and the protein machineries that they recruit to set up the pluripotent state. Finally, the chapter will look at the future challenges in the completion and utilization of the protein interaction network for the manipulation of pluripotency.
2. What is the protein interaction network?
A protein interaction network comprises proteins as nodes and protein-protein interactions as undirected links between the nodes. Drawing networks allows researchers to manage and interpret large datasets. Interpretation of the dataset is done by adopting concepts from other fields such graph theory to describe network properties. Such interpretations can explain how the structure of the network is serving its biological function. For example, in the field of graph theory, several parameters can be computed for a network. These parameters describe the architecture of the network so that it can be compared to other networks. This can provide some insights into the behaviour of the network particularly if it is compared to a network that is similar and already better understood. The most fundamental parameter of a network is the number of links a node has. This is referred to as ‘the degree of a node’ and is a variable that is designated by “k”. To describe all the nodes in the network, the number of nodes having different degree can be presented as a distribution curve (Figure 1). Depending on the pattern of the distribution curve, networks can be grouped into different classes (Barabasi & Oltvai, 2004). Random networks show a Poisson distribution (Figure 1A). Scale-free networks show a power law distribution (Figure 1B). The embryonic stem cell protein interaction network belongs to the class of scale-free networks.
In scale-free networks, most of the nodes will have very few links and only a few nodes will have many links (Figure 1B). Systems that are approximately scale-free include many biological networks like the yeast proteome, the prokaryote and eukaryote transcription network, all metabolic networks, and even the internet (Albert, 2005; Barabasi, 2009). Networks of this class show robustness against failure of single components. Besides degree distribution, some other network properties include the average number of neighbours, the average or characteristic path length, the network diameter and the clustering coefficient (Barabasi & Oltvai, 2004). Classifying networks by its degree distribution is one way graph theory can be used to associate universal laws or organizational principals to networks. Understanding network principals makes it easier to predict protein functions; to generate testable hypothesis; and to simulate manipulations of protein components to see if it gives desired consequences.
3. Building the protein interaction network
3.1. Methods used to build the network
Different methods can be used to build a protein interaction network. The simplest method is to build the network based on available information about protein interactions from the literature. A second method is to include interactions based on extrapolations of protein interactions in other organisms to the orthologs in the organism of interest. However, both of these methods are limited to known interactions. In addition, the second method may result in the inclusion of false interactions due to wrongly mapped orthologs or lost of conservation of interactions.
To discover protein-protein interactions without a priori knowledge and for network construction, a high throughput method is needed. For this, there are two tested approaches. There is the yeast 2-hybrid approach and the affinity purification-mass spectrometry approach. Interactors are identified by sequencing the plasmids encoding for the proteins (for the yeast 2-hyrbid approach) or by mass spectrometry (for the affinity purification approach). Both of these approaches have been tested in the construction of a protein interaction network for the yeast proteome (Gavin, et al., 2006; Gavin, et al., 2002; Ho, et al., 2002; Ito, et al., 2001; Uetz, et al., 2000). For the construction of a pluripotency-associated protein interaction network, the aim is to have only the components of pluripotency in this network. Hence this should be a subnetwork of the proteome. Oct3/4, a key factor for pluripotency is selected as the bait from which other proteins may be discovered. The network grows when proteins that interact with Oct3/4 are used as the next bait; and by iteration the protein interaction network for pluripotency can be completed. Caution should be taken during iteration of this process to avoid extending the network into the interactions of non pluripotency-associated proteins. Most of the datasets for embryonic stem cell pluripotency-associated network are generated by affinity purification-mass spectrometry. Use of the yeast 2-hybrid approach has been attempted on Oct3/4. But the number of interactors discovered via this approach (Li, et al., 2008) was significantly lower than that discovered via affinity purification-mass spectrometry. This suggests that Oct3/4 may recruit most of the subunits of macromolecules through only a few direct interactors or requires more interactors for contacts to be stabilized. Indirect interactors or cooperative interactors of Oct3/4 will not be discovered by the yeast 2-hybrid approach, which identifies only binary interactions.
3.2. Methods used to validate the network
A major concern with the use of affinity purification-mass spectrometry or yeast 2-hybrid approaches is the presence of false positives. In the yeast 2-hyrbid system, biologically irrelevant interactions can happen between two proteins inside the yeast nucleus to give a false-positive signal. In affinity purification-mass spectrometry, false positives are caused by background proteins that are not completely removed during affinity purification. Although mock purifications are included in experiments to allow identification of background proteins, there is a limitation in the mass spectrometer to capture all the peptides in a sample for identification. As such, sampling of the peptide population is not saturated. This causes estimations of relative abundance of proteins to be inaccurate and hence discrimination of noise from signal based on relative abundance of proteins in the actual versus the mock purifications also becomes inaccurate.
In view of these shortcomings of the approaches, validation of datasets becomes very important. The most direct method of validating a protein-protein interaction is by reciprocal co-precipitation. This can be done by expressing the two proteins in a cell culture system. However, some of these interactions are indirect and occur via a third protein, which if not present in the cell, would yield negative results in co-precipitation analysis. Even after direct association has been verified, it is important to further examine the functional significance of the interaction. Not all physical interactions have functional significance. For example, both Oct1 and Oct3/4 can interact with Sox2, but only the Oct3/4-Sox2 complex activates Fgf4 expression (Yuan, et al., 1995). Hence, multiple validations are necessary to verify that a protein has a role in pluripotency. Validations that have been employed include: (i) evidence for the presence of the interactor in embryonic stem cells; (ii) evidence that the interactor and the bait co-exist in a common subcellular location; (iii) indication that the level of abundance of the interactor changes upon differentiation; (iv) indication that the interactor regulates the genes of known embryonic stem cell transcription factors or vice versa; (v) gain or loss of pluripotency of embryonic stem cells when the gene of the interactor is knocked-out, is suppressed by RNA interference, or is overexpressed. For this validation, pluripotency can be monitored by alkaline phosphatase staining, by embryonic stem cell morphology, by Oct3/4 or Nanog transcript level, by profiling of lineage markers and by the expression of stage-specific embryonic antigen (SSEA) 1, 3 and 4. Finally, validation can also be done by looking at the loss-of-function phenotypes in mice. For this, the gene of the interactors can be knocked-out, suppressed by RNA interference or overexpressed. Given that gene redundancy or functional redundancy is a phenomenon of pluripotency, validations that show no effect with a single gene knock-out could be further evaluated by double or triple knock-out of related genes.
4. Analysis of current datasets
Although the earliest protein-protein interaction network in embryonic stem cells was based on Nanog as that first bait protein, datasets of later work were mostly built upon Oct3/4 (Liang, et al., 2008; Pardo, et al., 2010; van den Berg, et al., 2010; Wang, et al., 2006). Other proteins that have been used as baits are Sall4, Tcfcp2l1, Dax1, Esrrb, Rex1, Nac1 and Zfp281 (van den Berg, et al., 2010). These proteins were used because they were found to interact with Oct3/4. To gain a more complete view of the pluripotency protein interaction network, datasets from the four published protein interaction networks of the embryonic stem cell was integrated as one (Figure 2). Integration of these datasets gives a network comprising 239 proteins. Of these, 131 proteins were directly associated to Oct3/4. As expected, the distribution of the nodes according to their degree of links follows a power law distribution curve (Figure 2). Theoretically, this would suggest that pluripotency is mediated by a highly robust mechanism that is insensitive to the loss of many of its individual components.
However, at this stage more work is required before such conclusion can become fully accepted. This is because the protein interaction network is currently incomplete. At this stage, the network structure can be strongly skewed by the methods used to generate the network (Futschik, et al., 2007). The observation that essential proteins like Oct3/4, tend to be more highly connected than nonessential proteins could be a true property or a consequence of their having been more thoroughly studied, or a combination of the two (Hakes, et al., 2008). As data accumulates, the power of system biology to catalogue and integrate data will become more meaningful (Pieroni, et al., 2008).
4.1. The key factors in pluripotency
From the literature, Oct3/4 is already known to be a key factor in pluripotency. In mice, loss of Oct3/4 results in embryos that fail to form a pluripotent inner cell mass (Nichols, et al., 1998). The inner cell mass in these embryos takes on a trophoblast lineage and subsequently fails to proliferate. In adult cells, provision of Oct3/4 together with various cocktails of transcription factors induces pluripotency (Nakagawa, et al., 2008; Takahashi & Yamanaka, 2006). While the other components of these cocktails can change, the inclusion of Oct3/4 is indispensible. The level of Oct3/4 is also important in the subsequent maintenance of pluripotency. While decrease of Oct3/4 to half its physiological level leads to conversion of embryonic stem cells to trophectoderm, an increase of Oct3/4 by less than two fold of its physiological dosage leads to conversion of embryonic stem cells into primitive endoderm and mesoderm (Niwa, et al., 2000). Finally, as the embryo develops, the level of Oct3/4 decreases in the cells that differentiate; but in germ cells where pluripotency is kept, Oct3/4 expression is maintained (Scholer, et al., 1990; Scholer, et al., 1989). Taken together, this is evidence for the role of Oct3/4 in inducing and in maintaining pluripotency. While the key role of Oct3/4 in pluripotency is obvious and does not need construction of the protein interaction network to point this out. The emergence of other protein hubs (nodes with high number of links) can suggest new key factors. Following Oct3/4, are two other proteins, Esrrb and Tcfcp2l1 that have 82 and 87 links respectively. The importance of Esrrb in pluripotency is corroborated by the observation that this protein can help in the induction of pluripotency in fibroblast (Feng, et al., 2009). Although there are no similar observations for Tcfcp2l1, its hub position in the network would suggest that this protein might be another important coordinator of pluripotency.
Recently, the use of RNA interference has offered a means to functionally screen the genome. This would be a complimentary approach to the protein interaction network to find key factors of pluripotency. To find genes that are needed for maintenance of pluripotency, individual genes are knock-down by RNA interference. Combining the datasets from several studies (Ivanova, et al., 2006; Zhang, et al., 2006), including two which were genome-wide screens (Ding, et al., 2009; Hu, et al., 2009), led to the identification of a total of 166 proteins. In concurrence with the identification of Esrrb as a hub protein in the protein interaction network, the same protein was found to be one of the 166 proteins that were important for the maintenance of pluripotency (Table 1). However, a total of only 15 genes, inclusive of Esrrb, from the list of 166 are in the protein interaction network. This suggests that there are other key components found via RNA interference that are not yet discovered by protein-protein interaction. On the reverse, there are 224 proteins in the protein interaction network that are not found by RNA interference. These proteins could be involved in the induction of pluripotency but not in maintenance of it. Alternatively, these proteins may not have been identified via RNA interference because there can be redundancy of function, which is one mechanism for the robustness of the network.
For human embryonic stem cell, no protein interaction network based on yeast 2-hybrid or affinity p4urification-mass spectrometry approaches have been generated. However, determinants of human embryonic stem cell pluripotency have been identified by a genome-wide RNA interference screen (Chia, et al., 2010). The screen identified a total of 566 genes and a protein interaction network base on these has been reported. Information regarding possible interactions between any of the 566 genes was mapped based on the online database STRING, which stores known interaction and includes transfers from orthologous
interactions. Among the 566 genes, a total of 279 genes have some form of protein-protein interaction within the group and this network is shown in Figure 3.
The human network also shows a power law distribution (Figure 3). The hubs in the network are POLR2E with 26 links, MY06 with 19 links and EP300 and CDC42 both with 18 links. Notably human OCT4 is not one of the hubs. Again this is most probably an artifact of the incomplete network due to the lack of publications on OCT4 interactions. Although human OCT4 did not show up as a key factor, it is known to be important in pluripotency of human embryonic stem cell. Hence this emphasizes the need for more work in the construction of the network before reliable deductions can be made.
4.2. How key factors network?
Proteins such as Oct3/4 and Esrrb are transcription factors and they appear to be key factors in pluripotency. On the genome, these transcription factors show clustering at embryonic stem cell-specific genes, supporting the notion that their collaborations forms codes for ensuring selective transcriptional activation (Chen, et al., 2008; Kim, et al., 2008). It remains to be confirmed if these clusterings require direct protein-protein interactions or simply are clustering at the same location. Protein-protein interaction between these transcription factors could provide structural changes required for regulation of gene expression for pluripotency. It was suggested that collaborations involving more transcription factors would activate embryonic stem cell-specific genes. While transcription factors with little interactions would activate more general genes.
From the integrated dataset, proteins with the GO annotation “transcription factor” constitute a total of 78. Figure 4 shows a protein-interaction network of these transcription factors. Certainly, there are transcriptions factors that are important to pluripotency that do not cluster into the highly interactive zone because the network is incomplete. For example, Sox2 is important for regulating pluripotency genes but does not have many mapped collaborators probably because the Sox2-interactome has yet to be investigated by any lab. The current network therefore serves as a guide for future research directions.
4.3. How key factors recruit protein machineries
The nucleosome remodeling histone deacetylase (NuRD) complex (Ahringer, 2000) was the most prominent complex identified in the embryonic stem cell protein-interaction network (Liang, et al., 2008; Pardo, et al., 2010; van den Berg, et al., 2010). All the components of this complex are in the network and each of the components interacts with one or more of the five transcription factors that was studied in greater detail (van den Berg, et al., 2010), namely Nanog, Esrrb, Oct4, Tcfcp2l1, and Sall4 which are themselves already tightly associated with one another (Figure 5).
This suggests that the transcription factors co-recruit the same machinery, NuRD for histone deacetylation as a gene repression mechanism to regulate pluripotency. Indeed case studies have shown that NuRD has specific developmental roles rather than being required for general cellular functions (Ahringer, 2000; Ch'ng & Kenyon, 1999; Mannervik & Levine, 1999). Besides NuRD, other complexes have been reported in the study by Pardo et al. (Pardo, et al., 2010). Most of these are involved in chromosome remodeling. Confirmation of these findings would surely expand our knowledge of the extent to which each of these complexes contributes to pluripotency. This is because there is also converse evidence that chromosomal remodeling factors like the Polycomb Group and Polycomb Repressive Complex are not required for maintenance of pluripotency in embryonic stem cells (Azuara, et al., 2006; Boyer, et al., 2006; de Napoles, et al., 2004; Lee, et al., 2006; Montgomery, et al., 2005; Niwa, 2007; O'Carroll, et al., 2001). It is believed that the chromatin of the embryonic stem cell is “loose” so as to allow free accessibility to transcription factors, but at the same time repressors are there to serve to prevent spontaneous differentiation of the embryonic stem cells. Having the different chromatin modifiers inserted into the protein interaction network may help to clarify their role in pluripotency. Besides the chromatin modifiers, the basic transcriptional machinery was also found to be recruited to the protein interaction network by Esrrb (van den Berg, et al., 2010). However this mechanism appears not to be utilized by the other transcription factors in the network. It remains to be seen if this mechanism is directly related to the regulation of pluripotency.
5. Future challenges
Ironically, pluripotency is best demonstrated by its loss. A population of cells is pluripotent if it can differentiate into many cell types; but once that happens, pluripotency is lost. In the embryonic stem cell, molecules for pluripotency work to balance two opposing features: the readiness to differentiate and the prevention of differentiation. To understand the molecular mechanism of pluripotency, we need to keep in mind this concept of pluripotency. In simulations, pluripotency should demonstrate these two opposing forces. In the current protein interaction network both of these features of pluripotency are not distinguished. Furthermore, it is necessary to consider the multifunctionality of proteins. In this case, looking at proteins for the assignment of “jobs” may be more confusing than helpful. Alternatively, assignment of processes may be more meaningful if this was done to the edges of the network rather than the nodes. This approach of analysis can be illustrated by the following example. The interaction (edge) between Oct3/4 and Cdx2 serves the purpose of “gene repression”; and the interaction (edge) between Oct3/4 and Sox2 serves the purpose of “gene activation”. Hence instead of annotating both functions to the Oct3/4 node, the annotations can be on the edges.
A protein interaction network by virtue of the protocols employed is a single snapshot of the protein-protein interactions of a population of cells at any given time. To understand how embryonic stem cells have the ability to differentiate into different cell types, further information will have to be integrated. The final protein interaction network should include information on protein subcellular location and protein concentration. All this information in the network will change as a function of time as the cell undergoes cell cycling and when the cell undergoes fate changes. A study on the system-level changes across the three mechanistic layers: epigenetic, transcriptional and translational during fate change in mouse embryonic stem cells show that changes in nuclear protein levels are not accompanied by concordant changes in the corresponding mRNA levels, suggesting that translational and post-translational mechanisms, rather than transcriptional regulation play important roles, during lost of pluripotency. (Lu, et al., 2009). For full understanding and successful simulation, information from the protein interaction network, the gene regulatory network and microRNA networks of ES cell should be fed back into one another. Integration of protein-protein interaction networks with transcriptional profiling networks has been done in yeast and has led to the discovery of new network features which are described as party hubs and date hubs (Vidal, et al., 2011). Party hubs are nodes that are coexpressed with its protein partners and date hubs are nodes that are not always transcribed at the same time and place as its partners. Biologically, party and date hubs may represent two kinds of protein-protein interactions. Transient protein-protein interactions that occur between transcription factors or between transcription factors and other protein complexes are date hubs. Static protein-protein interactions that occur between protein subunits of a stable protein complex are party hubs. The first type of interaction usually encodes instructions or messages while the second type of interaction functions mainly to execute the processes as a module. Identifying these interactions allow us to further understand how cell fate decisions are made and how these decisions are executed.
In view of the large number of proteins that have been associated with pluripotency. It is possible that there are also alternate means of achieving pluripotency. After all, pluripotency is a cellular state rather than a cellular composition. Proteins like Ronin (Dejosez, et al., 2008; Zwaka, 2008), which show strong associations with pluripotency, may operate via a separate network.
As data accumulation continues towards the point where the boundaries of the pluripotency-associated protein interaction network are felt, extra efforts will be needed towards looking for interactions amongst low concentration proteins and towards validation of this network. With a more complete embryonic stem cell protein interaction network, new hypothesis can be formulated. As more system biology data is generated from other fields, it will also become possible to compare between non-pluripotent and pluripotent networks. The embryonic stem cell protein interaction network, when ready, will serve as a point of comparison with other stem cells, with differentiated cells and with cancer cells. Such comparisons can potentially bring out unique features in each of these cellular conditions. Finally, in view of the differences between human and mice, the same work will have to be repeated with human embryonic stem cells. Knowledge gained from the challenges in mouse embryonic stem cell research ensues much faster progress with the human embryonic stem cell project.
Overall, we see great promise in getting answers and insights from a mature protein interaction network. Currently a total of 239 proteins form the mouse embryonic stem cell protein interaction network. More work is required in the construction of this network and this must be closely accompanied with attempts to annotate the purpose and nature of the interaction as discussed above. Another 151 proteins discovered to have a role in pluripotency by genome-wide RNA interference screening are yet to be connected to the protein interaction network. Multiple validations to confirm the involvement of these proteins in pluripotency are also necessary. In the network, the transcription factors show collaboration amongst themselves. A core group of transcription factors show recruitment of the same machinery, i.e. the NuRD. Some studies suggest that other chromatin modification machineries are also recruited. The role of these machineries remains to be investigated. When the network is reasonably saturated, system biology analysis should give greater insight into network properties. Inclusion of information on dynamic properties of the protein interaction network would also facilitate predictive capabilities
This work is supported by the Agency for Science, Technology and Research, Singapore.