Dynamics of Protein Complexes Tracked by Quantitative Proteomics

Cellular proteins rarely function as individual entities, instead they form multi-molecular complexes that are themselves interconnected in dense functional networks. These networks can perform a diverse range of highly coordinated biological processes. The characterization of protein-protein interaction networks is therefore crucial, not only to elucidate the local function and regulation of single proteins but also, and above all, to capture a comprehensive snapshot of cellular activity as a whole system. The term “protein complexes” describes structures of varying nature. Protein complexes can be formed both by stable or transient interactions. Stable, long-term interactions can bridge core components of large multi-protein complexes, or molecular machineries, such as the RNA polymerase II complex and the 26S proteasome. On the other hand, interactions that are transient and dynamic in nature are often highly sensitive to regulatory stimuli and signaling events, such as enzyme/substrate complexes. However, in all cases, protein interactions are prone to strict regulation and vary upon change in cellular environment. In human, the activity and the expression levels of cellular proteins may diverge between various differentiated states and thus lead to specific protein interactome network maps for each cell type. In addition, in a particular cell type, protein interactions are dependent upon physiological and pathological conditions, for example cell proliferation or stress response, and protein interactomes may thus fluctuate, hence reflecting the spatial and temporal complexity of cellular activity. Deciphering the dynamics of these protein interaction networks by assembling sets of various interactomes that echo different cellular conditions, rather than simply draw a comprehensive map of a static protein interactome, remains one of the key challenges in cell biology. Once solved, this will greatly aid understanding of complex mechanisms underlying normal cell behavior and how they are modified by genetic alterations, cancer and other types of diseases. Is this goal completely unrealistic, not to say utopist, given that the mapping of the human protein interactome has not yet been completed? I am eager to believe that the accumulation of outstanding studies and the emergence of new powerful techniques will certainly lead to the achievement of this ambitious project. In fact, various high throughput methodologies have already proven to be very efficient to characterize protein-protein interactions on a proteome scale, yet the scientific community is still far from building a dynamic map of the human protein interactome.

analysis of protein complex dynamics, as they enable a relative quantification of protein interaction intensities and therefore enable comparison between different cellular conditions (see section 3: SILAC-based Quantitative Proteomics).Another essential point resides in the fact that proteins analyzed by AP-MS techniques are expressed under near-physiological conditions, with correct regulation and post-translational modifications (Kocher and Superti-Furga, 2007).

Other biochemical techniques
Alternative high throughput methods have been developed to analyze protein interactions, however these are not as widely used as Y2-H and AP-MS.Among those, the LUMIER approach (LUminescence-based Mammalian IntERactome mapping) is based on the IP of FLAG-tagged baits that are co-expressed in mammalian cells with putative interaction partners fused to the Renilla Luciferase (RL) enzyme.The intensity of the interaction between the two proteins of interest can then be determined by measuring luciferase enzymatic activity in FLAG immunoprecipitates (Barrios-Rodiles et al., 2005).Applied to the analysis of transforming growth factor- (TGF) pathway, this semi-quantitative methodology was shown to efficiently detect protein interactions dependent on pathwayspecific, post-translational modifications (PTMs) and, interestingly, interactions involving membrane proteins, which are usually under-represented in large-scale studies due to their poor recovery in fractionation procedures (Barrios-Rodiles et al., 2005).Both LUMIER and AP-MS approaches identify protein interactions within protein complexes, using quantitative measurements and therefore allow for comparisons between different cellular conditions (for example, in the absence or presence of TGF signaling).However, unlike AP-MS, the LUMIER technique requires the overexpression and tagging of both baits and preys, limiting the reliability and the coverage of the results obtained.Protein-fragment Complementation Assay (PCA) enables the detection of binary proteinprotein interactions (PPIs) in vivo, in their natural environment.Using the PCA approach described by Tarassov et al (Tarassov et al., 2008), baits and preys are fused to F[1,2] and F[3] complementary N-and C-terminal fragments of a mutant of the mDHFR reporter protein that is insensitive to the DHFR inhibitor methotrexate but retains full catalytic activity.The F[1,2] and F[3] fragment fusions are expressed in Saccharomyces cerevisiae MATa and MAT strains, respectively, which are then mated and selected for methotrexate resistance.If the proteins of interest physically interact, the DHFR fragments are brought together and fold into their native structure, thus reconstituting the reporter activity and permitting the survival of the diploid colonies.Binary protein interactions can therefore be directly deduced from the measurement of colony growth.This methodology is an interesting alternative to the Y2-H assay as it enables the identification of direct binary interactions (less than 82 Å between the two proteins of interest) in vivo and, unlike Y2-H, is based on proteins in their native subcellular location and post-translationally modified state (Tarassov et al., 2008).

In silico approaches
Techniques described above are all based on large-scale experimental methodologies that search for physical interactions in vivo and/or in vitro.In contrast, in silico approaches, which represent an alternative way to obtain protein interaction information, rely upon the curation of all publications in literature that describe either low or high throughput protein interaction studies (where an experiment reporting less than 40 interactions is considered as low throughput).Literature-curated protein interactions are reported in web based interaction databases, most of which are now freely available online.Of note, lowthroughput analyses account for a third of the total number of interactions in Biomolecular Interaction Network Database (BIND), i.e. 67,789 low-throughput interactions from a total of 206,859 (Isserlin et al., 2011).Other curated protein interaction resources include the Munich Information center for Protein Sequence (MIPS) protein interaction database, the Molecular INTeraction database (MINT), the Database of Interacting Proteins (DIP), the protein InterAction database (IntAct), the Biological General Repository for Interaction Datasets (BioGRID) and the Human Protein Reference Database (HPRD), which currently report more than 200,000 protein interactions resulting from large-scale curation of several thousand publications.In silico approaches are precious in compiling, and regularly updating, all protein interaction evidences reported in the literature, and in making them publicly available for the scientific community, however they are not as reliable as generally presumed (Cusick et al., 2009).They indeed rely upon individual studies that are of highly variable quality and often do not provide curators, and readers in general, with essential pieces of information, such as correct gene names, species and precise experimental parameters, making it difficult to decide on interpretations in a reliable manner.In addition, most protein interactions contained in databases (>75%) are only supported by one publication, with only 5% of the total number of interactions being described in three or more publications (Cusick et al., 2009).However, the main issue raised by these in silico approaches resides in the fact that they rely upon small-scale focused studies, which are, by definition, biased towards hypothesisdriven investigations and tend to search for proteins that are already known and have therefore a higher probability of being investigated again.In contrast, high throughput strategies rely upon unbiased discovery-driven explorations, which are absolutely necessary to unveil new unpredictable protein interactions and investigate novel functions (Cusick et al., 2009).Therefore, high throughput methods appear as essential tools to uncover the vast number of protein interactions that remain to be identified for assembling a comprehensive map of the human protein interactome network.

Comparison of technique limitations
Not a single high throughput method is perfect and, on its own, none will enable the comprehensive characterization of the human protein interactome and its dynamic properties.Drawbacks are inherent to the technique used and are therefore inevitable.For example, in Y2-H, protein interactions are investigated in the yeast system, which does not reflect the native environment of human proteins and is characterized by improper PTMs, processing and regulation of proteins in general.In addition, baits and preys are overexpressed and constrained to the nucleus, which might force interactions that would not occur in natural conditions.These limitations have to be taken into account to interpret results obtained through large-scale Y2-H screens, which might not be ideal to analyze protein interaction dynamics but offer the advantage of identifying binary interactions between protein pairs tested.In contrast, the AP-MS technique enables the comprehensive identification of all individual protein interaction partners (direct or indirect) for any given bait, thereby leading to the characterization of multi-protein complexes.Interestingly, the combination of AP-MS with quantitative proteomics strategies can efficiently, and quantitatively, reveal changes occurring in protein complexes between different conditions.Limitations of AP-MS techniques reside in the fact that protein complexes have to be "artificially" extracted from their natural environment, e.g. by cell lysis or possibly by cellular fractionation, and immunopurified before MS analysis, leading to putative perturbations of the complex and disruption of weak interactions.To avoid this problem, proteins can be cross-linked before extraction.Alternatively, protein purification may be performed in very poor stringency conditions, to preserve weak interactions, which are often of great biological importance as they can, for example, reveal regulatory mechanisms (discussed in more details below).From this list of limitations, it is claimed that one assay may be more efficient to capture one type of protein interaction and vice versa.This might explain, at least partially, the small overlap between protein interactions identified by these techniques, due to how fundamentally different they are (Cusick et al., 2005;Figeys, 2008).This does not necessarily reflect the poor reliability of the reported data, but instead, the poor sensitivity of these approaches, which cannot individually cover the whole interactome network due to the technique limitations described above and a still weak sensitivity of detection (Lemmens et al., 2010).Therefore, it may be very powerful to combine these complementary strategies to study protein interactions, each of them providing a partial view of the whole system, with AP-MS identifying protein complexes in their natural environment and Y2-H indicating binary interactions within these protein complexes (Boulon et al., 2010b).

Reducing the numbers of false positives and false negatives: A major challenge
Before interpreting any high throughput protein interaction data, it is however necessary to discriminate between genuine interactions and non-specific ones, i.e. false positive interactions that are inevitably recovered in all large-scale studies.This represents one of the major challenges in the field, given that non-specific contaminants often represent more than 50% of the identified protein interaction partners.In contrast, false negatives often are transient interactions and/or interactions that only occur in specific conditions and cannot be detected by the experimental setup.These false negatives constitute another important issue in these assays, as low affinity and/or low abundance specific interaction partners are generally of great biological importance to understand protein function, regulation and dynamics.To overcome these issues, it is essential to strive for the highest signal to noise ratio, which encompasses both sensitivity and reliability.High sensitivity, which reduces the number of false negatives, will be achieved through the increase of detection tool performance and the optimization of experimental procedures.High reliability, which reduces the number of false positives or experimental contaminants, also depends on optimal experimental set up but above all, it relies on adequate data analysis.To date, most data obtained by large-scale studies undergo computational assessments that provide confidence scores or biological significance for each protein interaction identified, based on comparison with other approaches.For example, protein interactions are considered as more reliable if they are supported by data showing either phylogenetic conservations, genetic interactions, subcellular co-localization, similar functional interactions in Gene Ontology (GO) or correlated expression profiles between interacting proteins (Ge et al., 2001;Tong et al., 2004).Indeed, a correlation has been shown between protein interactions and protein localization and expression, with 27% of interacting proteins sharing the same subcellular localization (Ge et al., 2001;Reguly et al., 2006;Tong et al., 2004).In addition, confidence scores also reflect the integration of different properties of the interaction network generated by the analysis, e.g.interaction bi-directionality and network topology (Cloutier et al., 2009;Ewing et al., 2007).However, the calculation of these confidence scores raises two problems.First, these confidence scores are based on the comparison, and the correlation, between new protein interactions and previous observations, thus leading to a bias towards already known data versus unpredictable interactions.Second, confidence scores that rely upon properties of the network can only be calculated when there are a sufficient number of experiments performed, i.e. this analysis pathway is limited to large-scale studies and cannot be applied as efficiently to small-scale studies.

Analyzing the dynamics of protein complexes
Characterizing protein interactions alone, without mentioning in which specific cellular context the interaction has been found, might be misleading.Indeed, as discussed previously, all protein interactions are dynamic and may be subjected to variations in response to changes in cellular environment, either physiological or pathological.This means that all protein interaction networks are prone to an extreme plasticity, which needs to be taken into account if one wants to draw a faithful map of the human interaction network, or rather faithful maps of the human interaction networks.Does that mean that the scientific community needs to assemble as many protein interaction networks maps as possible conditions?This might seem unrealistic, and it certainly is.But one should definitely aim to notify precise conditions in which each protein interaction was found.This is absolutely essential, when assembling the different pieces of the protein interaction network jigsaw puzzle, to avoid bringing together things that just do not match, e.g.mixing together in a network protein interactions that specifically occur in proliferating cells with interactions that specifically occur in differentiated tissues.In the end, many proteins potentially interact with each other, but the interesting question is "when?".Most researchers are interested in these problems and have already sought for variations in interaction patterns in different cellular contexts, such as cancer and neurodegenerative diseases (Lim et al., 2006).Interestingly, the combination of affinity purification with quantitative proteomics overcomes most limitations in the sense that it provides high sensitivity, with the development of highly performing MS equipment, and high reliability.Quantitative proteomics indeed enables (i) the identification of protein interactions in their natural environment, with native PTMs and subcellular localization, (ii) the efficient discrimination between specific interaction partners and the non-specific background of contaminants, i.e. proteins that bind non-specifically to the affinity matrices and (iii) the comparison of protein interaction intensities between different conditions.

SILAC-based quantitative proteomics: A method of choice to reliably analyze specific protein interactions and protein complex dynamics
MS-based proteomics is not inherently quantitative.Quantitative proteomics strategies that have been developed recently mostly involve isotope labeling via either metabolic incorporation in vivo/in cellulo (¹ N/¹ N metabolic labeling, Stable Isotope Labeling by Amino acids in Cell culture (SILAC)) (Ong et al., 2002), chemical modification in vitro (ICAT, iTRAQ) (Gygi et al., 1999;Ranish et al., 2003) or enzymatically catalyzed incorporation (¹⁸O labeling) (Yao et al., 2001).Alternatively, label-free strategies for protein quantification are based either on the comparison of precursor signal intensity for each peptide across multiple LC-MS data, or on spectral counting (Collier et al., 2010;Wepf et al., 2009).These different methods for quantitative proteomics will not be detailed in this chapter, as they are described in this book by Sap and Demmers (Quantitation in mass spectrometry based proteomics) and Leroy, Matallana and Wattiez (Gel free proteome analysis -isotopic labeling vs. label free approaches for quantitative proteomics).It is noteworthy however that isotope labeling appears to be more sensitive than label free to detect small variations, which means that until label free techniques are more robust and statistically reliable, isotope labeling strategies might be more powerful to analyze subtle changes occurring in protein interaction intensities between different conditions.

Triple labeling SILAC pull-down workflow
Among isotope labeling strategies, SILAC has emerged as a simple and powerful approach, now widely used to study protein-protein interactions in various organisms and cell types (Boulon et al., 2010a;Mann, 2006;Trinkle-Mulcahy et al., 2008).SILAC methodology relies upon the metabolic labeling of proteins in cell culture, through the incorporation of light, medium and heavy isotope containing amino-acids (arginine and lysine) that can be resolved and quantitated by MS.Light amino-acids refer to naturally occurring environmental isotopes of carbon, nitrogen and hydrogen, i.e. "unlabeled" 12 C, 14 N and 1 H, whereas medium-and heavy-labeled arginine (R) and lysine (K) refer to (i) medium (R6K4): [ 13 C 6 ]arginine (R6) and 4,4,5,5-D4-lysine (K4) and (ii) heavy (R10K8): [ 13 C 6 , 15 N 4 ]arginine (R10) and [ 13 C 6 , 15 N 2 ]lysine (K8) (Ong et al., 2002).Cells are cultured in SILAC media containing light, medium or heavy amino-acids for at least 5-6 doublings to ensure complete incorporation of isotopic amino-acids.Various studies have used the SILAC methodology to characterize dynamic changes in protein interactions (reviewed by Dengjel et al., 2010;Gingras et al., 2007;Vermeulen et al., 2008).For example, Blagoev et al identified novel proteins binding to the SH2 domain of the adapter protein Grb2 upon EGF stimulation, by using GST-SH2 fusion protein and GSTbased affinity purification (Blagoev et al., 2003).Only two SILAC conditions were used (light and heavy), which allowed for the comparison of SH2 interacting proteins in untreated versus EGF-stimulated cells, with no control IP.Similarly, Foster et al exploited the SILAC method, to identify proteins that interact with GLUT4 in an insulin-dependent manner (Foster et al., 2006).Recently, Kaake et al reported an interesting study based on three SILAC IP experiments performed in parallel in yeast.Their approach was called QTAX (Quantitative analysis of TAP in vivo Xlinked protein complexes).TAP-tagged Rpn11 was used as bait, to characterize proteasome interaction partners in three different cell cycle phases (G1, S and M) (Kaake et al., 2010).For each cell cycle phase, a double labeling SILAC strategy was performed, which included an internal negative control and Rpn11 specific pull-down, therefore allowing for the discrimination between putative contaminants and specific interaction partners.However, the comparison of interaction partners between the three cell cycle phases relied upon independent experiments and, therefore, separate MS runs.As a result, subtle variations in protein interaction intensities may not be observed in this study.
The SILAC workflow that is described in this chapter merges advantages from these different strategies.It is based on a triple labeling SILAC pull-down approach, which compares, within a single MS run, an internal negative control, for the identification of putative non-specific contaminants, and two IPs of interest performed in two different conditions, for the direct comparison of protein interaction intensities between the two conditions tested (Figures 1A and 1B).This protocol has proven to be efficient in the characterization of both specific and dynamic interactions and is easily accessible to all laboratories.In brief as summarized in Figure 1A, in the case of a GFP-IP, parental cells are grown in light (R0K0) medium, whereas GFP-protein expressing cells are grown in medium (R6K4) and heavy (R10K8) media.The light condition is used for the control IP, the medium condition for the IP of interest in control conditions (untreated cells) and the heavy condition for the IP of interest in treated cells (chemical inhibitors, stress, etc).Medium and heavy conditions can also be used to compare changes in protein complexes between cell cycle phases, cell types, etc.After cell lysis, extracts from each cell line are precleared before GFP-protein is affinity-purified using GFP_TRAP ® affinity matrix (Chromotek) (Rothbauer et al., 2008).Eluates from each condition are then combined and digested using trypsin.Resulting peptides are analyzed by LC-MS/MS and can be quantified by the MaxQuant software, which has been developed by the Mann group (Cox and Mann, 2008;Cox et al., 2009).As seen in Figure 1C, each peptide identified in the triple labeling SILAC IP experiment shows a typical MS spectrum with three main peaks that correspond to its light (L), medium (M) and heavy (H) isotopic forms, respectively.The relative abundance of each distinct peak area is determined by MaxQuant, which provides M/L, H/L and H/M ratios for each peptide.Protein ratios can then be extrapolated from the median ratio value of all peptides identified for that specific protein.

Triple labeling SILAC pull-down data analysis
In triple labeling SILAC IP experiments, baits and genuine interaction partners are expected to show high M/L and/or H/L ratios, as opposed to experimental contaminants, e.g.proteins that bind non-specifically to the affinity matrix, which are expected to have M/L and H/L ratios close to 1. Proteins that show M/L and H/L ratios close to zero are likely to be environmental contaminants, such as keratins (Figures 1B and  1C).In contrast, H/M SILAC ratios indicate changes in protein interaction intensities.For example, proteins showing a SILAC H/M ratio<1 are expected to have a decreased interaction with the bait in treated cells versus untreated cells whereas proteins showing a SILAC H/M ratio>1 are expected to have an increased interaction with the bait upon treatment.Of note, it can be very powerful to perform several SILAC IP experiments in parallel to study the dynamics of protein complexes in more than two conditions.In this case, the first experiment can be carried out as described above whereas the other ones can exclude the negative control and directly compare protein interactions in three different conditions (Boulon et al., 2010b).Putative contaminants are thus deduced from the first experiment and many different conditions can be compared in a reliable manner.Figure 2 shows a method of visualizing triple SILAC IP data, by plotting log 2 (H/M) (y axis) versus log 2 (M/L) (x axis) SILAC ratio values for all proteins identified in the experiment.Interestingly, on this type of graph, most proteins usually cluster around the origin, with M/L and H/M ratios close to 1, and therefore log 2 ratios close to 0 (Figure 2).These proteins  are likely to be contaminants, as described above, which often represent more than 50% of all proteins identified in AP-MS experiments.In contrast, putative genuine interaction partners of the bait typically localize to the right side of the graph, with M/L SILAC ratios over a certain threshold, which may vary between experiments.Of note, not a single All protein groups identified and quantified by MaxQuant are represented on the graph.The experimental design is similar to Figure 1, with endogenous Rpb1 being used as bait.Cellular extracts from the light condition are incubated with a control antibody (control IP) while cellular extracts from the medium and heavy conditions are incubated with an antibody against endogenous Rpb1.Cells cultured in the heavy condition are treated with -amanitin and Leptomycin B (LMB) for 15 hours.On the x-axis, log2(M/L) ratio correlates with the enrichment of Rpb1 IP versus control IP.Proteins with high log2(M/L) ratios are expected to be specific interaction partners.However, not a single threshold can unambiguously separate contaminants from genuine interaction partners.On the y-axis, log2(H/M) correlates with the enrichment in -amanitin+LMB treated cells versus untreated cells.Putative experimental contaminants cluster around the origin.The bait, Rpb1, is spotted in red.The dotted red line shows an alternative x-axis defined by the bait, which separates the proteins whose interaction with the bait is increased after the treatment (above) or decreased (below).Proteins within red oval are proteins whose interaction with the bait is decreased by two-fold or more.Proteins that show a log2(M/L) ratio>2 are spotted in orange, RNA polymerase II subunits in purple and R2TP/prefoldin-like complex in green.threshold can unambiguously separate contaminants from genuine interaction partners (discussed below).To analyze protein interaction dynamics, one should focus on those putative specific interaction partners.To start with, the bait itself should display high M/L and H/L SILAC ratios.This indicates that it was efficiently immuno-purified.Otherwise, the IP protocol might need to be optimized.If the efficiency of the IP is the same between the two conditions tested, the log 2 (H/M) ratio of the bait protein should be 0. In practice, this is often not the case, due to changes in expression levels and/or accessibility of the bait induced by the treatment.A way to get around this problem is to draw a second x-axis using the bait protein as a reference.Proteins that locate below this new x-axis reveal a decreased interaction with the bait in treated cells whereas proteins that locate above reveal an increased interaction (Figure 2).This extremely easy to apply visualization method provides in a glimpse an objective conclusion regarding the dynamics of protein complexes.

Application to the analysis of RNA polymerase II complex
This triple SILAC IP method has been efficiently applied to the analysis of RNA polymerase II complex dynamics (Boulon et al., 2010b).The RNA polymerase II (RNAPII) complex is an essential multi-protein complex that is involved in the transcription of all mRNAs and capped non-coding RNAs.The structure and subunit composition of this enzyme have been characterized in detail.RNAPII complex is formed by 12 subunits, Rpb1 to Rpb12.Rpb1 and Rpb2, the two largest subunits, form the catalytic core of the enzyme.However, relatively little is known about assembly mechanisms.Recently, a set of RNAPII interacting partners with unknown function was identified by AP-MS (Cloutier et al., 2009;Jeronimo et al., 2007).In collaboration with the Bertrand group, we explored the dynamics of RNAPII complex using the triple SILAC IP strategy described above to capture the function of these different factors.Interestingly, we could show that some of these interaction partners, which are part of the R2TP-prefoldin-like complex, in fact participate to the assembly of the RNAPII holoenzyme in the cytoplasm (Boulon et al., 2010b).In this work, we took advantage of -amanitin transcription inhibitor, which is known to induce the degradation of Rpb1, RNAPII largest subunit, and the disassembly of the remaining subunits, which are exported to the cytoplasm (Boulon et al., 2010b;Nguyen et al., 1996).In addition, -amanitin combined to leptomycin B (LMB) treatment leads to the accumulation of newly synthesized Rpb1 in the cytoplasm, which cannot be imported into the nucleus (Boulon et al., 2010b).Four triple SILAC IP experiments were thus performed in parallel, using endogenous Rpb1, GFP-Rpb3 and GFP-hSpagh as baits.In this chapter, endogenous Rpb1 IPs are described as examples.In brief, in the first experiment, the light condition was used for control IP, whereas medium and heavy conditions were used for endogenous Rpb1 IP in untreated cells ("assembled" Rpb1) versus -amanitin+LMB treated cells ("unassembled" Rpb1).Eluted Rpb1 and associated partners were digested using trypsin, analyzed by LC-MS/MS and relative SILAC ratios were calculated by MaxQuant.Figure 2 shows Rpb1 IP dataset plotted as log 2 (H/M) against log 2 (M/L) ratios.The identification of specific interaction partners of Rpb1 (log 2 (M/L) >2) revealed the presence of all RNAPII subunits and a set of additional factors, some of which belong to the R2TP/prefoldin-like complex that was previously described by other AP-MS approaches (Cloutier et al., 2009;Jeronimo et al., 2007).RNAPII subunits are marked in purple, whereas R2TP/prefoldin-like complex factors are marked in green.Interestingly, using H/M ratios, we could observe drastic changes in Rpb1 interaction partners between the two conditions tested.We showed (i) that the association between Rpb1 and the other RNAPII subunits is lost upon -amanitin+LMB treatment (interactions were arbitrarily considered as significantly decreased when a two-fold or greater change was observed upon treatment, as compared to the bait H/M reference ratio) and (ii) that the interaction of Rpb1 with the R2TP/prefoldin-like complex is not affected by the treatment.This indicated both that the holoenzyme is disassembled upon treatment and that R2TP/prefoldin-like factors bind to "unassembled" Rpb1, suggesting that these factors of unknown function might therefore be involved in the stabilization and the assembly of RNAPII subunits, which was later confirmed by other approaches (Boulon et al., 2010b).A second triple SILAC IP experiment was performed in parallel, using again endogenous Rpb1 as bait, to directly compare three different conditions, i.e. untreated cells versus cells treated with either -amanitin+LMB or actinomycin D. Actinomycin D is another transcription inhibitor, which induces stalling of the whole RNAPII complex onto DNA within the nucleus.Therefore, actinomycin D is not expected to induce the disassembly of the complex.By comparing untreated versus actinomycin D-treated cells, we could indeed observe that actinomycin D has no effect on Rpb1 association to the other RNA polymerase II subunits.In addition, when directly comparing -amanitin+LMB versus actinomycin D treatments, it was clear that the association between Rpb1 and the R2TP/prefoldin-like complex is much stronger in cells treated with -amanitin+LMB.This confirmed that unassembled Rpb1 is specifically associated with the R2TP/prefoldin-like complex and that it is not an indirect consequence of transcription inhibition.This example shows that the triple SILAC IP strategy can be efficiently applied to the high confidence identification of specific interactions and analysis of protein complex dynamics between several conditions (three different conditions were compared in this study).Here, data mining is enhanced by the integration of several major criteria, including reliable quantitative SILAC ratios.In addition, the quality of MS data highly depends on the number of peptides identified and quantified for each protein and the total sequence coverage.These parameters should therefore also be taken into account to evaluate the reliability of MS results.Interestingly, this SILAC IP strategy can be combined with complementary approaches, including Y2-H, to characterize binary interactions within protein complexes, and fluorescence microscopy, to uncover subcellular localization of protein interactions (Boulon et al., 2010b).

Optimization of experimental procedures
As discussed previously, one major challenge of AP-MS experiments is the reliable discrimination between genuine protein interaction partners and non-specific contaminants.This will be facilitated both (i) by the optimization of the experimental procedure, to increase the IP efficiency (high specific signal) and to reduce the background of contaminants (low non-specific noise), therefore tending to a high signal/noise ratio (Boulon et al., 2010a;Trinkle-Mulcahy et al., 2008) and (ii) by an efficient data analysis pathway that allows the reliable "identification" of the putative contaminants.I will first discuss the different important points that need to be taken into account to optimize a triple SILAC IP workflow.The triple SILAC IP protocol described in Figure 1 is shown for the IP of GFP-tagged baits and the identification of their specific interaction partners in two different conditions, i.e. untreated versus treated cells.However, the triple SILAC co-IP procedure is far from being restricted to the chosen example and can be applied to many different types of investigations, but one has to keep in mind that both the reliability and the sensitivity of the resulting datasets will be extremely dependent on the experimental parameters chosen.It is therefore necessary to think about possible pitfalls and design an "optimal" protocol that will be correlated both to the question asked and to the tools available (ten Have et al., 2011).Important features include the choice of the tag/antibody, the conjugation of antibodies to affinity matrices and the IP protocol.

Choice of the tag/antibody
Both tag-based and endogenous pull-down experiments have advantages and drawbacks.Whenever possible, the use of antibodies targeted against the endogenous baits should be favored.Indeed, endogenous proteins avoid several problems usually associated with the use of tags, i.e. endogenous proteins are naturally expressed in their native cellular environment, with correct expression regulation, PTMs and above all proper interaction partners.However, this strategy relies on the availability of a specific and high affinity antibody that isolates the endogenous bait protein efficiently, which is often not available.In any case, antibody affinity and specificity should always be checked carefully.Noteworthy, a Swedish project (The Swedish Human Protein Atlas project), funded by the Knut and Alice Wallenberg Foundation, has been initiated to generate, in a high-throughput manner, high quality affinity-purified human antibodies to allow for a systematic exploration of the human proteome using Antibody-Based Proteomics (Uhlen et al., 2010).In May 2011, 11,300 Prestige Antibodies covering more than 50% of the human proteome had been developed (http://www.proteinatlas.org).But not all of them have been tested for IP efficiency, and there might still be a long way before all human proteins can be immunoprecipitated using this antibody library.In contrast, tagged baits provide a scalable and general method to identify specific protein interaction partners.Different types of tags are commonly used in affinity-purification experiments, such as fluorescent tags (e.g.GFP), His-tag and Flag tag.In addition, a TAP-tag (Tandem Affinity Purification) methodology can be used, rather than a one step procedure (Rigaut et al., 1999).Although this two-step method reduces the amount of contaminants recovered in the IP eluate, it also decreases the general yield of proteins recovered and risks losing biologically relevant low affinity and/or low abundance interaction partners.Alternatively, the GFP tag has proven to be an effective tag for affinity purification procedures, due (i) to its low background of non-specific interactions and (ii) to the efficient recovery possible using recently developed GFP_TRAP ® (Chromotek) affinity matrices (Rothbauer et al., 2008;Trinkle-Mulcahy et al., 2008).In addition, the GFP tag can be used in a dual strategy combining both fluorescence microscopy and affinity-purification (Trinkle-Mulcahy et al., 2008).All tags, however, can potentially affect protein structure, localization and turnover, resulting in alteration of both protein function and association with specific partners.This problem may be countered by trying different locations for the tag, for example C and N terminal positions.The fact that recombinant proteins are usually overexpressed in mammalian cells represents another important perturbation of the system.Interestingly, the BAC TransgeneOmics strategy, developed by the Hyman lab, allows for the expression of GFP-tagged proteins under endogenous promoters and can be used in high throughput approaches for the identification of specific interaction partners, such as QUBIC (QUantitative BAC-green fluorescent protein InteraCtomics) (Hubner et al., 2010;Poser et al., 2008).In all cases, the generation of stable cell lines expressing recombinant proteins, rather than transient transfections, avoids problems linked to the heterogeneity of gene integration and expression levels between cells.

Conjugation of antibodies to affinity matrices
Antibodies are conjugated, covalently or not, to bead matrices (e.g.sepharose, agarose and magnetic beads).When combined with MS, it is highly recommended to covalently conjugate the antibody to the beads, otherwise a large amount of antibody can be eluted from the beads along with the specific protein complexes and compete with other proteins for further MS identification.The type of beads used for each pull-down experiment is an issue that is worth considering as well, as the efficiency and cleanliness of different types of beads may vary according to the cell type and the type of extract used.In our experience, Dynabeads (Invitrogen) work well for nuclear extracts, whereas Sepharose and Agarose beads (GE-Healthcare) can give lower backgrounds when used with cytoplasmic extracts and whole cell extracts (Trinkle-Mulcahy et al., 2008).

Cell extraction and immunoprecipitation protocol
Cell lysis and protein extraction may be a challenging part of the procedure, according to the protein complexes of interest.In particular, membrane proteins and proteins attached to macromolecular entities, including chromatin and subnuclear compartments, represent a real challenge to release and are therefore often under-represented in protein interaction studies using "normal" extraction procedures.Specific purification protocols may thus be envisaged, such as the modified chromatin immunopurification (mChIP) method (Lambert et al., 2009).To reduce the amount of non-specific binding in a co-IP experiment, several options may be considered, including a pre-clearing step (pre-incubation of cellular extracts with bead matrices alone), incubation times kept to their minimum (1h max) and high stringency buffers (for example adequate buffers according to detergent and salt concentrations).Similarly to the TAP-tag strategy, increasing the buffer stringency may reduce the number of false positives identified but also increase the number of false negatives, by losing precious transient protein interaction partners, which are certainly the most difficult, but also the most interesting, proteins to identify.Therefore, to preserve all genuine protein interaction partners, both stable and transient, medium or low stringency buffers may be favored.As a result, however, many contaminants remain in the analysis, which need to be reliably identified and distinguished from the specific interaction partners.

An additional criterion to identify putative contaminants: The Protein Frequency Library
Even though SILAC IP strategies may have proven themselves successful in the identification of stable interaction partners, relying upon isotope labeling ratios alone does not entirely solve the contaminant problem.Indeed, not a single ratio threshold can unambiguously isolate non-specific binders from genuine interaction partners (Figure 2).There is usually no doubt concerning interaction partners identified with high SILAC ratios, which often are genuine stable interaction partners, but in all SILAC IP experiments there are also low abundance and/or low affinity genuine interaction partners (transient interactions) that show low SILAC ratios (between 1 and 1.5 -2) and are therefore embedded in the background of contaminants.Defaulting to using a high threshold filter eliminates both contaminants and transient interaction partners whereas an overly cautious low threshold will result in keeping both.Hence, it is not possible to rely on SILAC ratios alone to consistently and unambiguously separate contaminants and specific interaction partners.To address this issue, a new methodology was developed, called the Protein Frequency Library (PFL), which provides an additional objective criterion to the data analysis (Boulon et al., 2010a).The principle of the PFL is based on the knowledge that proteins frequently in the data repository (x axis).When proteins are sorted from the highest to the lowest percentage (Figure 3B), the proteins appearing nearest the origin of the graph have the highest probability of being contaminants.The PFL can be applied to analyze data from any MS pull-down experiment, as an additional criterion to evaluate the probability of each protein identified to be a false positive, binding non-specifically to the affinity matrix.When applied to SILAC data, it is possible to superimpose the results given by the PFL and the 2D graph, plotting M/L on the x axis and H/M on the y axis, by highlighting on the graph proteins that have a frequency of detection above a threshold value that has to be chosen.The choice of an optimal threshold has to be determined depending on the number of experiments used to generate the PFL and will certainly become lower and lower as new data are added to the repository (Boulon et al., 2010a).The use of a multidimensional structure, which includes all datasets and associated metadata, allows for possible filtering of the PFL to obtain protein frequencies of detection relevant to each specific set of experimental parameters.Considering all experiments recorded in the database, only those that were performed with the chosen set of experimental parameters are used to generate the PFL, which leads to the generation of a "customized" PFL (Figure 3A).This is of great importance, given that the nature of IP contaminants is highly correlated to the experimental parameters used.For example, contaminants are greatly different according to the bead type chosen, e.g.magnetic or sepharose beads.We have indeed shown that cytoskeleton proteins "stick" to dynamic beads whereas positively charged nuclear proteins are more prone to bind non-specifically to sepharose beads (Trinkle-Mulcahy et al., 2008).Therefore, the PFL can be considered as a dynamic list of "contaminants", which can be filtered for each specific set of experimental parameters.This avoids the need to have a large set of control experiments that exhaustively cover every possible combination of experimental parameters analyzed.The PFL is thus equally applicable to low and high throughput IP experiments.The use of the PFL is not restricted to the Lamond laboratory.The PFL is now freely accessible online (http://www.peptracker.com/datavisual/)after registration.Figure 3A shows an interface of the PFL that can be used to specify experimental parameters on which the library can be filtered, e.g.organism, cell extract, bead type, etc.All users can therefore select their own experimental parameters and obtain a list of putative contaminants in this specific set of conditions.However, a minimum of 15 independent IP experiments in the experimental count (number of experiments that are taken into account to generate the new customized PFL) might be necessary to provide reliable results.Of note, the PFL is a dynamic tool that is updatable, i.e. the PFL is automatically updated as data from new experiments are added to the data repository, thereby increasing in accuracy.The current PFL is necessarily limited to the experiments performed in the Lamond laboratory.However, it is foreseen that in the future external users will have the ability to upload their own data, and therefore increase the spectrum of experimental parameters available, thereby having a broader impact on the scientific community.From my experience, the PFL is especially helpful in identifying "outsiders", i.e. genuine interaction partners that are of low abundance and/or low affinity, which are otherwise lost among the large, nonspecific background of contaminants and therefore often overlooked in AP-MS studies.Interestingly, this tool is an example of meta-analysis.Indeed, the PFL is generated through the integration of data from many independent MS IP experiments, performed by independent researchers using various experimental parameters.This process therefore allows for better data mining.One essential point to note is that any analysis relies upon recorded associated metadata, which provide a crucial support to an improved data analysis.

Standardization of data analysis and storage
The SILAC IP strategy presented in this chapter can be used in low throughput studies but it can also be scaled up to support large scale surveys of protein interactome dynamics.In any case, the assembly of great interactome maps will require the integration of both low and high throughput protein interaction studies.In fact, small-scale datasets are of great value to protein interaction databases, assuming that they are supported by enough metadata (organism, bait, cell type, treatment etc.), as they often report high resolution analyses that provide details missing in large-scale datasets, such as binding sites and dynamic information, thereby increasing local coverage of large interactome networks (Orchard and Hermjakob, 2011;Sanderson, 2009).The main challenge encountered by the scientific community is not the generation of an increased amount of protein interaction data produced by either low or high throughput studies.In fact, the amount of interaction studies increases in an exponential manner as new technologies emerge, which become increasingly accessible to most international research groups.Instead, major issues reside both in the quality and in the homogeneity of the interaction data generated.Poor quality data that cannot be interpreted or exploited are of low interest for the scientific community.Data that are generated in different groups, using different machines and therefore different file formats and analysis pathways, cannot be accurately compared between each other, thereby leading to the accumulation of independent datasets that cannot be integrated.As all published data are inherently of variable quality, there is a need to increase the overall reliability of interaction datasets and develop data standards.This will rely on a strict quality control of all data that are uploaded in public repositories.As proposed by Olsen and Mann, the selection of high quality data could result from "social-network like mechanisms", which would calculate confidence scores for each specific result (e.g. each protein-protein interaction) based on the number of times it would be retrieved in various independent studies using different techniques (Olsen and Mann, 2011).This would help eliminate results that are of poor reliability and thereby enhance confidence of protein interaction databases.Data standardization can probably be considered as one of the main challenges in the field.Currently, interaction data can be found in many different types of format, depending on the vendor and on the analysis pathway.Therefore, creating a common standard data format that could be used by the scientific community would facilitate exchange, comparison and integration of datasets, which is absolutely essential and requires an intense international coordination.Since 2004, a consortium of databases, including BIND, DIP, IntAct, MINT and MIPS, agreed to develop a community standard data model for the representation and exchange of protein interaction data (Hermjakob et al, 2004).These databases were grouped into IMEX (International Molecular interaction EXchange) (Orchard et al., 2007).The standard format called PSI-MI (XML format) was developed by members of the Molecular Interaction (MI) group of the Proteomics Standards Initiative (PSI), which belongs to the Human Proteome Organization (HUPO).Of note, the PSI-MI format cannot handle quantitative MS data yet.In terms of storage of MS data, the PRIDE (PRoteomics IDEntifications) database, hosted at the EBI (European Bioinformatics Institute), is a centralized, standards compliant, public data repository for MS-based proteomics data that compiles protein and peptide identifications (http://www.ebi.ac.uk/pride).Along with the standardization of protein interaction data formats, the efficient and reliable recording of metadata is absolutely crucial to better analyze and exploit datasets.Indeed, metadata represent useful information that is required for data mining, comparison and retrospective studies, as it has been shown in the case of the PFL.To address this issue, a HUPO project has led to the development of MIMIx (Minimum Information required for reporting a Molecular Interaction experiment) (Orchard et al., 2007).As described by Orchard et al, MIMIx represents a "compromise" between the vast amount of information that would be necessary to precisely describe and reproduce an interaction experiment, which should be present in any original publication, and the constant load placed on scientists who upload their data into databases.As guidelines, the MIMIx checklist contains several experimental parameters that need to be accurately specified, including the host organism, correct molecule identifiers generated by major databases (Uniprot and RefSeq), detection method etc.In addition, a proper controlled vocabulary should be used (for example, bait/prey), as well as confidence values attributed to the interaction whenever possible (Orchard et al., 2007).These guidelines may (i) help increase the usefulness and the clarity of publications reporting interaction data and (ii) improve systematic recording of protein interaction data in public resources, thereby increasing their access to a wider community.Finally, efficient protein interaction data analysis relies upon powerful visualization techniques that enable the representation of large and dynamic protein interactome network maps.There is definitely a large demand for this type of tool that will need to be covered by the development of new cutting edge visualization software.Many different tools have been generated already, including the Cytoscape project that has integrated plugins to allow Cytoscape to interact with relational databases, Osprey, which is associated to the BIOGRID database, Genego Metacore and Ingenuity.These tools provide good graphical interfaces to visualize protein interactome networks, although downstream data analysis may also rely on lab oriented tools, such as the PFL.

Conclusion
The scientific community has been developing immense efforts to map the human protein interactome network.However the characterization of a static interactome only provides a list of possible interactions, without questioning when these interactions occur, and how they are regulated.It is therefore necessary to focus on a more functional analysis by studying the dynamics of protein interactions.The combination of SILAC-based quantitative proteomics with affinity purification techniques currently provides a reliable strategy to both identify specific protein interaction partners and analyze subtle changes in protein interactions between different conditions.One can envision that the development of new techniques and analysis tools will certainly also favor the use of label-free approaches in the future.However, despite an escalating number of outstanding studies reported in the literature and the increasing performance of technologies, many challenges remain to be faced before a dynamic map of the human interactome can be assembled.In particular, an international coordinated effort to standardize data formats and develop powerful software for data analysis and visualization may allow to efficiently exploit, compare and integrate datasets generated all over the world, thereby resulting in higher reliability and usefulness of protein interaction data.New insights into the human protein interactome dynamics would undoubtedly benefit both basic and clinical sciences, by providing essential information about the function of individual proteins, connections between them and the functional organization of the cell as a whole system.This may rely upon the identification of key protein "hubs", i.e. proteins that are highly connected in interactome networks and may have crucial roles in specific disease pathways.Interestingly, significant correlations have been found between protein interactome maps and disease-associated gene networks, suggesting a potential predictive use of protein interactomes for the identification of nonintuitive disease-related genes and putative drug targets.

Acknowledgment
I am very grateful to Yasmeen Ahmad for critical reading of the chapter.I thank Aymeric Bailly for advice and suggestions.I thank the Lamond and Bertrand laboratories for fruitful discussions regarding the development of the strategies described in this chapter.I apologize to those investigators whose studies were not included in this chapter due to space limitations.This work was supported by a Human Frontier Science Program longterm fellowship to the author.
(A) Overview showing the workflow of a representative triple SILAC IP analyzing the changes in specific interaction partners of GFP-tagged bait stably expressed in U2OS cells in response to a drug treatment.References to R0K0, R6K4 and R10K8 culture conditions can be found in the body text.(B) Diagram illustrating the SILAC principle of differential labeling.The bait and its specific interaction partners should only be retrieved in medium and heavy conditions, thereby showing high M/L and/or H/L SILAC ratios, whereas non-specific contaminants are present in all three conditions, thereby showing M/L and H/M close to 1. (C) Typical MS spectra obtained for representative peptides of a specific interaction partner (top), an experimental contaminant binding non-specifically to the affinity matrix (middle) and an external environmental contaminant (bottom).IP: immunoprecipitation; L: light; M: medium; H: heavy; GFP-Trap_A®: GFP binding protein coupled to a monovalent matrix (Chromotek).

Fig. 1 .
Fig. 1.Overview of triple labeling SILAC analysis of protein interaction partners.