Dynamic Proteomics: Methodologies and Analysis

Proteins are dynamic and any detailed description of the proteome must reflect the dynamic variations in protein properties. For example, most proteins form complexes with other protein partners, can undergo various post translational modifications and can accumulate in different sub compartments of the cell. Spatial and temporal variations between proteins in different compartments and/or cell types mean that each experiment for mass spectrometric analysis must be carefully designed to optimise the data that can be obtained. Recent improvements in experimental methodologies and in the resolution and sensitivity of Mass Spectrometers, have expanded the complexity of proteomic analysis that is now possible[1]. In this chapter we outline current workflows and methodologies that facilitate complex proteomic analyses, from the design and execution of experiments, though to the analysis and interpretation of the resulting data.


Introduction
Proteins are dynamic and any detailed description of the proteome must reflect the dynamic variations in protein properties. For example, most proteins form complexes with other protein partners, can undergo various post translational modifications and can accumulate in different sub compartments of the cell. Spatial and temporal variations between proteins in different compartments and/or cell types mean that each experiment for mass spectrometric analysis must be carefully designed to optimise the data that can be obtained. Recent improvements in experimental methodologies and in the resolution and sensitivity of Mass Spectrometers, have expanded the complexity of proteomic analysis that is now possible [1]. In this chapter we outline current workflows and methodologies that facilitate complex proteomic analyses, from the design and execution of experiments, though to the analysis and interpretation of the resulting data.
SILAC labelling can be used to quantitate a wide range of biological experiments based upon differential comparisons of two or three cell states or conditions. For example, immuno-precipitation and protein-protein interaction analysis, cellular fractionation for localisation studies and measurements of protein synthesis, degradation and turnover can all be quantitated using the SILAC approach [2][3][4][5][6][7].The SILAC approach can also be used to carry out high throughput analyses on entire proteomes and can help to identify subsets of proteins that respond to specific cellular perturbations.
Reliable interpretation of SILAC data requires computational analysis. Widely accessible spread sheet applications like excel are commonly used for this task. This involves numerous peptide and protein identifications, with several isotope ratio and/or intensity values associated with each identification. The interpretation of these data is often the most complex part of the proteomics experiment. How to go about data quality assurance and culling, as well as modelling the data in such a way as to draw valid conclusions will be discussed in this chapter.
Mass Spectrometry-based proteomics is evolving into a multidimensional analysis world (e.g. identification, quantification, space and time), where not only do we identify and quantify the proteome but also characterize changes in protein properties (e.g. subcellular location) at different time points, and under different conditions (e.g. response to a drug treatment). These types of analyses help to provide a functional characterisation of the genome and may facilitate the application of proteomics for clinical studies.

Mass Spectrometry of proteins
Mass Spectrometry of proteins is based on several principals of chemistry and physics, namely mass and generation of charged molecules, or ions. Given the known composition of amino acids (Figure 1), and the inferred knowledge of protein composition (from the Human Genome Project [8], the protein products are predicted from the genetic code), we can therefore compute in silico the predicted molecular weight of every protein.
Additionally, the mass change resulting from any modification (made in the laboratory, for example reduction and alkylation of cysteine di-sulphide bonds) can be accurately predicted and therefore matches can be made between these calculated values and the experimental ion masses measured in a mass spectrometer.
For complex protein mixtures (e.g., cellular lysates, immune-precipitates, whole organism and/or tissue lysates) protein analysis is typically performed using the following methodology: Protein solubilisation; A separation step to fractionate the complex protein mixture (e.g. gel electrophoresis, isoelectric focussing or size exclusion chromatography); Reduction and Alkylation (to disrupt di-sulphide bonds in proteins, and add a carbomidomethyl modification to cysteine residues to inhibit di-sulphide re-formation); Digestion using a proteolytic enzyme such as trypsin (as bottom up -or peptide level analysis is the most common form of protein analysis in mass spectrometry); HPLC reversed phase chromatography (which reduces the complexity of peptide samples sufficiently for the instrument to measure the individual ions); Electro-spray ionisation and tandem mass spectrometry ( Figure 2). Through the process of electro-spray ionisation peptides can be charged (mostly positively charged), and the charge used to control their movement through the instrument. The mass spectrometer then performs a survey scan (characterising all of the peptide ion masses present in a given time window), followed by several sequencing scans, which isolate and fragment peptides, one ion mass at a time, by colliding each selected peptide ion with inert gas molecules, thereby generating fragment or daughter ions. These fragment ions characterise the amino acid sequence of the selected peptide. Using this analysis strategy the current generation of mass spectrometer can generate ~30,000 or more spectra from a typical protein lysate, thereby identifying and quantifying hundreds to thousands of separate proteins, depending on the complexity of the sample.

SILAC
Relative quantification of proteins in two or more samples is aided by using isotope labelling techniques such as SILAC [9]. SILAC (Stable Isotope Labelling of Amino acids in Cell culture) is a quantitative method of analysis where specific amino acids (typically arginine and/or lysine) undergo a forced enrichment of heavy carbon, nitrogen and hydrogen isotopes (namely C13, N15 and deuterium, all of which are not radioactive) in cell culture, by using amino acid depleted media. After approximately 6 cell division cycles the vast majority of the proteins are completely substituted with the heavy isotope labelled amino acids ( Figure 3) [9]. These basic amino acids are cleavage sites for the enzyme trypsin, ensuring every tryptic peptide measured contains a single SILAC isotope label. Modern mass spectrometers are extremely sensitive instruments that can detect the changes in weight caused by the presence of different isotopes e.g. 'medium' labelling of proteins is generally created by using L-arginine-13 C6 14 N4 and L-lysine 2 H4 (R6K4). Thus the 'medium' labelled arginine and lysine will have an increased mass of 6Da and 4Da, respectively, relative to the normal 'light' isotopes in naturally occurring amino acids. The MS spectra display these differences as distinctive double (for 2 SILAC labels) or triple (for 3 SILAC labels) peaks at a given mass for the endogenous/light peptide Using this technology experimental scenarios have been established allowing the characterisation of the dynamic proteome [2-4, 6, 7]. SILAC specific instructions and some product information can be found in the Supplementary Methods section under 'SILAC'. Please note other suppliers are available and these can be found on the internet. *'Light labelled' proteins should be prepared from cells grown on media identical to the heavy labelled media (i.e. media, to which amino acids with the normal 'light' isotopes and dialysed Calf serum are added). [Using 'normal' non-SILAC media is not sufficient as this will have a different composition, due mostly to the non-dialysed calf serum used in typical media. Non-dialysed calf serum may have more small molecules than dialysed, which could potentially change the growth rate of cells, therefore giving differential growth conditions between the control sample and the sample of interest. Figure 3. The principle of SILAC. Cells which have been grown for >6 generations in SILAC media contain proteins completely substituted with heavy isotope-labelled amino acids. These different mass labels can be used for distinguishing and comparing proteins in a wide range of experimental conditions. When mixed with a control sample, with a different SILAC label, the resulting spectra of each peptide, allow accurate relative quantitation.

Protein interaction proteomics
As mentioned in the introduction, many proteins do not act in isolation, but form complexes with partner proteins and a major goal is to identify the detailed composition of these respective multi-protein complexes. However, the dynamic nature of the proteome means that there may not be a unique description of protein complexes. For example, at different cell cycle stages and/or under changed conditions (e.g. following drug treatment) the partner proteins in a complex might either change, or vary in abundance and/or modification state. Our aim, therefore, is to analyse both the composition and dynamic nature of protein complexes. . Immunne-precipitation. The beads used in immune-precipitation experiments contribute a significant amount of non-specific binding proteins, and therefore need to be accounted for with the use of a bead control. This is done using cell lysate and the beads of choice, without the antibody, leading to a sample which contains only background proteins. With this information the genuine interactors can more easily be determined.
Using immune-precipitation (harnessing the specificity of antibodies to protein targets) for protein interaction experiments has long been the gold standard, particularly in combination with traditional western blot analysis. With the application of mass spectrometry to characterise immuno-precipitates, the analysis has now expanded to identifying hundreds of proteins in each IP. A large percentage of the proteins pulled down in an immune-precipitation experiment bind non-specifically, for example binding to the beads used as the solid substrate for the antibody rather than to the bait or target protein ( Figure 4). The beads often have a high general binding affinity for protein [4,10]. Without good controls these non-specifically binding proteins can occlude identification of the specific proteins of interest. To allow for this a bead control is often included as part of the experiment. A bead control is provided by applying equal amounts of the cell lysate of choice to the beads being used for the immune-precipitation, sans antibody. This generates a sample that will predominantly contain non-specific binding proteins, which can be identified during the analysis and distinguished from the genuine protein interaction partners in the complex of interest. While label free MS analysis is effective for this protocol, differential SILAC labelling to distinguish the control and experimental conditions (e.g. R0K0-bead control, R6K4 protein of interest, R10K8 protein + drug) improves the accuracy of quantitation and efficiency for this kind of analysis.
Extensive analysis of hundreds of immune-precipitations, with lysates from various different cell lines and bead types has been used to generate a database recording protein identification frequencies, i.e., recording the number of previous experiments where any given protein was identified [4]. The higher the number of times a protein is identified in different IP experiments, the more likely it represents a non-specific binder. This protein frequency library information is available as an online resource for comparing immuneprecipitation results; see http://www.peptracker.com/datavisual/.
A basic immune-precipitation protocol can be found in the Supplementary Methods section under 'Immuno-precipitation Protocol'.

Dynamic proteomics: How it's done
The use of SILAC labelling enables a wide range of assay formats to be designed for quantitative comparison of protein properties under different conditions. For example, using SILAC in conjunction with cellular fractionation, immune-precipitation and time course experiments it is possible to analyse the kinetics of protein transport, synthesis, degradation and interaction.
Using a combination of physical and chemical separation methods, including differential density centrifugation, it is possible to fractionate cells and isolate subcellular organelles and components such as cytoplasm, nucleoplasm, membranes etc. There are also commercial kits available that can be used to fractionate cells and combined with MS analysis. The cellular fractionation most commonly used in our hands concentrates on distinguishing between cytoplasmic and nuclear localisation of proteins in eukaryotic cells, allowing analysis of compartmentalisation of protein function and nucleo-cytoplasmic transport under different cell growth conditions and responses [3,11]. Figure 5 illustrates the procedure and the specifics of the methodology can be found in the Supplementary Methods under 'Cellular Fractionation'.
The principles of the fractionation strategy, as applied to mammalian cells grown in culture, are as follows; application of a hypotonic (low salt) buffer to freshly trypsinised cells, followed by a gentle mechanical disruption with a dounce homogeniser. This causes the cells to swell, and hence disrupts the outer cell membrane. The resulting 'cellular' suspension is centrifuged such that larger organelles, including the nucleus (which at this stage is intact) will spin down into a pellet, whilst the soluble material and smaller cytoplasmic material will stay in the supernatant. Thereafter stronger mechanical disruption is employed (e.g. sonication) to lyse the nucleus, and one or more additional fractionation steps (e.g. density gradients) are used to separate organelles and other subcellular structures based on properties such as their size, density and/or shape.
This can be combined with MS-based approaches and SILAC to determine changes in the subcellular organisation of the proteome induced by stress or other perturbations (e.g. UV, drug treatment etc.). This is done by growing cells in media with different SILAC labels, using one of the labels as an untreated control sample (e.g. 'light') while exposing cells grown in a different label (e.g. 'medium' or 'heavy') to the perturbation, e.g. stress, drug treatment. After incubating for the desired time, which will vary depending on the treatment being performed, equal numbers of cells from each control and experimental sample can be mixed and the fractionation protocol carried out. Alternatively, the cell fractionation can be performed separately for the different samples and then mixed to combine equal amounts of protein from each ( Figure 5). In this technique proteins remaining unchanged as a result of the perturbation will show a SILAC ratio for the control and experimental isotopic forms of ~1 (or if a log ratio is plotted, 0). In contrast, proteins which have been altered as a result of the experimental treatment (e.g. moved from cytoplasm to nucleus) will show either an increased or decreased SILAC ratio, according to the design of the experiment. This conveniently highlights a particular subset of proteins that may respond to a specific perturbation and provides in parallel a direct comparison with the bulk response of the large number of cell proteins sampled in high throughput.
This approach can also be used in combination with a Pulse SILAC experimental set-up, as discussed below. The cellular fractionation protocol described above allows the characterisation of changes in the steady state localisation of proteins and of kinetics of protein movement, but this is not the full story. Although the location of a protein is fundamental to its function, the change induced by your experimental variable might also affect protein turnover, either by changing rates of protein synthesis, degradation or both. So how do we characterise this? Pulse SILAC techniques have enabled an elegant experimental procedure to characterise the time dynamics of the proteome [3,12,13]. This involves generating a population of completely labelled cells in medium label (e.g.R6K4), and switching the media over to heavy (e.g.R10K8). Over time conversion of all the medium labelled protein into heavy labelled protein occurs. Collecting cells at various time points, and mixing these with light labelled cells (50:50 as per usual SILAC) as a control steady state of protein expression (Figure 6), gives samples which characterise protein synthesis and degradation (13).
The benefit of this kind of experimental set up is evident in the downstream data analysis. Decrease in the medium to light ratios describes the degradation rate of a given protein, whilst increase in the heavy to light ratio describes the synthesis of new proteins. The time point at which these 2 curves intersect (assuming you have a sufficient number of time points for accurate measurements) describes the time required for turnover of 50% of the protein. Analysis of proteome turnover in the HeLa and HCT116 cell lines has been carried out and made publicly available at http://www.peptracker.com/turnoverInformation/.
Bearing in mind different cell lines have varying cell cycle length, the online Protein Turnover Viewer can allow comparison of new results with this database, and hence reveal differences in behaviour between cell lines. The Protein Turnover Viewer has an easily navigable interface, allowing Uniprot identifiers to be used to identify a protein of interest to find out the data on its turnover.  This technique is not only useful for steady state, or 'normal' protein turnover analysis. It fits very well to drug treatment kinetics, microRNA effects, DNA damage analysis (e.g. UV or chemical induced), or physiological perturbations (e.g. hypoxia or other forms of stress). Analysis of the resulting data is more complex than a more simple SILAC experiment and the data set larger, but provides a useful wealth of information about protein dynamics.

Data analysis
Data analysis of SILAC experiments needs to be tailored to the specific question, but the beginnings of the analysis process are very similar and can follow this method: MaxQuant  Data Culling  Population Statistics  Data Grouping

MaxQuant
MaxQuant is a comprehensive software package widely used for the analysis and quantitation of MS-based proteomic data, including SILAC, that was created by Jurgen Cox and Matthias Mann [14,15]. It is made available as freeware and can be downloaded from http://maxquant.org/. MaxQuant includes a search engine that can use raw MS data from the mass spectrometer, perform peak picking, mass recalibration, SILAC pair matching and quantification, label free quantification, database searching (using Andromeda), and output peptide and protein data in extensive detail [14,15]. While other commercial and freeware software options are also available for analysis of MS data we routinely use the MaxQuant package which works very well specifically for the protocols described here.

Data grouping
Data grouping is a way of making large data sets easier to manage. In an ideal world having a database with experimental values linked to reliable meta data describing the experimental parameters is the best case scenario for proteomic data management [2][3][4]15].
Online versions of proteomic databases are available which allow mass spectrometry based experimental data upload, and subsequent comparison to other datasets contained in the database, such as PRIDE (http://www.ebi.ac.uk/pride/) [16]. Several other MS data repositories (namely Tranche and PeptideAtlas) have combined with PRIDE to form the Proteome Xchange (http://www.proteomeexchange.org) which enables submission from a single webpage and the combination of the data from all three repositories. In depth analytics on this data has not been performed-comparisons are mainly based around protein identification, and classification.
Quantitative comparisons of datasets in this forum aren't possible but grouping/result set selection according to numerous meta data and protein identifiers is possible. Absolute quantification comparisons with experimental datasets is possible through PaxDB (http://pax-db.org) [17] which not only contains data for most model organisms but has correlated absolute quantitation information from 28 datasets, and computed the average parts per million value for thousands of proteins. These data can be searched 100 identifiers at a time.
Using the MaxQuant software for data processing allows the grouping and separation of data from individual MS analyses [14,15]. MaxQuant can combine data from all the protein fractions from a sample (if it has been pre-fractionated before MS), and can separate different samples from different conditions, but combine and output the results in one excel sheet. This facilitates direct comparison between all samples with all ratio/intensity data present.
When the appropriate population statistical analyses have been performed and a statistically valid significance cut-off has been calculated, the candidates for up-or down-regulated proteins from each group can be identified. When performing analysis of proteome dynamics, these results can also be compared with other variables. For example, a cell fractionation experiment performed, in conjunction with a time course of a drug treatment. Time course data can also be analysed to determine trends. It is important to have a zero time point, to describe the basal protein level, and use this to normalise values from the later time points followed by detection and grouping of trends. Most proteins will show little or no change over time but specific groups may show trends, for example reflecting regulation as a result of cell cycle, which appear as one or more peaks/troughs (figure 8) that can be identified by clustering analysis (this analysis was done with StatistiXL (http://www.statistixl.com/features/cluster.aspx) and further correlated with other data, such as GO terms or protein network information (Network analysis was done with String data base analysis http://string-db.org/ [18]) . In the example shown, network analysis of the proteins found to have similar expression trends indicated that that the proteins identified were linkers between 2 or more functional networks, showing the transfer of effect through regulation, over time. With any other kind of grouping, such as for example Go term or subcellular location, this association between known networks would not be determined; it is only seen in the regulation trend association. Hierarchical clustering of protein ratios over time, leading to effective grouping of expression trends. This kind of trend grouping and analysis was not possible by grouping according to GO terms or cellular location, or network association. Network analysis of these clustered groups after hierarchical clustering is advisable however, as interaction between known networks is often identified.

Conclusions
Life scientists working in the proteomics field have had the privilege of being at the cutting edge of an emerging technology that has opened up new possibilities for improving experimental design and data analysis. As proteomics can be "characterized more by its diversity than a common methodological or subject orientation" [1] the applications developed to accommodate this diversity should be made available and accessible to the wider scientific community.
The methods described here allow the description and measurement of protein-protein interactions, changes in proteome localisation and rates of synthesis and degradation. While the bench-top methodologies are relatively straightforward, the key to harnessing the biological value of the experiments often lies in the methods used to analyse the resulting data. We recommend systematic recording and management of all data, from all experiments. Systematic recording of detailed meta data can be used to extract information and obtain new results through a comparison of data trends across many different and often unrelated experiments. We term this approach, 'Super Experiments'.
All protocols discussed above can be found on greproteomics.lifesci.dundee.ac.uk and www.lamondlab.com websites.

SILAC-Stable Isotope Labelling of Amino acids in Culture
The following protocol provides a step by step guideline for preparing SILAC media and growing labelled cells in tissue culture.
Media can be bought ready or be made by the user, prior to use. Cells should be grown for a minimum 6 passages for complete labelling.

Cellular fractionation protocol
This protocol will provide an effective technique to fractionate a variety of different cell types into cytoplasmic, nucleoplasmic and nucleoli fractions. The exact recipes for the solutions required throughout the protocol are provided at the end.
Nuclear pellet can be stored in any volume of buffer at -80⁰C and can be spun out again using the same centrifugation parameters as step 5.

Immuno-precipitation protocol
This technique is very useful in the purification of a protein of interest. The technique works through the formation of an antigen: antibody complex which is attached to agarose/sepharose/metallic bead. The bead coupled to an antibody provide a matrix to which the protein of interest can bind allowing the other undesired components of the whole cell extract to be washed away. The eluted sample from the beads can then be further processed by gel electrophoresis and MS.
The protocol that follows is a very generic standard procedure presented as an initial recommendation for those who have not performed or optimised an IP previously.
Reagents required. 1. Place whole cell extract aliquot in a round-bottomed vial to ensure good mixing. Add antibody to the required specific dilution for what you're using (you may need to consult your information booklet for antibody dilution guidelines.) When using cell