Mass spectrometry-based proteomics, the large-scale analysis of proteins by mass spectrometry, has emerged as a powerful technology over the past decade and has become an indispensable tool in many biomedical laboratories. Many strategies for differential proteomics have been developed in recent years, which involve either the incorporation of heavy stable isotopes or are based on label-free comparisons and their statistical assessment, and each of these has specific strengths and limitations. This chapter gives an overview of the current state-of-the-art in quantitative or differential proteomics and will be illustrated by several examples.
- Mass spectrometry
- heavy isotope labelling
- chemical tagging
- 18O labelling
Analysis of the proteome using mass spectrometry has proven to be an indispensable tool in biomedical research over the past 15 years or so. Originally, because of technical limitations, only qualitative measurements were performed for the identification of proteins in a sample. However, the need to put a quantitative label on proteomics analyses became evident rapidly. For this reason, several different technologies were developed for their use, in combination with mass spectrometry, to supply researchers with more quantitative data to investigate, e.g. the dynamics of a particular proteome. In this chapter, a brief overview of quantitative approaches in mass spectrometry-based proteomics will be given. Current protocols for quantitative analysis and software solutions for data analysis will be discussed and examples from the field (including our own laboratory) will be given to illustrate the power of these methods.
2. Mass spectrometry-based proteomics
Before the application of mass spectrometry, protein analysis was mostly based on the purification of single proteins or protein complexes, followed by the performance of experiments on these purified proteins or complexes. Usually, such biochemical experiments are quite laborious and mostly reliant on the extent to which the protein can be purified. Although mass spectrometry as a technique to study small molecules dates back to the beginning of the 20th century, the use of mass spectrometry in peptide and protein analysis is more recent. The development of both matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI) in the 1980s was key to this development, as these techniques allowed the ionization of biomolecules such as peptides, proteins and nucleotides, which made their detection by mass spectrometry possible. The 2002 Nobel Prize in Chemistry was awarded to John B. Fenn and Koichi Tanaka for the development of these ionisation techniques . With the possibility to analyse biomolecules, in particular peptides and proteins, using mass spectrometry, the key step towards proteomics was made. Equally important was the advent of the genomic age, supplying the databases which are instrumental for the analysis and identification of proteins, as well as the technical advances of both mass spectrometers and the (bio)informatic infrastructures that are essential for large data handling.
Mass spectrometry in itself is merely a qualitative analytical technique. The biochemical and biophysical properties of proteins and peptides are quite variable, which leads to large differences in properties such as ‘sprayability’ and, thus, in resulting ion intensities between different peptides, even though these may be present in equimolar amounts in the sample. In order for mass spectrometry to be useful not only for the qualitative analyses but also for the quantitative analysis, these caveats and problems need to be addressed and solved. Concerning the different types of mass spectrometers, there are several physical principles to choose from. While it goes beyond the scope of this chapter to discuss all of these in details, it is quite useful to be aware of the different possibilities available as these may influence the performance of the quantitative analysis. The type of mass spectrometers that are most widely used in proteomics are (1) time-of-flight (ToF), (2) quadrupole, (3) (Paul) ion trap, (4) FTICR, or (5) orbitrap [2,3]. In ToF analysis, the velocity of an ion is measured in order to determine the size of the particle. The quadrupole analyses the movement of an ion through an electric field, while the Paul ion trap is a type of quadrupole that uses static direct current and radio frequency oscillating electric fields to trap ions. In an FTICR mass spectrometer, ions are trapped in a strong magnetic field and the periodic movement of the ions is translated back to
3. Overview of quantitation methods
The first (semi-)quantitative approach to proteomics was achieved using ‘2D difference in gel electrophoresis’ (2D-DIGE). With this technique, proteins are separated according to size and charge, and it includes the incorporation of fluorescent labels (CyDye) to allow the comparison of two conditions versus an internal standard [4,5]. After separation of the proteins, the spots are analysed using specialised software, measuring the relative fluorescence intensities. Spots that appear to be differentially regulated can then be excised from the gel and identified using mass spectrometry . The usage of an internal standard, usually a mix of the two measured conditions, supplies this method with quantitative properties. However, the limitations of 2D-DIGE, and of 2D gel electrophoresis for complex samples in particular, have led to decreased usage of the technique. Because of limitations in the number of samples to be compared, in studying membrane-bound proteins, as well as in the relative low proteome coverage, alternative technologies have now superseded the use of 2D-DIGE as a quantitative proteomics method. Techniques currently used for quantitation are summarized in Figure 1 and discussed further below.
Nowadays, the techniques most frequently used to quantify proteins using mass spectrometry involve labelling proteins with isotopically labelled tags, which can be distinguished in the mass spectrometer because they differ in mass. Differential mass tags result in a (usually only small) mass difference between the ‘light’ and ‘heavy’ sample, while proteins and/or peptide properties such as the retention time on a chromatography column are not affected. This allows for the simultaneous analysis of the tagged proteins in a single mass spectrum or LC-MS run. Several methods based on the addition of labelled tags are used in modern proteomics, each with their strong and weak points. Furthermore, with the development of more sensitive and faster mass spectrometers, methods that allow quantitation of proteins in a label-free manner have been developed, including spectral counting and the comparison of ion intensities. These techniques have the distinct advantage of requiring no (chemical) labelling of the sample, but the trade-off is the lower accuracy of the quantitation. All of these techniques will be described in this chapter, including several examples of how they are used to answer biomedical questions currently posed in the field.
4. Metabolic labelling
The use of amino acids with either of light or heavy stable N and/or C isotopes in growth medium is an approach that was introduced by the Mann lab . Because the labelling takes place at the very beginning of the proteomics workflow, samples can be mixed at the earliest possible time point. Consequently, the occurrence of systematic errors that may be introduced during sample handling is reduced [8,9]. Although this method has shown to be a powerful way to perform quantitation in proteomics in many different applications, there are also several disadvantages to using metabolic labelling, most importantly the inability for application in human tissue samples. Because the samples need to be metabolically active in order to incorporate the label, this automatically precludes, e.g. blood and biopsy samples. This makes it impossible to use metabolic labelling in a diagnostic setting. Furthermore, for some metabolic labelling approaches, reliable software for data analysis is still lacking. The labelling itself is quite time-consuming, as it takes some time for cell cultures to become completely labelled. Finally, the costs of metabolic labelling approaches may be substantial due to the amount of expensive labelled reagents . Below, several types of metabolic labelling will be discussed in more detail.
4.1. 15N labelling
The use of heavy nitrogen (15N) to label whole model organisms dates back to the 1960s, when it was applied to plants for the first time (see  for a review on the matter). In the late 1990s, this strategy took off for other organisms such as
4.2. 13C labelling
Another prime candidate for metabolic labelling is 13C, as carbon is a key player in protein chemistry. 13C labelling has been successfully used in the determination of protein turnover rates. For instance, by feeding
In cultured cells, the metabolic labelling method of choice is stable isotope labelling using amino acids in culture (SILAC), which uses isotopically labelled amino acids (See Figure 2 for a typical SILAC workflow). In order for the amino acids to be incorporated into proteins, it is necessary to determine whether the studied organism is an auxotroph for said amino acid. If a cell or organism is an auxotroph for an amino acid, it cannot synthesize this amino acid itself and, therefore, the amino acid should be supplied in the food or in the growth medium . Usually in SILAC, labelled lysine and arginine are used, which are particularly useful for proteins that are processed with trypsin. Since trypsin cleaves after lysine and arginine, in principle all peptides except for the C-terminal peptide are labelled. If the cells are auxotroph for the selected amino acids, all proteins in a cells are generally completely labelled after several doublings . Conversely, this means that the cells must be dividing, which precludes the use of this technique on primary tissue samples. A complication that has been described in the literature that could potentially interfere with quantitation of SILAC labelled proteins is the natural occurrence of arginine-to-proline conversion. While lysine and arginine are relatively stable in the cell, it is possible for the cell to produce proline from spare arginine, which can then lead to heavy labelled proline. Obviously, this is undesirable and should be accounted for either experimentally or during data analysis (see e.g. ).
Labelling using SILAC can also be used to examine post-translational protein modifications such as phosphorylation and ubiquitination in a quantitative manner. An example of this is a phosphoproteomic study in yeast after the knockout of a kinase that plays a role in growth and division . SILAC can in principle be used for any cultured cell type. A recent study from our lab into hormonal signalling in
An interesting technological progression in the recent years has been the emergence of fully labelled SILAC organisms, such as fruit flies, mice and rats, which allows for
Finally, the so called ‘super-SILAC’ standard is a pool of multiple cell lines that have been labelled using SILAC, which is then spiked into experimental samples. By spiking all the samples with this standard, quantitation becomes possible without the necessity to label the samples themselves using SILAC. This allows the application of SILAC quantitation in patient tissue, which can evidently not be labelled using traditional SILAC. It should be noted that it is recommended to have a representative sample for the tissue to be studied in the SILAC standard, which limits the usage of this technique to tissues with a representative cell line. For a more in-depth review on this topic, see .
5. Chemical labelling strategies
The use of chemical labelling strategies for relative quantitation in proteomics dates back to the late 1990s . The major advantages of using chemical techniques rather than metabolic labelling are the reduced cost and the higher speed of sample processing and analysis. Where labelling cells with SILAC may take up to several days , chemical labelling protocols are usually performed in less than an hour . Chemical labelling can be applied to any protein sample, not just metabolically active samples, and some of the techniques allow for a high number of samples to be analysed simultaneously . However, since chemical labelling is done either at the protein level or at the peptide level and at a relatively late stage in the sample preparation protocol, systematic errors are introduced more readily. Also, labelling at the protein level requires specific proteins such as cysteine or lysine, which makes peptides without these amino acids not quantifiable [10,24].
5.1. Labelling with an Isotope-Coded Affinity Tag (ICAT)
The first chemical labelling technique that was described for quantitative mass spectrometry was the isotope-coded affinity tag (ICAT). In ICAT, a thiol reactive group is used to conjugate the tag to cysteine residues in the protein. Apart from the reactive group, the tag has a linker and a biotin moiety. The linker has either eight hydrogen atoms for the light version or eight deuterium atoms for the heavy version, which are used to distinguish two differentially labelled conditions by the 8 Da shift in the mass spectrum . The biotin moiety of the tag can be used to affinity purify the tagged peptides after trypsinisation. The weakness of ICAT lies in the requirement of cysteine residues to be present in the peptide, which leads to a limitation in the amount of peptides tagged. Furthermore, the presence of deuterium causes a shift in elution times when peptides are fractionated using HPLC, which hampers subsequent data analysis . This elution time shift problem was later solved by introducing 13C instead of D into the linker moiety. ICAT labelling has, for instance, been used to investigate the redox state of proteins in a study to the formation of reactive oxygen species and the way this is dealt with by the cell . The ability to use ICAT in human samples has been exploited in screening cerebrospinal fluid samples of Alzheimer patients to find novel prognostic biomarkers .
Labelling using isotope-coded protein labels (ICPL) is based on a similar principle as ICAT. In ICPL, lysine residues in intact proteins are labelled, which are more common than cysteine residues. The mass difference between isotope pairs of the labelled and unlabelled peptides depends on the amount of labelled lysine residues in the peptide and can be determined fairly simply, which provides strong constraints for database searches . A disadvantage of labelling lysine residues is that modifying the residue side chain makes it impossible for trypsin to cleave at this particular lysine residue. As such, this results in much longer peptides after trypsin digestion, as cleavage will only occur after arginine residues, which may lead to proteolytic peptides that cannot be detected. It is therefore recommended to either use another or an additional protease for protein digestion, or to perform the labelling at the peptide level after proteolytic cleavage. A study on tumour cell senescence in which ICPL was successfully used is a good indicator for the power of quantitative proteomics in general. Here, an effect of tumour cell senescence on several important tumourigenesis proteins such as cMYC and key metabolic enzymes such as ATP synthetases were found .
5.3. Isobaric tagging
Tandem mass tags (TMT) and isobaric tag for relative and absolute quantitation (iTRAQ) are based on labelling peptides with isobaric tags. Here, the label is conjugated to the N-termini and lysine residues of peptides, so that in principle every peptide is labelled (Figure 4). The various isobaric tags themselves have different masses, but are balanced by a linker moiety that ensures identical intact masses for all possible combinations of tag plus linker. As a consequence, differentially labelled peptides end up in the same precursor peak in the mass spectrum. Only when this peak is subsequently selected for fragmentation, the linkers will be cleaved first, which leads to the appearance of peaks corresponding to the different tags (‘reporter ions’) in the low
Due to the high number of samples that can be measured in one run, its applicability to human tissue samples and the availability of high-resolution mass spectrometers capable of ion detection in the low
5.4. Dimethyl labelling
A simple method of labelling compounds at the peptide level for relative quantitation is dimethylation. Either light-labelled (with H) or heavy-labelled (with D) dimethyl groups are conjugated to the N-terminus of the peptides and to free lysine residue side chains. The advantages of dimethyl labelling include low cost, high speed and possibilities for automated sample preparation. However, since labelling occurs at the peptide level, variation between runs is still inherent to the process [10,34]. The first incarnation of dimethyl labelling was limited to only two different flavours. However, using isotopic isomers (‘isotopomers’) of formaldehyde with either only D or a combination of 13C and D, up to three different samples can now be compared in a single run  (Figure 5). Although this may still be lower than the amount of different labels that can be achieved using isobaric tagging, it is significantly cheaper. Dimethyl labelling can be used for a variety of quantitative measurements, for instance, after a pulldown or immunoprecipitation enrichment protocol. Using an antibody to probe for phosphopeptides in combination with labelling allows one to quantitatively monitor phosphorylation events . Another possibility that was recently introduced is using dimethylation to study DNA–protein interactions, e.g. by using an oligonucleotide to pull down the proteins and performing the dimethylation labelling on the proteins enriched for . These widely different applications show the power of dimethylation as a quantitative proteomics tool.
5.5. 18O labelling
Another way to differentially label samples for quantitative purposes is the use of heavy oxygen. This labelling method is different from other labelling protocols in that the label incorporation is achieved during the digestion of proteins into peptides. By performing the digestion in water that contains 18O instead of 16O, the carboxyl terminus of every peptide will incorporate two 18O atoms. This method can be incredibly fast, with reports of labelling being achieved in 15 min . A potential pitfall is that the labelling may be incomplete when not performed in a correct manner, leading to multiple peaks in the MS spectrum and therefore resulting into difficulties in quantitation [25,38]. Our lab has described a protocol to avoid incomplete labelling and to assure full incorporation of the heavy oxygen label . By using immobilized trypsin under acidic conditions, all proteolytic peptides could be fully labelled with heavy oxygen with no traces of back-exchange. The labelling protocol was implemented into a protein–protein interaction analysis pipeline to differentiate between
Another method to achieve consistent labelling is to use alternative proteases besides trypsin, e.g. β-lactamase , which eliminates the incorporation of two heavy oxygen atoms and limits it to one atom consistently.
6. Absolute Quantitation (AQUA)
All label-based approaches described above are geared towards generating relative quantitative measurements. In many cases though, it would be interesting to measure absolute quantities of proteins instead. In order to gain absolute quantitation results, synthesized peptides or proteins containing heavy isotope labels that correspond to the target peptide or protein of interest can be spiked into the sample at a known concentration, after which the intensities of target and standard can be compared to one another. Obviously, the standard peptide can be modified with one or multiple post-translational modifications if needed . Due to the fact that this spiked standard provides absolute rather than relative quantitation, this technique has been dubbed absolute quantitation (AQUA). Spike-in components that can be used for AQUA include peptides with stable isotopes incorporated into one or several amino acids , a construct in which several peptides are strung together (which has the added advantage of being able to quantify multiple peptides in one run ), or an entirely labelled protein to quantify the amount of protein . As with other quantitation techniques, the stage at which the label is incorporated largely determines the extent of the systematic quantitation error that is introduced into the sample. In studying hormonal influence on blood pressure, and more specifically angiotensin II, spiking in the synthesized heavy labelled angiotensin has been used to absolutely quantify protein levels in plasma. As such, it was shown that chronic kidney disease patients had strongly increased levels of angiotensin II . These results show that AQUA can be useful in the field of biomarker research, although it has many more applications, such as in assessing the levels of enzymes in prokaryotes .
7. Label-free quantitation
With the development of better and faster mass spectrometers with higher sensitivity and heavier duty cycles, the number of studies that use label-free quantitation (LFQ) methods has increased over the past few years. The obvious advantage of LFQ is that no sample processing other than the standard LC-MS procedures is needed. Furthermore, there is no need for often expensive labelling kits. There are two major approaches employed in label-free quantitation: spectral counting and intensity-based quantitation. Quantitation by spectral counting is based on the observation that peptides that are more abundant will be detected and fragmented more often by the mass spectrometer, and as such the MS/MS count gives information about the abundance of the protein. However, there are several issues that should be taken into account here. In general, larger proteins generate more proteolytic peptides, which increases the chance that multiple peptides for one such protein are detected. Furthermore, in principle every peptide has different physicochemical properties, which influence the ionizability and, therefore, the detectability in the mass spectrometer. To address this, several modifications of spectral counting have been developed, which incorporate mathematical corrections, such as introducing a normalised spectral abundance factor into the equation to account for protein length variability (e.g. emPAI ). In intensity-based quantitation, on the other hand, the quantitation is based on the total amount of peptide that is detected in a specific retention time window for which the area under the curve in the chromatogram is accurately determined (extracted ion currents (or XICs) of peptides). LFQ has benefited greatly from recent developments in mass spectrometer hardware as it increases the number of quantifiable features present in a given LC-MS run and allows averaging over more peptides for protein quantitation . In order for the ion intensity quantitation to be reproducible, normalization steps are required as differences in the total amount of protein loaded onto the LC-MS system and instrument variances need to be accounted for. Because of this, powerful software is required and has been developed to perform this type of peptide and protein quantitation (see  for an in-depth review). An interesting label-free quantitation technique has been described that combines peptide counting, spectral counting and ion intensities into the so-called normalized spectral index . Using this method, the variance between multiple LC-MS runs was largely eliminated. This method shows great promise in achieving reproducible label-free quantitation.
8. Software applications for quantitative mass spectrometry
Quantitative proteomic data are typically very complex and the data analysis requires specialized software. The main challenge concerns incomplete data, as even modern advanced mass spectrometers cannot sample and fragment every peptide ion present in a complex sample. As a consequence, only a subset of peptides and proteins present in a sample can be identified. Over the past years, several strategies for mass spectrometry-based quantitative proteomics and corresponding computational methodology for the processing of quantitative data sets have been developed (reviewed in ([50,51]), as different quantitative LC-MS methods require different software solutions for data analysis. Quantitation can be achieved by comparing peak intensities in differential stable isotopic labelling, via spectral counting, or by using the ion current in label-free LC-MS measurements. Many software solutions have been published and can be used freely, with specific instrument compatibility and processing functionality which can deal with these basically different quantitation methods. The researcher has to choose the appropriate software solution for his quantitative proteomic experiments based on the experimental and analytical requirements. Since it goes beyond the scope of this chapter to discuss all of the available software tools separately, we refer the reader to an extensive and up-to-date overview of software solutions including links to websites for downloads at http://www.ms-utils.org.
9. Concluding remarks
In summary, all of the mass spectrometry-based quantitation methods have their particular strengths and weaknesses and the researcher has to choose the best method from the multitude of methods that have emerged for the analysis of simple and complex (sub-) proteomes using quantitative mass spectrometry for his specific research. This choice depends on the availability of high-resolution mass spectrometer and LC equipment, the available expertise present in the lab and the financial aspects involved. Quantitative proteomics methods have become mature and can now be applied at a large scale to the study of proteomes and their dynamics. Using the labelling methods described in this chapter, thousands of proteins can be identified and quantified in a single experiment. However, there is still room for improvements to both the experimental strategies for the quantitative analysis of very complex mixtures and of their post-translational modifications and to appropriate bioinformatics and statistical approaches in order to obtain meaningful interpretations of the results. The ultimate goal is to generate quantitative proteomic data at a scale that would allow the comprehensive investigation of a biological phenomenon.