Plant Protein Analysis

The influx of data from the past ten years of large-scale plant genomes sequencing projects have yielded the sequence, complete or in its final assembly level, of several plant genomes, including Arabidopsis thaliana, Oryza sativa, Zea mays , Brachypodium distachyon, Cucumis sativus, Populus trichocarpa, Medicago truncatula, Glycine max, Malus domestica, Physcomitella patens, Selaginella moellendorfii, Sorghum bicolour, Theobroma cacao, Vitis vinifera, Prunus pumice, Rricinus communis and Vigna radicata. This knowledge, combined with the implementation of classical and innovative parallel high-throughput proteomic technologies associated to new protein search algorithms, has triggered a growing interest in plant proteomics to address a comprehensive analysis of cellular functions from the level of the plant to the whole organisms in different physiological and environmental conditions. A number of reviews have been recently written providing detailed insights into the basic lines of plant proteomics studies (Baginsky, 2009; Rose et al. 2004). In addition a number of initiatives such as the International Plant Proteomics Organization (INPPO) and The Plant Proteomics Database (PPDB) have been launched recently to organize the massive amount of information that emerged within the field of plant proteomics (Agrawal et al. 2011, Sun et al. 2009). Figure 1 highlights the rapid increase of scientific interest in plant proteomics that has occurred in the last ten years with model species including Arabidopsis (Van Norman & Benfey, 2009) and rice (Agrawal & Rakwal, 2011) which opened the way also for studying non-model plants species. The majority of plant proteomics studies to date can be divided into two basic categories: the first involves protein annotation and profiling with the aim of separating and cataloguing as many proteins extracted from whole cells and organelles as possible to provide a snapshot of the major constituents of the proteome. The most notable examples of descriptive plant proteomics are studies carried out in different organs of Arabidopsis (Giavalisco et al. 2005, Baerenfaller et al. 2008, Joshi et al. 2011) and in rice (Agrawal et al. 2009, Koller et al.2002, Ferrari et al. 2011) where, respectively, 13,029 and 2,528 unique proteins have been identified from several tissues. However it should be noted that entire proteomes of single cell types cannot yet be fully mapped, as will be explained later, and to date the number of protein entries in the UniprotKB database for plant organisms is still limited to just above 500,000 which corresponds to less than 1/10th of the total number of entries (Schneider et al. 2009). The second category of proteome analysis aims at revealing


Introduction
The influx of data from the past ten years of large-scale plant genomes sequencing projects have yielded the sequence, complete or in its final assembly level, of several plant genomes, including Arabidopsis thaliana, Oryza sativa, Zea mays , Brachypodium distachyon, Cucumis sativus, Populus trichocarpa, Medicago truncatula, Glycine max, Malus domestica, Physcomitella patens, Selaginella moellendorfii, Sorghum bicolour, Theobroma cacao, Vitis vinifera, Prunus pumice, Rricinus communis and Vigna radicata.This knowledge, combined with the implementation of classical and innovative parallel high-throughput proteomic technologies associated to new protein search algorithms, has triggered a growing interest in plant proteomics to address a comprehensive analysis of cellular functions from the level of the plant to the whole organisms in different physiological and environmental conditions.A number of reviews have been recently written providing detailed insights into the basic lines of plant proteomics studies (Baginsky, 2009;Rose et al. 2004).In addition a number of initiatives such as the International Plant Proteomics Organization (INPPO) and The Plant Proteomics D a t a b a s e ( P P D B ) h a v e b e e n l a u n c h e d r e c e n t l y t o o r g a n i z e t h e m a s s i v e a m o u n t o f information that emerged within the field of plant proteomics (Agrawal et al. 2011, Sun et al. 2009).Figure 1 highlights the rapid increase of scientific interest in plant proteomics that has occurred in the last ten years with model species including Arabidopsis (Van Norman & Benfey, 2009) and rice (Agrawal & Rakwal, 2011) which opened the way also for studying non-model plants species.The majority of plant proteomics studies to date can be divided into two basic categories: the first involves protein annotation and profiling with the aim of separating and cataloguing as many proteins extracted from whole cells and organelles as possible to provide a snapshot of the major constituents of the proteome.The most notable examples of descriptive plant proteomics are studies carried out in different organs of Arabidopsis (Giavalisco et al. 2005, Baerenfaller et al. 2008, Joshi et al. 2011) and in rice (Agrawal et al. 2009, Koller et al.2002, Ferrari et al. 2011) where, respectively, 13,029 and 2,528 unique proteins have been identified from several tissues.However it should be noted that entire proteomes of single cell types cannot yet be fully mapped, as will be explained later, and to date the number of protein entries in the UniprotKB database for plant organisms is still limited to just above 500,000 which corresponds to less than 1/10 th of the total number of entries (Schneider et al. 2009).The second category of proteome analysis aims at revealing changes in protein expression in response to physiological and environmental stimuli and is commonly termed comparative proteomics.It has been employed in a variety of studies including wood formation (Gion et al. 2005), response to cold stress (Neilson et al.. 2011) drought (Riccardi et al. 2004), heavy metal toxicity (Villiers et al.. 2011, Visioli et al. 2010a), flower development (Theissen et al. 2001) and seed development (Hajduch et al. 2005).Only a few plants have been intensively studied among the many plant species sequenced, including Arabidopsis thaliana, the first plant to be sequenced (Kaul et al. 2000), which has a short life cycle and it is easy to handle; rice (Oryza sativa), which is used as a model for cereal monocots (Matsumoto et al. 2005); maize (Schnable et al. 2009) and poplar which is being used as a model plant for woody species and for its economic and eco-physiological relevance (Tuskan et al. 2006).Although the analysis of the "green" proteome has grown rapidly we are still far away from an integrated understanding of plant proteome and identification of the role of the many proteins involved in cross-talk between cross-linked metabolic pathways.A challenge in comparative proteomics is the difficulty in delivering large-scale protein quantification (Schulze & Usadel 2010) to assay global protein changes elicited by biotic /abiotic events.A second problem is the inadequacy of current technologies for analysing a representative proportion of the expressed proteins present in a plant sample (Patterson 2004).This is mainly due to the dynamic range of protein concentrations within plant cells which is estimated to be as wide as 10 5 -10 6 (Pattersons & Aebersold 2003).Abundant proteins such as RuBisCO (1,5-biphosphate carboxylase/oxygenase), the world's most abundant protein, can comprise up to 40% of total protein content in green tissue.The same is true for seed storage proteins or other housekeeping proteins which can be present at levels of 10 5 -10 7 molecules per cell.These highly abundant proteins hinder the detection of the low abundance proteins such as kinases, phosphatases, regulatory protein, transcription factors and rare membrane proteins whose concentrations are below 10-100 molecules per cell.To deplete the more abundant proteins from plant samples, many protocols require selective precipitations such as sucrose density gradient centrifugation or FPLC anion-exchange chromatography (XI et al. 2006).Unfortunately many of these approaches can be laborious, time consuming or require expensive equipment.Furthermore, proteomes are much more dynamic that genomes resulting in a considerable increase in complexity when gene expression is analysed at the protein level.While the human genome consists of approximately 30,000 genes, the corresponding proteome is expected to include between 200,000 and 2 millions proteins due to splicing and post-translational modifications (Gygi et al.1999).A similar situation is expected in plants.For instance thousands of phosphorylation sites have been characterized in plant proteins (Heazlewood et al. 2008) and it is likely that different post-translational modifications of storage proteins could explain the discrepancy between these proteins and the corresponding mRNAs abundances found in many plants living in temperate climates (Dai et al. 2007, Holdsworth et al. 2008, Rose et al. 2004).Recently protein arrays, which allow fast and parallel data analysis with miniaturization and automation, are emerging as a tool to supplement classical proteomics concepts.Protein arrays are able to profile and functionally characterize recombinant proteins encoded by globally or differentially expressed cDNA clones (Bussow et al. 2001,) or by the high-throughput sub-cloning of ORFs (Jahn et al. 2001).

Protein extraction
Isolation of intact total protein is the first and the most critical step toward any proteomics study, in fact analysis of plant proteomes present very specific problems when compared to other organisms.Proteins in plant cells are present at relatively low concentrations and constitute highly heterogeneous populations as a consequence of their functional diversity.Polypeptide molecular size, complexes (e.g."clusters" or "modules" of interacting molecules that carry out cellular functions), spatial and time-dependent concentrations (e.g.proteins in the nucleus for transcription or in the mitochondrion for energy regeneration), charge (pI ranges from 3 to 12) proteins present in compartments like the cytosol or distinct organelles like the mitochondrion or plastid, to highly hydrophobic proteins embedded within the different cell membranes are some aspects of this complexity.As a consequence a multi-step procedure is often necessary to extract subsets of specific proteins.The key to protein isolation is the efficient solubilization of different protein types, including membrane proteins, with a minimum of handling time.The technique also needs to be suitable for downstream proteomics analysis procedures with minimal post-extraction artefacts and non-proteinaceous contaminants.The presence in plant cells of multiple interfering substances such as proteases, polyphenols, tannins, pigments, waxes, high carbohydrate/protein ratio further complicates the eventual extraction, solubilisation and separation procedures, that even under optimal conditions, results in the reduction of approximately 25% of the expected proteome (Patterson 2004).No single protein extraction protocol can capture an entire proteome, consequently a range of different extraction protocols, involving many permutations of physical and chemical treatments, solvent and buffers have been reported in literature (Rose et al. 2004, Baginsky 2009).A schematic outline of protein extraction methods is shown in Figure 2. In some cases specialized protocols have been developed to extract a specific subset of proteins such as membrane or cell wall-associated proteins (Everberg et al. 2004).Specific mass spectrometry compatible protein extraction protocols have been developed (Sheoran et al. 2009).In addition, sequential extraction of tissues with a series of solvents can be effective in decreasing protein complexity and in enhancing the detection of low abundant proteins (Maltman et al. 2002).Extraction of plant proteins generally involves physical disruption by Fig. 2. Flow chart of proteins extraction methods which are highly dependent on cell type, tissue and organs to be analyzed.mechanical means, grinding, sonication, chemical or enzymatic lysis of the cell and release of the contents into an extraction medium.Proteins are dissolved into a buffer solution as close as possible in composition to the original intracellular medium with respect to pH and ionic strength.To help protein solubilization, to protect them from hydrolysis or oxidation and to remove non-proteinaceous constituents from the aqueous extract, additional components are also added.Often subsequent separation and analytical steps may be intolerant of these additives: for instance inorganic salts may interfere in electrospray mass spectrometry, detergents in chromatographic and electrophoretic separations and in MALDI mass spectrometry, while protease inhibitors cocktails may interfere in the digestion of the proteins by trypsin.Thus it is essential to design extraction strategies with full knowledge of the nature and sensitivities of further processing and analytical steps.Two excellent and complementary methods currently in use to prepare a total plant protein extract are: i) trichloroacetic acid (TCA)/acetone precipitation and ii) phenol extraction in combination with different extraction buffers.Homogenization of the sample in 10% TCA dissolved in acetone almost immediately inactivates proteases and precipitates proteins, in addition it provides a means for delipidating membranes and releasing membrane associated proteins.This procedure also allows interfering substances to be washed out from the precipitated proteins and provides a clean sample for isoelectric focusing.While the TCA/acetone procedure is extremely effective for many plant tissues, particularly for young growing vegetative tissues, the method can sometimes result in the co-extraction of polymeric contaminants such polysaccharides and phenolic compounds.In this case the second protocol involving protein solubilization in phenol, with or without SDS, and subsequent precipitation with methanol and ammonium acetate is preferred (Hurkman & Tanaka 1986).A way to identify rare or hydrophobic proteins and increase the overall detectable proportion of the proteome is to reduce the protein complexity.Protein profiling of isolated organelles provides information about their enzymatic inventory and allows conclusions to be made about the compartmentalization of metabolic pathways.A number of studies have analyzed the proteomes of plant sub-cellular organelles including plastids (chloroplast, amyloplast, etioplast) ( Baginsky et al. 2007, Ferro 2010), mitochondria (Heazlewood et al. 2004), vacuoles (Schmidt et al. 2007), peroxisomes (Reumann et al. 2007).A significant contribution of organelle proteomics to cell biology comes from the sub-cellular localization of protein and enzymes that can not be inferred from genome sequences.Chloroplast proteome analysis, for example, revealed that many proteins in the organelle were imported into the chloroplast via the secretory pathway, without a predictable N-terminal transit peptide (Friso et al. 2004).An exceptionally surprising finding given that it may explains why some chloroplast proteins are glycosylated (Villarejo et al. 2005).Similarly a recent proteome survey of Arabidopsis peroxisomes revealed the presence of unexpected proteins in the peroxisomal matrix.Additional validation with GFP-tagged proteins allowed the characterization of a novel peroxisomal targeting sequence (Reumann et al. 2007)

Protein separation
Two approaches have been generally used for analyses of plant proteins.Gel-based analysis methods involve the separation of proteins from a complex mixture and are typically accomplished by 2D-PAGE.With gel-free approaches, protein fractionation is carried out using liquid chromatography devices.Both techniques involve the subsequent identification and characterization of proteins by mass spectrometry.Initial analyses were carried out by separating protein samples in the first dimension using self constructed isoelectro focusing (IEF), followed by second dimension PAGE.In the last few years' reproducibility, sample loading and resolution of 2D gel electrophoresis have significantly improved with the introduction of immobilised pH gradient strips in the first dimension.After separation proteins are visualized by different staining techniques such as Silver staining and Coomassie Brilliant Blue (CBB) and quantified by densitometry.An example of plant sample arrayed by 2D-PAGE is outlined in figure 3.By employing 2D-PAGE analyses it has been possible to analyse the rice and Arabidopsis proteomes (Kamo et al. 1995, Tsugita et al. 1994) and undertake comparative quantification Fig. 3. Two-dimensional gel electrophoretogram of a total protein extract from barley tissue (Marmiroli et al. 1993).
of cold or salt-stressed plants and identify the responsive protein mediators of stress signal (Hajheidari et al. 2005).Recently 2D-PAGE has been used for establishing a protein reference map for soybean root hair cells (Brechenmacher et al 2009).Although 2D-PAGE is a robust and relatively straightforward technique and allows for the separation of up to 10,000 discrete proteins it has been criticized for being cumbersome and labor-intensive due to the time consuming process of image analysis and gel-to-gel variations that can complicate reproducibility (Taylor et al. 2011).Even with advanced 2D-PAGE analysis software, a high number of computationally generated 2D-PAGE spots have to be compared in a manual validation to get reliable accuracy (Hajduch et al. 2006).Moreover 2D-PAGE provides only a rough estimate of a proteins quantity due to variations in staining efficiency of individual gels and of its dependency on samples processing.2D gels of plant proteins are also problematic due to post-translational modifications, such phosphorylation, glycosylation and myristoylation which cause proteins encoded by the same gene to migrate at different locations on the gel.The same holds true for multiple protein isoforms arrayed by 2D-PAGE.Low copy number proteins such as transcription factors, which are of considerable interest in plant biology, are liable to lie beyond detection limits of 2D-PAGE.Furthermore the number of spots resolved varies depending on the chosen tissue and plant species and often a single spot can contain multiple proteins species complicating protein identifications.Larger integral membrane proteins tend to be poorly soluble under common experimental conditions and are thus under-represented in the 2D-gels.Sometimes reactions of carbamylation, deamidation and isoaspartate formation occur during denaturing IEF resulting in changes in a proteins isoelectric point and causing horizontal strings of spots seen on 2D gels.2D-PAGE is also notoriously difficult to automate which limits throughput and results in greater experimental variability.In addition, the 2D-PAGE approach is generally more suitable for analysis of soluble and peripheral membrane proteins.Recently proteome analyses have been performed using "gel less" procedures based entirely on liquid chromatography (LC).The main advantage of LC is that crude protein extracts can be analysed after few purification steps thus achieving a higher level of reproducibility than most of the chemical procedures, allowing a better comparison of protein patterns (Lambert 2005).The use of LC or two-dimensional liquid chromatography (2D-LC) separations is a robust methods for characterizing large numbers of total plant protein samples and proteins 123 from plant organelles or sub-cellular compartments, followed by selective intact-protein analysis by MS (Pirondini et al. 2006) Among the different LC approaches a 2D-LC separation technique called PF-2D, based on chromatofocusing (CF) in the first dimension and high performance reversed phase (HPRP) liquid chromatography in the second dimension, has been recently developed allowing a fine separation of high amount of heterogeneous proteins.A dedicated software package then converts complex chromatograms of a large number of fractions into easily visualized 2-D maps, "virtual gels", in which pH is plotted against the retention time (Figure 4).
Fig. 4. PF-2D virtual separation gel of total protein extract from Arabidopsis thaliana (Pirondini et al. 2006) In silico analysis of different "virtual gels" can be used to generate a complete catalogue of the qualitative and quantitative differences existing between different proteomes.Such an approach has been successfully applied to the identification of proteins involved in plant proteomic response to heavy metals and viruses (Larson et al. 2008, Visioli et al. 2010 b).Affinity chromatography has also demonstrated its potential in plant proteomics to overcome challenges associated with the enrichment of low-abundance proteins or to deplete high-abundance proteins.Many tags are currently used in plant protein purification including green fluorescent protein (Peckham et al. 2006), gluthatione S-transferase (Sridhar et al.2006), hexahistidine (Koroleva et al. 2009), maltose binding protein (Koroleva et al. 2009 To improve the purification of plant protein complexes new protein tags (TAP tags) based on Biotin carboxyl carrier domain have been developed (Qi & Katagiri 2009).One important application of these techniques has been the investigation of post-translational modifications (PTMs) in plant proteins, for example protein phosphorylation is one of the most extensively studied PTMs in plants where -immobilized metal affinity chromatography (Fe-IMAC) is widely used to enrich phosphopeptides from complex peptide mixtures (Kersten et al. 2009).
The same approaches can be employed to study other PMTs such as glycosylation and ubiquitination (Morelle 2008).Affinity chromatography has also been applied to map protein-protein interactions by isolating protein complexes (Morris 2008).

Protein identification
Over the past decade the increasing availability of ESTs and genomic sequence data along with the rapid advances in MS have paved the way for a new era of protein identification and quantification.Generally two forms of mass spectrometry are used for protein identifications, both of which employ "soft" ionization techniques (Fenn 2002, Tanaka et al. 1988).The first is matrix-assisted laser desorption ionization (MALDI)-time of flight (TOF) mass spectrometry, used to perform peptide mass fingerprinting (PMF).The second is electrospray (ESI), which is usually coupled to high performance liquid chromatography (HPLC) sample separation, and is often used in tandem mass spectrometry to undertake peptide fragmentation.With the rapid increase in MS popularity, an assortment of instruments developed for different budgets and needs have become available (e.g.Waters, AB Sciex, Bruker Daltonics, Shimadzu, Agilent Technologies and Thermo Scientific).The improved mass accuracy, mass resolution and sensitivity allow for the rapid identification of picomoles or even femtomoles of proteins and peptides if matching genomic sequence data is available.The principle of mass spectrometry is outlined in figure 5.Among the different MALDI-based MS techniques, MALDI-TOF has been for many years the most widespread MS analysis approach.Though it is not the most rigorous approach to protein identification, it still represents an economically convenient alternative to more complex MS systems especially when proteomic analyses are carried out on plants species whose complete genome/protein databases are complete or well annotated.In typical MALDI-TOF analysis the first step is excision of 2-D gel plugs containing the selected protein spot of interest or a low-complexity fraction resulting from sample purification.The second step involves protein digestion (Shevchenko et al. 2007), with a site specific protease (e.g., trypsin or CNBr).The resulting mixture of ionized peptides is then mixed with a matrix solution of α-cyano-4-hydroxycinnamic acid (CHCA) whose function is to absorb most of the energy coming from a UV laser fired at the sample.Lighter ions travel faster in the TOF analyzer than heavier ions and thus the time taken to travel down the analyzer and reach the detector varies according their mass-to-charge ratio to produce a mass spectrum.Finally, the list of masses produced from the mass spectrum, is interrogated against a protein database (e.g.SwissProt, NCBInr) using a software package (e.g.MASCOT) with experimental mass accuracy of ca. 10 ppm.The peptide masses derived from the spectrum are compared to proteins in the database that have been "in silico" digested to produce a list of possible matches.This approach is referred to as peptide mass fingerprinting (PMF), it is relatively straightforward to perform and the spectra are usually simple to interpret.A scoring mechanism is employed to assess the likelihood of a correct identification.Robust protein identification requires the correct assignment of the molecular weights of at least four or five peptides.In absence of exhaustive protein or genomic databases information, large expressed sequence tag (EST) databases have been used for protein identification.An example of a peptide mass fingerprint experiment is shown in figure 6.  Oeljeklaus et al. 2008).For instance MALDI-TOF mass spectrometry has been used in the characterization of the Arabidopsis thaliana proteome (Giavalisco et al. 2005).The authors report the identification of 2,943 spots from 2-DE from 663 different gene products.This is a small number considering there are more than 35,000 proteins coded by the Arabidopsis genome.A survey of the proteomes of six tissues from the model legume Medicago truncatula produced 2D-PAGE reference maps from which 551 proteins were identified (Watson et al. 2003).In this case, the overall successful identification rate was 55%, a figure that is considered good in absence of a fully sequenced genome, although the figure depended on the tissue in question.For example, identification was achieved for 43% of the proteins extracted from root tissue, while the figure for leaves was 76%.The difference presumably reflects the differences in the quality of the separations and the information in the databases and availability of ESTs.An investigation of soybean seed filling successfully provided 679 2-DE protein spots at five sequential developmental stages (Hajduch et al. 2005).Analysis of each of these protein spots by MALDI-TOF yielded the identity of 422 of these proteins, representing 216 non-redundant proteins.In nuclei isolated from rice suspension cell culture cell, from a total of 549 proteins resolved on 2-DE, 190 proteins were identified by MALDI-TOF MS from 257 major protein spots (Khan & Komatsu 2004).In Populus nigra cultivated under cadmium stress a subset of 20 out of 126 spots were identified by MALDI-TOF MS.
Proteins that were more abundant in the metal exposed plants were located in the chloroplast and in the mitochondrion, suggesting the importance of these organelles in response and adaptation to metal stress (Visioli et al. 2010b).MALDI-TOF MS has been used also for the identification of differentially expressed proteins of rice leaves in presence of arsenic (Ahsan et al. 2010) and alteration of barley root proteome in response towards salt stress conditions (Witzel et al. 2009).To reduce the influence of ion-suppression effects in MALDI-TOF/MS measurements and obtain more peptide peaks, separation of the tryptic peptides can be obtained using an off-line combination of capillary reverse-phase HPLC column with MALDI-TOF.LC-MALDI techniques do not suffer from the time constrains imposed by the transient presence of peptides eluting from a column and each sample can be analyzed more than once.LC-MALDI has been used for instance for identification of proteins involved in different plant signaling processes (Karlova et al. 2006).The beneficial features of MALDI have led this ionization technique to be incorporated into tandem instruments such as those with quadrupole ion trap/TOF, quadrupole TOF and TOF/TOF geometries.The advantage of hyphenated MS over single MALDI-TOF fingerprinting is that the precise sequence of amino acids in each peptide can be determined, allowing a more reliable identification.Examples of application in plant proteome analysis of tandem MALDI vary from characterization of Medicago truncatula cell wall proteome (Gokulakannan & Niehaus 2010), to the analysis of the glycoproteome of tomato and barley (Català et al. 2011).Identification of proteins involved in metabolic pathways affected by different cropping regimes (Nawrocki et al. 2011), in Cadmium response in poplars (Kieffer et al. 2008) and in salt stress effect on sorghum leaves (Swami et al. 2011) were also carried out taking advantage of hyphenated MALDI MS analysis.An emerging technique in plant biology based on MALDI, and made possible because of advances in instrumentation, is MALDI-imaging MS (MSI).This technique can be applied at both the tissue and single-cell level providing information on spatial distribution of specific molecules (Kaspar et al.2011).Whereas many plant metabolite profiles have been described so far, no comparable plant protein analyses are available; the only application of this technique is the identification of a precursor of a secreted peptide hormone identified in Arabidopsis (Kondo et al. 2006).To summarize MALDI-TOF analysis is extremely fast with regard to data acquisition, requires little expertise, is tolerant to contaminants such as salts and detergents, is easy to automate and allows the analysis of large number of samples in a short period of time, the protein identification relies purely upon the matching of the peptide masses accurately, and it can be relatively inexpensive.Unfortunately the data can be ambiguous and rely heavily on availability of a proteomic or genomic sequence or at least a substantial EST collection for the species being studied.Cross-species PMF studies from four plant species (Mathesius et al. 2002) for instance concluded that PMF data are not particularly useful for inter-species protein identification except for the highly conserved proteins.

Electrospray ionization (ESI)
A different method for protein identification by mass spectrometry is peptide fragmentation by means of electrospray ionization tandem mass spectrometry (ESI-MS/MS).The technique provides structural information about the peptide which can be used for more reliably protein identifications when analysed against protein databases (Grossmann et al. 2005).The first step of tandem MS involves sample digestion (e.g. with trypsin), the resulting peptides are loaded onto an HPLC coupled to an ESI mass spectrometer which allows the analysis of ionized molecules in solution.During electrospray ionization peptides enter the ion source as a fine mist of droplets via a needle which is surrounded by an accompanying flow of nitrogen gas.A high voltage is applied to the needle through which the solution arrives in the source causing the droplets produced to be charged on the surface.This whole process results in the ions being released from the liquid droplet to produce gas phase ions that are drawn into the first mass analyzer and separated according to their mass-to-charge ratio.ESI is commonly used as ionization technique in tandem mass spectrometry (MS/MS) which adds a second dimension to mass spectrometric selection improving the specificity of the technique and allowing structural analysis of peptides.Multiple stages of mass analysis separation can be accomplished by individual mass analyzer elements separated in space by a fragmentation cell.Examples include TOF, Fourier-transform ion Cyclotron Resonance (FTCIR), ion trap, quadrupole, orbitrap and linear quadrupole ion trap.(Cotter et al. 2007, Douglas et al. 2005, Hardman& Makarov 2003, Marshall et al. 1998).The first mass analyzer detects the whole spectrum of peptide ions present in the sample (MS scan) then precursor ions of interest are fragmented by collision, inside a collision cell, with inert gas molecules (e.g.argon or nitrogen) in a process called collision-induced dissociation (CID) to produce a fragmentation spectrum of the selected peptide.This process produces a series of fragments ions that can differ by single amino acids, allowing a portion of the peptide sequence subsequently used in a bottom-up approach for protein identification by database interrogation.Tandem mass spectrometry has been used to analyze proteomes of Arabidopsis thaliana, Oryza sativa and Medicago truncatula by taking advantage of their extensively sequenced and/or annotated genomes and proteomes.For instance the most extensive plant proteomic analysis reported to date was conducted with Arabidopsis thaliana and led to the identification of 13,029 proteins on the basis of 86,456 unique peptides which represent approximately the 50% of the predicted expressed genes (Baerenfaller et al. 2008).Due to the limited applicability of MALDI-TOF to study the proteomes of organisms with un-sequenced genomes de novo sequence data derived from peptide fragmentation has been particularly useful for proteome analysis of non-model plants.De novo sequences can be searched against protein databases of relatives of the organism under investigation using MS-BLAST on the basis of close protein identities.De novo sequencing in plant proteomics has been employed for the analysis of barley thylakoid membrane proteins (Granvogl et al. 2006), proteome analysis of opium poppy cell cultures (Zulak et al. 2009), oak (Quercus ilex) (Jorge et al.2006) and banana (Liska & Scevchenko 2003).The combination of LC-MS/MS analysis with new single and two-step affinity purification methods of plant protein has triggered the interest for the isolation and characterization of plant protein complexes (Pflieger et al.2011).Even if the results are far from exhaustive and the structure-function relation of these protein assemblies are still poorly understood, the identification and characterization of these plant complexes are necessary to fully understand the cellular dynamics and homeostasis.Over the past decade MS techniques have advanced and alternative non-gel approaches have developed to address technical limitations inherent in 2D-PAGE/MS/MS.This "shotgun" approach, referred to as multidimensional protein identification technology (MudPIT) (Link et al. 1999), consists of a two-dimensional chromatography separation, prior to electrospray mass spectrometry followed by database searching.Shot gun proteomics refers to direct and rapid analysis of the entire protein complement of whole organelles, cells and tissues starting from chemical or enzymatic digestion of proteins to generate a highly complex set of peptides that is well beyond the separation capacity of 2D-PAGE.The theoretical peak capacity of MudPIT system has been calculated to be ca.23,000 proteins (Wolters et al. 2001) making this system a powerful tool for proteomics.The rationale behind this method is that since the properties of peptides are more approachable than proteins, standardized protocols can be developed to face with proteome wide measurements by means of peptide analysis only.Sample preparation is relatively straightforward, the proteins are denatured, the cysteines reduced and alkylated and then the proteins are digested producing complex mixture of peptides.Peptides are then separated prior to analysis by tandem MS.The first dimension is normally a strong cation exchange (SCX) column with high loading capacity and high-resolution separation capacity.Peptides are stepped from the cation exchanger in a series of salt steps that increase in concentration onto the second dimension a reverse phase chromatography (RP) column.A subsequent RP gradient separates the eluting peptides relative to their hydrophobicity and delivers them, after each salt step, into a tandem mass spectrometer for selection and fragmentation.In contrast to the traditional 2-DE/MS/MS approach the shotgun method is largely unbiased providing a strategy for the efficient detection of low-abundant and hydrophobic proteins.A typical qualitative shotgun plant protein analysis in the range of 200 to 1,000 proteins for plants such as rice and Arabidopsis thaliana is theoretically achievable (Froehlich et al. 2003).Application of this "shotgun" approach has allowed the identification of more than 1,000 distinct proteins from rice leaf and root samples (Breci & Haynes 2007) and 294 ubiquitines in Arabidopsis thaliana (Maor et al. 2007).Shotgun protein analysis has led also to the identification of 44 differentially expressed proteins, out of a set of 3,004 non-redundant proteins previously identified, in the rice reduced culm number1 mutant when compared to wild-type rice (Lee et al. 2011).

Protein quantitation
Determination of relative abundances of proteins in organisms or tissues subjected to a variety of environmental or physiological conditions is the final goal of any plant proteomic study.Techniques such as difference gel electrophoresis (2D-DIGE) (Timms & Cramer 2008,) which permit changes in protein abundance to be more readily assessed, has partially overcome limitations caused by inter-gel variations.Another advantage of 2D-DIGE is that it requires low amounts of protein (0.025 mg) as compared to the requirements of standard 2D-PAGE (ca.0.2 to 1 mg) This technique involves covalent labelling of two different protein samples with fluorescent cyanine dyes (for example, Cy2, Cy3 and/or Cy5 which fluoresce at different wavelengths) prior to two-dimensional electrophoresis and produce sub-nanogram sensitivity.The intensity of fluorescence at each of the wavelengths for Cy3 and Cy5 is measured and after employing gel matching software, intensity ratios are used to evaluate relative abundance of proteins in the two different samples.A variety of plant proteomic studies have used DIGE (Granlund et al. 2009, Schenkluhn et al. 2010) to investigate abiotic stresses such as freezing, effect of UV on maize, aluminium stress in tomato, the effects of abscissic acid (ABA) and beta-aminobutyric acid (BABA) on Malus pumila.Generally, MS analysis of proteins by MALDI or LC-MS/MS is not quantitative because of the different physical and chemical properties of the tryptic peptides: difference in charge state, ionization competition, peptide length, non-homogeneous sample introduction, amino acid composition or post-translational modification and limitations in sample handling all result in variations in ion intensity for the peptides even when they belong to the same protein.As a consequence MS signals are notoriously variable, unpredictable, and therefore a potential source of significant error in quantitative proteomic studies.Despite these hurdles a number of comparative strategies have been adopted and have been categorized as either stable-isotope-labelling or label free approaches.Protein quantification by means of stable-isotope-labelling is based on the fact that when a peptide is labelled with different isotopic mass tags ( 2 H, 13 C, 15 N, 18 O) it differs from the unlabeled peptide only in terms of its mass but exhibits the same chemical properties during chromatography.In MS spectra obtained from peptide samples after their chromatographic separation, the ratio of MS signal intensities or peak areas of differentially labelled species extracted from the relative mass spectra between the labelled and unlabelled peptide permits an accurate relative quantification of differences.Labelling can be introduced at different steps during sample preparation.In metabolic labelling whole cell or organisms are labelled in vivo through the growth medium.In chemical post extraction labelling the isotopic modification is added to proteins or tryptic peptides through a chemical reaction.The most common strategies for chemical labelling include isotope-coded affinity tag (ICAT) and isobaric tag for relative and absolute quantitation (iTRAQ).In the ICAT method two protein mixtures representing two cell states are treated with different reagents consisting of a biotin affinity tag, heavy and light isotopologues and a cysteine-reactive group.To minimize error both samples are then pooled, digested with a protease and subjected to avidin affinity chromatography to isolate labelled biotinylated peptides in order to reduce the sample complexity by about 10-fold.Subsequently LC-MS/MS analysis is performed to determine the abundance ratio for each identified peptide.So far there have been few reports using ICAT in plant proteomics (Dunkley et al. 2004, Majeran et al. 2005) due to the fact that ICAT can only distinguish between protein samples containing cysteine, a rare amino acid present only in a fraction of proteins or peptides.In plant proteomics studies ICAT was employed to study organellar proteomes using fractionation of cellular organelles in Arabidopsis thaliana and maize (Dunkley et al. 2004, Majeran et al. 2005).ICAT has also been used in a recent study with proteins from solubilized mitochondria of Arabidopsis thaliana in order to investigate protein complexes (Hartman et al. 2007).A similar method, called Isotope Coded Protein Label (ICPL), based on labelling of more frequent amino acid groups has been developed (Kellermann 2008) and it has been used for comparative quantification of cell-wall proteins of Medicago truncatula plants interacting with nitrogenfixing bacteria Rhizobia (Hahner et.al 2007).The iTRAQ method (Applied Biosystems) is based on isobaric tags, i.e. tags that have the same mass and are primarily designed for the chemically labelling the N-terminus of peptides generated from protein digests that have been isolated from cells in, for example, two different physiological conditions.The labelled samples are combined, fractionated by nanoLC and analyzed by tandem mass spectrometry.Database searching of peptide fragmentation data results in the identification of the labelled peptides and hence of the corresponding proteins.Fragmentation of the tag attached to the peptides generates a low molecular mass reporter ion that is unique to the tag used to label each of the digests.The reporter ion intensities enable relative quantification of the peptides in each digest and hence the proteins from which they originate.In quantitative plant proteomics iTRAQ has been used to quantify 45 proteins from Arabidopsis thaliana cells treated with bacterial pathogen Pseudomonas syringae (Kaffarnik et al. 2009).Labelling with iTRAQ has been also used for investigating the proteome of guard cells in Arabidopsis thaliana mutants impaired in the Gα subunit of GPA1 in order to understand the signalling role played by trimeric G-proteins in plants.(Zhao et al. 2010).This study has allowed the identification of 18 proteins which are differentially expressed in the mutant.These proteins included ATP synthase, enzyme of the Calvin Cycle and proteins involved in the stress response.In a study of grape proteomes at different stages of ripeness identification of between 1,000 and 1,400 proteins, 91 in the exocarp and 58 in the mesocarp were upregulated during fruit maturation (Lucker et al. 2009).iTRAQ has been also widely used to study phosphoproteomics responses of elicitor treatment by comparing several time points post-treatment (Nuhse et al. 2007), protein degradation in chloroplasts and developmentally induced changes in chloroplasts proteomes in maize and Brassica.Unlike chemical labelling which is typically applied after protein extraction, fractionation and digestion, metabolic labelling takes place at the very first stage, i.e. at the level of protein biosynthesis.In this quantification procedure, called stable isotope labelling by amino acid in culture (SILAC), labelled essential amino acids (usually deuterated leucine) are added to amino acid deficient culture media, and thus become incorporated into all proteins.In general, SILAC has the advantage of a simpler analysis compared with metabolic labeling with 15 N. Usually, a single amino acid is used for SILAC.If the supplied amino acid is Lys or Arg, analysis of peptides from a trypsin digest that cleaves after these two amino acids will result in peptides containing only a single difference from the labeled amino acid.Therefore, the mass difference between peptides in the MS scan will be known and consistent.Experimental cell populations are treated in a specific way, such as cytokine stimulation, with different isotopologues then protein populations are harvested and compared.Because the label is embedded directly into the amino acid sequence of every protein, the extracts can be pooled directly.Purified proteins or peptides will preserve the exact ratio of labelled to unlabelled proteins as no more synthesis is taking place.Relative quantitation takes place at the level of the peptide mass spectrum or peptide fragment mass spectrum exactly as in any other stable isotope method by calculating the MS peak intensity, or area, ratio of the light and heavy peptides.In some plants SILAC gives label incorporation of approximately 70% which is not satisfying for many global proteomics applications, this is because plants are very versatile autotrophs and are able to generate all the 20 amino acids necessary for protein synthesis.The other disadvantage of SILAC is that the labelled amino acids are expensive when used in amounts needed for efficient labelling, so this method is likely to be limited to plant cell cultures.The only organisms of the plant kingdom that have been efficiently SILAC labelled are auxotrophic mutants of Chlamydomonas (Naumann et al. 2007) and cultured cells of Arabidopsis (Grhuler et al. 2005).Nevertheless the ability of plants to synthesize amino acids from inorganic salts provides an opportunity for a simpler labelling strategy.The use of 15 N-KNO 3 was first used successfully in potato plants where 98% of the total protein was labelled with 15 N. Arabidopsis thaliana plants can be also be successfully labelled because it does not affect plant development (Ippel et al. 2004).In another study hydroponic isotope labelling of entire plants was used for relative protein quantification of seven-week-old Arabidopsis thaliana plants treated with oxidative stress (Bindschedler et al. 2008).Label-free quantification strategies are becoming increasingly popular to compare samples (Schulze and Usadel 2010).The rationale behind these methods relies on the comparison of peptide abundance as a measure for the corresponding protein between multiple LC-MS/MS analyses (Proll et al. 2007).Ideally samples for label-free comparisons are run consecutively on the same LC-MS/MS setup to avoid variations due to the system setup (column properties, temperatures) and thereby allow precise reproduction of retention times.Label free approaches are inexpensive with high proteome coverage of quantified proteins since every protein that is identified by one or more peptide spectra can be quantified.There are currently two different label-free strategies which use either MS1 precursor ion (i.e.MS survey scan) data or MS2 tandem mass spectrometry data (i.e.MS/MS) to estimate changes in relative abundance or proteins between samples.The MS1 based methods associate changes in relative protein abundance from direct measurement and comparison of the mass spectrometric signal intensity of peptide precursor ion belonging to a particular protein (Wiener et al. 2004).The rationale in this approach is that the height or peak area with a given m/z is a measure of the number of ions of that particular mass detected within a given time interval.This process of determining the peak area is referred to as ion extraction and results in a so-called extracted ion chromatogram of a given ion species.Such extracted ion chromatograms can be produced for each m/z across all the LC-MS/MS runs within an experiment, and the resulting peak areas can then be compared quantitatively provided that only the same ion species can be compared between samples due to the differences in ionization efficiency among different peptide species.On the other hand the MS2 based methods estimate differences in relative protein expression by either accounting for the extent of protein sequence coverage or the number of tandem mass spectra generated, a technique also known as spectral counting (Zybailov et al 2009).This quantitation method does not require any protein labelling and uses a simple additive procedure for quantitative evaluation and does not rely on chromatographic peak integration or retention time alignment.The relative quantification through spectral counting is achieved by comparing the number of MS/MS spectra for the same protein between two or more MS/MS analyses.The absolute concentration of each protein within the sample is derived from an exponentially modified abundance index (emPAI) which is calculated from the number of observed spectra for each protein divided by the number of possibly observable peptides, a fraction that has been described as a protein abundance index (PAI) (Rappsilber et al. 2002).The emPAI index along with another similar index for protein expression profiling (APEX) have been used to analyze differential protein expression in root nodules of Medicago truncatula (Larrainzar et al. 2007) in response to drought and to determine the abundance of stromal proteins in chloroplasts from Arabidopsis thaliana (Zybailov et al. 2008).Analysis of sucrose-induced changes in the phosphorylation levels of Arabidopsis plasma membrane proteins has been also carried out by exploiting spectral counting (Niittyla et al. 2007).

Concluding remarks and future perspectives
Proteome analysis along with profiling tools such as transcriptomics and metabolomics are becoming essential components of the emerging "systems biology" approach.It is clear from most of the current literature (Ning et al. 2011) that all proteomics including plant proteomics are changing in scale and focus, from their initial objective of identifying as many individual proteins as possible in a given biological sample to the development of high-throughput parallel and quantitative technologies for analyzing proteomes in a dynamic context.Methods such as metabolic labelling using, for instance, CO 2 via photosynthesis or inexpensive nitrogen salts in protein synthesis offer new ways to quantify plant proteomes and can even be exploited for labelling organisms that feed on plant materials.Several proof-of-principle studies have demonstrated the linearity and/or reproducibility of label-free quantification for the analysis of complex mixtures (Wang et al. 2003).Comparative studies have also shown that results obtained with both methods are generally in good accordance (Wienkoop et al. 2006), with spectral counting covering a slightly higher dynamic range and measurements of ion abundance being more accurate for the determination of protein ratios.With the development of modern high-precision mass spectrometers, the label free quantification is becoming an appealing alternative as mass accuracies increase and the reliability of mapping peptides across samples due to more narrow mass-to-charge windows.However reproducibility of the retention times over different LC-MS/MS runs remains crucial for precision in label-free quantification using peptide ion intensities.In addition evaluation of proteomics data is facilitated if experimental variations are minimized between experiments.In this context plants are also well-suited experimental organisms for achieving lower statistical variability through their clonal reproduction and their ability to grow in highly standardized and controlled environments.Not surprisingly most quantitative plant proteomics studies performed so far have utilized Arabidopsis thaliana as model organism.This plant has excellent features for proteomics studies, including: its genome is fully sequenced, genetic mutants for comparative experiments are available, it has a relatively short life cycle and can conveniently be cultivated under laboratory conditions, making it readily amenable to metabolic stable isotope labelling.With the completion of further plant sequencing projects and the advent of high-throughput global proteome analysis via non-gel-based shotgun, proteomics studies will become more and more appealing for an increasing number of plant species.Moreover, the combination of new intriguing methods in quantitative MS with biochemical, biological and genetic approaches are adding new dimensions to the characterization of cellular processes resulting in improved knowledge of (plant) biological systems.This is exemplified by the combinatorial use of advanced protein quantification strategies and elaborate phosphopeptides enrichment techniques (e.g.LC-FTCIR-MS), which have promoted phosphoproteomics as a tool with extraordinary potential for spatio-temporal analysis of entire signalling pathways in plants.The main current bottleneck in plant proteomic studies is still the wide dynamic range of proteins.Global abundance measurements in Saccharomyces cerevisiae have revealed a bell-shaped distribution of proteins spanning approximately six orders of magnitude in abundance (Ghaemmaghami et al. 2003), while only approximately three or four orders of magnitude can be covered by modern LC-MS/MS methods for complex samples.Proteins identified represent only a small fraction of the complete proteome or sub-proteome of plants and organelles.For this reason proteome fractionation and intelligent strategies of enrichment of protein targets have to be developed.For example the estimated total number of genes in the rice genome lies in the range of 32,000 to 50,000 for Oryza japonica, whereas the comprehensive display analysis of rice leaf, root and seed tissue using 2-DE followed by tandem MS and MudPIT have led to the identification of 5.1% to 7.9% of the expected number of protein.These data clearly demonstrate that further developments are needed to increase the resolving power of this method to allow the detection of the low abundance proteins present in the "extractome".On the technical side improvements in pre-electrophoretic fractionation and in mass spectrometry scan speed will likely contribute to deeper proteome coverage in the future.For example an atmospheric MALDI (APMALDI) has been developed (Doroshenko et al. 2002) which is relatively simple to interface to mass analyzers.Surface enhanced desorption ionization (SELDI) has been shown to be very powerful for selective ionization of peptides and protein fractions, although it has not applied to plant systems so far (Poon 2007).Another challenge in the large-scale, quantitative plant proteomics experiments lies in the application of new data-mining strategies.Irrespective of the applied methods for protein identification, advanced bioinformatics and statistical tools for data evaluation are essential to extract biologically meaningful data from the plethora of qualitative and quantitative information obtained in global-scale experiments.Recently a single, centralized, authoritative resource for protein sequences and functional informatics, UniProt has been created by joining the information contained in the SwissProt, Translation of the EMBL nucleotide sequence (TrEMBL) and the protein Information Resource-Protein Sequence Database (PIR-PSD) (Schneider et al. 2004).To conclude, qualitative and quantitative plant proteomics, especially MS-based proteomics, will be applied to more and more non-model plant species for comprehensive and in-depth characterization of plant-environment interactions and plant growth and differentiation to provide more reliable basis to the emerging phenomena of phenotypic plasticity and epigenetic variation.

Acknowledgment
The authors gratefully acknowledge Caterina Agrimonti, Mariolina Gulli, Elena Maestri, Marta Marmiroli, Giovanna Visioli for their support in the preparation of the manuscript and University of Parma for providing financial support in buying the Lab PF-2D and for access to MALDI-TOF/MS and Orbitrap facilities at CIM (Centre for Interfaculty Measures) necessary for the plant proteomic studies carried out in our laboratory.Thanks also to Foundation, AGER and RISINNOVA for the financial support of the project "Integrated genetic and genomic systems for updating rice varieties along the Italian rice food chain".

Fig. 1 .
Fig. 1.Growth index of scientific papers using the word plant proteome within the search engine Pubmed (updated to July 2011).

Fig. 5 .
Fig. 5. Main functions of a mass spectrometer

Fig. 6 .
Fig.6.MALDI-TOF MS spectrum of a Thlaspi protein(Visioli et al. 2010 a.).The peaks marked with • represent trypsin autolysis peaks that were used to internally calibrate the mass spectrum.The peaks marked with ■ represent peaks identified as peptides produced by the trypsin digestion of a protein of interest and finally MS analysis of the eluted proteins.PMF has been used for proteome analyses from model as well as crop plants(Colas et al. 2010, Glinski & Weckwerth 2006, Hajduch et al. 2005, Mooney et al. 2004,  Oeljeklaus et al. 2008).For instance MALDI-TOF mass spectrometry has been used in the characterization of the Arabidopsis thaliana proteome(Giavalisco et al. 2005).The authors report the identification of 2,943 spots from 2-DE from 663 different gene products.This is a small number considering there are more than 35,000 proteins coded by the Arabidopsis genome.A survey of the proteomes of six tissues from the model legume Medicago truncatula produced 2D-PAGE reference maps from which 551 proteins were identified(Watson et al. 2003).In this case, the overall successful identification rate was 55%, a figure that is considered good in absence of a fully sequenced genome, although the figure depended on the tissue in question.For example, identification was achieved for 43% of the proteins extracted from root tissue, while the figure for leaves was 76%.The difference presumably reflects the differences in the quality of the separations and the information in the databases and availability of ESTs.An investigation of soybean seed filling successfully provided 679 2-DE protein spots at five sequential developmental stages(Hajduch et al. 2005).Analysis of each of these protein spots by MALDI-TOF yielded the identity of 422 of these proteins, representing 216 non-redundant proteins.In nuclei isolated from rice suspension cell culture cell, from a total of 549 proteins resolved on 2-DE, 190 proteins were identified by MALDI-TOF MS from 257 major protein spots(Khan & Komatsu 2004).In Populus nigra cultivated 2D-PAGE, two dimensional gel electrophoresis; 2D-LC, two dimensional liquid chromatography; APEX, absolute protein expression; CF, chromatofocusing; CNBr, cyanogen bromide; DIGE, difference gel electrophoresis; emPAI, exponentially modified abundance index; ESI, electrospray; ESTs, expressed sequence tags; FTCIR, Fourier transform ion cyclotron resonance; HPLC, high performance liquid chromatography; HPRP, high performance reversed phase; ICAT, isotope-coded affinity tag; ICPL, isotope coded protein label; IEC, ion exchange liquid chromatography; IEF, isoelectro focusing; IMAC, immobilized metal affinity chromatography; iTRAQ, isobaric tag for relative and absolute quantitation; LC, liquid chromatography; MALDI, matrix assisted laser desorption ionization; MS, mass spectrometry; MudPIT, multidimensional protein identification technology; PF-2D, two dimensional protein fractionation; PMF, peptide mass fingerprinting; PTM, post-translational modifications; SELDI, surface enhanced desorption ionization; SILAC, stable isotope labelling by amino acid in culture; TOF, time of flight; TPP, three phase partitioning; TrEMBL , translated European molecular biology laboratory.