Assessment of Proteomics Strategies for Plant Cell Wall Glycosyltransferases in Wheat, a Non-Model Species: Glucurono(Arabino)Xylan as a Case Study

The proteome of a tissue is a snapshot of the entire set of proteins expressed by that tissue at a given developmental stage. This set of proteins can be considered as an integrated and complex response of the tissue to a set of environmental and growth conditions. Thus, the analysis of a proteome would provide valuable clues of the physiological status of a tissue (or a cell) at a given time. Proteomics, an expanding scientific field, represents a promising tool to investigate such proteomes in a high-throughput manner. Although proteomics proved to be helpful in elucidating difficult cellular processes, its use in plant cell wall polysaccharides biosynthesis in non-model plants remains challenging. In this chapter, we evaluated the capabilities of two proteomics strategies in identifying specifically three Golgi-resident glycosyltransferases (GTs), TaGT43-4, TaGT47-13, and TaGT75-4, involved in glucurono(arabino)xylan (GAX) biosynthesis in wheat, an economically important nonmodel crop plant (Zeng et al., 2008, 2010). GAX polymer is the second most abundant polymer in the biomass from grass plants and there is an urgent need to elucidate its biosynthetic pathways to allow engineering of plant biomass for biofuel and other human needs (Faik, 2010).


Introduction
The proteome of a tissue is a snapshot of the entire set of proteins expressed by that tissue at a given developmental stage.This set of proteins can be considered as an integrated and complex response of the tissue to a set of environmental and growth conditions.Thus, the analysis of a proteome would provide valuable clues of the physiological status of a tissue (or a cell) at a given time.Proteomics, an expanding scientific field, represents a promising tool to investigate such proteomes in a high-throughput manner.Although proteomics proved to be helpful in elucidating difficult cellular processes, its use in plant cell wall polysaccharides biosynthesis in non-model plants remains challenging.In this chapter, we evaluated the capabilities of two proteomics strategies in identifying specifically three Golgi-resident glycosyltransferases (GTs), TaGT43-4, TaGT47-13, and TaGT75-4, involved in glucurono(arabino)xylan (GAX) biosynthesis in wheat, an economically important nonmodel crop plant (Zeng et al., 2008(Zeng et al., , 2010)).GAX polymer is the second most abundant polymer in the biomass from grass plants and there is an urgent need to elucidate its biosynthetic pathways to allow engineering of plant biomass for biofuel and other human needs (Faik, 2010).

Proteomics in plant cell wall polysaccharides biosynthesis
In the model plant Arabidopsis (Arabidopsis thaliana) about 10% of the genome (~2,500 genes) is dedicated to cell wall metabolism and function (Carpita 2011).Among these genes 17% are putative GTs identified through a homology search, and only a dozen have characterized biochemical functions.The sequences of these enzymes are available through the Carbohydrate Active enZyme database (CAZy) that classified them into GT families on the basis of protein sequence similarity (Coutinho 2003).It is anticipated that more GTs (yet to be identified) are needed to synthesize all polysaccharides currently found in plant cell walls.Most of the GTs are predicated to be integral membrane proteins, however recent works indicate that some GTs have a cleavable signal peptide and are closely associated with other integral membranes as a complex to secure their proper subcellular localization (Zeng et al., 2010).GAX polymer, like all hemicelluloses with the exception of callose, is synthesized in the Golgi by a multi-protein complex called xylan synthase complex (XSC).Golgi apparatus is a multifunctional organelle used by a plant cell not only to synthesize plant cell wall polymers, but also to process and modify glycoproteins and sorting their destination.Thus, the proteins that populate the membranes of the Golgi are very complex in nature and content (Simon et al., 2008).The challenges in studying the Golgi-resident GTs involved in plant cell wall biosynthesis comes from the fact that they are difficult to purify in an active state and are usually considered low abundant proteins.This may explain why there are very limited reports on purification of these GTs (Perrin et al., 1999;Faik et al., 2002;Zeng et al., 2010).Instead, researchers put more efforts in developing strategies to isolate membrane preparations relatively enriched in endo-membranes or organelles (plasma membranes, endoplasmic reticulum, Golgi) along with enriched GT activities.However, these GT activities are still contaminated with other unrelated proteins.Hence, optimization of high-throughput MS methods for protein identification on these partially purified GT activities is currently lacking.

Proteomics in non-model plants
Economically important non-model crop plants such as wheat are currently lacking publicly available genomic/protein sequence information.Applying proteomics methods to these species is challenging and requires cross-species identification.Fortunately, genomes and their protein-encoding gene sequences are currently available for several plant species including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Sorghum bicolor, Brachypodium distachyon, Vitis vinifera, Zea mays, Medicago, and Glycine max (Figure 1).These nucleotide sequences as well as the predicted amino acid sequences are available through the Phytozome website (http://www.phytozome.net/) or NCBI (http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html), both of which are considerably larger than the total protein entries available through the Uniprot database which only contains a total of 53,1192 protein entries from different plant species.
The key factor to consider in the cross-species identification is gene annotation in databanks.For example, Uniprot database has only ~29,500 protein entries that are reviewed (annotated).In the case of Arabidopsis for which the genome sequence was completed and published more than a decade ago (Arabidopsis Genome Initiative, 2000), ~10% of proteins are still not annotated and ~30% are listed with unknown biochemical function.Nevertheless, with all the wealth in genomics resources, the identification of a protein from a non-model plant is now possible through BLAST search using MS/MS spectra and de novo sequencing.Even if a homologous protein is not listed in protein databases, a similarity at nucleotide sequence level (~35%) would be sufficient for a reliable identification of a homolog to the peptide sequence by cross-comparison to translated available plant genes (tBLASTn).

Advantages and disadvantages of proteomics strategies
Proteomics methods for high-throughput identification of proteins involve three main steps: the first step is the proteolytic (mostly trypsin) digestion of the protein samples.This digestion step would need optimization as incomplete digestion may result in a loss of protein information.The second step consists of three processes: (i) fractionation of the resulting peptides by liquid chromatography (LC), (ii) ionization of the separated peptides (fractions) through a source such as Matrix-Assisted LASER Desorption/Ionization Fig. 1.Predicted number of protein-coding genes and sizes of plant genomes currently available in public databases.Red lines indicate genome sequencing completed and published; purple lines indicate unpublished genomes but publicly available; and black lines indicate plants with incomplete genomes or not fully assembled (less available).Species names are indicated along with the size of their genomes.These sequences are available through Phytozome and/or NCBI websites.Adapted from (http://synteny.cnr.berkeley.edu/wiki/index.php/Sequenced_plant_genomes)(MALDI) or ElectroSpray Ionization (ESI), (iii) mass spectrometry (MS) analysis of the ionized peptides (received from the source) by tandem mass spectrometry (MS/MS), also called collision-induced dissociation (CID).This step generally delivers large informationrich files of MS/MS spectra (Bodnar et al., 2003).The third and critical step concerns the use of these MS/MS spectra in the identification of the exact proteins or at least their closest homologous proteins from the databases.Depending of the complexity of the samples, separation of the proteins prior to digestion can increase the chances of identifying these target proteins even in complex samples.Gel-based methods such as SDS-PAGE and 2dimensional (2-D)-PAGE are widely used to separate the proteins of a sample and thus reduce its complexity.Although tremendous progress in LC and MS techniques has been made in the recent years, the application of these methods is still challenging in many complex biological processes such as plant cell wall biosynthesis (see section 1.1).There are two general proteomic strategies: gel-free and gel-based strategies.

Gel-free proteomics strategy
Direct analysis of protein composition of any biological sample (without prior separation by electrophoresis) can be achieved by a non-gel shotgun approach called MudPIT (multidimensional protein identification technology) (Washburn et al., 2001;McDonald et al., 2002).The method consists of a 2-D liquid chromatography (LC/LC) separation of the peptides (as opposed to the 2-D-PAGE of the proteins) before MS/MS analysis.The first dimension separates the peptides on the basis of their charges using a strong cation exchange (SCX) column.In the second dimension the peptides eluted from SCX column are fractionated further according to their hydrophobicity by reverse phase chromatography.Depending of the mass spectrometer used, the MudPIT strategy may have some limitations in identifying low abundant proteins.For example, there is a loading limit of protein sample onto the SCX column and it saturates easily, which can significantly impact the identification of low abundant proteins (i.e., GTs) in complex samples.Also, complex samples would require optimization of running time (retention time) for optimal protein identification (Chen et al., 2005).Lastly, although MudPIT has been successfully used to identify more than a 1000 proteins from a complex biological sample (Washburn et al., 2001), the dynamic range between the most abundant and least abundant proteins/peptides (10,000 to 1) can still limit the identification of very low abundant proteins (Wolters et al., 2001).The other key factor to consider when using gel-free proteomics methods is the precipitation step, which is used to not only concentrate the proteins, but also to remove small soluble compounds (including detergents) that are easily ionized and would mask low abundant peptide ions (Newton et al., 2004).Thus, optimizing this step is critical and may also result in a loss of some proteins from the sample.Ethanol, chloroform/methanol, cold acetone, trichloroacetic acid (TCA), or TCA/acetone are all MS-compatible reagents that can be used in proteins precipitation (Ferro et al., 2003).However, because an individual protein has specific physicochemical characteristics, any precipitation method has its own advantages and disadvantages.For example, although ethanol and TCA will precipitate around 90% of proteins, several proteins can be eliminated, reduced, or enriched in the precipitated samples (Zellner et al., 2005;Chen et al., 2010).Furthermore, ethanol is also known to precipitate salts along with the proteins, which may require dialysis of the samples before any further analysis.In our previous work, cold acetone was found to precipitate sucrose along with the proteins when sucrose amounts is >25% w/w in a fraction (Zeng et al., 2010).The increase in sucrose content in the sample decreased peptide ionization and produced low quality MS/MS spectra.

Gel-based proteomics strategy
The most direct and easy way to separate proteins in a complex sample is using a 1-D SDS-PAGE method (Laemmli, 1970).The main advantage of 1-D SDS-PAGE over the 2-D-PAGE method is that virtually all proteins can be solubilized and separated on the gel (up to 10-15µg per sample).Loading larger amounts of protein on 1-D SDS-PAGE can be detrimental, as abundant proteins can spread over a large area of the gel and may mask low abundant proteins (Zhu et al., 2010).This can be problematic when working with proteins from photosynthesizing organs rich in ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) protein, an abundant and difficult protein to eliminate (Komatsu et al., 1999).Regarding 2-D-PAGE, it is known that protein solubility is the Achilles' heel of this technique, as it does result in the loss of low abundant proteins, proteins with extreme pI, and hydrophobic proteins (integral membrane proteins) (O'Farrell, 1975;Santoni et al., 2000;Ephritikhine et al., 2004).In relation to this approach, Golgi proteins are also known to easily aggregate during sample preparation/precipitation and do not run well on 2-D-PAGE gels (Asakura et al., 2006).Once 1-D SDS-PAGE separation is completed, the gels are usually cut into small equal slices or individual visible bands are excised.These slices/bands are subjected to trypsin digestion, and released peptides further fractionated by LC, and their structures analyzed by MS/MS (see next section).

Protein identification methods (MS/MS spectra processing)
Despite the availability of a large repository of protein-coding gene sequences from many plant species that can be used as databases for proteomic studies (Figure 1), the most daunting task in proteomics is still the identification of proteins with high sensitivity and accuracy from these databases using MS/MS spectra (Nesvizhskii et al., 2007;Patterson 2003;Service 2008).The processing of these MS/MS spectra for protein identification can be performed via two main strategies: i. Database-dependent strategy in which the experimental masses of peptide ions (parent and its fragmentation product ions) from MS/MS spectra are compared to the theoretical peptide masses derived from in silico digestion of proteins in a database.This search would result in the identification of peptide hits.These hits are scored and ranked from best to worse matches, and any proteins assembling two or several hits are considered candidates.In the case of non-model plants, protein identification can be challenging since either the exact protein may not exist in the database or the database may not contain an evolutionary similar protein.This underscores the need for continuous effort to sequence the genomes of as many plants as possible to allow success of proteomics projects in non-model plants.Many bioinformatics algorithms such as SEQUEST, Mascot, X!Tandem, OMSSA, and Phenyx have been developed for database-dependent search using MS/MS spectra (for details about these algorithms I refer the readers to the following link: http://www.proteomesoftware.com/Proteome_software_link_software.html).ii.Database-independent strategy consists of converting the fragmentation MS/MS spectra to possible de novo amino acid sequences.Several bioinformatics algorithms such as PepNovo, NovoHMM, Mascot, and PEAKS were designed for the heavy computation needed for the extraction of amino acid sequence information and can make MS/MS spectrum interpretable.These de novo peptide sequences are then used in BLAST searches of non-redundant protein database (i.e, NCBI, Swissprot).For nonmodel plants this combination (de novo and BLAST search) was proved to improve protein identification in terms of accuracy and rate increase (Shevchenko et al., 2001)

Application of proteomics strategies to the identification of GAX synthesizing GTs
In our previous work we demonstrated that GAX synthase is a multi-enzyme complex (XSC) formed of at least two known GTs (TaGT43-4 and TaGT47-13) and a mutase (TaGT75-4 called also reversibly glycosylated polypeptide, RGP) (Zeng et al., 2010).According to native gel electrophoresis data, this XSC has an apparent MW of ~250 kDa.In this chapter, two proteomics strategies (MudPIT and Gel-LC-MS/MS) were evaluated for efficient identification of these three proteins in membrane fractions enriched in wheat GAX synthase activity (partially purified activity).The goal is to develop a proteomics workflow for optimal identification of candidate proteins involved in plant cell wall biosynthesis in organisms with no available genome sequence information.

Experimental procedures 3.1.1 Preparation of membrane fractions enriched in GAX synthase complex
GAX synthase activity is routinely monitored as the amount of [ 14 C]glucuronic acid (GlcA) transferred from UDP-[ 14 C]GlcA into ethanol-insoluble GAX polymer in the presence or absence of UDP-xylose (UDP-Xyl) (Zeng et al., 2008(Zeng et al., , 2010)).It has been shown that the rice Golgi complex has distinct compartments that could be separated by density gradient centrifugation in presence of EDTA or MgCl 2 (Mikami et al., 2001).Thus, we fractionated Golgi-enriched membranes prepared from etiolated wheat seedlings on a continuous 25%-40% (w/v) sucrose density gradient supplemented with 1mM EDTA and used our enzyme assay to monitor GAX synthase activity distribution (Figure 2).According to the specific activity and protein content in fraction #3 (as measured via spectrophotometer), we have achieved ~11 fold purification with a 59% yield (Table 1).However, according to SDS-PAGE analysis of this fraction, only limited number of protein bands is visible on the gel (Figure 3), rather suggesting a higher purification was achieved.Together these results indicate that fraction #3 is a good starting material for proteomics analyses for our study.

Nanospray tandem mass spectrometry procedures
All proteomics analyses were carried out at the Mass spectrometry and Proteomics Facility (http://www.ccic.ohio-state.edu/MS/)at Campus Chemical Instrument Center (CCIC) of The Ohio State University (Columbus, OH).Liquid chromatography-nanospray tandem mass spectrometry (nano-LC/MS/MS) of global protein identification was performed on an LTQ XL or an LTQ Orbitrap XL (Thermo Scientific) mass spectrometers using different protocols: Table 1.Partial purification of wheat GAX synthase activity from Golgi-enriched microsomal membranes.The activity was measured as [ 14 C]GlcA incorporation from UDP-[ 14 C]GlcA (900cpm/pmol) into ethanol-insoluble materials and expressed as pmol GlcA incorporation per hour per milligram of protein (pmol/h/mg).Protein content was estimated using Bradford reagent.

Nano-LC/MS/MS on LTQ XL
Nano-liquid chromatography tandem mass spectrometry (Nano-LC/MS/MS) was performed on a Thermo Scientific LTQ XL mass spectrometer (Linear Quadrupole Ion Trap MSn) equipped with a nanospray source operated in positive ion mode.The LC system was an UltiMate™ 3000 system from Dionex (Sunnyvale, CA).The solvent A was water containing 50 mM acetic acid and the solvent B was acetonitrile.5 µL of each sample was first injected on to the µ-Precolumn Cartridge (Dionex, Sunnyvale, CA), and washed with 50 mM acetic acid.The injector port was switched to inject and the peptides were eluted off of the trap onto the column.A 5 cm 75 µm ID ProteoPep II C18 column (New Objective, Inc. Woburn, MA) packed directly in the nanospray tip was used for chromatographic separations.Peptides were eluted directly off the column into the LTQ system using a gradient of 2-80%B over 45 min, with a flow rate of 300 nL/min.The total run time was 65 min.The MS/MS was acquired according to standard conditions established in the lab.Briefly, a nanospray source operated with a spray voltage of 3 KV and a capillary temperature of 200°C is used.The scan sequence of the mass spectrometer was based on the TopTen™ method; the analysis was programmed for a full scan recorded between 350 and 2000 Da, and a MS/MS scan to generate product ion spectra to determine amino acid sequence in consecutive instrument scans of the ten most abundant peaks in the spectrum.
The CID fragmentation energy was set to 35%.Dynamic exclusion was enabled with a repeat count of 2 within 10 seconds, a mass list size of 200, an exclusion duration 350 seconds, the low mass width was 0.5 and the high mass width was 1.5 Da.

Capillary-LC/MS/MS on LTQ Orbitrap XL
Capillary-liquid chromatography-nanospray tandem mass spectrometry (Capillary-LC/MS/MS) of global protein identification was performed on a Thermo Scientific LTQ Orbitrap XL mass spectrometer equipped with a microspray source (Michrom Bioresources Inc, Auburn, CA) operated in positive ion mode.Samples were separated on a capillary column (0.2 X 150mm Magic C18AQ 3µ 200A, Michrom Bioresources Inc, Auburn, CA) using an UltiMate™ 3000 HPLC system from LC-Packings A Dionex Co (Sunnyvale, CA).
Each sample was injected into the µ-Precolumn Cartridge (Dionex, Sunnyvale, CA) and desalted with 50 mM acetic acid for 10 min.The injector port was then switched to inject and the peptides were eluted from the trap onto the column.Mobile phase A was 0.1% formic acid in water, and 0.1% formic acid in acetonitrile was used as mobile phase B. Flow rate was set at 2 µL/min.Typically, mobile phase B was increased from 2% to 50% in 90-250 min, depending on the complexity of the sample, to separate the peptides.Mobile B was then increased from 50% to 90% in 5 min and then kept at 90% for another 5 min before being brought back quickly to 2% in 1 min.The column was equilibrated at 2% of mobile phase B (or 98% A) for 30 min before the next sample injection.MS/MS data was acquired with a spray voltage of 2 KV and a capillary temperature of 175°C is used.The scan sequence of the mass spectrometer was based on the data dependant TopTen™ method: the analysis was programmed for a full scan recorded between 300 -2000 Da and a MS/MS scan to generate product ion spectra to determine amino acid sequence in consecutive scans of the ten most abundant peaks in the spectrum.The resolution of full scan was set at 30,000 to achieve high mass accuracy MS determination.The CID fragmentation energy was set to 35%.Dynamic exclusion is enabled with a repeat count of 30 seconds, exclusion duration of 350 seconds and a low mass width of 0.5 and high mass width of 1.5 Da.Multiple MS/MS detection of the same peptide was excluded after detecting it three times.

MS/MS data processing
Sequence information from the MS/MS spectra was processed by converting the raw data files into a merged file (.mgf) using an in-house program, RAW2MZXML_n_MGF_batch (merge.pl, a Perl script).The resulting mgf files were searched using Mascot Daemon by Matrix Science version 2.2.1 (Boston, MA) and the database searched against the full SwissProt database version 54.1 (283,454 sequences; 104,030,551 residues) or NCBI database.The mass accuracy of the precursor ions were set to 2.0 Da given that the data was acquired on an ion trap mass (LTQ) analyzer and the fragment mass accuracy was set to 0.5 Da (for analysis by high resolution Orbitrap, the mass accuracy of the precursor ions were set to 0.1 Da).Considered modifications (variable) were methionine oxidation and carbamidomethyl cysteine.Two missed cleavages for the enzyme were permitted.Peptides with a score less than 40 were filtered.Protein identifications were checked manually and proteins with a Mascot score of 50 or higher with a minimum of two unique peptides from one protein having a -b or -y ion sequence tag of five residues or better were accepted.Mascot also provides a "histogram of the MOWSE score distribution" for hits and a value of significance threshold (represented by a green region).A score is defined as -10*LOG 10 (P), where P is the absolute probability that the observed match is a random event.Therefore, significant matches would give scores that are higher than the significance threshold (outside of the green region).It is important to know that significance threshold values and scores of the hits are greatly affected by an increase in mass tolerance values in a MS/MS fingerprint search.

Gel-LC-MS/MS analysis
One-D SDS-PAGE was carried out according to (Laemmli, 1970).Separation gels were 10% or 12% acrylamide (150V until the loading dye reached the bottom) and molecular weight markers (Precision Plus Protein TM , kaleidoscope, Bio-Rad) were used.Proteins were visualized by coomassie blue or silver staining.Briefly, the protein sample (50-60µg) was solubilized in 20 µL Invitrosol (Invitrogen) plus 1 µL of 10% amidosulphobetaine 14 (ASB-14) detergent, 2 µL of 10X SDS-PAGE loading dye, and heated at 90°C for 7 min before fractionation on 1-D SDS-PAGE acrylamide gel).The gel was rinsed with water for 10 min to remove excess SDS detergent, before staining with coomassie blue for 1 h and distained overnight.Figure 3 shows a typical silver-stained 1-D SDS-PAGE gel of fraction #3.

In gel digestion
The gels were excised in two ways (see Figure 3): (i) only the most visible protein bands in the gel were manually excised and individually digested with trypsin, or (ii) the gel area from 30 to 180 kDa was excised into 20 to 40 equal small slices and each slice was individually digested with trypsin.The resulting peptides were then analyzed by LC-MS/MS as described above.
For manual excision of visible bands, gels were digested with sequencing grade trypsin from Promega (Madison, WI) using the Multiscreen Solvinert Filter Plates from Millipore (Bedford, MA).Briefly, bands were trimmed as close as possible to minimize background polyacrylamide material.Gel pieces were then washed twice in nanopure water for 5 min each before destaining with 1:1 v/v methanol:50 mM ammonium bicarbonate for 10 min twice.The gel pieces were dehydrated with 1:1 (v/v) acetonitrile:50 mM ammonium bicarbonate, and then rehydrated by incubation with dithiothreitol (DTT) solution (25 mM in 100 mM ammonium bicarbonate) for 30 min prior to incubation in iodoacetamide solution (55 mM in 100 mM ammonium bicarbonate) for 30 min in dark.The gel bands were washed again with two cycles of water and dehydrated with 1:1 (v/v) acetonitrile:50 mM ammonium bicarobonate.Protease digestion was carried out by rehydrating gel pieces in 12 ng/mL trypsin in 0.01% ProteaseMAX Surfactant for 5 min.The gel pieces are then overlaid with 40 mL of 0.01% ProteaseMAX surfactant:50 mM ammonium bicarbonate and gently mixed on a shaker for 1 h.The digestion is stopped with addition of 0.5% TFA.The MS analysis is immediately performed to ensure high quality tryptic peptides with minimal non-specific cleavage or frozen at -80 o C until samples can be analyzed.
For gel excision into 20-40 equal slices, robotic digestion was carried out using the Ettan Spot Handling Workstation (Amersham Biosciences).The slices were placed in a 96 well plate that was robotically washed, and slices digested according to the Ettan Spot Handling Workstation 2.1 User Manual.Briefly, gel slices were washed in 100 µL of 50% methanol/5% acetic acid for 30 min.This washing step was repeated 3 times and slices placed in a storage solution of 50 µL of 50% methanol/5% acetic acid until digestion.Gel slices were digested with sequencing grade trypsin from Promega (Madison, WI).Digestion was carried out by adding 100 µL water for 10 min followed by 100 µL acetonitrile for 10 min.The water and acetonitrile were removed and the gel slices were rehydrated and incubated with DTT (prepared as 32 mM in 100 mM ammonium bicarbonate) for 30 min prior incubation in iodoacetiamide solution (prepared as 80 mM in 100 mM ammonium bicarbonate solution) in dark for 30 min.The gel slices were washed again with cycles of acetonitrile and 100 mM ammonium bicarbonate in 10 and 5 min increments.The slices were dried for 10 min and then incubated in 25 µL of 50 mM ammonium bicarbonate containing trypsin (5 µg/mL) for 180 min.The peptides were extracted from the polyacrylamide with 50 µL 50% acetonitrile and 5% formic acid (3 times) and the extracts pooled together.The extracted pools were mixed with 50 µL of acetonitrile and incubated for 15 min before drying for 15 min.The MS analysis is immediately performed to ensure high quality tryptic peptides with minimal non-specific cleavage or frozen at -80 o C until samples can be analyzed.

MudPIT analysis
The first MudPIT analysis on fraction #3 was carried out on a freeze-dried sample that was re-suspended in 5X Invitrosol protein solubilizer (Invitrogen) and diluted to the final 1X Invitrisol by adding 25mM ammonium bicarbonate solution.Ten microliter (10µL) of DTT (5 mg/mL solution prepared in 100mM ammonium bicarbonate) was added to the sample and incubated for 15 min.Ten microliter (10µL) of Iodoacetimide solution (15 mg/mL in 100 mM ammonium bicarobonate) was added to the sample and incubated for another 15 min in the dark.However, we noticed that this freeze-dried sample contained higher concentrations of sucrose and salts, which contributed to the production of low quality MS/MS spectra.Therefore, removal of sucrose and salts by precipitation was necessary before trypsin digestion.Two precipitation methods were evaluated: TCA/acetone or chloroform/methanol.These two methods are known to be effective in desalting and lipid removal from protein samples.In our case, TCA/acetone combination gave a better result in removing sucrose and salts from our protein sample (fraction #3), as the quality of MS/MS spectra was improved and scores were higher than the significance threshold (see section 2.3).Trypsin was prepared in 50 mM ammonium bicarobonate and added to the protein solution in trypsin:protein ratio of 1:25 (w/v) and the mixture was incubated for 60 min at 37°C (no difference was observed when incubation time was extended up to 16 h).

Protein composition analysis of fraction #3 via MudPIT strategy
In our first MudPIT trial, the analysis was carried out on the LTQ XL (ion trap) instrument using MS/MS ion search mode with carbamidomethylation of cysteine residue as fixed modification and methionine oxidation as variable modification.Peptide mass tolerance was set to ± 2 Da, fragment mass tolerance was set to ± 0.8 Da and protein mass was unrestricted.Under these conditions, LTQ analysis of fraction #3 produced a total of 13,504 peptide queries that were used to search the green plant databases at NCBI (6,350,093 sequences).
The search yielded 74 peptide matches above significance threshold (set to: Individual ions scores >59 and p<0.05) that were matched by only 501 peptide sequences (~4% of the 13,504 peptide queries).The decoy value was 41 corresponding to a false discovery rate of 11.33%.Most of the identified hits were protein entries with similar annotation (redundancy) but from different related species (orthologous proteins).Therefore, for simplicity, entries having similar annotation were eliminated, which reduced the number of total protein identified to 25 proteins (Table 2).The low protein identification rate could be due to the inhibition of peptide ionization from low abundant proteins by more abundant proteins.
The other possibility could be the interference of some small molecules (i.e., sucrose, salts) present in the sample, which could produce MS/MS spectra of low quality.However, it is possible that some of these peptide queries are species-specific and do not have matches in the database.Therefore, to improve protein identification rate, we repeated the analysis of the same sample (fraction #3) using an Orbitrap high resolution mass spectrometer (LTQ Orbitrap XL).The high resolving power of the Orbitrap (determines masses with very high accuracy) has been useful in analyzing complex peptide mixtures by improving the resolution (narrow peak width), which in turn would increase the detection of more m/z ions.Thus, ions of similar m/z and proteins with multiple charges produced by ESI can be detected distinctly.Our results indicated that, even with reduced peptide mass tolerance (± 0.1 Da) and fragment mass tolerance (± 0.5 Da) values, Orbitrap analysis generated 20,123 peptide queries (6,619 more peptides compared to LTQ analysis).This analysis resulted in the identification of a total of 51 non-redundant proteins that included 37 new proteins (marked by "*" in Table 2).This increase in the rate of protein identification is the result of an increase in the number of matching peptides, 908 peptides (~408 more peptides Table 2. List of proteins identified in fraction #3 by MudPIT using LTQ or Orbitrap instruments.Fraction #3 was obtained from sucrose density gradient in Figure 2. Hits corresponding to wheat proteins are in italic.Unique hits identified in the Orbitrap analysis are marked by "*".compared to LTQ) and generation of new MS/MS spectra by the higher resolution of the Orbitrap.In addition, the Orbitrap analysis improved the scores (i.e., better MOWSE scores) of the identified proteins.Interestingly, some of the proteins identified in LTQ analysis were absent (or have lower scores) in Orbitrap analysis, which may indicate a higher rate of false positive in LTQ analysis.In addition, the decoy value for Orbitrap analysis was 54 corresponding to ~7% false discovery rate (compared to 11.33% in LTQ analysis).It is worth mentioning that we were able to improve protein identification by reducing false positive rate to ~3% after cleaning our samples from residual sucrose, lipids and slats without major increase in protein identification rate.
In the context of GAX biosynthesis, only the Orbitrap analysis resulted in the identification of several hits corresponding to reversibly glycosylated polypeptides (RGPs, GT75 family) from rice (gi|3646373, gi|4158221, gi|108709682) and wheat (gi|4158232).These proteins were identified by the following peptides (all with scores higher than 100): GIFWQEDIIPFFQNATIPK, NLDFLEMWRPFFQPYHLIIVQDGDPSK, YVDAVLTIPK, TGLPYLWHSK, VPEGFDYELYNR, NLLSPSTPFFFNTLYDPYR, and EGAPTAVSHGLWLNIPDYDAPTQMVKPR.However, both LTQ and Orbitrap analyses failed to identify TaGT43-4 and TaGT47-13 or any members of the GT43 or GT47 families.These results underscore the limitations of MudPIT in identifying some GTs involved in plant cell wall biosynthesis.

Analysis of individual visible bands on 1-D SDS-PAGE
When individual visible bands on the gel (according to coomassie blue staining) were trypsin-digested and the resulting peptides analyzed by LTQ, a total of 169 proteins were matched by 2,106 peptide sequences (~5% of the total peptide sequences, Table 3).Among these 169 proteins, only three proteins (gi|4158232, gi|2218152, and Os01g0926600) were identified as GTs involved in GAX biosynthesis, and one of these three GTs (Os01g0926600) Table 3.Proteins identified from Gel-LC-MS/MS analysis of visible bands on SDS-PAGE gel of fraction #3 (Figure 3) involved in GAX biosynthesis.Fraction #3 was from EDTAsupplemented sucrose gradient (see Figure 2).The approximate MW of gel bands is indicated along with the number of proteins identified under the same band.
was newly identified by this analysis (not present in MudPIT analysis).Table 4 lists nonredundant proteins identified by each analysis of individual proteins band.There were 57 new proteins identified by this strategy (not present in MudPIT analysis).Among these new hits, Os01g0926600 (MW 47,271) was identified from the analysis of gel band around 50 kDa by the following peptide IEGSAGDVLEDDPVGR (score 79).This rice protein has an exostosin domain belonging to the family GT47 that also contains wheat members known to be involved in GAX synthesis (Zeng et al., 2010).Again, this strategy failed in identifying any members of the GT43 family.

Analysis of equal slices covering 30-180 kDa area of the 1-D SDS-PAGE
When the gel area between 30 and 180 kDa was sliced into 20-40 equal slices and each slice subjected to trypsin digestion, a total of 233 proteins were identified through LC-MS/MS using LTQ analysis.These hits were matched by 1,283 peptide sequences, a ~0.9% of the total peptide sequences (Table 5).Table 5 lists the total peptide queries resulted from each slice along with the number of hits identified in NCBI databases, and the number of peptides that matched these hits.This table also lists the top hit from the analysis of each slice (often the top hit is similar in many slices).Table 5.Proteins involved in GAX biosynthesis identified in fraction #3 by Gel-LC-MS/MS and LTQ strategy.The gel area between 30 and 180 kDa of SDS-PAGE was sliced into 20-40 slices (see Figure 3) and each slice was trypsin-digested and analyzed.
(among the 233 hits) that were unique to this strategy (not in previous analyses).Among the unique hits identified by this strategy, two Chlamydomonas reinhardtii GTs (gi|159470791, and gi|159471277) belonging to the GT47 family (both annotated as exostosin-like glycosyltransferase) were identified by the following peptide RVAEADIPRL (score 56).This strategy, however, identified the exact wheat RGP protein (TaGT75-4, gi|4158232) with the following peptides VPEGFDYELYNR and YVDAVLTIPK (both with score 59).Therefore, this strategy successfully identified TaGT75-4 protein and homolog to TaGT47-13 but failed to identify the exact TaGT47-13 protein or any homolog to TaGT43-4 protein.

Discussion
Hemicellulosic polymers such as GAX represent up to 40% (w/w) of grass cell walls (in particular from growing tissues).In sharp contrast with the abundance of these polymers, the GTs that synthesize these compounds are present in low amounts in Golgi membranes of the plant cell.This observation suggests that these enzymes are highly active and may not be required in large quantities in the plant cell.This low abundance of GTs has been the main limiting factor in applying proteomics approaches to plant cell wall biosynthesis.To further complicate the issue, isolation of GTs from Golgi membranes (or simply disrupting these membranes) generally results in a drastic reduction or loss of transferase activity in vitro.To detect this weak transferase activity in vitro, it is necessary to use very sensitive biochemical assays (i.e., [ 14 C]radiolabeled sugars-based assay).Since the loss of transfer activity is GT-dependent, the biochemical assays not the best way to estimate the abundance of these enzymes in a particular protein preparation.Therefore, when working with plant cell wall GTs, all these factors should be taken in consideration.In this work, such in vitro assay was used to monitor the distribution of GAX synthase activity (from Golgi-enriched membranes) on a linear sucrose density gradient supplemented with EDTA as described earlier (Zeng et al., 2010).According to our in vitro assay, fraction #3 was substantially enriched in GAX synthase activity (Figure 2), and it can be assumed that this fraction is also enriched in TaGT43-4, TaGT47-13, and TaGT75-4 proteins.Therefore, fraction #3 is an excellent starting material to evaluate proteomics strategies in identifying these three GTs among a mixture of proteins.Furthermore, because genome and protein sequence information from five grass species are currently publicly available (Figure 1), it can be expected that proteomics analysis on wheat would be successful.
Our analyses indicated that gel-based proteomics approach (gel-LC-MS/MS) has a superior result compared to gel-free approach (i.e., MudPIT).In the MudPIT strategy, LTQ and Orbitrap analyses identified a total of 83 non-redundant proteins, but only 14 of these proteins where in common (Figure 4).However, the Orbitrap gave higher scores and protein identification rates.On the other hand, the Gel-LC-MS/MS strategy resulted in the identification of a total of 180 non-redundant proteins, among which 83 proteins were in common with MudPIT analyses (97 new proteins) (Figure 4).Regarding the ability to identify GTs, the Gel-LC-MS/MS strategy identified most of the GTs associated with GAX biosynthesis.Intriguingly, all the strategies used failed to identify TaGT43-4 or any closest homolog from the NCBI database.Three possibilities could explain this result: (i) the TaGT43-4 protein may be lost during the precipitation step (preparation the sample); (ii) TaGT43-4 is a very active enzyme and is present in only small amounts in fraction #3, which may not be detectable by the LC-MS/MS methods used in this work, or (iii) TaGT43-4 protein is somehow resistant to trypsin digestion.
Our hypothesis is that most of the TaGT43-4 protein was lost during the precipitation step, as Golgi proteins are known to easily aggregated during precipitation and are very difficult to re-solubilize in a buffer containing detergent.Although it has been shown that ASB-14 and SDS detergents are suited for solubilizing hydrophobic proteins (Herbert, 1999), their use in this study may not be efficient in re-solubilizing freeze-dried or TCA/acetone precipitated wheat Golgi proteins.In support of this hypothesis, fraction #3 should be enriched in Golgi proteins (Zeng et al., 2010), but our analysis indicates that fraction #3 was actually enriched in endoplasmic reticulum (14%), tonoplast (17%), and plastid (28%) proteins, and Golgi proteins represented only 2% of the total hits (according to NCBI annotation of possible subcellular localizations) (Figure 5).Therefore, a reliable 'precipitation-re-solubilization' strategy appears to present a crucial step that must be optimized for minimal protein loss.Alternatively, improving enrichment strategies to overcome protein loss during the `precipitation-re-solubilization` step should be developed.Although all proteomics strategies employed here have failed to reveal the exact identity of some GTs associated with GAX biosynthesis, proteomics is still a powerful tool, as many low abundant GTs (among the 2% proteins from the Golgi) could be identified.Furthermore, this work demonstrated that working with a non-model species without a fully sequenced genome such as wheat did not seem to be an issue, as most (40-60%) of the proteins identified were either from wheat sequences available in the databases, or were closest homologs to the anticipated wheat proteins from grass species (rice, barley, maize, or sorghum).The other limitation in applying proteomics to plant cell wall biosynthesis is the capacity of a mass spectrometer analyzer to extract as many MS/MS spectra as possible to increase the detection rate of proteins.To overcome all these issues and depending of the complexity of the protein sample, we are proposing a workflow to carry out a successful proteomics analysis (Figure 6).In this workflow, the first step is to assess the quality of the sample by optimizing the precipitation step (removal of salts and contaminants) without any protein loss.Our work demonstrated that "precipitation-re-solubilization" step is crucial in a successful proteomics analysis of Golgi membrane proteins.Depending of the complexity of the samples, the simplest proteomics strategy to try is the combination of MudPIT fractionation with LTQ analysis.If the sample contains more than 500 proteins, the use of high resolution mass spectrometry (e.g.Orbitrap) in combination with MudPIT could Fig. 6.A proteomics workflow pipeline for efficient protein identification from unknown samples.
be the easiest strategy to test.For more complex protein samples (more than 1000 proteins) it may be necessary to combine 1-D SDS-PAGE fractionation, LC-MS/MS and the high resolving power of the Orbitrap analyzer for optimal protein identification (Figure 6).The distribution of GAX synthase over sucrose density gradient was intriguing.In the absence of EDTA, all GAX synthase activity stabilized at the expected density of ~1.16g/mL (fractions 17 and 18 in Figure 2) along with the Golgi marker activity IDPase (Zeng et al., 2010).The inclusion of EDTA in the sucrose gradient resulted in splitting of the activity into three density areas, namely around density 1.09g/mL (fractions 2 and 3 in Figure 2), around density 1.14g/mL (fractions 12 and 13 in Figure 2), and around density 1.16g/mL (fractions 17 and 18 in Figure 2).Fraction #3 contained the highest GAX synthase activity.This shift in the density may suggest that GAX synthase activity is associated with various Golgi compartments.The presence of such Golgi compartments was reported earlier by Mikami et al. (2001) in rice but not in tobacco cells.They showed that rice Golgi complex fractionated into several compartments by simple centrifugation on density gradient in presence of EDTA or MgCl 2 .Recently, Asakura et al., (2006) used this strategy to isolate (and analyze by proteomics) rice cis-Golgi membranes labeled with green fluorescent protein (GFP) fused to a cis-Golgi marker SYP31 (which belongs to a family of SNARE proteins; soluble N-ethyl-melaeimide sensitive factor attachment protein receptor).Interestingly, their proteomics results gave very similar protein composition to our data, except that no members of the GT43, 47, and 75 were identified in their study (Asakura et al., 2006).Taking together, these results suggest that the cis-Golgi is less tightly attached to the medial and trans-Golgi compartments in grasses.Therefore, we are tempted to propose a possible explanation of the effect of EDTA on the dissociation of these Golgi compartments (Figure 7).Our hypothesis is that some ions (i.e., Ca 2+ ) are involved in linking cis-Golgi (and probably the trans Golgi network [TGN]) to the medial and trans-Golgi cisternae.In the absence of EDTA, the whole Golgi complex would stabilize at an apparent high density (~1.16g/mL, fraction 17 and 18 in Figure 2).Upon addition of EDTA into sucrose gradient, the ions are chelated leaving different Golgi compartments stabilized at their corresponding densities (1.09, 1.12, and 1.16g/mL in Figure 2).In any case, the fractionation of the Golgi complex from grasses in the presence of EDTA is an excellent tool for isolating different Golgi compartments.Fig. 7.A diagrammatic representation of the possible effect of EDTA on the dissociation of Golgi compartments, cis-, medial-, trans-Golgi, and trans Golgi network (TGN).EDTA would chelate metal ions that mediate the attachment (red lines) of cis-Golgi compartment to the medial and trans-Golgi cisternae.

Conclusion
We evaluated two proteomics strategies for MS/MS identification of GTs associated with GAX synthase complex in wheat for which little genome sequence is available.The evaluation of these strategies is based on their capacity to identify the exact wheat proteins TaGT43-4, TaGT47-13, and TaGT75-4, or at least their closest homologous proteins from other grass species such as rice, barley, maize, or sorghum.Therefore, these strategies are MS/MS spectra quality-dependent and cross-species-dependent using error tolerant BLAST search and de novo peptide sequences generated from the MS/MS spectra.Our data indicated that the highest number of unique hits identified (180 proteins) was obtained through Gel-LC-MS/MS strategy using the Orbitrap analyzer, but the fact that TaGT43-4 and/or its homologous proteins were not identified by this method underscores the importance of optimizing sample preparation step (precipitation-re-solubilization).Based on our results and interpretations, we have proposed a workflow chart that includes routinely used proteomics methods and optimization steps to help increase the detection rate of proteins of plant cell wall GTs from non-model plants.

Acknowledgment
The author would like to thank Dr. Green-Church for her valuable comments and editing of the experimental procedures dealing with proteomics analysis.Thanks to Wei Zeng and Nan Jiang for the excellent work preparing protein samples for proteomics.This work was supported by the National Science Foundation (grant no.IOS-0724135 to A.F.)

Fig. 2 .
Fig.2.Distribution of GAX synthase activity after fractionation of wheat Golgi-enriched microsomal membranes on a linear (25%-40%) sucrose density gradient.Microsomal membranes were prepared from 6-day old etiolated wheat seedlings.Fractions of 1ml were collected and each fraction was tested for GAX synthase activity ([ 14 C]GlcA incorporation in presence of UDP-xylose, as cpm/reaction) and for sucrose density (g/ml).

Fig. 3 .
Fig. 3. SDS-PAGE (10%) and silver staining analysis of fraction #3 obtained from Figure 2. The arrows on the left indicate the position of the visible bands that were manually excised and digested with trypsin before ESI-LC-MS/MS-TRAP analysis.The area indicated on the right of the same gel corresponds to the area excised into 20-40 equal slices that were subjected to "In-gel" trypsin digestion followed by ESI-LC-MS/MS-TRAP analysis.Protein identification was carried out using Mascot program to search green plant databases at NCBI.For comparison, the original Golgi-enriched membranes were analyzed.Molecular mass markers (MW) are indicated on the right.

Fig. 5 .
Fig. 5. Classification of proteins identified in fraction #3 according to NCBI annotation of their possible sub-localization.

Hits marked by "*" are unique hits to from Orbitrap analysis
Assessment of Proteomics Strategies for Plant Cell WallGlycosyltransferasesin Wheat, a Non-Model Species: Glucurono(Arabino)Xylan as a Case Study 155

Table 4 .
Newly identified proteins by LC-MS/MS analysis from individual visible bands on SDS-PAGE gels (not identified in MudPIT analysis).Only proteins with scores >55 and/or two peptide matches are listed.