Comparative Proteomics of Tandem Mass Spectrometry Analyses for Bacterial Strains Identification and Differentiation

total experimentally-determined number of proteins showed a difference


Introduction
Bacterial proteome represents the collection of functional and structural proteins that are present in the cell.The bacterial proteome consists of diverse classes of proteins with different cellular functions.Overall, the protein content of the cell represents the majority of the cell dry weight, which makes it an ideal cellular component to be utilized for bacterial characterization (Loferer-Krobacher et al., 1998).The diversity of the bacterial proteome requires the determination, identification, and characterization of its protein content in order to understand their cellular functions (Costas et al., 1990).Moreover, studying the bacterial proteome is essential to identify pathological proteins for vaccine development, diagnose and provide counter measures to infectious diseases, and to the understanding of biological systems.The availability of microbial genomic sequencing information has led to an expansive area of researching bacterial proteomics.Proteomics studies allow addressing the functional proteins produced by the changes of genetic expressions.Using comparative proteomic studies allows the examination of bacterial strain differences, both phenotypic and genetic, bacterial growth under various nutrient and environmental conditions, i.e. nutrient type, growth phase, temperature, chemical compounds, such as antibiotics.
Comparative Proteomics also provides the researcher with a tool to begin characterizing the functions of the vast proportion of "hypothetical" or "unknown" proteins elucidated from genome sequencing and database comparisons.Comparative proteomics has been widely applied to microbial identification and characterization studies through the utilization of several mass spectrometry techniques, with tandem mass spectrometry techniques proving to be effective and reliable approach [Aebersold,2003;Anhalt & Fenselau, 1975;Dworzanski, 2006;Hillkamp,2000;Jabbour, 2005, Krishnamurthy, 2000).This chapter will address the utilization of comparative proteomics and the application of tandem mass spectrometry in the identification and differentiation of bacterial strains.

www.intechopen.com
Tandem Mass Spectrometry -Applications and Principles 200

Overview of the utilization of tandem mass spectrometry in bacterial identification and differentiation
Mass Spectrometry techniques have been extensively used for rapid identification and differentiation of microbes in general and bacteria in particular.The most predominant mass spectrometry techniques that have been utilized for bacterial identification and differentiation include electrospray ionization tandem mass spectrometry/mass spectrometry (ESI-MS/MS); matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS); surface-enhanced laser desorption/ionization (SELDI) mass spectrometry; one-or two-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (1-or 2-D SDS-PAGE); and hybrid techniques such as combination of mass spectrometry, gel electrophoresis, and bioinformatics.Those mass spectrometry methods provide either fingerprints of the bacterial proteins, i.e.MALDI-TOF-MS technique, or amino acid sequences, from tandem MS/MS analysis, of proteins from collision-induced dissociation (CID), Electron transfer dissociation, or post-source decay (PSD) of ionized tryptic peptides derived from bacterial proteins, i.e.ESI-MS/MS technique.This chapter will address the utilization of tandem mass spectrometry techniques in the differentiation of bacterial strains.Tandem mass spectrometry techniques have witnessed significant utilization and success in the interrogation of the protein component of a biological species, virus proteins, protein toxins, and bacteria for identification and characterization purposes (Demirev & Fenselau,2008a, 2008b;Dworzanski & Snyder, 2005;Ho, 2002;Ecker, 2005;Fox, 2002Fox, , 2006;;Hofstadler, 2005;Lambert,2005;Nagele, 2003;Pennigton, 1997;Sampath, 2007;Wilkins, 2006;Williams, 2002).Investigations of the protein component in biological systems constitute the realm of proteomics (Nagele, 2003;Pennigton, 1997).The LC-tandem MS technique is well-suited and equipped to handle the complex and very comprehensive suites of proteins, in a reproducible fashion (William 2002), present in biological threat microorganisms.The vast amount of protein and peptide data generated from a typical LCtandem MS analysis needs to be addressed in an efficient and timely manner.Data reduction techniques have spawned a number of successful bioinformatics software analysis tools to efficiently address this task (Fox, 2002(Fox, , 2006;;Yates, 1998;Kuwana, 2002).Furthermore, new genomes are constantly being realized and resolved so as to increase the database of bacterial genomes to interrogate a biological sample (Dworzanski & Snyder, 2005).A major portion of the Centers for Disease Control (CDC) Category A, B, and C biological threats have their genomes fully sequenced and available for bioinformatics coupled to MS-based proteomics (NCBI website, 2010;Integrated genomic, 2010;Rotz, 2002).The US Government has initiated extensive efforts in the detection and identification of biological threat species in their Defense Advanced Research Projects Agency (DARPA) programs that explore the "detect to protect" and "detect to treat" paradigms (National research Council [NRC], 2005;Demirev, 2005).Those initiatives cover areas of general health risk, bio-terrorism utility, Homeland Security, agricultural monitoring, quality of foodstuffs, environmental monitoring, and biological warfare agents in battlefield situations (Demirev & Fenselau,2008a).Some of the concerns include incidents such as a ricin attack (Bevilacqua, 2010) and the Bacillus anthracis spore attack on the US postal system in the fall of 2001 (Demirev & Fenselau,2008b;Dworzanski & Snyder, 2005;Friess, 2010;Ho, 2002;Wilkins, 2006).

www.intechopen.com
Comparative Proteomics of Tandem Mass Spectrometry Analyses for Bacterial Strains Identification and Differentiation 201 Proteomic analyses by LC-MS have been used in the characterization of bacteria (Castanha,2006;Dworzanski, 2004Dworzanski, , 2006;;Lambert, 2005;).Given the degree of success for tandem MS-based proteomics in bacterial characterization, a comparative proteomic study was reported about the potential of the outer membrane protein (OMP) and whole cell protein extracts, independently, can distinguish between strains of the same species (Jabbour et al., 2010).Typically, whole cell protein extracts are usually investigated or select portions of the bacterium, such as the outer membrane, are isolated and the proteins extracted there from.In the membrane, the OMPs act as active mediators between the cell and its environment and are often associated with virulence in Gram-negative pathogens.In pathogenic Escherichia coli, there are multiple OMPs present which are required for intestinal colonization as well as those that play a role in the type III secretion system responsible for delivering effector proteins to host cells (Garmendia, 2005;Ide,2001;McDaniel, 1997;Wachter,1999).

Outer membrane proteins for bacterial strains differentiation
Outer membrane proteins (OMPs) of gram-negative bacteria act as active mediators between the cell and its environment and are often associated with virulence in gramnegative pathogens (Jerse et al., 1990;Kaper et al., 2004;Koebnik et al., 2000;).Avriulent strains often lack one or more of the plasmids or genes encoding proteins needed for virulence.These differences in OMP expression between virulent and avirulent strains of gram negative bacteria could potentially be exploited to distinguish among strains.Therefore, OMPs could prove to be potential biomarkers for Bacterial strain differentiation.The off-line 2-D chromatofocussing and reverse phase LC with electrospray-time of flight (ESI-TOF)-MS and matrix-assisted laser desorption ionization (MALDI) TOF-MS detection instrumentation have been used to analyze whole cell protein extracts of non-pathogenic and pathogenic (O157:H7) E. coli strains (Zheng, 2005).Those analyses provided various proteins where, in addition to commonly shared proteins, seven unique proteins were found in a non-pathogenic E. coli strain, and five unique proteins were found to be expressed in the pathogenic O157:H7 strain.These intracellular, non-OMP proteins were the basis for distinguishing the E. coli strains; however, this information was not applied to bioinformatics cross-referencing with a proteome database.A series of Enterobacteria were investigated and cross-referenced with on-line protein databases (Pribil, 2005).OMPs were investigated by MALDI-TOF-tandem MS where microgram amounts of cells were briefly subjected to trypsin digestion on a stainless steel target plate.Four Enterobacteria were investigated and protein mass spectra were analyzed.Peptide analyses provided protein identification, and multiple assignments allowed database searches for matching to the Enterobacteria species: E. coli, E. herbicola, E. cloacae, and Salmonella typhimurium.Some of the distinguishing proteins originated in the cellular milieu and unique OMPs were identified in all four species.Top-down proteomics and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF/TOF) tandem mass spectrometries were used to differentiate protein extracts of E. coli strains.Six ions found in a collection of mass spectra originated from proteins that could distinguish between pathogenic and non-pathogenic E. coli strains by tandem TOF mass spectrometry.A unique protein biomarker ion at m/z 7705.6 was found (putative uncharacterized YahO) in pathogenic O157:H7 and pathogenic nearest neighbor O55:H7 (infantile diarrhea) strains.Another ion at m/z 9737.5 indicative of the acid stress www.intechopen.comTandem Mass Spectrometry -Applications and Principles 202 chaperone-like protein: HdeA was found in the O157:H7 strain.An ion (m/z 9063.4) in the mass spectrum of non-pathogenic E. coli RM3061 was absent in the O157:H7 mass spectrum.Tandem TOF mass spectrometry analysis identified the peak as the HdeB acid stress chaperone-like protein which was useful in discrimination for this non-pathogenic E. coli strain.In another study, the membranes of the S. typhimurium and Klebsiella pneumoniae Enterobacteria were isolated, and the proteins were extracted with subsequent 2-D electrophoresis (Fagerquist, 2010).The excised protein spots were digested with trypsin and analyzed by MALDI-TOF-MS and peptide mass fingerprinting.The masses predominately originated from OMP peptides and were searched against microorganism databases for identification purposes.Twenty-five and fourteen unique proteins were found in S. typhimurium and K. pneumoniae, respectively, in a reproducible fashion (Lamontagne, 2007).Pathogenic E. coli, such as the O157:H7 strain is a public health pathogen responsible for most common food borne and waterborne illnesses.This bacterium contains a full complement of OMP proteins.Yersinia pestis is classified as a Category A pathogen and is an important potential biowarfare agent.Virulent Y. pestis contains three plasmids encoding multiple OMPs that are required for virulence (Ben-Gurion & Shafferman, 1981;Ferber, 1981;Filippov, 1990).For example, the pCD1 plasmid encodes several Yersinia OMPs and a type III secretion system, which are needed for survival and entry into host eukaryotic cells (Cornelis, 2002;Ramamurthi, 2002).Additionally, the pPCP1 plasmid encodes an OMP plasminogen activator that interferes with clotting and complements (Titball, 2003).Avirulent strains often lack one or more of the plasmids or genes encoding proteins needed for virulence, and it is these differences in OMP expression between virulent and avirulent strains of Gramnegative Enterobacteria that could potentially be exploited in order to distinguish among strains.Alternatively, high-throughput tandem mass spectrometry-based proteomics was applied as a means for characterizing cellular proteins and producing amino acid sequence information for peptides derived from these proteins for E. coli and Y. pestis.Whole cell protein and cell membrane OMP extracts were compared and contrasted with the in-house BACid bioinformatics modeling tools for species and strain level discrimination (Jabbour, 2010).

Bioinformatics tools for bacterial strains differentiation using tandem mass spectrometry
Utilization of MS techniques for bacterial differentiation relies on the comparison of the proteomic information generated from either intact protein profiles (top-down) or the product ion mass spectra of digested peptide sequences (bottom-up) analyses (Warscheid, 2003;Washburn, 2001).For top-down analysis, bacterial differentiation is accomplished through the comparison of the MS data of intact proteins with an experimental mass spectral database containing the mass spectral fingerprints of the studied microorganisms (Craig, 2004).Conversely, bacterial differentiation using the product ion mass spectral data of digested peptide sequences is accomplished through the utilization of search engines against publically available sequence databases to infer identification (Eng, 1994;Warscheid 2004).Several peptide searching algorithms (i.e.SEQUEST and MASCOT) have been developed to address peptide identification using proteomics databases that were generated from either fully or partially genome sequenced organisms (Craig, 2004;Xiang, 2000).
Recent developments in the microbial differentiation field have focused on improving the selectivity of the MS data processing.The product ion mass spectrum-SEQUEST approach was reported for the identification of specific bacteria using a custom-made, limited database of sequences (Keller, 2002;VerBerkmoes, 2005).Another approach used open reading frame (ORF) translator programs to predict possible protein sequences from all probable ORFs and correlate them with the genomic sequences to establish an identification of microorganisms (Chen, 2001).This approach did not show advantages over the product ion mass spectrum method with regard to strain level discrimination (Wolters et al., 2001).However, a recent advancement in proteomics approaches to bacterial differentiation reported a hybrid approach combining protein profiling and sequence database searching using accurate mass tag (Lipton et al., 2002;Norbeck et al., 2006).This approach was used to probe defined mixtures of bacteria to evaluate its capabilities.Alternatively, an emerging bioinformatics approach that is based on a cross correlation between the product ion spectra of the tryptic peptides and their corresponding bacterial proteins derived from an in-house comprehensive proteome database from genome sequenced microorganisms has been validated (Jabbour, 2010).The exploitation of this proteome database approach allowed for a faster search of the product ion spectra than that using genomic database searching.Also, it eliminates inconsistencies observed in publicly available protein databases due to the utilization of non-standardized gene finding programs during the process of constructing the proteome database.The proposed approach uses an ensemble of bioinformatics tools for the classification and potential identification of bacteria based on the peptide sequence information.This information is generated from the liquid chromatography tandem mass spectrometry (LC-MS-MS) analysis of tryptic digests of bacterial protein extracts and subsequent profiling of the sequenced peptides to create a matrix of sequence-to-microbe (STM) assignments.This proteomics approach is an unsupervised approach to reveal the relatedness between the analyzed samples and the database of microorganisms using a binary matrix approach.The binary matrix is analyzed using diverse visualization and multivariate statistical techniques for bacterial classification and identification.

Bacterial strains growth and culture conditions
Pathogenic strains employed in the present study were E. coli O157:H7 and Y. pestis Colorado 92 (CO92).Non-pathogenic strains employed were E. coli K-12 and Y. pestis A1122.Working cultures were prepared by streaking cells from cryopreserved stocks onto tryptic soy agar (TSA) followed by incubation for approximately 18 hours at 37 o C for E. coli and 30 o C for Y. pestis strains.After incubation, all working culture plates were stored at 4 o C. Cells from working cultures were used to inoculate broth cultures for each strain, which consisted of 100 mL of trypticase soy broth (TSB) for E. coli strains and 100 mL of brain heart infusion (BHI) for Y. pestis strains.Cultures were incubated for approximately 18 hours at 37 o C for E. coli strains and 30 o C for Y. pestis strains with rotary aeration at 180 rpm.After incubation, broth cultures were pelleted by centrifugation (2,300 RCF at 4 o C for 10 min), washed, and resuspended in 10 mL HEPES buffer followed by heating at 95 o C for 1 hour to lyse the cells.After heating, a portion of each sample was plated onto TSA and incubated for five days at the appropriate temperature to ensure no growth prior to removing samples from the BSL-2 or BSL-3 laboratory.Total cellular protein samples (whole cell protein extracts) were heated for one hour to ensure that a no growth situation was confirmed on agar plates for safety concerns.

Isolation of the Outer Membrane Proteins (OMPs)
After lysis of the whole cells by heating at 95 o C for one hour, the cell debris was pelleted by centrifugation at 2,300 RCF at 4 o C for 10 min.The supernatant was then centrifuged at 100,000 x g for one hour to pellet the proteins.The pellet was resuspended in 1 mL of HEPES buffer, 1 mL of a 2% Sarkosyl solution (N-lauroylsarcosine sodium salt solution) was added, and the sample was incubated at room temperature for 30 min.Samples were centrifuged at 100,000 x g for one hour, and the pellet containing OMPs was resuspended in 1 mL of HEPES buffer.

Processing of whole cell lysates and OMPs samples
All protein samples were ultrasonicated (20 seconds pulse on, 5 seconds pulse off, and 25% amplitude for 5 min duration) and a small portion of the lysates was reserved for 1-D gel analysis.The lysates were centrifuged at 14,100 x g for 30 min to remove any debris.The supernatant was then added to a Microcon YM-3 filter unit (Millipore, Catalogue # 42404) and centrifuged at 14,100 x g for 30 min.The effluent was discarded.The filter membrane was washed with 100 mM ABC and centrifuged for 20 min at 14,100 x g.Proteins were denatured by adding 8 M urea and 3 g/L DTT to the filter and incubating overnight at 37 C on an orbital shaker at 60 rpm.Twenty microliters of 100% ACN was added to the tubes and allowed to incubate at room temperature for 5 min.The tubes were then centrifuged at 14,100 x g for 40 min and washed three times using 150 L of 100 mM ABC solution.On the last wash, ABC was allowed to sit on the membrane for 20 min while shaking, followed by centrifugation at 14,100 x g for 40 min.The micron filter unit was then transferred to a new receptor tube and the proteins were digested with 5 L trypsin in 240 L of ABC solution + 5 L ACN.Proteins were digested overnight at 37 C on an orbital shaker set to 55 rpm.Sixty microliters of 5% ACN/0.5% formic acid (FA) was added to each filter to quench the trypsin digestion followed by two minutes of vortexing for sample mixing.The tubes were centrifuged for 30 min at 14,100 x g.An additional 60 l 5% ACN/0.5% FA mixture was added to the filter and centrifuged.The effluent was then analyzed using LC-ESI-tandem MS.

LC-tandem MS analysis of peptides
The tryptic peptides were separated using a capillary Hypersil C18 column (300 Å, 5 m, 0.1 mm i.d. 100 mm) by using the Surveyor LC from ThermoFisher (San Jose, CA 95101).The elution was performed using a linear gradient from 98% A (0.1% FA in water) and 2% B (0.1% FA in ACN) to 60% B over 60 min at a flow rate of 200 L/min, followed by 20 minutes of isocratic elution.The resolved peptides were electrosprayed into a linear ion trap mass spectrometer (LTQ, Thermo Scientific, San Jose, CA 95101) at a flow rate of 0.8 L/min.Product ion mass spectra were obtained in the data dependent acquisition mode that consisted of a survey scan over the m/z range of 400-2000 followed by seven scans on the most intense precursor ions activated for 30 ms by an excitation energy level of 35%.A dynamic exclusion was activated for 3 min after the first MS-MS spectrum acquisition for a given ion.Uninterpreted product ion mass spectra were searched against a microbial database with TurboSEQUEST (Bioworks 3.1, Thermo Scientific, San Jose, CA 95101) www.intechopen.com Comparative Proteomics of Tandem Mass Spectrometry Analyses for Bacterial Strains Identification and Differentiation 205 followed by application of an in-house proteomic algorithm for bacterial identification of the replicate analyses.

Protein database and database search engine
A protein database was constructed in a FASTA format using the annotated bacterial proteome sequences derived from fully sequenced chromosomes of 1433 bacteria, including their sequenced plasmids (as of May 2011).A PERL program (http://www.activestate.com/Products/ActivePerl)was written to automatically download these sequences from the National I n s t i t u t e s o f H e a l t h N a t i o n a l C e n t e r f o r Biotechnology (NCBI) site (http://www.ncbi.nlm.nih.gov).Each database protein sequence was supplemented with information about the source organism and genomic position of the respective open reading frame (ORF) embedded into a header line.The database of bacterial proteomes was constructed by translating putative protein-coding genes and consists of tens of millions of amino acid sequences of potential tryptic peptides obtained by the in silico digestion of all proteins (assuming up to two missed cleavages).The experimental product ion mass spectra of bacterial peptides were searched using the SEQUEST (Warscheid, 2003) algorithm against a constructed proteome database of microorganisms.The SEQUEST thresholds for searching the product ion mass spectra of peptides were Xcorr, deltaCn, Sp, RSp, and deltaMpep.The search results were filtered by using Xcorr = 1.90, 2.20, and 3.75 thresholds for peptide ions of +1, +2, and +3 charges, respectively (Ma, 2009;Wu, 2003).These parameters provided a uniform matching score for all candidate peptides.The generated outfiles of these candidate peptides were then validated using the Peptide Prophet algorithm (Keller et al., 2002).Peptide sequences with a probability score of 95% and higher were retained in the dataset and used to generate a binary matrix of sequence-to-bacterium (STB) assignments.The binary matrix assignment was populated by matching the peptides with corresponding proteins in the database and assigning a score of one.A score of zero was assigned for a non-match.The column in the binary matrix represents the proteome of a given bacterium, and each row represents a tryptic peptide sequence from the LC product ion mass spectral analyses.A sample microorganism was matched with a database bacterium by the number of unique peptides that remained after filtering of degenerate peptides from the binary matrix.Verification of the classification and identification of candidate microorganisms was performed through hierarchical clustering analysis and taxonomic classification (Jabbour et al., 2010).The SEQUEST-processed product ion mass spectra of the peptide ions were compared to an NCBI protein database with the in-house BACid developed software (Dworzanski et al., 2006).BACid provided a taxonomically meaningful and easy to interpret output.It calculated the probabilities that a peptide sequence assignment to a product ion mass spectrum was correct and used accepted spectrum-to-sequence matches to generate an STB binary matrix of assignments.Validated peptide sequences, either present or absent in various strains (STB matrices), were visualized as assignment bitmaps and analyzed by the BACid module that used phylogenetic relationships among bacterial species as part of a decision tree process.The bacterial classification and identification algorithm used assignments of organisms to taxonomic groups (phylogenetic classification) based on an organized scheme that begins at the phylum level and follows through the class, order, family, genus, and species to the strain level.BACid was developed in-house using PERL, MATLAB and Microsoft Visual Basic.

Comparative proteomic differentiation between the whole cell and the OMP extracts for the E. coli O157:H7 strain
The whole cell protein extracts of E. coli strain O157:H7 were prepared and analyzed by LC-Tandem ESI-MS/MS.The bioinformatics analyses involved the nearest-neighbor analysis, using the Euclidean single linkage approach to arrive at a set of proteins for species and strain matching to the database.Figure 1 shows the identification and classification of the experimental sample, whole cell extract, as E. coli O157:H7 strain.However, this identification is equally shared with E. coli UTI89, which is the causative agent of human urinary tract infections.Although E. coli UTI89 is related to E. coli O157:H7, it is missing certain proteins such as the OMP HU2 outer membrane and flagella related proteins that are distinctly expressed in E. coli O157:H7 (vide infra).A comparative proteome list of the strain-unique proteins and the total number of identified proteins for the mentioned E. coli O157:H7 extracts is shown in table 1.There are five and eight unique proteins resulted from the bioinformatics analysis of the peptide product ion mass spectra from the E. coli O157:H7 whole cell and OMPs extracts, respectively.Figure 2 shows the nearest neighbor similarity linkage results for the OMP extract of E. coli O157:H7.This dendogram shows an unambiguous strain level differentiation for the E.coli O157:H7 as compared together E. coli strains.It is worth mentioned that the next nearest neighbor, which is E. coli UT189, is relatively distant at approximately 2.2 linkage units unlike that from the whole cell protein extract (Figure 1).This result indicates that OMPs extract can potentially serves as strain-unique biomarkers for bacterial strain differentiation.Moreover, a closer look at the resulted bioinformatics data showed the total number of proteins identified between the two extraction techniques was such that the whole cell preparation had a significantly higher number of proteins of 162 as compared to the that of the number of OMP extract proteins of 89.However, the number of unique proteins that were identified from the OMP extract (eight proteins) was greater than that in the whole cell protein extract (five proteins) (Table 1).These numbers of unique proteins are very similar to that of the whole cell protein extracts for E. coli strains investigated (Zheng et al. 2005).That work found five unique proteins from the E. coli O157:H7 strain.However, this does not imply an absence of the additional OMPs in the whole cell extract.Rather it may be that a higher abundance of non-OMPs, or remaining protein in the cell, potentially suppressed the detection of the OMPs in the whole cell protein extracts by tandem MS.Mass spectral analysis can suffer from ionization suppression due to the presence of large numbers of ionizable species.Generally, a whole cell extract has a significantly larger number of ionizable peptides with a greater abundance of non-outer membrane tryptic peptides compared to that of an OMP extract.Therefore, whole cell protein extract analysis likely experiences a degree of ionization suppression during mass spectral analysis.

Comparative proteomic differentiation between the whole cell and the OMP extracts for the E. coli K-12 strain
The results of the bacterial strain level differentiation of the whole cell and OMPs extracts for E. coli K-12 are shown in Figures 3 and 4, respectively.The results indicate that those extracts provided sufficient number of identified proteins to correctly identify the E. coli K-12 strain.Figure 3 shows that the whole cell protein extract produced an equal similarity with the sample and the E. coli K-12 and W3110 strains.This is in agreement with the literature, which reported that E. coli W3110 is actually a substrain of K-12 (Baglioni et al., 2003;Yamada et al., 1993).It worth mentioning that the whole cell extract (Figure 3) is approximately 0.03 linkage units distant between the sample/K-12/W3110 E. coli group of strains and the next nearest-neighbor group that includes the E. coli 536/UT189/CFT73/O157:H7 strains.Hence, the whole cell protein extract was able to delineate the sample containing E. coli K-12 from that of the of the E. coli O157:H7 strain.Figure 4 shows the nearest neighbor Euclidean similarity linkage analysis for the OMP extracts of the E. coli K-12 sample.This dendogram shows that the OMP extract provided an enhancement of the strain differentiation as compared with that of whole cell extract.Although, a sample was matching with the non-pathogenic W3110 strain, however, the labels signify the same organism (vide supra).No ambiguity was observed in the strain differentiation.Moreover, there is a relatively larger linkage distance (0.10) between the sample/K-12/W3110 and the 536/UT189/CFT073/O157:H7 groups of E. coli strains from the OMP as compared to that from the whole cell extract, figure 3. Table 2 presents a list of the unique proteins.The total number of identified proteins found in the proteomics analysis for the K-12 strain was 194 and 112 for the whole cell protein and OMP extracts, respectively.The number of strain-unique proteins that were identified by the bioinformatics algorithm was greater in the OMP extracts (ten proteins) compared to that in the whole cell extracts (eight proteins).These numbers of unique proteins from the K-12 extracts are very similar to that of the whole cell protein extracts for E. coli strains investigated by Zheng et al. (Zheng et al., 2005).That work found seven unique proteins from the non-pathogenic E. coli 88-0447 (O136STa).
Overall, the comparative proteomic analyses of the E.coli whole cell extracts showed that there 162 proteins produced for E. coli O157:H7 strain vs. 194 for that of E. coli K-12 one, see tables 1-2.Upon removing the highly conserved, house-keeping, denigrate and energy transfer proteins from both strains, the number of strain-unique proteins was eight for E. coli K-12 and five for E. coli O157:H7.From analyses of the OMP protein extracts, a comparison of the total experimentally-determined number of proteins showed a difference between the two E. coli strains.The O157:H7 strain had 89 total identified proteins compared to 112 for the K-12 strain.Upon removing the highly conserved, house-keeping, and energy transfer proteins from both strains, the number of strain-unique proteins for E. coli O157:H7 is eight and that for E. coli K-12 is ten in the OMPs extract of the studied E. coli strains as shown in table 2.

Comparative proteomic differentiation between the whole cell and the OMP extracts for the Yersinia pestis CO92 strain
A comparison of the LC-Tandem MS and bioinformatics results of the proteins present in the whole cell and OMP extracts of Y. pestis CO92 was performed.Figure 5 shows the identification results of the whole cell protein extract for Y. pestis CO92.The dendrogram indicates an ambiguous strain level differentiation between the experimental sample and the database Y. pestis CO92 entry.The bioinformatics analysis of the whole cell extracts of Y. pestis CO92 matched with five strains entries of Yersinia strains in the database.The CO92 experimental strain was matched to the only avirulent Y. pestis strain (91001) in the database as well as to the virulent Antiqua, CO92, Nepal 516, and IP32953 Y. pestis strains.However the Y. pestis KIM strain resided two linkage units distant from the sample and remaining five Y. pestis strains in the nearest neighbor similarity linkage analysis.The set of unique proteins for whole cell protein extracts of Y. pestis CO92 shows only four biomarkers associated with its reported virulence factors (Table 3).3).For example, virulence plasmids in Y. pestis such as pPCP1 that encodes for plasminogen activator protease precursor, pCD1 that encodes for low-calcium response protein, pMT1 that encodes for toxin protein and the structural gene for fraction 1 protein capsule (chaperonin protein) were found in the mass spectral analyses and are listed in Table 3.The chaperonin protein was present in higher abundance than that of the other protein biomarkers.The unique set of proteins had the closest match with Y. pestis strains compared to other similar bacteria in the database as seen in both dendrograms in Figures 5-6.
From analyses of both protein extracts, a comparison of the number of total, experimentallydetermined number of proteins showed a difference between the two protein methods as applied to the Y. pestis sample.The whole cell protein and OMP approaches had 182 and 136, respectively, total identified proteins (Table 3).Upon removing the highly conserved, house-keeping, and energy transfer proteins from both strains, the number of strain-unique proteins (Table 3) for the whole cell protein and OMP approaches was four and thirteen, respectively.Even with a significant amount of unique proteins, the OMP differentiation capability did not provide a significant benefit (1.4 linkage units) with respect to the four proteins from the whole cell approach (1 linkage unit) as detailed in the dendograms in Figures 5-6.

Comparative proteomic differentiation between the whole cell and the OMP extracts for the Y. pestis A1122 strain
A comparison of the LC-Tandem MS and bioinformatics results of the proteins present in the whole cell and OMP extracts of the avirulent Y. pestis A1122 was performed.Figure 7 shows the nearest-neighbor similarity linkage analysis of the whole cell extract of the avirulent Y. pestis A1122 strain.A unique set of proteins for each extraction method had the closest match with Y. pestis strains compared to other similar Gram-negative bacteria in the database entries.In figure 7, the dendogram shows the similarity linkage for the whole cell protein extract from the Y. pestis A1122 in which the sample was identified to the pathogenic KIM, CO92 and Nepal 516 strains.Equidistant next nearest neighbors to this group are the 91001 and Antiqua strains.The linkage distance is minimal between these two groups of Y. pestis strains.On the basis of these results, the unique set of proteins (Table 4) from the experimental Y. pestis A1122 sample produced a closest similarity index to the CO92 and Nepal 516 virulent strains from whole cell protein extract preparations.A similar situation also was observed using whole cell protein extracts between the sample CO92 strain and the 91001/Antiqua/CO92/Nepal 516/IP32953 strains (Figure 5).As shown in table 4, there are three strain-unique proteins that were identified out of a total of 164 proteins from an analysis of the A1122 strain.On the other hand, the OMP analysis in Figure 8 shows that the sample was identified at the strain level as Y. pestis 91001.This finding is encouraging knowing that Y. pestis 91001 is the only avirulent strain in the proteome database which also includes several pathogenic Y. pestis strains.Because the avirulent Y. pestis A1122 strain has not been sequenced or is not publicly available, its absence from the database provided an indirect test of the robustness of the proteomics approach in the classification of a non-database bacterium to the database entries.It is worth mentioning that the constructed proteome data base consists of more than 1400 fully sequenced bacteria that had been translated into their complimentary protein expressions.All the samples studied were compared to all the proteomes in the constructed database and the top 20 closest near-neighbors were selected for further comparative proteomics analyses.This also provides confidence for identification at the species level (Figure 8).However, an equal similarity index is also shared with the Nepal 516 strain.The Antiqua strain is a very close nearest neighbor to the 91001 and Nepal 516 cluster of strains.The CO92 strain is observed to be relatively more removed from the 91001/Nepal 516 and Antiqua strains.On the basis of these results, the unique set of proteins for the experimental Y. pestis A1122 sample produced the same similarity index for the database Y. pestis 91001 and the Nepal 516 strains from the OMP extract preparation (Table 4).Figure 8 shows that there is a very small linkage distance between the groups of Y. pestis strains.Thus, the OMP analysis produces very similar classification results (very small linkage distances) for the six Y. pestis strains in the genome database.Table 4 lists the six unique proteins from a total of 94 proteins for the Y. pestis 91001 strain found in the OMP extract of the experimental A1122 strain.From analyses of the whole cell protein extracts, a comparison of the total number of proteins produced 182 (Table 3) and 164 (Table 4) for Y. pestis CO92 and Y. pestis A1122, respectively.Upon removing the highly conserved, housekeeping, and energy transfer proteins from both strains, the number of strain-unique proteins was four for Y. pestis CO92 and three for Y. pestis A1122.From analyses of the OMP protein extracts, a comparison of the number of total, experimentally determined number of proteins showed a difference between Y. pestis CO92 and Y. pestis A1122.The CO92 strain had 136 total identified proteins compared to 94 for the A1122 strain.Upon removing the highly conserved, housekeeping, and energy transfer proteins from both strains, the number of strain-unique proteins for Y. pestis CO92 was 13 and that for Y. pestis A1122 was 6.

Conclusion
Comparative proteomics of tandem mass spectrometry data showed that the OMPs extract provided equal or better discrimination compared with the whole cell one with respect to the distance or similarity linkage with the next nearest neighbor(s).Also, the OMPs extracts of all studied strains showed correct database bacterial match with linkage similarity improved over the whole cell extract.The improved strain level differentiation using OMPs extract could be due to the possible ionization suppression experienced by whole cell that could shield the detection of important peptides that could be classified as unique biomarkers.However, whole cell lysates can be an appropriate option for the differentiation of Gram positive bacterial strains and the reported results herein support their potential application in bacterial species and potential strain differentiation.Also, Inclusion of more relevant bacteria such as Francisella tularensis, Burkholderia, and other Gram negative genera and species may provide a more comprehensive outlook on the importance of OMPs in comparison to the whole cell extract.These additions may also provide decision information as to the relative merit of applying OMP vs. whole cell protein extraction techniques in the analysis of an experimental bacterial sample for classification and diagnostic purposes.Overall, Tandem MS-based proteomics and bioinformatics were shown to have utility in the comparative proteomics study for the differentiation of Gram-negative bacterial strains.Different numbers of distinguishing, unique proteins were obtained by the bioinformatics procedure between the whole cell and OMPs extracts.This resulted in different degrees of separation between the correctly determined database organism and the next nearest neighbor organism(s).Moreover, this approach relies on taxonomic correlation within the constructed proteome database and thus inferring an ID on sample organism not present in the genome database is possible.This capabilities is supported the fact that prokaryotic organism as they are arranged in hierarchal order their common proteins increase as we move from strain to phyla and vice versa.Such properties will allow the utilization of this

Fig. 1 .
Fig. 1.Euclidean linkage similarity dendogram of the Nearest-neighbor classification of whole cell extract of E. coli O157 H:7.

Fig. 3 .
Fig. 3. Euclidean linkage similarity dendogram of the Nearest-neighbor classification of whole cell extract of E. coli K-12 strain.

Fig. 5 .
Fig. 5. Euclidean linkage similarity dendogram of the Nearest-neighbor classification of whole cell extract of Yersinia pestis CO92 strain.

Figure 6
Figure 6 shows the identification results for the OMP extracts of the Y. pestis CO92 sample.The dendrogram indicates an unambiguous, and correct, strain level identification with the Y. pestis CO92 strain in the proteome database.The experimental sample and Y. pestis entry of the Y. pestis CO92 strains are one linkage distance unit from the next nearest neighbor group consisting of the 91001/Antiqua/Nepal 516 strains.The set of unique proteins for virulent Y. pestis CO92 provides the presence of known biomarkers associated with virulence factors (Table3).For example, virulence plasmids in Y. pestis such as pPCP1 that encodes for plasminogen activator protease precursor, pCD1 that encodes for low-calcium response protein, pMT1 that encodes for toxin protein and the structural gene for fraction 1 protein capsule (chaperonin protein) were found in the mass spectral analyses and are listed in Table3.The chaperonin protein was present in higher abundance than that of the other protein biomarkers.The unique set of proteins had the closest match with Y. pestis strains compared to other similar bacteria in the database as seen in both dendrograms in Figures5-6.From analyses of both protein extracts, a comparison of the number of total, experimentallydetermined number of proteins showed a difference between the two protein methods as applied to the Y. pestis sample.The whole cell protein and OMP approaches had 182 and 136, respectively, total identified proteins (Table3).Upon removing the highly conserved, house-keeping, and energy transfer proteins from both strains, the number of strain-unique proteins (Table3) for the whole cell protein and OMP approaches was four and thirteen, respectively.Even with a significant amount of unique proteins, the OMP differentiation capability did not provide a significant benefit (1.4 linkage units) with respect to the four

Fig. 7 .
Fig. 7. Euclidean linkage similarity dendogram of the Nearest-neighbor classification of whole cell extract of Yersinia pestis A1122 strain.

Table 1 .
Identified unique Proteins lists detected in the Whole Cell Protein and OMP Extracts of E. coli O157:H7.

Table 2 .
Identified unique proteins lists detected in the whole cell and OMP extracts of E.

Table 3 .
Identified unique proteins lists detected in the whole cell and OMP extracts of Y. Pestis CO92 strain.

Table 4 .
Identified unique proteins lists detected in the whole cell and OMP extracts of Y. Pestis A1122 strain.