Proteases used for proteolytic digestion of protein extracts retrieved from biological material such as tissue, body fluids or cell extract. Table 1 presents the enzyme class, pH and temperature optimum, inorganic ion cofactor and specificity of protease. In addition a representative application and literature source is given.
Sample preparation is a key step in proteomics, however there is no consensus in the community about the standard method for preparation of proteins from clinical samples like tissues or biofluids. In this chapter, we will discuss some important steps in sample preparation used for bottom-up proteome profiling with mass spectrometry (MS). Specifically, tissues, which are an important source of biological information, are of interest because of their availability. Tissues are most often stored as fresh frozen (FF) or formalin-fixed paraffin-embedded (FFPE). While FF tissues are more readily available, paraffin embedding has historically been routinely used for tissue preservation. However, formaldehyde induced crosslinks during FFPE tissue preservation present a challenge to the protocols used for protein retrieval. Moreover, in our view, an important aspect to consider is also the amount of material available at the start of a protocol since this is directly related to the choice of protocol in order to minimize sample loss and maximize detection of peptides by MS. This “MS sensitivity” is of special importance when working with patient samples that are unique and often available in limited amounts making optimization of methods to analyze the proteins therein important given that their molecular information can be used in a patients’ diagnosis and treatment.
- sample preparation
- mass spectrometry
Proteomics is an important tool in the study of human biological material with the aim to extract knowledge that can improve a patients’ treatment outcomes. Molecular information obtained from patient samples can be complementary to pathological observations all with the goal of faster and more accurate diagnosis, and subsequent treatment. Molecular analysis of tissue by proteomics can lead to disease classification and reveal underlying disease pathways that can further serve as a target for medical treatment.
Sample size and origin is an important aspect in sample preparation. Today, there are numerous sample preparation procedures existing which aim to improve sensitivity of detection or protein recovery from a sample. Release of proteins from native or artificial material is a crucial step in sample preparation and to improve protein recovery there are different additives such as detergents, chaotropes, buffers and salts added during the sample preparation that must be considered. Moreover, targeting special groups of proteins (e.g. membrane proteins), which are involved in key cellular functions and may be a target of pharmaceutical treatment, often represent a challenge in their isolation and analysis. Their amphipathic nature may require use of appropriate enrichment procedures all with the goal to achieve better detection.
Further, sample loss during most standard preparation procedures is inevitable, and it is even more accentuated when minute amounts of material are being processed. To minimize sample loss and thus increase sensitivity of the analysis at the MS step there have recently been several technologies developed. Specifically, improvement in technologies that allow detection of proteins down to a single cell have become available. Some of these technologies, such as nanoPOTS and microPOTS, have already been applied to human tissues. These new possibilities to analyse small regions of tissue samples with sufficient sensitivity is opening the door to many applications such as profiling of selected regions of a tumorous zone or detection of proteins from subcellular populations. These new applications aimed at working with 1 to 100s or 1000s of cells will likely have increasing importance in clinics, but only if they can be developed into routine and robust methods.
2. Tissue preservation
Human tissue samples are a valuable source of information for diagnostics, therefore a lot of effort has gone into best preservation methods that minimize changes that can occur over time in storage. For example, following clinical surgery tissues need to be stored according to the protocols that minimize chemical, enzymatic, mechanical or thermal degradation and protect their molecular content. Today, tissues are most often preserved as fresh frozen (FF) or formalin-fixed paraffin-embedded (FFPE) tissues.
2.1 Fresh frozen tissues
FF tissues are obtained usually with snap freezing of tissues where the temperature achieved is below −70°C, most often in dry ice or liquid nitrogen (Figure 1B). To minimize variability between the samples storage and thus to minimize potential effect on molecular structure and integrity of the tissue, the European Human Frozen Tumour tissue bank (TuBa-Frost) has standardized tissue preservation by freezing in 2006 [1, 2]. An important aspect in preservation of tissues by the FF method is prevention of formation of artefacts that might result in changes to the tissue structure and morphology. For example, ice crystals that can disrupt structures within the tissue may form as a consequence of the freezing procedure due to moisture present within the tissue . An alternative to snap freezing is the optimal cutting temperature (OCT) compound, which is used for tissue embedding and contains polyvinyl alcohol, polyethylene glycol (PEG) and benzalonium chloride. The OCT substance preserves tissue and enables optimal microdissection of the tissue. However, where samples will later be analyzed by mass spectrometry (MS), OCT compounds must be removed prior to analysis. This is usually achieved by washing the tissue with a special grade of alcohol or Carnoy’s fluid  or with the use of other protocols for sample purification.
2.2 Formalin-fixed paraffin-embedded tissues
An alternative to preservation of tissue by the FF process is the use of FFPE methods (Figure 1A), which are routinely used by pathologists around the globe to preserve tissue by embedding in paraffin. The FFPE process preserves tissues by chemical fixation most often in 10% of formalin and is followed later by embedding in paraffin to form a tissue block for subsequent slicing. The combination of formalin fixation with paraffin embedding allows for long term storage of tissues. Also, FFPE tissues are often used for histopathological studies, a routine process in examination of a patient’s biopsies and clinical material . Moreover, it is known that formalin leads to chemical modification of proteins in the fixed tissues causing cross-linking between proteins and modifications most often as methylation (+14 Da) as well as formation of methylene and methylol adducts to a lesser extent. As a consequence of formaldehyde induced modifications, the molecular weight or physicochemical properties of fixed proteins can be altered.
3. Preparation of the sample for bottom-up proteomics
Protein extraction and the subsequent preparation for LC–MS analysis represents one of the key steps in proteomics (Figure 2). While there have been numerous protocols reported, they have mainly focused on preparation from large amounts (i.e. micrograms to milligrams) of material, which limits their utility in the study of patient clinical samples. Notably, protein extraction from FFPE preserved tissues requires removal of formaldehyde-formed cross links, which is usually carried out by heating samples in a buffered solution at an elevated temperature (95°C or 100°C). The most common buffers used for protein extraction are ammonium bicarbonate, tris(hydroxymethyl)aminomethane (Tris), and Radioimmunoprecipitation assay (RIPA) buffer. Addition of detergents to the buffer composition (e.g. sodium dodecyl sulfate (SDS), sodium dodecyl cholate (SDC), RapiGest SF surfactant™ (Waters), PPS Silent Surfactant™ (Expedeon) have been routinely used to improve protein solubilization efficiency and thus enhance protein extraction. In addition to optimization of the extraction buffers many studies also optimized other parameters like incubation time of the extraction and/or addition of various proteases to improve protein coverage during subsequent LC–MS/MS analysis.
Traditional detergents and chaotropes such as SDS and urea have been widely used for protein solubilization, however they are also well known to inhibit digestion at higher concentrations and are incompatible with reversed phase chromatography separation (RPLC) used to introduce samples for MS analysis. Therefore, their concentration must be kept low at the time of proteolysis in order to preserve the effectiveness of proteases used for protein digestion. Failure to do so often leads to incomplete protein solubilization and denaturation. Also, presence of detergents in the sample might interfere with later instrumental analysis, therefore there have been different purification methods developed for detergent removal to improve LC–MS outcome. The choice of the most effective procedure depends on the physicochemical properties of the detergent. Some of the procedures might include detergent removal on the basis of size exclusion (i.e. molecular weight cut-off filters) or with the use of spin columns containing appropriate resins for detergent removal. Moreover, heating of the sample in urea buffers often leads to covalent modification of proteins via carbamylation, which might affect peptide retention time during RPLC separation and if not accounted for will interfere with identification. In order to circumvent these problems caused by mass spectrometry incompatible detergents significant effort went into development of reagents that avoid these complications. To this end, acid labile detergents such as RapiGest SF surfactant™ (Waters) and PPS Silent Surfactant™ (Expedeon) were developed that could be easily removed after proteolysis by simple measures like decreasing the pH. For example, the MS compatible surfactant ProteaseMAX™ (Promega) surfactant enhances tryptic, chymotryptic and LysC digestion and then degrades during the course of a digestion reaction. Another compound, Invitrosol™ (Thermo Fisher Scientific) is a homogenous surfactant that does not impact tryptic digestion and elutes during RPLC in three peaks well separated from where peptides elute .
3.2 Sample digestion
Classical bottom-up proteomic sample preparation aims to turn protein extracts into peptides via a process of protein cleavage or digestion with proteases. Notably, proteins extracted from biological material tend to keep their native tertiary structure mostly held by non-covalent interactions of amino acid side groups . It is thus essential to disrupt the tertiary structure and linearize the protein sequence to ease the accessibility of proteases to cleavage sites. Protein tertiary structure is frequently disrupted by chaotropic and denaturing reagents. Disulfide bonding contributes to tertiary structure as well via a covalent bond between cysteine side chain groups also termed an S-S bridge. Disulphide bonds are most often broken by use of reducing agents leaving free sulfhydryl groups available that allow the protein to unfold more fully. Dithiothreitol (DTT), tris (2-carboxyethyl) phosphine (TCEP), tris (3-hydroxypropyl) phosphine (THPP) and 2-mercaptoethanol (2-ME) are the most commonly used reducing agents. Sulphur containing reagents such as 2-ME and DTT break the S-S bridge by thiol-disulfide exchange, while phosphorus containing reagents form a phosphine oxide as a result of disulphide bond reduction . Reduction is commonly followed by free sulfhydryl group alkylation to prevent disulphide bond reformation. In this chemistry a free sulfhydryl group performs a nucleophilic attack on the alpha carbon of an alkylating reagent creating a covalent bond between the alkyl group and cysteine. There is a wide palette of alkylating reagents that may be used, but in proteomic sample preparation the most commonly used reagents include iodoacetamide, iodoacetic acid, N-ethylmaleimide (NEM) and S-methyl methanethiosulfonate. Covalent modification of a free sulfhydryl group leaves a mass tag on each cysteine that must be considered as a mass shift to cysteine during interpretation of peptide tandem mass spectra. Alkylated proteins are then further processed by proteolytic cleavage, to shorter segments; peptides, which are then easily detected in a bottom-up experiment carried out by LC–MS/MS analysis. As mentioned above peptides may be produced by enzymatic methods but also chemical methods that can be either specific or unspecific (Table 1). In both cases there are a variety of protocols available to digest proteins into peptides for mass spectrometry-based proteomic analysis.
|Protease||Class||pH range/ion||t [°C]||Cleavage specificity||Example application||Reference|
|Trypsin||Serine||7–8/Ca2+||37||Arg, Lys (C-term)||Primary central nervous system lymphoma|||
|LysC||Serine||8.5||37||Lys (C-term)||Whole liver SDS lysates|||
|LysN||Metalloproteinase||7-9/Zn2+||Thermostable||Lys (N-term)||HEK 293 cells|||
|Chymotrypsin||Serine||8/Ca2+||37||Hydrophobic AAs (C-term)||Cerebrospinal fluid (CSF)|||
|Pepsin||Aspartic||1.5-2.5||37||Preferentially Phe, Leu (C-term)||Human liver tissue|||
|Thermolysin||Metalloproteinase||5.0–8.5 / Zn2+||65–85||Ala, Met, Ile, Leu, Val, Phe (N-term)||Human liver tissue|||
|AspN||Metalloproteinase||6.5–8.0 / Zn2+||40||Asp (N-term)||Brain and liver tissue from C57BL/6 mouse|||
|GluC||Serine||4.0, 7.8||37||Glu, Asp (C-term)||Cerebrospinal fluid (CSF), brain and liver tissue from C57BL/6 mouse||[28, 30]|
|ArgC||Cysteine||7.2–8.0/Ca2+||37||Arg, Lys (C-term)||Cerebrospinal fluid (CSF), brain and liver tissue from C57BL/6 mouse||[28, 30]|
|CNBr||Chemical||—||—||Met (C-term)||Extracellular matrix of human mammary and liver tissue||[29, 31]|
Bottom-up proteomics frequently relies on proteolytic enzymes that digest a protein at specific sites. Having predictable digestion rules for a given protease results in a faster database search process that also makes it computationally less demanding and more accurate. Trypsin is the most common protease in bottom-up proteomics cleaving peptide bonds at the C-terminus of arginine and lysine when not followed by proline . Notably, maintaining an optimal temperature of 37°C at a pH optimum between 7 and 8 in the presence of Ca2+ ions in the digestion buffer is important for the reaction to proceed efficiently . The optimal enzyme to substrate ratio is also important and for trypsin this is often from 1:20 to 1:100 (w:w). In some instances LysC endoproteinase, which is isolated from
GluC, ArgC, LysN, AspN are also popular proteases in bottom-up proteomics as they predictably produce complementary or orthogonal peptides to trypsin with different substrate affinities. GluC is a serine protease isolated from
Broad specificity protease digestion is less common to bottom-up sample preparation, nevertheless it is used to digest rigid protein structures that resist digestion using common proteases. Proteinase K is one such serine endopeptidase isolated from fungus
Thrombin is a serine protease which is proteolytically activated during the clotting process from an inactive prothrombin precursor. It is exclusively specific towards the Leu-Val-Pro-Arg-Gly-Ser motif. Therefore, it is most often used to cleave a specific linker tethered to another peptide with this sequence motif inserted into recombinant fusion protein constructs. There is a wide palette of these type of protein tag removal endopeptidases; namely Factor Xa cleaving Leu-Val-Pro-Arg-Gly-Ser motif, Enteropeptidase cleaving Asp-Asp-Asp-Asp-Lys motif, TEV Protease cleaving Glu-Asn-Leu-Tyr-Phe-Gln-Gly motif, Rhinovirus 3C Protease cleaving Leu-Glu-Val-Leu-Phe-Gln-Gly-Pro motif and several others . Further details of protein tag removal proteases will not be discussed as it does not fall within scope of this chapter.
Finally, it should be noted that reproducible protein cleavage could be achieved even in non-enzymatic reactions mediated by chemical reagents. The most frequent chemical reagents to cleave peptide bond are dilute acids, such as hydrochloric acid, formic acid and acetic acid or other reagents such as cyanogen bromide (CNBr), hydroxylamine and 2-nitro-5-thiocyanobenzoate (NTCB) . Exposure of proteins to dilute acids results in kinetically favored cleavage of peptide bonds at asparagine but with time others as well, while CNBr cleaves at less abundant methionine . NTCB is specific towards cysteine, while hydroxyl amine reagent cleaves peptide bonds at asparagine and glycine. Generally, chemical mediated cleavage targets peptide bonds of less common amino acids producing long peptides useful in middle-down proteomics .
4. Technologies for analysis of limited sample amounts
Given that there is no technology to amplify proteins as may be done for nucleic acids with polymerase chain reaction, historically proteomics has faced limitations in terms of the amount of starting material required for success. Traditional proteomics approaches to sample preparation such as filter-aided sample preparation (FASP), in-gel digestion, and in-solution digestion typically require at least several micrograms of a protein sample, which can be complicated to retrieve from representative clinical samples that are by default limited in availability. Therefore, the traditional method of defining proteomes has generally produced knowledge on the underlying biology that reflect averages rendered from analysis of mixtures of cells of different types present in tissue.
As proteomics and the requisite mass spectrometry instrumentation have evolved, microscale proteomic pipelines that decrease the amount of protein required to sub-microgram levels have become available. Microscale proteomics pipelines rely on modifications of traditional proteomics pipelines frequently accompanied with cell sorting, laser capture tissue microdissection (LCM) or single cell extraction methods. Microdevices such as nano-capillary columns, microfluidic chips, miniaturised ESI introduction interfaces and miniaturised enzyme reactors are often required . Introducing microscale proteomics provides a clearer picture of reality as it substantially increases sensitivity, spatial proteome resolution and leads to better understanding of how protein networks coincide on microscopic level. Despite obvious benefits, microscale proteomics still requires special instrumentation making implementation of these protocols for the moment some what difficult across laboratories worldwide.
One recent promising such technology is nanoPOTS (nanodroplet processing in one pot for trace samples) (Figure 3A). The nanoPOTS platform is intended for processing small cell populations in nanoliter volumes. NanoPOTS benefits from downscaling the processing volumes that in turn substantially reduces surface associated sample losses. The final step of nanoPOTS is accompanied with solid phase extraction (SPE) that concentrates, desalts and efficiently introduces a sample to nanoLC fluidics. Recently, a modification of nanoPOTS termed microPOTS was reported that is a more adoptable variant not requiring a robotic platform . It has been reported that nanoPOTS could identify >3000 proteins from 10 cultured mammalian cells, while microPOTS has been reported to reproducibly identify up to 1200 and 1800 proteins from 25 HeLa cells and 50 mm square mouse liver tissue, respectively . Several nanoPOTS modifications have been reported since it was introduced. For example, Zhu et al. claim that a combination of nanoPOTS with fluorescence activated cell sorting (FACS) could detect 670 protein groups from a single mammalian cell . Later a combination of nanoPOTS, nanoLC separation operated at 20 nL/min and Orbitrap Eclipse and Tribrid mass spectrometer led even to a slight increase in sensitivity identifying ~1000 protein groups from a single HeLa cell . Extraordinary low sample requirements predispose nanoPOTS to being useful for LC–MS/MS tissue imaging. Spatially resolved proteomic maps of a mouse blastocyst embedding into placenta have been produced using a combination of nanoPOTS and LCM. The nanoPOTS - LCM combination produced quantitative tissue images for >2000 proteins with 100-μm spatial resolution which substantially outperformed classical protein imaging mass spectrometry (IMS) . The universality of nanoPOTS is well documented in several publications summarising results from pancreas, liver brain tissue thin sections as well as plant samples.
Achieving submicrogram detection limits has also been reached by introducing a carrier proteome to decrease adsorption of the proteome of interest in combination with TMT labelling (Figure 3B). The carrier proteome spike-in helped the method known as Single-Cell-ProtEomics-by-Mass-Spectrometry (SCoPE MS) to overcome extensive losses due to adsorption of proteins to surfaces (e.g. LC columns) while the addition of TMT labelling identifies the carrier and analysed proteomes. Moreover, TMT labels enable relative protein quantitation of multiple samples/conditions per one LC–MS run. The SCoPE MS approach has enabled detection of >1000 proteins from a single mouse embryonic stem cell . Specht et al. further exploited quantitative potential of TMT labels and claimed to reproducibly quantitate >1000 proteins in a SCoPE MS experiment investigating differentiating monocytes heterogeneity .
Introducing on-column immobilised protease digestion (IMER) downscales sample requirements up to the sub-microgram level, especially when combined with miniaturised column diameter. Utilising various nanostructured materials such as nanoporous material, nanoparticles, nanofibers and nanotubes succeeded in IMER nanobiocatalysis as it has led to enzyme stabilisation and increasing apparent enzyme activity per unit mass of immobilisation host . Several sub-microgram proteomic setups combining IMER with downstream microfluidic platforms have been reported [40, 41, 42].
The microfluidic platform termed Open tubular lab-on-column combines LysC and trypsin enzymatic digestion on 20 mm inner diameter (ID) column with on-line connected nano LC–MS/MS system. Open tubular lab-on-column benefits from very narrow capillary ID and IMER column ID that prevent excessive peptide dilution and adsorption to fluidics. The authors detected a biomarker Axin 1 in 10 ng of HCT15 colon cancer cells . Huang et al. characterised 348 proteins from 25 mice blastocysts on a platform termed SNaPP coupling enzymatic digestion on 150 mm ID IMER to nanofluidics . Naldi et al. coupled SCX column-based IMER proteomic reactor to nano-proteomic platform capable of protein capture, reduction, alkylation, digestion and the first dimensional SCX peptide pre-separation followed by LC–MS/MS. These authors claim that the platform performs with as low as 200 ng protein starting material . Moreover, the integrated Proteome Analysis Device (iPAD) couples a 10 port valve, digestion loop and SPE trap column in a microfluidic setup that is intended for micro sample preparation prior to mass spectrometry. The authors claim that the iPAD approach is capable of identifying 813 proteins in approx. 100 Duke’s type C colorectal adenocarcinoma .
Capillary electrophoresis (CE) is an efficient and sensitive separation technique reliably resolving proteins or peptides. Historically, it has been less robust than nanoLC but recently this has begun to change. Specifically, the introduction of CE-ESI interfaces that do not lead to an excessive peptide dilution have made CE-MS applicable in microproteomics . Several reports describe various proteomic pipelines coupling CE to MS. An ultrasensitive electrokinetically pumped nanospray ionization source coupled with CE was able to identify 283 proteins from 80 ng of MCF7 breast cancer cells. Moreover, the detection limit of spiked-in angiotensin II in bovine serum albumin digest was 2 attomole/injection . Although animal proteomics does not fall within scope of this chapter it is worth mentioning that CE-MS input allowed analysis down to 50 ng of
5. Conclusions and future perspectives
Developments in proteomics to identify clinically relevant proteins has been widely used in scientific research. Sample preparation has been considered as one of the key steps during analysis, and as such a variety of protocols to minimize variability and to obtain best sensitivity and protein recovery from the material have been used.
Constant development of technologies that could be applied in a medical context and potentially used for screening of patient samples have been rising in recent years. Technological evolution has also had an impact to provide platforms for proteome screening of limited cell numbers, i.e. some technologies have clearly demonstrated success on the single cell level. Cellular heterogeneity at the cellular level results during tumour development that can confound analysis. Therefore, advancement of the tools for profiling of cellular subpopulations or regions of tumours has great potential to provide novel insight in mechanisms of tumour growth. Moreover, integration of developed tools with machine learning algorithms to discover and map molecules that manifest pathological development will likely lead to a better understanding of mechanisms of oncogenesis and potentially uncover therapeutic targets.
This work was supported by the International Centre for Cancer Vaccine Science, carried out within the International Research Agendas program of the Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund. The University of Victoria-Genome BC Proteomics Centre is grateful to Genome Canada and Genome British Columbia for financial support for Genomics Technology Platforms (GTP) funding for operations and technology development (264PRO).