Finding Needles in Haystacks: The Use of Quantitative Proteomics for the Early Detection of Colorectal Cancer

Colorectal cancer (CRC) is a common and treatable disease if diagnosed early. Current population screening programs are suboptimal, and consequently, there is a need for the development of new methodologies for early diagnosis of CRC. In the past 10 years, unprecedented technological advancements in the field of mass spectrometry (MS)-based proteomics have progressively increased the sophistica-tion and utility of these investigations, leading to the draft mapping of the human proteome. These exciting studies have shaped our mechanistic understanding of the human genome and begun to provide us with a suite of novel biomarkers to predict the onset, progression and severity of many debilitating diseases. Thus, sophisticated MS workflows coupled with revolutionary protein quantification techniques hold promise for the field of MS-based plasma proteomics, particularly valuable in the context of early stage identification of curable CRC. However, within the last 40 years, no new plasma protein biomarkers of CRC have been translated into clinical practice. Here. we discuss the application of proteomic technologies within the field of CRC, highlighting contemporary MS-based plasma proteomic strategies that could be exploited to deliver on the promise of a panel of sensitive and specific plasma-based biomarkers with which to non-invasively detect early stage CRC.


Introduction
Colorectal carcinoma (CRC) is a common form of cancer that is estimated to be responsible for approximately 694,000 deaths worldwide each year [1]. It is the third most common form of cancer in males and the second in females, with an estimated 1.4 million new cases diagnosed annually. The natural history of progression of adenomatous polyps to CRC has been well described by the adenoma-carcinoma sequence, a stepwise process which recognizes that the majority CRC arises from adenomatous polyps (Figure 1) [2]. Colon and rectal cancer is staged from radiological, histopathological and intraoperative findings with the TNM (tumornodes-metastasis) system [3] or historically with the Dukes staging system [4]. The stage at diagnosis correlates to prognosis; the 5-year survival of patients with stage one disease is 90%, stage two is 71%, stage three 53% and stage four is only 14% [5]. Therefore, diagnosis and treatment of early stage disease is associated with significantly better outcomes than late stage disease.
Screening programs aim to detect asymptomatic patients with early stage disease where there is a conferrable survival benefit. Investigations used for screening require appropriate levels of sensitivity and specificity, this is to ensure adequate probability of disease detection and to reject patients without the disease in question. Fecal immunochemical tests (FIT) stool screening tests suffer from low predictive values for CRC and as such, positive tests can lead to unnecessary investigation with colonoscopy and other modalities. When considering the discovery of a biomarker for clinical use, the test must have both high sensitivity and specificity to capture the appropriate patient cohort without falsely reassuring patients. In addition, it must be specific in early stage disease, where the natural history of the disease can be successfully altered by surgical intervention. Currently, the participation in CRC screening programs is suboptimal, particularly given that early diagnosis and subsequent treatment significantly correlate to improved outcomes. Depending on the country or region, and the screening test offered, participation can be as low as 40% of the target population [10]. In the context of this poor compliance and subsequent effects on patient morbidity and mortality, there has been increased interest in the role of plasma-based biomarkers as a screening tool for the detection of early stage CRC.
Early stage screening for CRC is via stool-based tests or endoscopic or radiological investigations. Stool-based tests include guaiac-based fecal occult blood (FOB) tests or fecal immunochemical tests (FIT) [6,7]. Other methods include; colonoscopy, computed tomography colonography, flexible sigmoidoscopy or capsule colonoscopy [8]. Currently, FIT testing is the main method of population based screening for average risk patients as it has 83% sensitivity and 93% specificity [9]. However, the FIT test sufferers from low compliance rates [10]. Colonoscopy is also used for screening and diagnosis but it is a procedure associated with risk, with complications estimated to occur between 0.5-2.8 per 1000 procedures and a mortality rate of 0.007% [11]. Patients who participate in screening programs and  undertake colonoscopy examination have an estimated 90% decreased incidence of colon cancer than those who do not [12]. Early detection of polypoid disease and subsequent removal of polyps therein prevents progression to CRC [8].
Over the last 2 decades, unprecedented technological advancement in proteinbased mass spectrometry (proteomics) has radically changed the landscape of biomarker research [13] (Table 1). This has facilitated the characterization of complex cellular proteomes [14][15][16][17][18][19], research that has identified hundreds of over and under expressed proteins in carcinoma patients using tumor tissue, histological sections, plasma or fecal samples when compared to matched normal tissues [20][21][22][23][24]. Despite this, with the exception of Carcinoembryonic antigen (CEA) and Cancer antigen 19-9 (CA 19-9) [25], no new protein biomarkers have made it into routine clinical practice [21,26,27]. In this book chapter, we have sought to present an overview of the diagnostic and prognostic protein biomarkers of early stage CRC to aid in the development of accommodating future screening tools that will continue to increase the rate at which early stage CRC is diagnosed and treated. We also review the use of contemporary proteomic approaches to address many of the long-standing challenges in the field of human CRC plasma proteomics and speculate on the future clinical applications of these technologies (Figure 2).

Figure 2.
Overview, and advantages and disadvantages of using gel-free quantitative proteomic approaches for the identification of plasma protein biomarkers of early stage colorectal carcinoma.

Proteomic technologies for the identification of plasma proteins of early stage CRC
The use of blood or plasma for screening or diagnosis of CRC is the most attractive non-invasive material available for the identification of clinically relevant protein biomarkers. Most commonly, candidate protein biomarkers of early stage CRC are identified using MS-based proteomics techniques. Below we list the limitations and advantages of the most common sample preparation and proteomics techniques specifically to identify candidate biomarkers in the plasma of early stage CRC. These techniques face a number of limiting factors, which have reduced the utility of proteins revealed by proteomics. Indeed, factors including the extreme dynamic range of proteins within plasma [28], the variability in collection and processing methods [21], preanalytical and analytic processes [29], and the inherent heterogeneity of patient samples [30], have all hindered uniform consent for which biomarkers are the most relevant for use in the setting of early stage disease.
As a small number of highly abundant proteins such as; albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin, fibrinogen, comprise 90% of the human plasma proteome [31], therefore little capacity is left for the identification of lower abundance proteins to be used as early stage markers of CRC [32]. Researchers have thus turned to immunodepletion strategies to enrich for low abundant proteins, resulting in a 25% increase in identified proteins and 4-fold increased enrichment of non-targeted plasma proteins using peptide isoelectric focusing (IEF), followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) [31]. These pioneering studies have paved the way for high-resolution LC-MS/MS studies employed on depleted samples, routinely affording researchers with the capacity to identify 100 s if not 1000 s of plasma proteins during the course of a proteomics investigation [21] (Figure 2).
In the context of proteogenomic approaches to biomarker discovery [33], recent studies have also made some progress in reducing variability during collection and processing, revealing the suitability of human plasma proteins for qualitative and quantitative proteomic analysis after collection and storage for up to 48 hours at room temperature in cell free DNA-optimized blood collection tubes [21]. These tubes have been developed to overcome some of the issues that delays in processing time, temperature, and handling contribute to the deterioration of non-proteinbased biomarkers [34] and now protein biomarkers [21].
Although not yet in widespread use, future studies may show that these cell stabilization tubes reduce plasma contamination by proteins originating from blood cells during collection and storage, thus increasing the reproducibility of proteomics-based biomarker discovery projects (Figure 2).

Gel-based separation platforms
Two-dimensional electrophoresis (2DE) coupled to mass spectrometry is a very accurate and sensitive method of large-scale protein separation using human CRC tissue [35]. The application of this preparative platform, which facilitates the resolution of protein mixtures on the basis of proteins isoelectric point and molecular weight has been extensively employed using CRC tissue [36][37][38]. These techniques can be combined with any analytical MS platform to identify changes in protein abundance between samples. Results of these studies are most commonly validated using orthogonal immunological-based techniques using plasma including; ELISA, flow cytometry, immunoblotting. Recently, two-dimensional fluorescence difference gel electrophoresis (2D-DIGE) was employed on early and late stage CRC plasma samples, identifying apolipoprotein A1 (APOA1) as a potential marker of early stage CRC [39]. Interestingly, this study also showed decreased levels of galectin-7 (GAL-7) in patients with early stage disease compared to healthy controls. CRC tissue examination of GAL-7 revealed 100% negative immunoreactivity implying that it may not might not be originating from the tumor tissues [39].
Gel-based separation approaches coupled to mass spectrometry face significant limitations related to their reproducibility, low sample number capacity, poor resolution of low abundant potential biomarker proteins, poor resolution of highly acidic/basic proteins and of proteins with extreme size or hydrophobicity, and co-migration of multiple proteins in a single spot that renders comparative quantification rather inaccurate [40]. Therefore, more recently researchers have largely focused on gel-free approaches for the identification of biomarkers of early stage CRC.

Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS)
SELDI-TOF MS also known as ProteinChip® technology, is a high-throughput technique that can purify and identify plasma protein biomarkers [41]. The method offers simplicity as proteins are bound to a solid-phase chromatographic surface, which helps protein isolation from crude mixtures, with non-bound proteins being washed away. The remaining bound proteins are mixed with an energy-absorbing matrix such as sinapinic acid (SPA) or α-cyano-4-hydroxycinnamic acid (CHCA) to induce ionization and desorption of the proteins on the surface of the plate. MALDI-TOF MS is then used to generate a unique mass-to-charge ratio (m/z) of the desorbed molecules, which are analyzed as they fly down the TOF tube and detected as peaks in a mass spectrum [42]. The normalized peak intensity is directly proportional to the concentration of the corresponding protein molecule in the sample.
One of the earliest reports of SELDI-TOF MS for the identification of early stage CRC plasma biomarkers identified a four protein peak m/z profile (m/z: 3191.5, 3262.9, 3396.3 and 5334.4) that was able to discriminate CRC from healthy controls with a sensitivity and specificity exceeding 90% [43]. Furthermore, two additional protein peaks (m/z: 9184.4 and 9340.9) were described as being able to discriminate plasma from patients with primary CRC from those with metastatic CRC [43]. In the same year, a similar study employing SELDI-TOF revealed a set of two protein peaks (m/z: 8132 and 4002) that could discriminate CRC from control again with >90% sensitivity and specificity [44]. This study was followed up some years later using an independent, patient case-control series of blood samples collected at multiple sites. However the latter study failed to discriminate plasma from CRC patients from healthy controls using these two protein peaks. Rather, the study identified two new protein peaks (m/z: 3961 and m/z 5200) in CRC plasma compared to healthy controls, again yielding very high sensitivity and specificity [45]. Drift and intensity of m/z were suggested to be responsible for the variation in reproducibility between the studies, an inherent limitation of SELDI-TOF based biomarker discovery projects, mostly underpinned by the wide dynamic range of human plasma.

Chromatographic separation platforms
While the analysis of intact proteins by 2DE is likely to continue to play an important role in comparative studies of the CRC tissue proteome, recent technical developments have heralded a new era in proteomics where the emphasis is placed on peptides rather than on whole proteins. Trypsin-based proteomics is now well recognized at the starting point in any proteomics investigation. Hydrolysis of 7 Finding Needles in Haystacks: The Use of Quantitative Proteomics for the Early Detection… DOI: http://dx.doi.org /10.5772/intechopen.80942 peptide bonds in proteins is achieved using proteolytic enzymes resulting in the generation of an even more complex peptide mixture. However, the smaller size of peptides makes them much more homogenous structures than proteins. This coupled with the continued maturation of nanoscale chromatographic strategies, and the revolution of electrospray ionization MS (ESI-MS) [46] have meant that the rapid and detailed analysis of the human proteome using tryptic peptides is now common place in the MS laboratories of the world [47] (Figure 2).
Tryptic digestion at the whole proteome level increases the complexity of a protein sample, therefore peptide purification techniques including reversedphase high performance liquid chromatography (RP-HPLC) are key for achieving increased sensitivity through flow separating eluting peptides entering the MS over time. RP-HPLC is most commonly used for one-dimensional (1D) peptide purification in proteomics. In RP-HPLC, peptides are generally retained due to hydrophobic interactions with the stationary silica phase. Polar mobile phases, such as water mixed with acetonitrile, are subsequently used to elute the bound peptides in order of decreasing polarity (increasing hydrophobicity). While reversed phase chromatography can be used as the sole separation procedure for moderately complex peptide mixtures prior to LC-MS/MS analysis, it is generally considered to have insufficient resolution for the analysis of more complex mixtures. This reflects the fact that although an MS instrument can perform mass measurements on several co-eluting peptides, if many peptides co-elute, the instrument cannot fragment them all and valuable information is likely to be irretrievably lost. Therefore, 2D-HPLC fractionation strategies including ion exchange chromatography (IEX), strong cation exchange (SCX), hydrophilic interaction liquid chromatography (HILIC), electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) and hydrophilic strong anion exchange chromatography (hSAX) are commonly employed prior to RP-HPLC. These 2D approaches were used in the draft mapping of the human proteome [48,49], and continue to be a key preparative step in the successful application of whole proteome based investigations.

Isobaric labeling
In addition to their utility in building an in-depth understanding of the CRC plasma proteome, gel-free strategies have also proven to be particularly amenable for use in comparative profiling applications. Indeed, since peptides are inherently less variable than their parent proteins, it has been argued that they constitute a more reliable basis for quantitative comparisons. This property has been exploited for the development of a suite of isobaric-tag based labeling strategies, which facilitate the simultaneous comparison of complex proteomic mixtures using different sample populations. The most common of these approaches used in plasma introduces a isobaric tag covalently bound to the N-terminus and side chain amines of plasma peptides [e.g. isobaric tags for relative and absolute quantitation (iTRAQ ) [50] and tandem mass tags (TMT)] [51]. Each of these approaches allows for relative quantification between samples based on the intensities of the reporter produced by precursor-ion fragmentation in the low m/z region of spectra. In each technique, the isobaric tags possess identical chemical properties to ensure similar behavior during chromatographic peptide separation and MS1 applications, but thereafter present as an easily distinguishable mass difference. As such, chromatographic separation platforms have become viable alternatives to 2DE for the differential analysis of complex protein mixtures [52,53]. The MS operates in Data Dependent Acquisition (DDA) mode so that during each duty cycle the MS cycles through a short survey scan of the eluting peptides or precursor-ions, then a series of n (~10-15) MS2 scans, during which each of the precursor-ions are isolated, fragmented and their fragment-ions are detected. Database searching is then performed on the MS2 fragmentation spectra and used to identify the sequence of their MS1 parent peak. Limitations in this technology underpin some of the variation seen in MS based biomarker studies since MS2 spectra rarely allow unambiguous identification of the precursor-ions. Nevertheless, the application of quantitative DDA (iTRAQ ) to investigate a panel of 10 CRC plasma samples revealed Orosomucoid 2 (ORM2) to be elevated compared to 10 healthy control samples [54]. ORM2 expression was confirmed in CRC tissues compared with corresponding adjacent normal mucosa; however no significant association between ORM2 concentrations and TNM stage or histological grade was shown. Nevertheless, an interesting finding to arise from this work was that plasma levels of ORM2 were higher in patients with inflammatory bowel disease, than in patients presenting with either a normal colorectum, hyperplastic polyps, or adenoma [54]. Thus, ORM2 appears to function in modulating the activity of the immune system, potentially mediating escape from immune recognition; an important first step during transformation.
A recent study by our group assessed whether the plasma samples of CRC patients stored in specialized blood collection tubes (e.g., PAXgene or STRECK; referred to as "BCT"), designed to reduce plasma DNA (pDNA) contamination and enhance low-abundance DNA target detection, was amenable for comparative and quantitative proteomics [21]. Eight patients with Stage I-IIA, and one patient with Stage IIIB were collected pre-and post-resection, in both BCT and EDTA tubes, and subjected to comparative and quantitative analyses using TMT. Of the 641 unique proteins identified across all samples, 184 proteins showed ±0.5 log 2 fold-change in peptide abundance pre-versus post-operation. Label-free targeted proteomics validation using parallel reaction monitoring (PRM, discussed below) showed the most well recognized blood marker of CRC, CEA, was significantly more abundant pre-compared to post-operation in patients with early stage disease when collected and stored in BCT prior to MS. The same trend was also seen for gelsolin (GSN), structural maintenance of chromosomes protein 1B (SMC1B), E3 ubiquitin-protein ligase SHPRH (SHPRH), and semaphorin-3C (SEMA3C), highlighting the importance of preanalytical considerations during biomarker investigations using proteomic-based techniques [21].

Label-free quantification
Label-free mass spectrometry has recently emerged as a quantitative tool for the analysis of CRC plasma proteins. In the absence of isobaric-tagged based modifications, this rapid, low-cost technology relies on a workflow in which individual samples are analyzed (e.g. by LC-MS or LC-MS/MS) separately prior to protein quantitation via precursor ion (intact peptide) signal intensity or via spectral counting. The development of high-resolution accurate mass time-of-flight (TOF), and Orbitrap MS facilitates the extraction of precursor ion peaks at the MS1 level, permitting identification based at MS2 level ( Table 1). The m/z ratios for all ions are detected and their signal intensities at a particular chromatographic retention time recorded. Owing to the tight correlation between signal intensity and ion concentration, relative peptide levels between samples can be determined directly from these peak intensities. Similarly, spectral counting exploits the strong correlation between protein abundance and the number of MS/MS spectra. This approach involves counting the number of peptide-specific spectra identified in different biological samples and the subsequent integration of these data for all measured peptides of the protein(s) that are quantified. 9 Finding Needles in Haystacks: The Use of Quantitative Proteomics for the Early Detection… DOI: http://dx.doi.org /10.5772/intechopen.80942 Examples of the application of label-free based proteomic profiling in the context of CRC include comparative analyses of the plasma samples from a cohort of 118 CRC patients compared with 96 healthy controls [55]. This study reported the identification of 373 plasma proteins, with 69 linked to CRC. Of the 69 CRC associated proteins, 2 proteins; Macrophage mannose receptor 1 (MRC1) and S100A9, were verified as being upregulated in CRC by immunoblot analysis and proved effective in identifying CRC from healthy controls with high accuracy, using ELISA analyses [55].

Multiple and parallel reaction monitoring (MRM/PRM)
Targeted proteomics, using multiple (MRM) (also known as; selected reaction monitoring or SRM) [56] or parallel reaction monitoring (PRM) [57] technologies enables absolute quantitation of multiple peptides per chromatographic experiment by exploiting the unique capabilities of triple quadrupole (QQQ ) and quadrupole Orbitrap MS and the unique characteristics of the targeted peptides. Analysis is performed by the acquisition of selected events across the chromatographic retention time, of predefined pairs of precursor-ion and product-ion masses for MRM, or individual precursor-ions for PRM. The technique becomes an absolute quantitation tool by spiking isotopically labeled synthetic peptide(s) into the complex sample of interest, which act as internal standards for any peptide(s) of interest. The labeled peptide standards are designed to mimic those generated by tryptic sample digestion, differing by only a few Daltons dependent on the isotopic label used. This enables endogenous and isotope-labeled peptides to be subjected to targeted MS/MS analysis and differentiated by the unique MS2 mass spectra provided by the isotopic label. MRM assay development and optimization are key elements for this method of targeted quantitation. This is somewhat mitigated using PRM-based targeted proteomics, owing to the high-resolution mass accuracy of quadruple Orbitrap MS for precursor-ion selection and the monitoring of all MS2 fragment-ions used for quantitation in parallel.
High-throughput targeted proteomics using MRM in immunodepleted blood plasma has previously been employed to measure the abundance of large numbers of candidate CRC plasma proteins using 137 [23] and 1045 [20] confirmed CRC patients. These powerful studies highlight the capabilities of current MS technologies. Indeed, no less than 187 and 392 candidate marker proteins were simultaneous monitored, respectively. These analyses have aided in the development of candidate panels of plasma protein markers that can be monitored simultaneously to identify CRC in the symptomatic population [20,23].

SWATH MS
Sequential windowed acquisition of all theoretical fragment-ion mass spectra (SWATH-MS) is a quantitative MS approach heralded as among the most important recent developments in proteomics research [58]. Driven by the recent advances in speed and sensitivity of the new generation of high resolution Triple-TOF MS, these technologies afford the ability not only to determine which proteins are present in the proteome, but also to accurately quantitate without the need for label-based methods, or by limited numbers of targeted peptides. This is due to the lower duty cycle of a Triple-TOF MS compared to an Orbitrap-based mass analyzers [59]. SWATH-MS operates in Data Independent Acquisition (DIA) in which all ions within a selected m/z range are fragmented and analyzed in a second stage of tandem mass spectrometry. In combining the unique advantages of traditional DDA (high-throughput) and MRM (high reproducibility and consistency) technologies, SWATH-MS can be deployed for both discovery and quantitation of all detectable peptides present in complex biological samples.
SWATH-MS also affords the added advantage that it does not rely on prior knowledge of the precursor peptide ions, instead acquiring information in a DIA manner and thus avoiding laborious assay development. The SWATH-MS workflow involves two key steps beginning with the generation of a spectral library (e.g. via conventional LC-MS/MS) through which acquired peptides are identified. During this acquisition mode, the mass spectrometer is programed to step within 2-4 s cycles through a set of precursor acquisition windows covering the mass range accessible by a quadrupole mass analyzer and also that in which most tryptic peptide precursors should fall (400-1200 m/z). During each cycle, the mass spectrometer fragments peptide precursors and records a complete, high accuracy fragment ion spectrum for all precursors that elute on the chromatograph. This is then followed by acquisition of SWATH-MS data for each sample under analysis, interrogation and matching against the spectral library to identify peptides, and finally extraction of specific peptide ions to enable area-under-the-curve quantitation between samples.
The first SWATH-MS study of CRC plasma also simultaneously assessed protein biomarkers from pancreatic cancer, lung cancer, prostate cancer, and ovarian cancer, all from patients diagnosed with early forms of these diseases. This sophisticated study employed sample enrichment and subsequent detection of tissuespecific secreted protein profiles via SWATH-MS. These data were used to generate a digital representation of the proteins from within each plasma sample that could be queried for the presence and quantity of specific peptides using a targeted data analysis [60]. Tumor specific biomarkers were detected for individual cancer types, as well as a common biomarker Thrombospondin-1 (THBS1), which was significantly altered in the blood of four of five carcinomas (CRC, lung, prostate and ovarian) [61]. These ground breaking studies highlight the potential of the new generation of analytical MS techniques for the detection of early stage.

Overview of biomarkers of colorectal adenocarcinoma
Plasma biomarkers used in clinical practice include Carcinoembryonic antigen and Cancer antigen 19-9, however these investigations have limited use in early diagnosis of CRC [6]. A variety of plasma and histological biomarkers of early and late stage CRC including heat shock proteins, matrix metalloproteinases, complement component proteins, Annexins and S100 proteins are discussed and summarized in Table 2.

Carcinoembryonic antigen
Carcinoembryonic antigen (CEA) is a cell-surface high molecular weight glycoprotein important for cell adhesion, discovered in 1965 by Gold and Freedman as a component of human colon carcinoma and foetal tissue [62]. The production of CEA typically ceases at birth and it is present in very low concentrations in healthy patients. It can, however, be elevated in CRC, other types of cancer and non-malignant conditions [63]. CEA is one of the most commonly used biomarkers of CRC worldwide, however its sensitivity for the detection of CRC is not good enough to be useful as a diagnostic tool, with plasma elevation >5 μg/L in Dukes type A, B, C and D reportedly 3, 25, 45 and 65% respectively [63,64] . Limited evidence supports a role of CEA as a marker of CRC prognosis and recurrence; its sensitivity as an indicator of recurrence is estimated to be 80% [65] and post-operative elevation is particularly sensitive for the detection of hepatic and retroperitoneal metastases. for RT-PCR and IMMUNOBLOT analysis: Nm23-H1, S100A8, S100A9, Adenosylhomocysteinase.
Semi-quantitative PCR and IMMUNOBLOT analysis. IMMUNOBLOT analysis preformed for plasma samples. S100A8 and S100A9 were significantly increased in the plasma of CRC and colorectal adenoma patients compared to controls. [36] Synopsis of biomarker identification including sample type, pre-analytic workflow and MS technique (where applicable).

Cancer antigen 19-9
Cancer (or Carbohydrate) antigen 19-9 (CA 19-9) is a clinical biomarker used in various diseases. Elevation can occur in benign conditions such as biliary and pancreatic disease, pulmonary disease, renal failure and autoimmune disease as well as malignant conditions of the pancreas, colon, rectum, liver, ovary and lung. CA 19-9 is therefore considered a non-specific biomarker of CRC [66] and is a classical marker for late stage disease and metastasis. For this reason it is not appropriate for use as a screening, or diagnostic, marker of carcinoma [67]. CA 19-9 can be used in tandem with CEA for post-operative monitoring to detect recurrence, or as a prognostic indicator as pre-operative elevation without correspond elevation of CEA is associated with a poorer 5-year survival [68]. When the combination of preoperative elevation of both CEA and CA19-9 occurs, this is predictive of increased cancer mortality compared to non-elevated pre-operative levels [69].

Heat shock proteins
Heat shock proteins (HSP) are a type of stress-inducible protein that are present in all organisms [70] and their cells at low levels in normal physiological conditions. They have been functionally linked to cell apoptosis, protein homeostasis, cell growth mediation as play an important role during fertilization [70][71][72][73][74][75][76]. HSPs also function as chaperones, and act in protein assembly and unfolding. Various member of the HSP family have been postulated to have roles in antigen presentation and as a chaperones of peptides to major histocompatibility complex class I and class II [75,76]. HSPs are typically classified into five subunits or families according to their molecular weight; Large HSP (HSP110, glucose-regulated protein 170), HSP90, HSP70, HSP60 and small HSPs (HSP27, HSP40). Significant research has focused around the role of HSPs in disease progression and on their role as therapeutic targets and as biomarkers.

HSP27
HSP27 is a member of the small HSP family, it has an anti-apoptotic role and acts as a chaperone to prevent misfolded protein aggregation. It is considered to be modulated by mitogen-activated protein kinase through phosphorylation. Abnormal HSP27 expression has been demonstrated in various cancer types, including ovarian, prostate, breast and colon cancer, as well as non-malignant conditions such as neurological and cardiovascular disease [76]. The overexpression of HSP27 in histological colon and rectal cancer samples was assessed in a large cohort of 404 patients with 2DE and tandem mass spectrometry (MS/MS) combined with a large validation set using tissue microarrays (TMA). The authors found that overexpression of HSP27 was present in both colon and rectal cancer and associated with poorer cancer-free survival in the rectal cancer cohort [77]. Furthermore the use of immunohistochemistry (IHC) and TMA analytical approaches has revealed that high HSP27 and HSP70 are associated with poorer clinical outcomes in primary resected CRC [78].

HSP40
HSP40 is also a member of the small HSP family and act as co-chaperones to HSP70. This family are further subdivided into DNAJA, DNAJB, and DNAJC; subgroups that have been shown to participate in both tumor progression or conversely, in tumor suppression in different types of cancer [75]. HSP40 overexpression in CRC has been demonstrated (along with HSP70) in 50 histological samples using IHC and immunoblotting [79].

HSP60
HSP60 is a chaperone protein with functions including transport and mitochondrial protein folding. This protein has been associated with a wide range of cancers including prostate, breast, cervical bladder, hepatic and CRC [71]. HSP60 is also elevated in non-cancer conditions such as chronic hepatitis and liver cirrhosis [80]. 2DE coupled to LC-MS-MS/MS using 28 histological adenocarcinoma samples and a 789 patient IHC validation set revealed that HSP60 was overexpressed along with S100A9 and translationally controlled tumor protein (p < 0.001 for each) [81]. This study also identified the beta subunit of 14-3-3 as a prognostic marker [69]. Additionally, IHC and immunoblot were used to demonstrate, in histological samples of 44 patients, that HSP60 was elevated in tumor tissues and there was a significant association between HSP60 levels, tumor differentiation, and tumor stage [82]. Comparison of colonic tumor samples and matched normal tissue confirmed overexpression of HSP60 (3.25-fold change ratio) with 2D-DIGE and immunoblotting) [83]. The authors of this study also developed an immunoassay for serum HSP60 detection, confirming a statistically significant elevation in serum HSP60 levels in CRC compared to controls (P = 0.0001) [83].

HSP70
HSP70 has 13 subgroup family members. It is associated with cytosolic calcium level homeostasis and, inhibition of HSP70 expression, has been shown to stimulate release of intra-cellular calcium in cell culture. Calcium induces cell death by the caspase dependent mechanism in CRC cell lines, and functions in the stabilization of lysosomes and inhibition of apoptosis [84]. Importantly, in other types of cancer such as pancreatic and prostate cancer, HSP70 has been shown to upregulate cell survival [84]. In a study of 33 CRC patient plasma samples, using ELISA assays, serum levels of HSP70 were significantly elevated (≥2.25 ng/ml) in cancer patients compared to healthy controls, The sensitivity and specificity of elevated serum HSP70 in the CRC group was reported as 96.77% and 96.96% respectively [85]. It has been further demonstrated using ELISA testing that high serum concentration of HSP70 is associated with increased mortality (p = 0.005) [86]. Additionally the use of immunostaining has shown that mitochondrial HSP70 overexpression correlates to poor survival (p = 0.04) [87]. Independent IHC analyses of 81 primary CRC tissues revealed that HSP70, as well as HSP110, overexpression is associated with highly advanced clinical stages and positive lymph node involvement [88].

HSP90
HSP90 activates Hypoxia-inducible factor-1 and Nuclear Factor-κB which in turn regulate epithelial to mesenchymal transition, invasion and motility of CRC [89]. HSP90 has been shown in various studies to be overexpressed in CRC and may serve as a potential biomarker for CRC. In a small study of histological adenocarcinoma samples with an iTRAQ labeling method and QStar LC-MS/MS approach, a total of 82 altered proteins were found in CRC patients, which included overexpression of HSP90α and significant downregulation of Gelsolin. The results also suggested that HSP70 had decreased expression in the same samples [90]. Further validation using immunoprecipitation, MALDI-TOF-MS and immunoblotting confirmed that HSP90α is overexpressed in tumor cells and is correlated with poor prognosis and metastatic disease [91]. Plasma HSP90α serum levels were also significantly elevated in an analysis of 77 CRC patients compared to controls [92], thus highlighting the potential biomarker utility of this protein.

Matrix metalloproteinases and their tissue inhibitors
Matrix metalloproteinases (MMPs) are a diverse class of at least 25 zinc-dependent endopeptidases, which have important physiological applications and have also been implicated in the invasion, progression and metastasis of CRC. Accordingly, MMPs have been implicated as therapeutic targets, diagnostic and prognostic biomarkers. MMP subclasses have been demonstrated in various types of cancer, including breast and melanoma, and therefore are not cancer specific biomarkers [93].

MMP1 and MMP13: collagenases
MMP1 functions to degrade type I, II and III collagen. MMP13 is structurally similar to MMP1, and likewise it also cleaves collagens, as well as degrading extracellular matrix proteins including fibrillar collagen, fibronectin, tenascin C and aggrecan core protein 1 [94]. Demonstration by immunostaining of 133 CRC samples showed that MMP1 expression was significantly correlated with hematogenous colorectal metastasis [95]. Increased expression is also associated with poor prognostic factors such as invasion level, lymph node and hepatic metastasis [96]. Similarly, MMP13 overexpression in CRC has also been shown to be associated with poor prognosis [93].

MMP2 and MMP9: gelatinases
The gelatinase group of MMPs also function to degrade the extracellular matrix; their main substrates being collagen and gelatin. Overexpression of MMP2 may promote CRC invasiveness due to its degradation of β1 integrins, thereby enhancing motility and decreasing cell adhesion [97]. Quantification of tumor, normal tissue and plasma samples using ELISA in 72 patients identified upregulation of MMP1, MMP2, MMP3 and MMP9 in carcinoma. MMP2 overexpression was also significantly associated with lymph node metastasis [98].

MMP7: matrilysin
MMP7 promotes tumor invasion by proteolytic cleavage of extracellular matrix proteins such as proMMP2 and proMMP9, and it is also involved in cellular proliferation and apoptosis regulation. Overexpression of MMP7 is found in 80% of CRC [99], and is associated with poor prognosis. This protein has been shown to have a sensitivity of greater than 92% to identify colonic adenomas in mouse models, Additionally mouse models have implicated overexpression of MMP7 in tumourigenesis [100,101] whilst in humans, MMP7 has been implicated in progression of adenoma to carcinoma. Accordingly, MMP7 has been demonstrated in numerous studies using IHC to be overexpressed in adenoma and various stages of carcinoma [102][103][104].

MMP12: metalloelastase
MMP12 is predominantly expressed in macrophages and degrades a wide range of substrates. MMP12 levels have been shown to be overexpressed in CRC, however this increased expression is associated with decreased risk of hepatic metastasis and decreased vascular endothelial growth factor expression [105,106]. It is therefore postulated that MMP12 may have a protective role; a notion supported by a range of pro-tumourigenic effects being recorded following MMP12 inhibition [106,107]. Conversely, along with MMP7 and MMP10, elevated serum levels of MMP12 have been suggested to be associated with poor CRC prognosis [108].

Tissue inhibitors of metalloproteases
Tissue inhibitors of metalloproteases (TIMPs) have been implicated in tumourigenesis. TIMP1 overexpression is associated with advanced stages of CRC [109]. IHC studies have demonstrated a significant correlation between TIMP2 expression in inflammatory cells, increasing tumor size, lymph node involvement and presence of metastasis [110]. TIMP3 has been described as independent prognostic marker for CRC, where strong cytoplasmic staining has been associated with longer survival in rectal cancer patients [111].

Annexins
Annexins are phospholipid-binding membrane-binding calcium regulated proteins from a multigene family. They function in membrane processes such as structural control as well as cell transport and as linkers between membranes, or between membranes and cytoskeleton as well as calcium regulated exocytosis. In humans the annexin family consists of subfamilies; A1-A11 and A13 [112]. The sensitivities of annexins A3, A4, and A11 peptides for detecting early-stage CRC have been reported to exceed those of CEA, and as such these peptides are promising biomarkers for early detection of CRC [26].
A shotgun proteomics analysis (LC-MS/MS) of extracellular vesicle proteins with selected reaction monitoring performed on CRC cell culture lines has demonstrated annexin A3, annexin A4, and annexin A11 overexpression, particularly in early stage CRC patients. Reported sensitivities of annexin A3, A4, A11 for stage one disease are in the range of 82.1-85.7%, and for stage two disease between 89.3-96.4% [26], therefore highlighting a potential role for these annexins as an early stage disease biomarker. Notably, the same study reported the sensitivity of CEA for early stage disease to be as low as 38.8% [26]. Importantly, progressive increases in annexin A3 abundance have been shown to strongly correlate with disease progression from normal tissue, to adenoma and finally to carcinoma [26].
Confirmation of Annexin A2 overexpression in a small cohort of histological samples has been described using a 2D-LC-MS/MS approach with iTRAQ labeling; with results being validated with immunoblot and IHC [90]. Conversely a study examining serum levels of Annexin A2 found that the protein was significantly lower in CRC patients compared to healthy controls; Annexin A2 levels were also inversely related to tumor size and stage [113]. In addition to Annexin A2, altered Annexin A4 expression has been demonstrated in CRC via the application of a label free LC-MS/MS approach, with validation in CRC serum samples confirming its overexpression and thus potential as a biomarker of the disease [114]. Annexin A10 is not frequently overexpressed in CRC with an estimated elevation being recorded in only 5.8% of patients. However, it too is associated with poor overall survival and poorer progression-free survival particularly in late stage cancers. As such, Annexin A10 may be considered as a prognostic marker when present [115], similarly annexin A13 expression is associated with lymph node metastasis, however it is not associated with tumor stage or differentiation [116].

Complement component C3
Complement component C3 (C3), and its fragment C3 anaphylatoxin (C3a), overexpression has been demonstrated in fecal, serum and histological samples from CRC patients. C3 is also a component of the innate immune system, with functions including promotion of phagocytosis, local inflammatory responses and aiding in the adaptive immune response. C3 may also have a role in host cell damage when up regulated and aid in foreign pathogen invasion [117]. C3 overexpression in stool samples of CRC was demonstrated in two different cohort studies [118,119], the second also showing a down-regulation of Proteinase 3 (PRTN3) and ataxia-telangiectasia mutated protein (ATM). Elevated levels of C3a overexpression were further demonstrated in serum samples using SELDI-TOF-MS and validated with MS and ELISA; the authors reporting a sensitivity of 96% and specificity of 96.21%. They also found C3a to be increased in the serum of 81.6% of adenomas [120].

Complement component C9
Complement component C9 (C9) is a constituent of the complement system that has important functions in the innate immune system. It is a terminal constituent of the membrane attack complex (MAC) and thereby aids in immune system response to cell death [121]. Changes in C9 expression have been described in both fecal and plasma samples, in a series of 315 stool samples using a combination of LC separation, LTQ-FT hybrid MS and QE-Label free-MS; C9 and C3 in addition with S100A8, S100A9 were found to be overexpressed [118]. A UHPLC-LC-MS approach and plasma-based immunoassay using 187 proteins previously described in the literature, demonstrated significant elevation of C9 in CRC plasma samples [23]. Similarly, an analysis of 31 CRC plasma samples revealed overexpression of C9 compared to healthy controls as well as reduced expression of Apolipoprotein AI [122].
3.7 S100 proteins S100 are a family calcium-binding proteins, which consists of 24 members subdivided into three groups; broadly those with intracellular regulatory functions only, extracellular functions only and those with both intracellular and extracellular functions [123]. The proteins, S100A8 and S100A9, form a heterocomplex that is postulated to function in myeloid differentiation, cell transport, nuclear factor interaction and calcium related phagocytosis [123]. Mouse models have demonstrated accumulation of S100A8/A9 positive cells in areas of dysplasia and adenoma as well as promotion of MAPK and NF-κB activation signaling pathways [124]. IHC staining has demonstrated overexpression of S100A8, S100A9, Adenosylhomocysteinase (AHCY) and Nm23-H1 in CRC tumor cell cytoplasm, in the same study, S100A8 and S100A9 were also significantly increased in the plasma of CRC patients [36]. S100A8 has also been shown to have increased expression at progressive CRC stages (Duke's A-D) compared with controls [125]. Minichromosome maintenance complex component 4 (MCM4) and S100A9 overexpression have also been shown in proximal colonic fluid mouse proteome, using label free MS [126]. The same study identified Chitinase 3 like 1 (CHI3L1) protein overexpression in adenomas and advanced adenomas and CRC, the overexpression was further confirmed to be present in the serum of all three patient subtypes compared to controls [126]. A 2DGE LC-MS/MS based analysis of Dukes stage B CRC also identified S100A9, HSP60 and TCTP as overexpressed proteins. In addition to histological and plasma samples, S100A8 and S100A9 have been shown to be overexpressed in fecal samples also using a LC-MS/MS approach [118,119]. Additionally, S100A11 has been identified among a cohort of 23 upregulated proteins in CRC samples using a combined targeted LC-MS/MS and SRM approach [22].

Conclusions
The development of non-invasive modalities with high patient compliance that can unequivocally detect and diagnose early stage CRC will afford the greatest opportunity for early intervention strategies to treat asymptomatic patients and ultimately improve the survival of patients with CRC. However, current screening methods are inadequate and there remains a pressing need to establish reliable biomarkers of early stage CRC disease. The resolution that is now achievable with advanced quantitative MS-based proteomic workflows and instrumentation hold the promise of unlocking the secrets of early stage disease that could be exploited to prevent or cure CRC. However, the inherent issues that have plagued MS-based biomarker discovery projects over the last 20 years moderate optimism. Sample size, particularly in the early stage setting, coupled with the wide dynamic range of blood plasma and the observed low concentrations of early stage specific individual protein biomarkers and the lack of reproducibility of MS investigations have meant that no new biomarkers of early stage CRC have entered the clinical setting since the discovery of CEA.
Over the coming years, the limitations of most current MS-based biomarker discovery projects will be resolved, mostly thanks to the recent developments in sophisticated techniques and technologies that not only simplify pre-analytical issues but address analytical limitations. Improvements in sample preparation techniques that potentially do away with immunodepletion, or the enrichment techniques that are currently absolutely necessary for the successful implementation of MS-based plasma biomarker investigation, will increase the reproducibility of future projects [127]. Analytic techniques that employ wider MS/MS windows for the simultaneous detection and quantification of low-abundant potential biomarkers using SWATH-MS strategies are important developments that will continue to arm our ever-evolving arsenal of MS technologies and resolve the issue of both detection and quantitation of low-abundant potential marker of early stage disease.
It is likely in the age of proteogenomics, that the greatest increase in resolution of early stage disease markers will come from the high-throughput simultaneous detection and quantification of protein and non-protein based biomarkers. Indeed, the combination of ctDNA and protein biomarkers in patient plasma with resectable pancreatic ductal adenocarcinomas showed a staggering 99.5% specificity, providing hope of early stage diagnosis for one of the most aggressive forms of gastrointestinal cancer [128]. Non-invasive blood tests combining non-protein and protein biomarkers represents an exciting approach for the early detection of any cancer type and holds the greatest potential for the increased survival of CRC patients worldwide.
© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.