Advances in Molecular Subclassification of Colorectal Cancer

This chapter will highlight the advances made in our understanding of the molecular landscape of colorectal cancer (CRC) via the development of molecular subclassification systems and their potential predictive and prognostic utility. Firstly, the comprehensive integrative analysis of 224 colorectal cancer samples performed by The Cancer Genome Atlas (TCGA) Research Network will be described highlighting the potential therapeutic targets identified. The development of molecular subclassification systems primarily via gene expression profile analysis by independent groups will also be described, and their potential clinical and prognostic associations will also be discussed. The chapter will then go on to describe the four consensus molecular subtypes of colorectal cancer which were proposed by an international consortium who applied unsupervised clustering techniques to the independent classification systems previously described. The clinical and prognostic associations of these four subtypes have been explored, and these findings will be discussed. Finally, the utility of molecular subclassification in colorectal cancer will be briefly explored.


Introduction
Colorectal cancer (CRC) was one of the earliest molecularly characterised solid tumours. Vogelstein et al. initially described the stepwise manner of adenoma formation to carcinoma via the accumulation of genetic and epigenetic events in the late 1980s [1]. This model provided insight into how driver alterations in the main oncogenes (KRAS, NRAS, BRAF and PI3K) and tumour suppressor genes (APC, TP53 and PTEN) were implicated in the biology of CRC [2]. The accumulation of these genetic mutations leads to carcinogenesis through deregulation of key pathways involved in cell proliferation, differentiation and apoptosis. It is now known that abnormalities of the Wnt signalling pathway are almost ubiquitous in sporadic CRC and usually arise from mutations of the APC gene [3].
Further to this, genetic and epigenetic exploration of CRC subsequently identified significant molecular heterogeneity in this disease. This was clinically evident by the differing responses to systemic therapy and varying clinical course of patients with the same stage of tumour. Biomarker discovery in CRC has arisen through the analysis of responders and nonresponders to targeted agents and the subsequent discovery of RAS mutations conferring resistance to anti-epidermal growth factor receptor (EGFR) therapies. More recently, our deeper understanding of the underlying biology of CRC has also revealed that clonal, stromal and immune characteristics of tumours are important when considering therapeutic targets. The ongoing need to accurately define molecularly distinct subgroups and identify the underlying genetic drivers as well as novel therapeutic targets within each subgroup in order to rationalise drug development continues to be of paramount importance in CRC.

Early molecular characterisation of colorectal cancer
It is now well established that the majority of sporadic CRC cases (85%) exhibit chromosomal instability (CIN) with changes in chromosome number and structure such as deletions, gains, translocations and amplifications. CIN is associated with inactivating mutations or losses in the APC tumour suppressor gene which occurs early in the adenoma-carcinoma sequence [3]. The remaining 15% of sporadic CRCs demonstrate microsatellite instability (MSI) through changes in the number of repeats or length of microsatellites. MSI arises through defective DNA mismatch repair (MMR) mechanisms caused by epigenetic silencing of the MLH1 gene by promotor hypermethylation [4]. Epigenomic studies have shown that MSI tumours have a high CpG island methylator phenotype (CIMP-H) which involves aberrant methylation of CpG-rich gene promoter regions. This leads to silencing of expression of critical tumour suppressor genes such as MLH1, thereby leading to the development of CRC [5]. Familial syndromes, such as Lynch syndrome/hereditary non-polyposis colorectal cancer syndrome (HNPCC), occur through germline mutational inactivation of genes encoding MMR proteins, namely, MLH1, MSH2, PMS2 and MSH6.
Clinicopathological features and the mutational status of CRC tumours differ according to the above classification. Sporadic MSI-high (MSI-H) tumours are more likely to be right-sided (proximal), poorly differentiated, mucinous and associated with tumour-infiltrating lymphocytes (TILs) and have higher rates of BRAF mutation, whereas microsatellite-stable (MSS) tumours are more frequently left-sided (distal) and have higher rates of KRAS mutation [5].
It has been shown that MSI status has both a prognostic and a predictive role in CRC. MSI-H tumours have better stage-adjusted survival (in stages I-III) when treated with surgery alone and do not derive as much benefit from adjuvant fluorouracil-based chemotherapy as MSS tumours do [4]. In advanced disease, MSI-H tumours are associated with a worse prognosis, and this is due to their association with activating BRAF mutations [6]. It has more recently been shown that MSI status also predicts for significant response and benefit from anti-PD1 antibodies with MMR-deficient tumours exhibiting higher response rates and longer progressionfree survival (PFS) than MMR-proficient tumours [7].

Integrative molecular analysis of colorectal cancer
It is clear that anatomical factors and common DNA alterations are helpful in identifying subtype characteristics in CRC, but they alone are inadequate to define the boundaries between the different molecular entities that comprise CRC. In recent years, many studies have begun to exploit microarray technology to investigate gene expression profiles (GEPs) in CRC; however, no single signature has proven clinically meaningful, especially in regard to predicting prognosis, and studies have been poorly reproducible due to the high molecular heterogeneity that exists in this disease.
In 2012, The Cancer Genome Atlas (TCGA) Research Network produced a comprehensive integrative analysis of 224 colorectal cancer tumour samples with paired normal samples in order to improve our understanding of the biology of this disease and identify potential therapeutic targets [8]. In addition, independent scientific groups also attempted to define intrinsic subtypes of CRC using GEPs in the hope that this will refine the molecular classification of CRC and facilitate clinical translation [9][10][11][12][13][14]. The findings of all of these independent analyses are discussed below.

The Cancer Genome Atlas (TCGA) comprehensive analysis of colorectal cancer
The comprehensive analysis of CRC undertaken by TCGA Research Network included tumours whose clinical and pathological characteristics reflected the usual breadth of features of CRC patients. Tumours were split into two main groups by mutation rate: those that were hypermutated (16%) and those that were non-hypermutated (84%) which seems to match the previously described MSI and CIN groups. The hypermutated group was then subdivided into those caused by defective MMR (dMMR) with a mutation rate of 12-40 mutations/Mb (approximately 13%) and those with an extremely high mutation rate of >40 mutations/Mb (approximately 3%)-the ultramutated group.
Initially, TCGA researchers considered colon and rectal tumours as separate entities due to their known anatomical and therapeutic differences. However, it was found that similar patterns of genomic alteration (copy number, expression profile, DNA methylation and miRNA changes) were seen in both types of tumours, so they were subsequently analysed together within the non-hypermutated group.
Thirty-two genes were identified to be recurrently mutated, and, after removal of non-expressed genes, the hypermutated and non-hypermutated groups had 15 and 17 recurrently mutated genes, respectively (see Table 1).
It was found that the tumour suppressor genes ATM and ARID1A displayed a disproportionately high number of frameshift or nonsense mutations. As expected, KRAS and NRAS mutations were activating oncogenic mutations at codons 12, 13 or 61, whereas the other genes had inactivating mutations. BRAF mutations were the classical V600E-activating mutations [8]. Given the differences in recurrently mutated genes between hypermutated and non-hypermutated cancers, it appears that these tumours progress through different sequences of genetic events. Interestingly, the recent data published by Jones et al. has identified that non-V600 BRAF-mutated advanced CRC represents a molecular subtype with distinct characteristics (which are different to BRAF V600E -mutated CRC) and an excellent prognosis [15]. These patients may not require the aggressive chemotherapy treatment that is beneficial to classical BRAF-mutated patients. It is not yet clear whether non-V600 BRAF-mutated cancers harbour the same resistance to anti-EGFR therapies as cancers with BRAF V600E mutations, but higher frequency of concomitant RAS mutations in this subgroup will have to be taken into account.
TCGA analysis provided further confirmation on the pathways previously known to be deregulated in CRC. The vast majority of tumours in both groups (93% of non-hypermutated and 97% of mutated tumours) had deregulated Wnt signalling, predominantly via inactivation of APC. The MAPK signalling pathway was also commonly activated, as was the PI3K signalling pathway. Inactivation of the TGF-β inhibitory pathway was also seen, resulting in increased activity of MYC. Almost all of the analysed tumours, irrespective of location or mutation levels, exhibited changes in MYC transcriptional targets, highlighting the important role of MYC in CRC development. New findings identified by TCGA included recurrent mutations in FAM123B, ARID1A and SOX9 and very high levels of overexpression of the Wnt ligand-receptor gene FZD10. The SOX9 gene is associated with intestinal stem cell differentiation and has not previously been shown to be implicated in CRC. It has been shown to facilitate β-catenin degradation [16], and its transcription is suppressed by Wnt signalling which is activated by extrinsic Wnt ligands. These findings suggest a number of potential therapeutic targets in CRC, namely, Wnt signalling inhibitors and small molecule β-catenin inhibitors, which are beginning to show initial promise [17][18][19]. In addition, overexpression of the genes ERBB2 and IGF2, which are involved in regulating cell proliferation, were identified thus indicating potential therapeutic opportunities of inhibiting the products of these genes. mRNA expression profiles of a subset of 189 TCGA samples separated the colorectal tumours into three clusters. One significantly overlapped with CIMP-H tumours and was enriched for hypermutated tumours, thereby representing a MSI/ CIMP subgroup. The two other groups were representative of a CIN and an invasive phenotype subgroup.

Intrinsic subtypes of colorectal cancer identified by independent groups
Three molecular CRC subtypes were also identified by Roepman and colleagues ((A) MMR-deficient epithelial, (B) proliferative epithelial and (C) mesenchymal) using unsupervised clustering of whole genome data from 188 CRC tumour samples [9]. These intrinsic subtypes were subsequently validated in a cohort of 543 patients with stage II-III disease. In addition to identifying these subtypes with phenotypes matching those identified via TCGA, prognostic features and chemotherapy benefit characteristics were also investigated in this study. The dMMR subtype A (22%) was found to be epithelial-like and displayed a strong MSI phenotype linked to dMMR and a high mutational rate including activating BRAF mutations. Type A patients exhibited the best prognosis with minimal benefit from adjuvant 5-FU chemotherapy. The mesenchymal subtype C (16%) tumours exhibit epithelial-tomesenchymal transition and show dMMR characteristics. These patients showed a poor baseline prognosis and no benefit from adjuvant 5-FU chemotherapy which is probably linked to their mesenchymal phenotype and low proliferative activity. The proliferative epithelial subtype B (62%) is almost exclusively MSS, BRAF wild type and MMR proficient. They exhibit a relatively poor baseline prognosis but receive the most benefit from adjuvant chemotherapy. This study focused on stages II and III CRC; therefore, further validation of the subtype classification and its clinical relevance on a larger set of stage IV tumours is warranted.
In addition, De Sousa E Melo and colleagues also identified three similar subtypes using over 1100 CRC tumour samples: chromosomal instable (subtype A), microsatellite instable (subtype B) and a third subtype (subtype C) which is largely microsatellite stable and contains relatively more CIMP-H carcinomas but cannot be identified on the basis of characteristic mutations [10]. This third subtype is therefore similar to the third subtype described in the studies above. This subtype was found to be associated with a very unfavourable prognosis as well as resistance to anti-EGFR targeted therapy. It is thought to relate to sessile-serrated adenomas due to a very similar GEP involving upregulation of genes involved in matrix remodelling and epithelial-mesenchymal transition (EMT) which was seen in both. This study therefore suggests that sessile-serrated adenomas and tumours belonging to subtype C possess high malignant potential and need to be clinically managed as such [10].
Further groups have also used GEPs to identify more than three intrinsic subtypes of CRC using large numbers of tumour samples. The biological relevance of the subtypes has been investigated in regard to treatment response and prognosis. Marisa and colleagues utilised a large multicentre cohort of tumour samples from patients with stage I-IV CRC, of which 556 fulfilled RNA quality requirements for GEP analysis [11]. These samples were split into a discovery set (n = 443) and a validation set (n = 1029) which also included 906 samples from eight public datasets. Unsupervised hierarchical clustering was applied to gene expression data which form the discovery subset to identify six molecular subtypes (C1-C6) with distinct clinicopathological features, molecular alterations, enrichments of supervised gene expression signatures and deregulated signalling pathways. In addition to identifying a deficient MMR subtype (C2), three CIN subtypes were shown (C1, C5 and C6): one with downregulated immune pathways (C1), one with upregulation of Wnt pathway (C5) and one displaying a normal-like GEP (C6). The remaining two were comprised of a KRAS mutant subtype (C3) and a cancer stem cell subtype (C4).
As expected, BRAF mutation was associated with the C2 subtype but was also frequent in the C4 CIMP-H, poor prognosis subtype. Although TP53 and KRAS mutations were found in all subtypes, the C3 subtype was highly enriched for KRAS mutant tumours suggesting a specific role for this mutation in this subtype of CRC. The biological relevance of these six subtypes is highlighted by their differing prognoses with the C4 and C6 subtypes being independently associated with the shortest relapsefree survival (RFS). However, the robustness of this gene signature as a prognostic classification requires further confirmation as some established prognostic factors in CRC, such as tumour grade and number of nodes examined, were not available for a significant proportion of cases and thus were not included in the analysis.
Schlicker et al. performed genome-wide mRNA expression profiling on 62 primary CRC samples using an unsupervised iterative approach [12]. Two main groups were identified (type 1 mesenchymal and type 2 epithelial) which were then split into five subtypes which were validated in independent published datasets comprising over 1600 samples. This subtype stratification was successfully aligned to several CRC cell line panels, and it was found that the GEPs defining the subtypes were well represented in these cell lines. Pharmacological response data showed that type 2 cell lines were more sensitive to treatment with aurora kinase inhibitors in keeping with the high levels of expression of aurora kinase A seen in the samples of this subtype. Additional data suggested that subtype 1.2 cell lines were most sensitive to inhibition of Src and also showed a higher sensitivity to inhibition of proteins on the PI3K pathway, GSK3β, PI3K and TOR than subtype 2.1 [12].  [13]. These subtypes showed distinct biological motifs and morphological features as well as differences in prognosis. The subtypes were validated in an independent dataset of 720 CRC expression profiles. Subtype C was enriched for both MSI and BRAF mutations, and its characteristics were in keeping with the described CIMP-H phenotype and hypermutated tumours found in TCGA analysis. This subtype had one of the best outcomes for RFS but the worst outcome in survival after relapse (SAR). Once again, KRAS mutations were found in all subtypes, and this supports the emerging theory that KRAS mutant CRCs are highly heterogeneous and that the oncogenic role of KRAS varies with the specific mutation and molecular background of the tumour in which it occurs [20]. Subtypes C and D were associated with the worst overall survival (OS)-for subtype D this was primarily due to early relapse associated with high EMT gene expression and low proliferation-associated gene expression, and for subtype C, it was the result of short SAR.
Subtypes B and E highly expressed canonical Wnt signalling target signatures, whereas subtypes A and D and normal samples expressed low levels of this signature. This was in concordance with the corresponding high percentages of β-catenin-positive nuclei seen in subtypes B and E and converse low percentages seen in subtypes A and D. This analysis is in support of the data suggesting that the colon stem cell signature, under the condition of silenced canonical Wnt target genes, is associated with a higher risk of recurrence (subtype D) [21].
Sadanandam and colleagues performed an analysis of GEPs from 1290 CRC samples using consensus-based unsupervised clustering. The resultant clusters were then correlated with response to cetuximab using a dataset annotated with therapeutic response to cetuximab in 80 patients [14]. The results of these studies identified five clinically relevant CRC subtypes which were named according to genes preferentially expressed in each. The transit-amplifying subgroup was found to contain two groups which differed in cetuximab sensitivity, so it was split into cetuximab-sensitive and cetuximab-resistant, thereby making six subgroups in total. These sub-subtypes showed the best response to cetuximab and increased sensitivity to cMET inhibition, respectively.
Additionally, response to standard chemotherapy with FOLFIRI (5-FU and irinotecan) was also investigated, and the analyses suggested that stem-like subtype tumours, both in the adjuvant and metastatic settings, and inflammatory-subtype tumours in the adjuvant setting may best be treated with FOLFIRI [14]. The transitamplifying sub-subtypes and the goblet-like subtype were not likely to respond to FOLFIRI in the adjuvant setting, thereby potentially sparing some patients from toxicity of futile treatment. These findings obviously warrant further retrospective and prospective validation, but in unselected CRC patients, FOLFIRI chemotherapy has not shown a survival benefit in the adjuvant setting.

Outcomes of integrative molecular analysis in CRC
As is evidenced above, up to six molecular subtypes of CRC have been identified by these independent groups, but only superficial similarities exist between the studies. The main characteristics of these subtypes are summarised in Table 2. Two subtypes have been repeatedly identified (microsatellite instability enriched and high expression of mesenchymal genes), but full consistency amongst the others has not been achieved probably due to the underlying biological complexity of this cancer and the significant overlap of features between subgroups. Methodological differences in the processing and analysing of samples have also contributed to these inconsistencies. In addition, the majority of samples from these datasets have been derived from primary tumours, so their applicability to advanced disease also needs to be considered as the molecular makeup of primary tumours versus metastases may vary, especially in response to the tumour microenvironment and immune cell infiltrate. Altogether, this has meant that the usefulness of these subclassification systems in clinical practice has been limited.

The consensus molecular subtypes of colorectal cancer
More recently, in order to resolve inconsistencies in subclassification systems and to aid clinical translation, the CRC research community formed an international consortium dedicated to large-scale data sharing and analytics [22]. After analysing the independent transcriptomic-based classification systems (which comprised 18 CRC datasets and 4151 patients in total) and using unsupervised clustering techniques, four robust consensus molecular subtypes (CMSs) with distinguishing features were proposed. Tumours with mixed features (approximately 13%) were thought to represent a transition phenotype or intratumoural heterogeneity. Table 3 summarises the main biological, molecular, clinical and prognostic associations of the four consensus subtypes.
In regard to genomic aberrations, CMS1 samples were hypermutated and encompassed the majority of MSI-H tumours. This group also displayed widespread hypermethylation and low prevalence of SCNAs. CMS2 and CMS4 subgroups displayed higher CIN via high SCNA counts. CMS3 samples consisted of fewer SCNAs than other CIN tumours, a significant proportion (30%) of hypermutated tumours  and intermediate levels of gene hypermethylation [22]. Despite clear enrichment of certain gene mutations within CMS groups, such as high rates of BRAF mutation in CMS1 and KRAS mutations in CMS3, no single genetic aberration was found to be limited to one subtype, and no subtype was defined by a single molecular event.
Further integrative genomic analysis did not draw any clear associations either, highlighting the poor genotype-phenotype correlation in this cancer. Further exploration of gene expression data revealed insight into the underlying biology of the subtypes: CMS1 samples showed strong immune activation and infiltration with CD4+ T helper cells, CD8+ cytotoxic T cells and natural killer (NK) cells along with strong activation of immune evasion pathways. CMS2 showed marked upregulation of Wnt and MYC downstream targets and higher expression of oncogenes EGFR, ERBB2, insulin-like growth factor 2 (IGF-2), insulin receptor substrate 2 (IRS-2) and transcription factor hepatocyte nuclear factor 4α (HNF4α). CMS3 samples showed enrichment for multiple metabolism signatures which are keeping with the described notion that activating KRAS mutations induce prominent metabolic adaptation [23,24]. CMS4 tumours showed upregulation of genes associated with epithelial-mesenchymal transition (EMT), such as transforming growth factor β (TGF-β) and integrins, as well as stromal invasion.

Clinical and prognostic associations of the consensus molecular subtypes
Associations between CMS subgroups and clinical features and prognosis were also investigated and showed that CMS1 tumours were more common in females, more likely to be right-sided and of higher histopathological grade. Conversely, CMS2 tumours were more likely to be left-sided and present at more advanced stages. CMS4 tumours show the worst OS and RFS even after adjustment for BRAF and KRAS mutations and MSI status. CMS1 tumours display good survival but very poor SAR in keeping with known data of MSI tumours associated with BRAF V600E mutations. CMS2 and CMS3 subgroups display intermediate survival, but a superior survival following relapse was noted in the CMS2 subgroup.
Further prognostic associations of the CMS subtypes have been explored via retrospective analysis of large clinical trial datasets, as have their association with biological therapies. 392 KRAS wild-type samples from the CALGB 80405 dataset were analysed via a NanoString platform to determine their CMS subtype classification, and this was correlated with survival [25]. It was found that CMS1 tumours treated with bevacizumab had significantly longer OS compared to those treated with cetuximab. CMS2 tumours treated with bevacizumab had a trend towards shorter OS than those treated with cetuximab. A meta-analysis of six randomised trials, including the CRYSTAL and FIRE-3 datasets, also confirmed the improvement in PFS and OS of left-sided tumours (CMS2) treated with anti-EGFR antibodies compared to no significant benefit for right-sided tumours (CMS1) [26]. No survival differences were found for left-or right-sided tumours treated with bevacizumab. This suggests sidedness of the primary tumour that determines efficacy of biological therapies, and this can possibly be explained by the biological differences of tumours from different sides of the bowel: left-sided tumours overexpress the EGFR ligands amphiregulin (AREG) and epiregulin (EREG) and also display amplifications of markers of cetuximab sensitivity, whereas right-sided tumours show reduced expression of EGFR ligands [27].

Clinical utility of the consensus molecular subtypes
Much hope was placed upon the CMS classification system allowing stratification of patients for clinical trials to validate the prognostic and predictive value of the subgroups and enable translation into clinical care. Although CMS classification has enabled refinement of the large 'non-MSI' group of CRC patients and provided a tool for systemic interrogation, there is some data which suggests that critical clinical information which predicts for outcome is still not distinguishable under this classification system. For example, a separate analysis of the CALGB 80405 dataset identified that sidedness of the primary tumour was still an independent prognostic factor over and above CMS subtype [28].
The association with treatment outcomes of the CMS subtypes, especially in the metastatic setting, still requires further exploration and validation. Kim et al. found that colorectal cancer assigner (CRCA) is subtyping more clearly defined oxaliplatin benefit group than CMS subtyping did prior to their analysis of the NSABP-C07 trial [29]. It is also important to consider the 13% of samples which could not be classified into CMS subtypes and the need to better characterise samples of mixed phenotypes and the clinical implications of this.
The challenge of reproducibility of this classification system which requires complex transcriptomic, proteomic and genomic analyses is also an issue, and its implementation is not feasible in many centres in its current form. There has been some work already undertaken to develop a robust and practical classifier based on immunohistochemistry (IHC) which appears promising but requires prospective validation [30].
All in all, the clinical utility and widespread reproducibility of this classification system in CRC is still to be determined, and it is likely that, with further characterisation, we may see additional subtyping of the four described subtypes in the future.

Conclusions
Much progress has been made in our understanding of the complex underlying biology of CRC which leads to heterogeneous drug responses and outcomes. Comprehensive integrative molecular analysis has led to the identification of molecularly distinct subgroups within this disease, and the consensus molecular subtypes have enabled some refinement of these subgroups. However, widespread reproducibility and confirmation of the clinical utility of CMS classification still need to be addressed. There are vast amounts of data being generated from molecular classification systems, and this needs to be prospectively integrated into clinical trial design in order to confirm biomarkers of resistance and response as well as to allow rational combinations of therapies to be explored. The ultimate goal is to streamline biomarker and drug codevelopment and recruit patients to innovative clinical trials of targeted agents to which they are more likely to respond based on the underlying molecular makeup of their tumours.
© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Author details
Avani Athauda and Ian Chau* The Royal Marsden NHS Foundation Trust, London and Surrey, United Kingdom *Address all correspondence to: Ian.chau@rmh.nhs.uk