Techniques used for the detection of alternative splicing
In this review, we discuss the merit of splicing isoforms as a source of biomarkers for ovarian cancer with a special focus on features that distinguish splice variants from global gene expression based markers. Key examples demonstrating the usefulness of alternative splicing (AS) as markers of ovarian cancer are described.
Ovarian cancer is a low incidence cancer with high mortality rate . The asymptomatic nature of this cancer and the late stage diagnosis of most tumors are the reasons for ineffective surgery and chemotherapy . In this sense, intensive research aim at increasing overall patient survival and quality of life by providing biomarkers for 1) early detection and 2) prediction of chemotherapy response and/or suggestion of alternative strategies. CA-125 is a glycoprotein that is usually expressed in a variety of epithelial cells and its serum level rise up in advance ovarian cancer . However, its use as an early detection marker or as a tool to screen the general population has not been approved so far [4,5]. CA-125 level is helpful in treatment-decision making but do not retain the capacity to improve overall survival and quality of life [6,7]. Clearly, there is still great need for biomarkers or combination of biomarkers that could positively identify early ovarian cancer lesions with great certainty or increase patients’ survival.
Genome-wide mRNA profiling presents an opportunity to rapidly identify RNA markers. Microarray platform has been applied in numerous occasions to provide gene expression signature correlating prognosis or indicative of chemotherapy response (review in ). However, the nature of the platform used to carry the experiments and the analysis methods and sample sets makes inter-laboratory comparison very difficult and finding reliable markers complicated. Indeed, a meta-analysis regrouping 829 samples fails to demonstrate the predictive power of 16 individual gene expression signatures . Consequently, very few microarray markers reached the clinic. In contrast, high-throughput protein signature based on mass spectrometry platform appears to have much more overlap in the peaks found by different experimental studies . However, the pace by which protein biomarkers are translated into clinical setting is relatively slow . Clearly, there is a need for novel methodology to discover ovarian cancer biomarkers that can yield reliable results and produce tests that could be quickly integrated in normal clinical setting. In this chapter, we discuss the potential of splice variant annotations as a tool for the discovery of ovarian cancer markers and discuss the challenges and promises of this hidden mine.
2. Pre-mRNA splicing mechanism and regulation
Transcription of messenger RNA (mRNA) is the first step of converting the DNA code into functional proteins. This process was often seen as a linear cascade of events that include mRNA capping , splicing , polyadenylation , export to the cytoplasm  and translation  to produce a single protein. However, in reality a single pre-mRNA can produce many mRNAs through the process of AS and this in turn lead to the production of several proteins from a single gene. Splicing is the process by which the protein coding exons (typically hundreds of nucleotides in length) are joined together after the removal of large non-coding introns (typically thousands of nucleotides in length) to form the coding sequence. In some genes, this process leads to one outcome and thus named constitutive (Fig. 1.) but in most cases it leads to more than one outcome and thus called alternative (Fig. 2). Both processes are mediated by the spliceosome, specialized machinery that recognizes consensus RNA sequences [13,17]. The spliceosome component U1snRNP binds the 5’ splice site (5’ss), the splicing Factor 1 (SF1) binds the branch point site (BPS) adenine, U2 auxiliary factor 65 kDa subunit (U2AF65) binds the poly-pyrimidine tract (PPT) and U2 auxiliary factor 35 kDa subunits (U2AF35) binds the 3’ splice site (3’ss) (Fig. 1A). The last two component are further replaced by U2snRNP and following complex base-paring rearrangements and RNA-protein interactions involving hundreds of protein, the spliced mRNA, the intron by-product and the spliceosome component are release . Chemically speaking, the splicing reaction proceeds in two trans-esterification steps (Fig. 1B). The first step involved the attack by the 2’ hydroxyl of the branch point adenine on the phosphate at the 5’ss, releasing at the same time the 3’end of the mRNA. The second step involved the attack by one of the hydroxyl of the terminal phosphate on the phosphate at the 3’ss, liberating the intron in the form of a lariat. This cycle of spliceosome assembly/disassembly is repeated for every intron of a gene on the nascent RNA transcript .
When the splice site for some exons become weak or introns with suboptimal sequence exist splicing may become less accurate and may depend on the factors that influence the splicing of competing exons and consequently produce mRNA versions with different exon pairs. AS affected the majority of multi-exons genes and is believed to be the principal driver of proteome diversity . As illustrated in figure 2, two 5’ss can compete for a single 3’ss or inversely, two 3’ss can compete for a single 5’ss. These type of alternative splicing events (ASEs) are referred to alternative 5’ (alt5’, Fig. 2A) and alternative 3’ (alt3’ Fig. 2B), respectively. The most frequent type of ASE in human is the full skipping of an exon (cassette exon, Fig. 2C). Some exons are also skipped as a bloc (multiple cassette exon, Fig. 2D) or mutually exclusive (Fig. 2E). AS could also be coupled to others regulatory mechanisms such as polyadenylation  (Fig. 2F). In this case, the resulting mRNA exhibits a different 3’ untranslated region (UTR), which is further subjected to different regulation by small non-coding RNA (e.g. microRNA). In about 1 out of 3 cases, AS decision introduces a sequence containing premature stop codon . In these cases, the resulting mRNA is flagged to be degraded by the non-sense mediated decay machinery creating an efficient mechanism that control gene expression post-transcriptionally  (Fig. 2G). In some cases, ASE occurs outside the coding region and influence regulatory sequence in the UTR . These different forms of splicing isoforms should not be confused with those generated by alternative transcription start site where a single gene might transcribe from different promoters (Fig. 2H). In this case, and unlike alternative splicing, there is little chance that the isoform will have different protein sequence unless a new protein-coding exon is added in frame for translation initiation.
ASEs are normally associated with low sequence conservation near the splice site and instead are usually linked to RNA binding motifs that may enhance or repress exon inclusion [23,24]. Motifs that enhance exon inclusion often recruit splicing factors like the SR protein family, which in turn interact with the spliceosome via an arginine serine rich domain to increase weak 5’ss and 3’ss recognition  (Fig. 3A). On the other hand, splicing motifs that promotes exon exclusion by binding members of the hnRNP family oligomerized through exon , block UsnRNA recruitment  or loop out the alternative exon  (Fig. 3B to D). Similarly, sequence motifs in intron may bind to SR or hnRNP proteins to influence splicing, but in this case, the SR proteins results in exon exclusion and hnRNP in exon inclusion. This is most likely because hnRNPs define intronic region and SR protein define exons location. Usually, these different enhancers and repressor protein families work together to define the final outcome of any ASEs (Fig. 3E) [29,30]. One of the most conserved intronic motif downstream of alternative exons is the UGCAUG motif [31,32], which bind the tissue-specific splicing factors family RBFOX. In general it is suggested that tissues specific splicing factors favor exon inclusion when bound to introns downstream of alternative exons and exclusion when bound upstream. This rule is beginning to be appreciated for several splicing factor such as Celf , epithelial-specific regulatory protein , Nova  and RBFOX  (Fig. 3F).
3. The advantage of alternative splicing as a source of ovarian cancer biomarker
Analysis of the ovarian cancer proteome using mass spectrometry is undoubtedly the most direct approach for the identification of biomarkers that could be readily implemented in the clinic. However, the difficulty generating specific antibodies for the large number of potential markers generated via this approach makes marker validation very difficult. In contrast, the validation of nucleic acid markers generated through microarray or deep-sequencing screen is fairly simple and is often achieved by polymerase chain reaction (PCR) [34,37]. Furthermore, the function of these potential markers can easily be verified through the knockdown of gene expression using RNA interference (RNAi) strategies . However, scoring global changes in gene expression as markers for ovarian cancer limits the assay to ~25000 genes in the genome, while it is estimate that the human cells contains at least >100 000 proteins. This limitation is no longer an issue when we consider the expression of specific splice variants, the number of which equal or exceeds the number of cellular proteins . In addition, it is much easier to predict the function of an alternative splice variant than predicting the function of a peptide marker. For examples, while the role of the well established markers CA-125 remain unclear after 25 years of research , one could easily predict the function of a marker by the protein domain eliminated or included through AS as is the case of the tyrosine kinase SYK. In this case, exon skipping remove a nuclear localization domain leading to the accumulation of protein in the cytoplasm, elegantly explaining the lost of nuclear function associated with cancer . Predicting the impact of AS is particularly attractive for biomarkers development when the alternative exon encodes a plasma transmembrane domain or an extracellular protease cleavage site . In these cases, one would be able to predict whether the cancer associated marker leads to an increase or decrease in the secretion of membrane anchored protein, an information that is difficult to obtain using global gene expression profiles.
4. The challenges of detecting splicing isoforms
Examples of alternatively spliced genes are steadily accumulating in the literature for more than 20 years and the discovery rate was greatly accelerated by recent technological advances like transcriptome sequencing techniques. Indeed, while early estimation of alternatively spliced genes based on Northern-Blots and endpoint RT-PCR were around 5% of the human genome, transcriptome sequencing revealed ASEs in 95% of the human genes with multiple introns . Different techniques have different capacity to illustrate the number of ASEs (see Table 1) and detecting splice variants remained difficult to detect for many years, which explains the reason they are not regularly considered as a source of biomarkers by most clinicians.
Several isoforms can be detected in a single sample
Large amount of RNA needed
Restricted by gel resolution
|Endpoint RT-PCR||Low to medium||Considered as the gold standard for validation|
Results easy to analyze
Throughput is enhanced when coupled to capillary electrophoresis
|Labour-intense if polyacrylamide gels are used to separate PCR products|
Low quantitative range
Restricted by gel resolution
|Real-time PCR||Low to medium||Provided already validated data|
Large quantitative range
Accurate data in fixed tissues
|Custom primer design|
|Microarray||medium||Some array are commercially available||Complex analysis|
Results need PCR validation
|Next generation Sequencing||Medium to High||Independent of genome annotation|
(Discovery of novel splicing isoforms)
|High cost prevent the use of biological replicates|
Long multi-step procedure
Results need PCR validation
Back in the 1980’s, splicing isoforms were mainly detected by Northern-Blot, which separate transcripts by size  and estimate relative mRNA abundance using internal controls. However, this method is difficult to adopt in a clinical setting and require a large amount of RNA (µg), which is difficult to obtain from clinical samples. Later, the discovery of reverse transcription and PCR amplification greatly facilitated the detection of splice variants . Splicing isoform amplification is achived by using PCR primers that are designed to hybridize to constitutive exons flanking the ASE of interest (Fig. 4A). The products are separated in agarose gels or capillary gel electrophoresis  and the ratio of the long and short isoform quantified and presented as ψ (percent of splicing index): the molarity of the long over the sum of the long and the short isoforms (Fig. 4A). Even if competitive PCR reaction are limited to a narrow range [43,45], endpoint PCR is still the preferred technique to detect splicing isoforms due to the ease of use and low cost of the experiments.
The gold standard for the mRNA quantification is real-time PCR , which unlike standard endpoint PCR, detects the amount of products accumulating after each cycle of amplification and permits accurate comparison of different samples. This type of PCR requires the use of fluorescent probes  or dyes  that permit detection by specialized sensors. Despite the accuracy of this detection method it is rarely used for the detection of splice variants due to difficulty in achieving isoform specific amplification [43,49]. Primers required for the amplification of the short isoform need to bind to a short unique sequence created by the exon-exon junction, which severely restrict the design (Fig. 4B). However, systematic evaluation of isoform specific design parameters and the availability of new algorithms for primer selection greatly facilitated the detection of ASEs from any species . Indeed, universal PCR conditions and ease of primer design makes real-time PCR reaches the point where it can compete with high-throughput detection methods like microarray in term of ASE coverage .
Microarray as a method for genome-wide expression profiling was discovered in 1995 , but the use of this method to detect splice variants was reported only in 2003 . It took 8 years to develop methods that could distinguish between the hybridization patterns of two closely related transcripts and develop chips with high enough density to accommodate the thousands of splicing isoforms  (Fig. 4C) Early attempts to extract splicing pattern from expression microarrays generated high false positive rate . Therefore, strategies where developed to probe exon-exon junction (junction array) . In this case, alternative exons are defined by very low or very high signals emanating from two consecutive splice junctions . Another popular strategy is to use exonic probe in addition to exon-exon junction probe (exon/junction array) . In every case, the high similarity of exon-exon junction to favor non-specific hybridization and in some analysis procedures the information is restricted to splicing isoform “detection” rather than true quantification . The most successful quantification of splicing isoform by microarray was achieved by relying solely on exonic probe [54-56]. However, the success of this method was limited by its dependence on a small set of pre-selected splice variants [53,57]. To allow the discovery of new splicing isoforms, a fourth strategy that consider all putative exons (tiling array) was developed . However, the high number of probe required for this methods restricted coverage to only a small fraction of the genome. Not surprisingly, these difficulties hampered the application of this method for the study of ovarian cancer splicing isforms. Indeed, to date there is no report of microarray based profiling of ovarian cancer splice variants.
In theory, the most promising approach for the detection of ovarian cancer splicing isoform is the transcriptome sequencing . Next generation sequencing (NGS) technology provide massively parallel sequencing of nucleotidic sequences in miniaturized microsystem. Several platforms are commercially available and their unique technology are discussed elsewhere [59,60]. The specific application of mRNA quantification through sequencing (RNA-seq) was demonstrated for different cancer types (e.g. lung  and prostate ) but not ovarian cancer thus far. Encouraging development in the refinement of the analytical pipeline to allow accurate quantification of splicing isoforms was recently made [37,63,64]. However, the complexity of the analytical pipeline of sequencing data and the cost of the sequencing read necessary to detect splice variants will reduce the speed by which this technique is applied to the discovery of splicing dependent biomarkers (Fig. 4D). In addition, secondary techniques like PCR will still be needed to confirm and validate the accuracy of the data generated and confirm it in a large number of clinical sample. Indeed, the majority of the AS information in ovarian cancer are derived from PCR-based techniques (see Table 2 and 3).
Numerous DNA mutation inactivated BRCA1/2 function (deletion of essential domain or protein truncation) through aberrant splicing.
AML1bDel179-242 expression inversely
correlates with overall survival
The ratio of total to full length KLF6 expression correlates with ovarian tumor grade
p53δ expression is associated to impaired response to first line chemotherapy
*The ratio of Fibulin C/D increase in ovarian tumors
*Osteopontin-c is undetectable in normal ovaries and present in ovarian tumors
This signature distinguishes normal ovaries from ovarian tumors regardless of grade, stage or histotype (serous, muscinous, endometriod, mix type)
This signature distinguishes normal ovaries from serous high grade ovarian tumors.
This cancer epithelial signature (CES) distinguishes normal Fallopian tube epithelium tissues from ovarian epithelial cancer cells
This cancer stromal signature (CSS) distinguishes normal tissues from ovarian tumors independent of the epithelial content of the tissue compared
5. Example of alternative splicing based ovarian cancer biomarkers
5.1. Gene specific discovery of splicing markers
PCR-based techniques of specific genes associated with ovarian cancer revealed a number of ovarian cancer associated splicing events. The most promising of these potential biomarkers for diagnosis, prognosis, chemoresistance and grade biomarkers are listed in table 2 and are further described in the text below.
5.2. Splicing markers generated through genome-wide expression profiling
The advent of splicing sensitive high-throughput technique opens the doors to monitor a large number of randomly selected ASEs rather than be limited to few candidate genes (see Table 3). The recent use of high-throughput RT-PCR by coupling PCR reaction in 384 wells plate to capillary gel electrophoresis in 96 well Caliper station dramatically increased the number of confirmed ovarian cancer associated splicing events. Initially, exon-exon junctions were systematically analyzed for a set of 600 cancer related genes in four different pools of normal and cancer ovarian samples. The resulting ASEs were subsequently validated using an independent set of 21 normal ovaries and 25 ovarian cancer samples, yielding 48 ASE markers . Later on, a focus on a collection of 2168 highly curated ASEs (RefSeq NCBI build 36) subsequently yield 288 ASEs markers using roughly the same sample set . The relatively high number of ASEs markers found coupled to the fact that several were related to the epithelial-mesenchymal transition raised the possibility that a large fraction of the discovered events might result from difference in the cell type compared (normal ovaries are largely composed of stromal cells where as ovarian tumors have a typical epithelial content around 75% ). This question was answered when 9 ovarian tumors were microdissected to isolated the RNA from stromal (tumor microenvironment) and epithelial cancer cells separately. A real-time PCR-based screening strategy coupled to an update version of RefSeq NCBI build 36 (3313 ASEs) yield a low but unambiguous set of cancer-specific splicing isoforms, the cancer epithelial signature (CES) . Surprisingly, the tumor microenvironment appears to contain promising splicing isoforms RNA markers. Indeed, this cancer stromal signature (CSS) might be able to diagnosis early ovarian tumors as it clusters low malignant potential and low-grade tumors within normal ovaries and Fallopian tube samples, although this study was performed on a low number of tissues .
The possibility that ovarian tumor microenvironment may be a source of splicing isoforms markers raise interesting questions regarding the studies conducted on whole tumors. First, some of the RNA transcripts detected may actually come from the microenvironment cells. For exemple, fibulin and fibronectin are two ECM components known to be produced and secreted by stromal cells. Pinpointing the cell type that produced those splicing deregulated secreted proteins will certainly help to rationalize the complex autocrine and paracrine pathways implicated in the cell to cell communication that take place into and surrounding the ovarian tumor. Second, AS is a highly tissue-specific process, some of the splicing pattern changes might be the reflection of the different proportion of stromal and epithelial cells of ovarian tumors. Theoretically, those effects would be minimal when ovarian tumors of equivalent epithelial content (typically 50-75%) are compared but maximal when normal ovaries (1% epithelial cells) are used as normal reference. As a consequence, prognosis marker derived from cancer samples comparison should yield more reliable splicing markers than diagnosis marker normalized with normal ovaries.
5.3. Alternative splicing associated protein markers
Interestingly, a number of RNA splicing isoforms markers might be amenable to detection at the protein level using isoform-specific antibodies. Ultimately, these could serve as diagnostic or prognostic tool to either directly detect the presence of cancer cells or indirectly the protein in patient’s fluid. Indeed, the product of the genes encoding fibronectin 1, fibulin, osteopontin, galectin 9, platelet derived growth factor A, extracellular sulfatase 2 and slit homolog 2 are all secreted in the extracellular matrix. Even some cytosolic proteins such as utrophin and serine hydroxymethyltransferase 1 were found in patient’s serum . Others are cell surface protein (amyloid beta A4 protein, stromal interaction molecule 1, CD97, peptidyl-glycine alpha-amidating monooxygenase and chemokine-like factor) harboring an ASE that encodes for an extracellular domain. More impressively, the exon encoding the transmembrane domain of betacellulin is preferentially excluded in ovarian tumors , leading to a secreted version of the protein . Thus in every cases, isoform-specific antibodies could be theoretically raised against the cancer associated isoform to ultimately serve as diagnostic/prognostic tool to either detect cancer cells or detect the protein in patient’s fluid.
Inversely, the splicing isoforms of the cell surface receptor Fas and CD44 were mostly studied at the protein level by either immunohistochemistry (IHC) or ELISA. Fas linked extracellular apoptotic signals that converge to the programmed cell death pathway through caspase 8 and 10. Differential usage of exon 6, which encodes the single pass transmembrane domain, results in a soluble version (sFas) and a membrane anchored version (mFas). The level of sFas is increase in ovarian tumor of higher grade compared to low grade [91,92] and correlates with worst prognosis for these patient . Although these studies were performed in small cohort, it elegantly demonstrated that AS can produce isoforms detectable in patient’s serum.
The glycoprotein CD44 is a cell surface receptor that binds diverse extracellular matrix ligands such as hyaluronic acid, fibronectin, osteopontin, collagen and laminin . The binding of low molecular weight hyaluronan polymer promotes the motility and invasion properties of CD44 (review in ). It is encoded as a 20 exons gene that exhibit extensive AS of the extracellular domain of exons 6 to 15 (also called variable exons 1 to 10). The major isoform present in normal epithelial [94,95] or stromal  ovarian cells is the shorter isoform CD44s lacking all variable exons (CD44s for standard isoform). In contrast, a complex pattern of splicing isoforms were detected in cancer tissues, including most of ovarian tumors by mean of RT-PCR [94,97,98] or by IHC using isoform specific antibodies [95,96,99,100]. One of these splicing isoforms, the inclusion of exon v10, appears to correlate with prognosis and is indicative of improved survival in a multivariate analysis of a 142 patient cohort by IHC . However, these findings contrast the initial study of Schroder who found no exon v10 expression although it relies on a smaller cohort . Intriguingly, inclusion of exon v10 in metastatic tumors was correlated with decrease survival . This apparent discrepancy could be rationalized if the exon v10 inclusion is seen as crucial to maintain proper cell adhesion and avoid cell detachment . It remains to be determined if any of the variable exons of CD44 could serve as biomarker at the RNA level.
6. Concluding remarks
AS dramatically increase the diversity of protein expression in human cells and therefore exponentially increase the number of potential disease markers. However, the complexity in detecting AS and the unclear function of the majority of splice variants greatly reduced the rate of AS based ovarian cancer biomarkers. This trend is likely to change in the next few years with the explosion of whole transcriptome sequencing efforts and the inevitable identification of splice variants as byproducts of next generations’ expression profiles. The real challenge now is to develop techniques allowing the use of splicing markers in the clinic and prepare pathologists to this new wave. Clearly, a compelling argument is needed to drive this drastic change in clinical practice and it will most likely be driven by the success of AS based screens in rationally predicting secreted protein that may serve as non-invasive ovarian cancer markers.