Potential lung cancer biomarkers discovered by the use of proteomic tools
With more than 1 million annual deaths, among both females and males, lung cancer is the world leading cause of cancer-related death (1). The most important risk factor for lung cancer is smoking, with smokers presenting a 10 fold risk increase compared to non-smokers. Lung cancers are usually divided into two categories: small-cell lung cancer (SCLC), representing approximately 15% of cases, and non-small cell lung cancer (NSCLC). This sub-division represents around 85% of all lung cancer cases and includes the histological sub-types adenocarcinoma, large-cell carcinoma and squamous cell carcinoma (2). The lung cancer 5-year survival rate is one of the lowest at 10-15% and treatment depends on the extent of the disease at the time of diagnosis (3). Approximately 30% of patients have early stage lung cancer when diagnosed and those tumours can be surgically removed, 20% have local and/or regionally advanced tumours and are treated with chemo and radiotherapy, and almost half of the patients have advanced metastatic disease when only palliative treatments are available (4). Consequently there is a pressing need for new screening and early diagnostic techniques that are specific and non-invasive, and also for tools that can predict prognosis, optimize treatments and identify new therapeutic targets. Genomic approaches have been used to that end in the last years. Nonetheless, given the importance of proteins to a cells’ phenotype, post-translational modifications, and the poor correlation between mRNA and protein expression levels (5, 6), proteomic analyses may enlighten the pathogenesis of lung cancer. A variety of techniques such as two dimensional gel electrophoresis (2D-PAGE, 2D-DIGE), protein arrays, protein labelling and tagging (ICAT, iTRAQ, SILAC), are being used in cancer research (7, 8) and have the potential to aid clinical practice as a complement to histopathology, as a selection method for individualized therapy, and in the assessment of drug efficacy, resistance, and toxicity (9).
2. Lung cancer
In the beginning of the 20th century, lung cancer was a rare disease. Nowadays it has the highest incidence and mortality rates in the world with lifestyle and environmental factors thought to be the major contributors to the development of this disease (10). Epidemiological evidence has shown that two to three decades after a peak in smoking prevalence in a given population, there is a peak in lung cancer deaths, making tobacco smoking the main cause of lung cancer development. This relationship was established in the 1950’s and 60’s (10-12). Other causes include environmental tobacco smoking, air pollution, indoor radon, occupational exposure to respiratory carcinogens, asbestos, and fumes from cooking stoves and fires (10). Even though smoking is undeniably the major cause of lung cancer, making it the leading cause of preventable death in the world, it is important to recognize that the majority of smokers will not develop this neoplasia over time and that this is probably due to individual variation in the susceptibility to respiratory carcinogens and the existence of a previous lung disease (13, 14). Tobacco components can induce DNA damage through several mechanisms including gene point mutations, deletions, insertions, recombinations, rearrangements, and chromosomal alterations, which drive the development of the disease (15). Nonetheless, the current classification of lung cancer does not emphasize the important of specific molecular and genetic alterations that can differentiate between SCLC and NSCLC. This is also true for the NSCLC subtypes adenocarcinoma, large cell carcinoma, and squamous cell carcinoma, that were until recently, treated similarly, regardless of their biological heterogeneity (16). Lung cancer is characterized by genetic instability of the chromosomes, nucleotides, and the transcriptome. These abnormalities are usually targeted to proto-oncogenes, tumour suppressor genes, DNA repair genes, among others. The silencing of telomerase is present in normal cells, but in almost all SCLC and over 80% of NSCLC, telomerase is activated, promoting cell immortalization (17). The epidermal growth factor receptor (EGFR) is overexpressed or abnormally activated by mutation in 50-90% of all NSCLC, especially in squamous cell carcinomas, leading to increased cell proliferation and survival through the RAS/RAF/MEK/MAPK and PI3K/AKT pathways (18). Activating mutations of the KRAS gene from the RAS proto-oncogene family are present in 20% of all NSCLS and between 30-50% of lung adenocarcinomas (19). The fusion of the echinoderm microtubule-associated protein-like 4 (EML4) and the anaplastic lymphoma kinase (ALK) genes occurs in approximately 7% of NSCLC and is associated with a persistent mitogenic signal. The EML4-ALK, EGFR, and KRAS mutations are almost always mutually exclusive (19). Tumour suppressor genes are also affected in lung cancer. Mutations in TP53 are the most common genetic alterations found in human cancers and occur in approximately 75% of SCLC and in 50% of NSCLC (17). Alterations in the PI3K/AKT pathway, the CDKN2A/RB1 pathway, VEGF, and epigenetic changes are also present in lung cancer (19). Several drugs have been developed to target these alterations and improve survival of lung cancer patients, such as tyrosine kinase inhibitors and monoclonal antibodies, revealing the importance of the molecular characterization of tumours in order to improve detection, diagnosis, treatment and prognosis of lung cancer.
Proteins are crucial operators in the majority of biological systems and a comprehensive knowledge of their expression, modifications, and function in the lung cancer setting, may be more informative than DNA and RNA studies alone. New technologies are being developed that allow the analysis of thousands of cancer cell proteins, possibly generating new therapeutic targets and biomarkers that will have an impact on early detection, therapy and prognostic evaluation of lung cancer patients.
3. Proteomic techniques in lung cancer research
The proteomic technologies which are being implemented in lung cancer research are mainly based on two dimensional gel electrophoresis, as seen on Figure 1 where the 2D-PAGE and 2D-DIGE workflows are represented, or proteomics based on isotope labelling methods as ICAT, iTRAQ, SILAC, followed by mass spectrometry (MS) analysis.
4. Two dimensional gel electrophoresis 2D-PAGE
2D-PAGE is the most used proteomic technique for studying the proteome as well as to search for cancer biomarkers (20, 21). In this methodology intact proteins are firstly separated by their isoelectric point (pI) and then according to their molecular weight. This procedure generates protein spots that are separated from the gel and digested into peptides for MS identification. Multidimensional separation of peptides may also be required given that, although the digestion step facilitates the identification process, it increases sample complexity, decreasing the sensitivity and coverage of the technique. Disadvantages of 2D-PAGE include the separation of low abundant proteins and of membrane proteins. The use of fractioning methods or higher protein concentrations for less detectable proteins and the use of mild detergents to increase the solubility of membrane proteins may be a solution for the aforementioned issues (22, 23). Other problems include co-migration of different proteins, the separation of a protein with different post-translational modifications, proteins with pI values below 4 or above 9, or the separation of very small or very large proteins. Differential gel electrophoresis (2D-DIGE), a modification of 2D-PAGE with fluorescent dyes (Cy3, Cy5 and Cy2), is able to increase reproducibility and throughput and also allows the accurate quantitation of protein expression difference (24). Differential analysis software can recognize the differentially expressed proteins and these can later be trypsin digested into peptides generating peptide mass fingerprints (PMF). The absolute masses of these peptides can be measured by matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), a technique that is both relatively easy to use and reasonably sensitive for identifying proteins. Additionally other MS techniques, such as electrospray ionization (ESI-MS/MS), are capable of providing amino acid sequence information on peptide fragments of the initial protein (25). Liquid chromatography coupled to tandem mass spectrometry workflow (LC-MS/MS) has become a standard method to identify proteins from complex biological samples. Also, direct MS analysis of tissue, known as MALDI imaging, is a method that has been used to elucidate proteome features characterizing histological differences in lung cancer between adenocarcinoma and squamous- cell carcinoma (26). Another example of a novel way to generate proteomic data is presented in the study of dynamic proteome changes on lung cancer cells (H1299) treated with the cytotoxic drug camptothecin using single-protein labelling on large scale (27).
5. Isotope-labelled mass spectrometry
Isotope-labelling methods, as seen on Figure 2, are gel-free procedures that introduce stable isotope tags to proteins through chemical reactions using isotope-coded affinity tags (ICAT) (28) and isobaric tag for relative and absolute quantitation (iTRAQ) (29), or through metabolic labelling with isotope labelled amino acids in cell culture (SILAC) (30).
ICAT is used to analyse pairs of protein samples, such as a treated sample and its control. Extractedproteins from both samples are labelled with a light or heavy ICAT reagent by reacting with a specific amino acid(cysteine). Samples are then mixed, trypsin digested, fractioned, and analysed by LC-MS/MS (31). Isotope peak ratios for each peptide determine the differential protein expression. The drawback of this technique is that it can only analyse cysteine containing proteins, two samples, and it can only identify 300-400 peptides.
iTRAQ is another labelling technique first developed by Ross and co-workers (32) which uses isobaric tags to label and compare proteins extracted from samples. iTRAQ contains a set of four or eight isobaric reagents and therefore can analyse up to four or eight protein samples at one time. After trypsin digestion samples are labelled with four or eight (4-plex or 8-plex) independent iTRAQ reagents. The reporter groups of the iTRAQ reagents separate from the peptides and generate small fragments for each sample with mass-to-charge (m/z) of 114, 115, 116, and 117 for 4-plex, plus 113, 118, 119, and 121 for 8-plex. The intensity of each peak correlates with the quantity of each reporter group and thus with the quantity of the peptide. This method allows the analysis of various samples at a time and also, given that most peptides are suitable to be labelled by iTRAQ, it minimizes information loss and allows the identification of proteins with different post-translational modifications. Disadvantages of iTRAQ include a separate lengthy sample processing, that increases the chances of experimental errors, and the generation of chemical side products during the labelling process that can reduce the sensitivity of the method (33).
SILAC, first developed by Mann and co-workers, is based on the metabolic incorporation of “heavy” and “light” forms of amino acids into the proteins of living cultured cells (34). Typically, heavy (13C or 15N) arginine or lysine are used in the culture medium of a cell culture while the other cell culture is supplied with regular amino acids. After several division rounds, these amino acids are incorporated into the newly synthesized proteins. Following trypsin digestion, peptides are analysed by MS and the light and heavy peptides appear in two distinct peaks and, by comparing the signal intensities differences, relative quantitation can be performed. This technique has been widely used for cancer biomarker discovery (35), and cell signalling dynamics (36).
6. Label-free mass spectrometry
Multidimensional Protein Identification Technology (MudPIT) is a generic label-free LC-MS shotgun screening method (36). It separates peptides according to two independent physicochemical properties using liquid chromatography (LC/LC) online with the ion source of a mass spectrometer, allowing the separation and identification of peptides without labelling. The success of this technique depends on the experimental workflow, from protein extraction to sample stability, given that the reproducibility of technical replicates is better than that of experimental replicates. Drawbacks of this method include the fact that not all peptides are equally detectable given the competition between ions, dynamic range limitations and MS sensitivity (37). With time and improvements, label-free MS could be widely used for biomarker discovery and validation.
7. Detection of post-translational modifications (PTMs)
PTMs are the chemical alterations that occur to a protein after translation. They include proteolytic cleavage, glycosylation, phosphorylation, acetylation, ubiquitination, farnesylation, methylation, sialylation, oxidation, prolyl isomerization and hydroxylation (38). Glycosylation and phosphorylation are two of the most biologically relevant PTMs and appear to be key processes in tumour progression in many types of cancers including lung cancer (39, 40)
Glycosylation, the process of adding saccharides to proteins, plays a fundamental role in protein stabilization, molecular and cellular recognition, growth and cellular communication, and can also be a part of immune responses and cancer progression (41). The comparative study of the carbohydrate chains of glycoproteins may provide useful information for the diagnosis, prognosis,and immunotherapy of tumours (42). The proteomic analysis of glycoproteins starts with the enrichment of these molecules from a complex protein sample by the use lectins. This step is followed by a separation of glycoproteins by procedures such as 2D-PAGE and 2D-DIGE coupled with glycoprotein staining methods, for example Pro-Q Emerald 488 glycoprotein stain (43), lectin fluorescence stain (44), and isotope labelling (45). Identification of separated glycoproteins and their glycan structures can be accomplished by chromatographic methods (nano-LC with hydrophilic columns, nano-LC with graphitized carbon packing, anion-exchange chromatography), electromigration approaches (capillary electrophoresis, capillary electrochromatography), capillary LC/MALDI-TOF/TOF MS & tandem MS (MS/MS), and chip-based approaches (46). Although there are some difficulties when analysing lung tumours, one study has identified 34 glycoproteins with significant differences between lung adenocarcinomas and healthy controls. The α1,6-fucosylation levels were incremented in the lung cancer group in comparison with healthy group (47).
Phosphorylation is the addition of a phosphate group to a protein and is a key regulatory mechanism of cellular signalling processes. Phosphoproteomics and the characterization of phosphorylation sites, which less than 2% are currently known, are some of the most challenging tasks in current proteomic research (48). To isolate and identify phosphorylated proteins one must use immunoaffinity or immunoprecipitation with a specific antibody, chromatofocusing, ion exchange chromatography and affinity chromatography, such as immobilized metal ion affinity chromatography (IMAC) (49). Separation methods include electrophoresis, 2D-PAGE or 2D-DIGE coupled with phosphoprotein staining (Pro-Q Diamond phosphoprotein gel stain) or isotope labelling (ICAT, SILAC) (50, 51). Analysis and identification methods of phosphoproteins and phosphopeptides are mass spectrometry-based approaches, such MALDI-TOF MS, LC-ESI-MS and MS/MS (52). Given that the key regulators of signalling cascades are kinases and phosphatases, lung cancer phosphoproteomics might reveal the correlation between phosphorylation and cancer mechanisms.
8. Samples in lung cancer proteomics
The lung is a heterogeneous organ composed by several highly differentiated cells (bronchial, alveolar, inflammatory) and vascular structures. Its main function is to perform gas exchanges between the atmosphere and the bloodstream. When studying lung cancer with proteomic tools, several different samples can be used: tumour tissue, blood, pleural effusions, among others (53). The accessibility of blood makes for a great sample for oncoproteomic studies. Moreover, it contains many circulating molecules secretedby the tumour that can be used as biomarkers. Nonetheless, due to the abundance of plasma proteins, depletion of these proteins is necessary to reveal the presence of less abundant ones. Tumour tissue samples, fresh-frozen or formalin-fixed and paraffin-embedded, are the ideal for any oncoproteomic study. However, adjacent normal tissue, inflammatory cells, stromal components, and others might also be present. This will result in non-tumour derived protein contamination. To compensate tumour heterogeneity careful sample cell content analysis and the increase of sample numbers is required to obtain relevant results. The pleura is a thin double-layered tissue that surrounds the lung and it is filled with pleural fluid. This liquid is constantly produced and reabsorbed, and its main function is to facilitate respiratory movements and reduce attrition between the lungs and the thorax wall. Pleural effusion is the pathological accumulation of fluid that occurs in inflammatory conditions and lung cancer. In the latter case, pleural effusion is often drained to search for cancer cell infiltration. Its protein composition is similar to plasma, but its proximity to tumour cells makes it useful for lung cancer biomarker detection by proteomic techniques.
9. Proteomics in the discovery and validation of lung cancer biomarkers
9.1. Diagnostic biomarkers
To discover a lung cancer diagnostic biomarker, a molecule that is specific and directly correlates with the presence of this disease, the majority of studies perform a comparison between the protein profiles of tumour samples and normal lung tissue. The ideal would be to study the development of the carcinogenic process from normal tissue, to metaplasia, to dysplasia, and finally to invasive cancer, in order to discover early markers of disease before the onset of clinical features.
In response to inflammation, a cancer enabling characteristic, acute-phase reactant proteins (APRPs) are produced. Recent proteomic studies have shown that APRPs haptoglobin (Hp) β chain (54), serum amyloid A (SAA) (55), and apolipoprotein A-1 (Apo A-1) (56) proteins are potential lung cancer diagnostic biomarkers. SAA proteins are involved in the transport of cholesterol to the liver, the recruitment of immune cells, and the induction extracellular matrix degrading enzymes. SAA1 and SAA2, which are synthesised in response to activated monocytes/macrophages, were recently identified, by LC-MS/MS, ELISA and immunohistochemistry analyses, as lung cancer biomarkers given their higher expression levels in blood and tissue from lung cancer patients when compared to healthy subjects and patients with other cancers and respiratory diseases (55). In another related study, serum and pleural effusions from NSCLC patients were compared by 2D-DIGE to those from patients with benign lung diseases. Gelsolin, possibly involved in cancer invasion, metalloproteinase inhibitor 2 (TIMP2), involved in lung parenchyma disorganization, and pigment epithelium derived factor (PEDF), an angiogenesis inhibitor, were among the candidate biomarkers (57). A study by Patz and co-workers, that aimed to test the diagnostic performance of four lung cancer biomarkers (carcinoembryonic antigen and squamous-cell carcinoma antigen, and 2D-PAGE and MALDI-MS discovered retinol binding protein – RBP - and α-1 antitrypsin), demonstrated that the four markers have inadequate diagnostic power when tested independently but proved useful when used in combination (58). A glycoproteomic study revealed plasma kallikrein (KLKB1), pleural effusion periostin, multimerin-2, CD166 and lysosome-associated membrane glycoprotein-2 (LAMP-2) as potential lung cancer biomarkers (59).
9.2. Prognostic biomarkers
Prognostic biomarkers, those that have expression levels correlating with the natural history of the disease, have the potential to influence survival by identifying high-risk patients and thus improve their management. The study of prognostic biomarkers in lung cancer has been made by correlating the expression of a molecule to the patient survival. An alternative approach is to compare groups of patients with different clinical stages of disease, based on the assumption that a more advanced tumour is more aggressive and may expressproteins that drive the metastatic process. Proteomic studies have aimed at discovering altered protein levels and subsequently validating those differences using immunohistochemistry on archive samples. Using 2D-PAGE, Chen and co-workers associated 11 components of the glycolysis pathway to poor survival in lung adenocarcinoma (39) and also demonstrated their prognostic role in lung cancer at the mRNA level. Nonetheless, glycolysis involved enzyme phosphoglycerate kinase 1 was found to limit tumour growth in mice subcutaneously injected with the Lewis lung carcinoma cell line, by promoting antitumor immunity (60). A study using 2D-DIGE, MS, western blot, and immunohistochemistry correlated the up-regulation of annexinA3, a protein associated with cancer metastasis by angiogenic promotion, with advanced clinical stage, lymph node metastasis, increased relapse time, and overall decreased survival in lung adenocarcinoma, indicating that annexin A3 might be a prognostic lung cancer biomarker (61). The involvement of S100A11, a small calcium-binding protein implicated in the prognosis and metastasis in several tumours, has also been evaluated in lung cancer. Comparative proteomicanalysis of two NSCLC cell lines, the non-metastatic CL1-0 and highly metastatic CL1-5, revealed that S100A11 was up-regulated in metastatic CL1-5 cells (62). Moreover, immunohistochemical analyses in NSCLC tissues showed that the up-regulation of S100A11 was significantly associated with a higher TNM stage and a positive lymph node status, indicating its importance in promoting invasion and metastasis of NSCLC. Altered expression of S100A6 was also implicated in NSCLC progression: elevated levels of this protein were associated with longer survival compared to S100A6-negative cases (63). Cytoskeletal reorganization is a central process regulating cell migration and metastasis and cytokeratins (CKs), a family of cytoskeletal intermediate filaments, have been suggested to play a role in carcinogenesis, by promoting cellular architecture reorganization during tumour development and progression. A 2D-PAGE and MS analysis has revealed that isoforms of CK7, 8, 18, and 19 were found in higher levels in adenocarcinoma samples than in adjacent tissues (64). Specific isoforms of the CKs were associated with unfavourable prognosis, CYFRA21-1 was a more accurate diagnostic marker, and CK18 was a stronger prognostic factor (65). Other cytoskeletal proteins found to be correlated with a poor prognosis in lung adenocarcinoma are non-muscle myosin IIA and vimentin proteins, involved in epithelial-mesenchymal transition, a process at the basis of invasive and metastatic behaviour (66). Phosphohistidine phosphatase (PHP14) was proposed to be another lung cancer prognostic biomarker, regulating cell migration and invasion by cytoskeleton rearrangement. Indeed, it has been shown that PHP14 knockdown in highly metastatic lung cancer cells (CL1-5) inhibited migration and invasion, whereas its over-expression in NCI H1299 cells enhanced these processes (67). Calmodulin, a protein implicated in cytoskeletal alterations during cell death, thymosin β4, a regulator of actin polymerization whose over-expression seems to stimulate lung tumour metastasis, thymosin β10 and cofilin proteins, regulators of actin dynamics, were identified and their expression and prognostic role validated on cohort of 188 lung cancer cases (68).
9.3. Predictive biomarkers
The discovery of predictive biomarkers, those on which the efficacy of a specific treatment can be foreseen, has been based on studying clinical samples from responding and non-responding patients and then validating results on selected cohorts. This type of biomarker aims at individualizing therapies in lung cancer but relies on extremely well characterized samples from cohorts of patients receiving a uniform treatment and closely monitored therapeutic responses. A recent MALDI-TOF-MS study that profiled serum from patients treated with cisplatin-gemcitabine in combination with the proteasome inhibitor bortezomib, revealed a 13-peptide signature that was able to distinguish with high accuracy, sensitivity, and specificity, patients with short and long progression-free survival (69). The epidermal growth factor receptor (EGFR) tyrosine kinase is an important target for treatment of NSCLC, and EGFR-inhibitor-based therapies have showed promising results. The serum MALDI-MS study conducted by Taguchi and co-workers in NSCLC patients treated with gefitinib and erlotinib revealed an 8-peak profile predictive of outcome (70). This 8-peak signature was commercially launched as a commercial product (Veristrat ®, Biodesix, Broom field, CO, US) and its clinical relevance is being validated in the context of a randomized phase III clinical trial where patients with advanced NSCLC progressing after first-line treatment, stratified according to serum MALDI-MS profiling, are subsequently randomly allocated to receive either erlotinib or chemotherapy as second-line therapy (PROSE, Proteomics Stratified Erlotinib trial). To the best of our knowledge, this is the only clinical trial investigating the predictive role of a proteomics biomarker in lung cancer patients. A summary of all mentioned biomarkers can be found on Table 1.
|Type of Biomarker||Proteins||Techniques|
|Diagnostic||Hp β chain (54)||LC-ESI-MS/MS, ELISA|
|LC-MS/MS, ELISA, IHC|
|Apo A1 (56)||2D-PAGE, MALDI-TOF|
α-1 antitrypsin (58)
(11 components) (39)
|Annexin A3 (61)||2D-DIGE, MS, IHC*|
|S100A11 (62)||2D-PAGE, MALDI-TOF-MS/MS, IHC|
|CK 7, 8, 9 and 19 (64)||2D-PAGE, MS|
|PHP14 (67)||2D-PAGE, ESI-TOF-MS/MS|
Thymosin β10 (68)
|Predictive||13-peptide signature (69)||MALDI-TOF-MS|
|8-peak signature (70)||MALDI-MS|
Proteomic approaches are improving rapidly and the development of high-throughput platforms is showing promising results as the list of candidate biomarkers for lung cancer is continuously growing. However, there is a great need for careful interpretation of this intricate data in order to generate biologically relevant hypotheses. The proteome is highly complex and current tools cannot yet provide a definitive solution for its exploration. In addition, cancer is a multifactorial disease so diverse that a great deal of time and effort will be necessary to define its associated proteome modifications and to translate these into practical clinical applications. In fact for many of the identified proteins, their functional role in lung cancer development is not yet known and a solid clinical validation is still lacking. Nonetheless, it is likely that some of these candidate biomarkers will serve to identify new possible therapeutic strategies.