Molecular Biology of Lung Cancer and Future Perspectives for Screening

Lung cancer patients have the highest mortality among patients with solid tumors worldwide and their prognosis is strictly stage-associated. However, only 15–20% of patients are diagnosed in stage I, since these early tumors are frequently asymptomatic. Early detection of lung cancer, which allows effective therapeutic intervention, is a promising approach to lowering its mortality rate. However, conventional diagnostic methods for lung cancer, such as chest X-ray and CT of the chest, produce high costs and potentially false-positive results. Thus, the discovery of highly sensitive, specific, noninvasive, and cost-effective lung cancer biomarkers combined with conventional approaches, such as X-rays, may improve the sensitivity of lung cancer screening. Herein, we summarize the most recent studies about the molecular pathology of lung cancer and discuss the advancements expected in the near future, including the potential biomarkers and liquid biopsy approaches for the detection of lung cancer in populations at risk of developing this disease.


Introduction
Despite multimodality treatment strategies including surgery, radiotherapy, chemotherapy, and targeted therapy, lung cancer is still the first leading cause of cancer-related death in the world with the 5-year lung cancer survival rate remaining as low as 15% [1][2][3]. The most common histologies are summarized as non-small cell lung cancer (NSCLC) and account for 80-85% of newly diagnosed cases. Surgery is the standard of care for functionally operable early stage NSCLC and resectable stage IIIA disease and possesses a potential for cure. However, only 20% of NSCLC are resectable at diagnosis [4]. Histology and cytology of thoracic biopsies are currently the gold standard that asserts early diagnosis of lung cancers detected by thoracic imagery. Nevertheless this approach is costly and often detects false positive nodules that turn out not to be cancers. Therefore, early detection approaches-especially directed toward the population at high risk for the development of this disease-remain an unmet clinical need. Several studies have been performed to define the ideal approach that should be sensitive, specific, reliable, and reproducible for early diagnosis of lung cancer or for prediction of the development of this disease in subjects at risk. This review summarizes the molecular

ALDH family proteins
The cancer stem cell model proposes that tumor progression, drug resistance, metastasis, and relapse after therapy may be driven by a subset of cells within the tumor: the cancer stem cells (CSCs) [17][18][19][20]. Recent evidences suggest that like other tumors, human lung cancers may also harbor CSC populations. Human alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH) are the principal enzymes responsible for ethanol metabolism and have heterogeneous tissue distribution. Isoenzymes of ADH participate in bioamine, prostaglandin, and retinoid acid metabolism [21]. The second enzyme ALDH belongs to a large family of intracellular enzymes that participate in cellular detoxification, differentiation, and drug resistance through the oxidation of endogenous and exogenous aldehydes to carboxylic acids [22]. The ALDH superfamily currently consists of 19 known putatively functional genes in 11 families and 4 subfamilies with distinct chromosomal locations [23][24][25]. Several studies have explored the biological significance of ALDH in cancers such as head and neck cancer, colon cancer, breast cancer, papillary thyroid carcinoma, and specifically lung cancer, where they have provided supportive evidence for the association between ALDH activity and lung cancer stem cells [26][27][28][29][30][31][32]. ALDH1A1 seems to be co-expressed with other NSCLC stem cell markers such as leucine-rich repeat-containing G-protein-coupled receptor 5 (LGR5) in NSCLC tissues, and their expression is significantly associated with stage disease and poor prognosis [33]. It was reported that ALDH1A1-negative expression in lung cancer patients corresponds to shorter survival compared to those with ALDH1A1-positive expression and that ALDH1A1 overexpression was associated with a favorable outcome. Moreover, high expression of ALDH1A1 mRNA was found to be correlated to a better overall survival (OS) in all NSCLC patients followed for 20 years. In addition, high expression of ALDH1A1 mRNA was also found to be correlated to better OS in Ade patients but not in SCC patients. These results strongly support that ALDH1A1 mRNA in NSCLC is associated with better prognosis. However, there are other contradictory results indicating that ALDH1 cytoplasmic expression was associated with poor prognosis in several tumors, such as NSCLC [34]. Jiang et al. also showed that ALDH1A1 expression was positively correlated with the stage and grade of lung tumors and related to a poor prognosis [34]. A recent meta-analysis showed that increased ALDH1A1 expression is associated with poor OS and disease-free survival in lung cancer patients [35]. Previous studies showed that also several other ALDH isoforms are involved in lung cancer as ALDH3A1, highly expressed in two types of NSCLC, Ade and SCC, and ALDH3B1 expression was also found to be upregulated in a high percentage of human tumors, particularly in lung cancer [36][37][38].

Lung cancer screening
Cancer screening is promising for malignancies with a stage-dependent prognosis, and it aims to reduce morbidity and mortality through detection of cancer at an early stage. In general, the screening programs have to be subjected to a rigorous risk-benefit assessment taking into account the endpoints as cancer-related mortality, overall mortality, morbidity, patient-reported outcome, and costs. All the screening programs need a transparent system of quality assurance.

Low-dose computed tomography
Several studies on lung cancer screening were conducted mainly by using chest X-rays (CXR) for imaging alongside sputum cytology. The National Lung Screening Trial (NLST) enrolled 53,000 individuals aged 55-74 years with a 30-pack-year smoking history, and participants were randomly assigned to radiography or low-dose CT. The low-dose CT group had a 20% reduction in lung cancer mortality and a 6-7% reduction in all-cause mortality [39]. The International Early Lung Cancer Action Program (I-ELCAP) analyzed retrospectively the outcomes of more than 21,000 patients after the completion of the NLST. Different size threshold for nodule diameters resulted in different cancer diagnosis rates. Increasing the threshold from 5 to 0 mm to 6-0, 7-0, 8-0, or 9-0 mm also changed the frequencies of positive results [40]. With respect to North American, European studies performed on a smaller number of individuals at risk of lung cancer showed somewhat inconsistent and less significant results [41][42][43]. Although these studies showed an improved stage distribution in favor of earlier stages, better resectability of the tumors, and also improved survival, an effect on overall mortality could not be demonstrated [39,44,45]. Aside from the morbidity and mortality that is not justified within this context, the expenses turn out to be substantial, as thoracic imagery can be repeated, leading also to debated benefit risks. Despite the progress made in imagery, which allowed the detection of nodules less than 3-4 mm and even the definition of the malignant or benign features, currently cancerous lesions less than 1 mm cannot be detected by imagery [46]. A major drawback of low-dose CT is the large number of false-positive tests and the diagnosis of indolent tumors which in turn lead to an increased morbidity from unnecessary surgical treatment [47][48][49][50]. Thus, even if the imagery can allow early stage asymptomatic and operable lung cancer detection, these approaches are not satisfactory because of high cost, high risk of radiation exposure, and poor sensitivity and specificity.

Biomarkers for lung cancer detection
The discovery of cancer biomarkers, specific molecules that help to distinguish between normal and cancerous conditions, may potentially be used to develop a more effective diagnostic tool for cancer. Body fluids (blood, pleural effusion, etc.) that are in contact with tumors are enriched with proteins shed from cancer cells. Proteins secreted from cancer cells could enter the blood circulation and have the potential to be monitored in plasma/serum. Carcinoembryonic antigen (CEA) is an oncofetal protein not typically expressed in adult tissues. In lung cancer the CEA levels in blood are elevated and are inversely correlated with the response to cancer therapy. Therefore, this marker is used for the detection of cancer recurrence and the prediction of a poor survival rate. CYFRA-21-1 is a fragment of cytokeratin 19 that is typically associated with epithelial cell cancers including NSCLC. This marker is correlated with disease response and the prognosis of cancer but cannot be used to identify cancer patients from patients with respiratory diseases. The sensitivity of CYFRA 21-1 for NSCLC ranges between 23 and 70% [51,52]. Neuronspecific enolase (NSE) is a glycolysis enzyme produced in neuronal cells and cells with neuroendocrine differentiation. SCLC is of neuroendocrine origin, and therefore NSE is found to be elevated in patients' blood [53]. Tumor M2-pyruvate kinase (PKM2) is a dimeric form of the pyruvate kinase isoenzyme type M2 that is increased in various cancers [52,54]. C-reactive protein (CRP) is an acute-phase protein, the levels of which rise in response to inflammatory conditions such as lung cancer. However, recent studies suggested that CRP could be used as a prognostic biomarker of lung cancer and angiogenesis [55]. Serological markers such as CEA, NSE, and CYFRA 21-1 are used for the monitoring of treatment effects in lung cancer, but their diagnostic value as screening biomarkers is still being debated [56,57]. To date, no useful marker has been identified for the screening of asymptomatic patients. Ideally, a biomarker should have a sensitivity and specificity of 100%, a goal that is almost never achieved. One strategy potentially increasing both parameters is to combine several biomarkers into a screening marker panel. Several studies with smaller panels encompassing few markers provided first evidence that simultaneous analysis of several antigens have a higher potential for separating patients with lung cancer from controls [56]. Combined with other noninvasive methods, this may allow for further refinement of lung cancer screening [58].

Future perspectives 4.1 New potential lung cancer biomarkers
Proteomics studies showed new lung cancer biomarkers that can be tested in the blood ( Table 2). Plasma kallikrein (KLKB1) enzyme cleaves Lys-Arg and Arg-Ser bonds in kininogen to release bradykinin and has functions related to blood coagulation. Studies evidenced how serum levels of its fragmentation form were increased in lung cancer samples compared with normal control sera [59,60]. Serum amyloid A (SAA) proteins are a family of apolipoproteins associated with the high-density lipoprotein (HDL) complex that are secreted during the acute phase of inflammation. In particular, isoforms SAA1/2 were detected in Ade patients' sera but not in healthy donors' sera using liquid chromatography/mass spectrometry (LC-MS/ MS). This protein was also detected in tissue [59,61]. Haptoglobin (Hp) is a free hemoglobin-binding glycoprotein that inhibits the oxidative stress of hemoglobin and assists in hemoglobin uptake. It is a tetramer constituted by two α and two β chains. High levels of Hp have been reported in various cancer types including lung cancer. Proteomics analysis showed Hp β chain peptide levels to be threefold higher in lung cancer patients' sera with respect to control subjects [59,62]. Complement component 9 (C9) protein, a terminal constituent of the membrane attack complex, plays a role in the immune response by forming plasma membrane pores. This protein was identified in sera of patients with SCC by glycoproteomics approaches. Its protein levels were significantly higher in SCC patients than those in healthy donors and in patients with other cancer types [59,63]. Insulin-like growth factorbinding protein-2 (IGFBP-2), member of the insulin-like growth factor-binding protein family, inhibits IGF-mediated growth and development rates. Increased levels of IGFBP-2 have been found in solid tumors and in blood from patients with glioma and colorectal, prostate, and breast cancers above all at advanced stage disease. Recently circulating anti-IGFBP-2 autoantibodies and IGFBP-2 combined markers showed increased diagnostic sensitivity and specificity for lung cancer with respect to IGFBP-2 alone [64]. Peroxiredoxin 1 (PRX1) and peroxiredoxin 2 belong to a family of ubiquitous multifunctional antioxidant proteins. The main function of PRX1 is to eliminate peroxides generated during metabolism. PRX1 is also involved in the inhibition of oncogenes, and its protein levels were found to be higher in human cancer cells and tissues. Recently, PRX1 was also identified in lung cancer patients' plasma by mass spectrometry-based screening technology. Plasma PRX1 levels were increased in patients with lung cancer and also in subjects exposed to asbestos [65]. Endoglin (CD105) is a major cell membrane glycoprotein of the vascular endothelium. The main function of CD105 is to help the binding of endothelial cells to integrins and other receptors promoting angiogenesis by the activation of endothelial cells. CD105 overexpression was found in the endothelium of vessels in human solid tumors and is closely associated with a poor prognosis and the presence of metastases. Moreover levels of soluble CD105 (s-endoglin), formed by the cleavage of ectodomain of membrane receptors, were higher in patients with various types of cancer compared to normal counterparts, and its levels were also associated to metastases. In NSCLC s-endoglin serum levels were significantly decreased in postoperation patients, confirming its potential use for monitoring and prognosis of lung cancer [59,66]. Progesterone receptor membrane component 1 (Pgrmc1) is a cytochrome b5-related protein induced by carcinogens. In fact, Pgrmc1 levels are elevated in spontaneous ovarian, breast, and lung cancers. Pgrmc1 is known to localize at the endoplasmic reticulum, and it was identified as a sigma-2 receptor, which is induced in cancers. Recently, Pgrmc1 showed also the potential to be a serum biomarker for lung cancer. It was shown that Pgrmc1 is localized in secretory vesicles and is secreted by lung cancer cells. Moreover, Pgrmc1 levels in the plasma and in the exosome fractions of plasma were significantly increased in lung cancer patients [59,67]. Pro-gastrin-releasing peptide (proGRP, residue 31-98) is a more stable biochemical precursor of gastrin-releasing peptide (GRP), which is specifically produced by the neuroendocrine origin of SCLC cells. In a recent report, proGRP levels were increased in SCLC patients with respect to patients with other  types of lung cancer; its levels are also associated with the progression of the disease [59,68]. Ciz1 is a nuclear matrix protein which promotes the initiation of mammalian DNA replication. Recently, variant Ciz1 (24 nucleotides from the 3′end of exon 14 are excluded, leading to in frame deletion of eight amino acids 'VEEELCKQ') protein levels were significantly increased in the plasma of early stage lung cancer patients compared with that from healthy donors and with other respiratory diseases suggesting its potential use as a diagnostic lung cancer biomarker. Its sensitivity and specificity for stage I NSCLC were 95 and 74%, respectively [59]. MMP-1 is a collagenase that cleaves collagen types I, II, III, IV, and X at one site in the helical structure and is overexpressed in various cancer cells. High plasma levels of MMP-1 seem to be associated with a lower patient survival rate [59,69]. uPAR is a glycosylphosphatidylinositol (GPI)-anchored glycoprotein and cell surface receptor specific to the urokinase plasminogen activator (uPA). The uPA-catalyzed cleavage of uPAR is a negative feedback loop in which uPA cleaves uPAR leaving the cleaved form of uPAR attached to the cell surface. It was shown that serum levels in preoperative NSCLC patients are correlated with higher levels of cleaved uPAR and lower survival rate [59,70]. Kuroda et al. showed that ADAM28, a disintegrin and metalloproteinase 28 overexpressed in NSCLC tissues, was detectable also in serum of patients and increases with progress of tumor stage [71]. The sensitivity, false-negative rate, and AUC for ADAM28 were even better than those for CEA, suggesting a potential use of this test for diagnosis and monitoring of NSCLC. Recently, serum levels of ALDH1A1 were shown to be elevated in the sera of patients with NSCLC. Combined testing of serum ALDH1A1 and CEA levels significantly increased the screening sensitivity of CEA alone [72]. We provided evidence that isoforms other than ALDH1A1 may be secreted into the blood of lung cancer patients, and therefore screening sensitivity may be further enhanced by using an isoform-unspecific ALDH test without apparently affecting specificity [unpublished results]. Our results showed elevated ALDH serum levels can be detected in the vast majority of patients with early and advanced stage disease, suggesting that serum ALDH should be evaluated as part of a marker panel for noninvasive detection of early lung cancer in a larger cohort of patients at risk. Although identification of proteins is now promising, quantification of proteins in complicated mixtures by MS remains challenging, especially in plasma carrying a large amount of proteins. Using antibody-based techniques such as ELISA, for biomarker measurements, could be hindered by a lack of high-quality antibodies. A quantitative approach has evolved, which performs targeted analysis of representative peptides by multiple reaction monitoring (MRM), and evaluated the potential utility of a list of candidate proteins for lung cancer diagnosis.

Liquid biopsy
Current methods employed for the evaluation of cancer genomes require tissue biopsy, either bone marrow biopsy of blood cancer or biopsy of affected nodal/ soft tissue. These procedures are invasive and are limited by being representative of only a single site of the disease under evaluation. Thus, single-site tissue biopsy may not truly reflect the entire mutational profile of an individual's disease, and from a logistical perspective, it may not be suited to repeat biopsy over short periods of time. Liquid biopsy seems to be the most promising and, in contrast to tissue biopsy, is noninvasive, can provide a better representation of cancer genetic profile, and can be easily repeated [73] (Figure 1). A liquid biopsy is a blood sample of about 10-20 ml taken for diagnosis, prognosis, and prediction of a treatment purposes. It is a noninvasive approach to screening and early diagnosis of lung cancer. It consists not only of biomarkers but also circulating free DNA (cfDNA) and circulating tumor cells (CTCs). There are two methods for CTC isolation: the indirect CellSearch and the direct isolation by size of epithelial tumor cells (ISET) filtration method, which is, to date, the most interesting method for early diagnosis and screening of lung cancer using CTCs. CTCs can be cytomorphologically characterized before surgery and can be detected from asymptomatic patients with stage I lung cancer. Moreover, recent studies showed that CTCs can be isolated from patients with high risk of developing lung cancer (smokers with chronic obstructive pulmonary disease) without nodules detected by CT and that at the follow-up resulted positive for Ade [74,75]. It was also showed that the initial presence of CTCs had a predictive value of 100% of developing a secondary lung cancer. As a follow-up to this pilot study performed in a single center and on a restricted number of patients, a multicenter study (named AIR project) began and has been involving 20 French university hospitals and 600 patients with chronic obstructive disease, over 55 years who smoked more than 30 packets per year. This project aims to study patients by means of CTC detection through ISET along with CT. Nucleic acids as DNA or RNA fragments can circulate in the plasma either freely or present in vesicle, as exosomes. While cfDNA is universally found in the plasma of healthy people as well as those with benign diseases, it has been observed that patients with malignant disease have higher levels of cfDNA in their plasma [76]. Among RNA, coding (microRNA) and noncoding RNA can circulate. Recent studies have evidenced the more or less complex signature of plasma microRNA associated with lung cancer. In particular, it was showed that a signature of several plasma microRNA has a predictive value for lung cancer in a high-risk population [74]. Although liquid biopsy can permit the monitoring of patients on treatment or after treatment for lung cancer, it holds some limits. The major limits  for the use of circulating nucleic acids for early lung cancer detection concerns the distinction between free nucleic acid of germinal or somatic origin because of the lysis of circulating hematological cells and also the low amount detectable in very early tumor stages. Moreover, circulating somatic microRNA can be released from other diseases associated with lung cancer, such as cardiovascular diseases, inflammatory disorders, pulmonary fibrosis, and other associated cancers. Another limit is relative to the pre-analytical phase, which is very crucial as the delay between blood sampling and the analytical phase should be as short as possible. It should also be the same for all the patients involved in the study, for a better comparison of the results, and the blood must be conditioned with a buffer that allows excellent conservation of the material. Moreover, the low amount of biomarkers, cfDNA or CTCs in the blood of patients with a very small tumor or not still visible by thoracic imagery, needs a high sensitive technique, which should be able to isolate enough material to conduct the analysis. Another problem is represented by the lack of standardization of the different pre-analytical and analytical steps which limit the deployment of liquid biopsy in clinical practice for early diagnosis of lung cancer [77][78][79]. Therefore, it is difficult to obtain a specific result if the pre-analytical phase is not standardized among all the patients involved in the study and, also, if the clinico-biological information about the patient is not known, as other potential comorbidities can emerge.
Other than variants in cfDNA, aberrant DNA methylation of some novel and known genes was also investigated in serum of patients with lung cancer by means of a quantitative methylation-specific PCR and showed a specificity of 71% [77]. Identification of blood-based noninvasive or minimally invasive detection markers will improve the clinical management of lung cancer. It is noteworthy that this simple, reliable, and noninvasive blood test could aid not only the early detection of lung cancer but also could be used most effectively to direct imaging modalities with low specificity such as CT. Such a test could therefore have a significant impact on the long-term survival of these individuals. Moreover, these tests can be helpful in monitoring the response to therapy and to identify new actionable mutations.

Conclusions
This review summarizes the molecular pathology and the conventional methods used for screening of lung cancer, highlighting the advantages and limits of these approaches. We also report the recent studies about new circulating biomarkers potentially useful for lung cancer screening. Ideally, a biomarker should have a sensitivity and specificity of 100%, a goal that is almost never achieved. One strategy potentially increasing both parameters is to combine several biomarkers into a screening marker panel. Combined with other noninvasive methods, this may allow for further refinement of lung cancer screening. Liquid biopsy is a 10 ml blood sample taken for diagnostic, prognostic, and disease monitoring purposes. It consists not only of biomarkers but also circulating cfDNA, RNA, and CTCs. With respect to tissue biopsy, it permits to have a better representation of the whole cancer genetic profile and may be suited to repeat biopsy over short periods of time. Compared to imaging modalities such as X-rays and CT, liquid biopsy represents a more reliable, less invasive, and less expensive method for the detection of lung cancer in populations at risk of developing this disease.