Open access

Introductory Chapter: Application of Bioinformatics Tools in Cancer Prevention, Screening, and Diagnosis

Written By

Ghedira Kais and Yosr Hamdi

Published: 28 September 2022

DOI: 10.5772/intechopen.104794

From the Edited Volume

Cancer Bioinformatics

Edited by Ghedira Kais and Yosr Hamdi

Chapter metrics overview

263 Chapter Downloads

View Full Metrics

1. Introduction

Cancer is a leading cause of death worldwide, with nearly 10 million deaths in 2020, accounting for one in six deaths. Breast, lung, colon rectum, and prostate are considered the most common cancer types [1]. Around one-third of deaths from cancer are due to environmental factors and lifestyle habits, such as tobacco use, high body mass index, alcohol consumption, low fruit and vegetable intake, and lack of physical activity [2]. In addition, 10% of cancer cases are due to genetic factors and around 10% of cancer-causing infections, such as human papillomavirus (HPV) and hepatitis, are responsible for approximately 30% of cancer cases in low- and lower-middle-income countries [3]. Indeed, HPV infection is the main cause of cervical cancer, cancer that can be cured if detected early and treated effectively [4]. The multifactorial character of the disease with the huge amount of data that has been generated during the last decades covering all risk factors behind cancer disease allowed bioinformatics to play an essential role in Cancer research and made oncology a success story in translating and using OMICs data, including genomics, transcriptomics and proteomics data, in clinical settings [5].

Advertisement

2. Use of bioinformatics integrative approaches in oncology

Numerous research groups worldwide have attempted to develop strategies to identify novel diagnostic and prognostic markers for different cancer types based on computational integrative analyzes and tools. One of the most powerful computational approaches is meta-analysis, where multiple studies interrogating a common hypothesis are analyzed together [6]. Several studies have applied meta-analysis methods to cancer microarray data in order to identify differentially expressed genes (DEGs) between cancer patients and controls. These methods can be applied to identify robust gene-expression signatures in a single cancer type and/or to look for common expression patterns across different types of cancer. In 2004, Rhodes and co-workers investigated and analyzed 40 published cancer microarray data sets, comprising 38 million gene expression measurements from >3700 cancer samples [7]. With the advent of high throughput sequencing technology, known as NGS, RNA sequencing (RNASeq) has been used in several aspects of cancer research and therapy including the discovery of biomarkers, the characterization of cancer heterogeneity and evolution, cancer immunotherapy, and the investigation of drug resistance [8]. High throughput sequencing technology has the advantage of fast-speed sequencing at low cost and with high accuracy compared to the former Sanger technology. Compared to microarray, RNASeq can also detect unknown gene expression sequences [9]. Gene expression profiling often generates large gene-expression signatures that need to be functionally analyzed to identify a handful of genes of interest that are selected for experimental validation. Several methods have been developed allowing systematic functional analysis of gene expression signatures including Gene Ontology (GO) [10, 11], KEGG [12], TransPath [13], and GenMAPP [14]. Finally, to better understand complex biological processes, such as cancer initiation and progression, it is important to consider the integration of transcriptomic data in the context of complex molecular networks. This implies the mapping of interactomes involving protein-protein interaction with the gene expression signature to identify induced or repressed interactome subnetworks on the basis of known and predicted protein-protein interactions [15].

Advertisement

3. Data science in oncology

In the past decade, Artificial intelligence (AI), particularly, machine learning (ML) has grown rapidly in the context of data analysis and computing allowing applications and platforms to function in an intelligent manner (https://pubmed.ncbi.nlm.nih.gov/34278328/). ML is a field that refers to a broad range of learning algorithms that perform intelligent predictions based on learning from a subset of data [16]. AI has recently altered the landscape of cancer research and medical oncology using traditional ML algorithms and cutting-edge Deep Learning (DL) approaches [17]. Indeed, ML algorithms including Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network (NN) have been used to optimize cancer classification [18]. Furthermore, DL-based algorithms have been widely applied in medical imaging to accurately diagnose breast cancer [19], colorectal cancer [20], lung cancer [21], and others [22]. Moreover, AI systems have been developed and used to diagnose early gastric cancer (EGC) from 4667 magnifying image-enhanced endoscopy images, including 1950 EGC images from 1042 cases and 2717 noncancerous images from 769 cases [23].

Advertisement

4. Tools and databases

Several publicly accessible databases containing cancer related data, and integrating tools for delivering and analyzing information and data, as well as specialized databases dedicated to specific types of cancer, have been developed during the last decades. Most commonly used and prominent ones include the International Cancer Genome Consortium (ICGC) [24] and The Cancer Genome Atlas (TCGA) [25]. A detailed list of publicly available databases and their descriptions has been reported by Pavlopoulou and co-workers [26]. Recently, a novel database integrating RNA-seq, DNA methylation, and related clinical data from over 10,000 cancer patients in the TCGA study as well as from normal tissues in the GTEx study has been developed and made freely available through [27, 28]. Concerning bioinformatics and computational tools for cancer risk prediction, numerous resources have been developed including the International Breast Cancer Intervention Study (IBIS) [29], the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) [30], the BRCAPRO [31] and the Breast Cancer Surveillance Consortium (BCSC) risk model [32]. A comprehensive list of web tools and web servers for cancer genomic study and cancer prognosis analysis has been provided by Yang and coworkers [33] and Zheng and colleagues [34].

Advertisement

5. Precision oncology application

Molecular and genetic profiling of tumors play an increasingly important role not only in cancer research but also in the clinical management of cancer patients [35]. Multi-omics approaches hold the promise of improving diagnostics, prognostics, and personalized treatment using highly reproducible and robust bioinformatics methods of complex data management and integration to go from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists [36]. The initial results coming out from these efforts are promising, but it has also become explicit that the exploitation of the full potential of precision oncology faces many challenges. One major bottleneck resides in the efficient and precise annotation of variants [37]. This challenge requires the use of databases containing well-curated variants as well as their interactions with potential drugs. The second challenge is the rapid development of molecular profiling techniques coming with novel challenges in terms of the development of new bioinformatics tools, pipelines, and workflows adapted to each of these new techniques [38]. Moreover, multi-omics approaches are providing more insights into dysregulated pathways, increasing the level of confidence in reporting actionable variants when they can be confirmed by RNA, protein, or epigenetic profiling. However, the availability of diverse multi-omics data is currently posing new bioinformatics challenges to integrate multiple data sets and identifying potentially efficient treatments [39]. Finally, interpreting the clinical significance of genomic variants and transcriptional changes is a laborious task that cannot be fully automated in a reliable way and therefore needs a multidisciplinary team to apply clinical interpretation to select relevant variants and to recommend targeted, personalized therapies [40]. That being said, bioinformatics still holds the hope to make the intersection of cancer research and medical applications for better clinical management of patients.

References

  1. 1. Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Piñeros M, et al. Global Cancer Observatory: Cancer Today. Lyon: International Agency for Research on Cancer; 2020
  2. 2. Cancer Prevention Overview (PDQ®)–Patient Version was originally published by the National Cancer Institute
  3. 3. de Martel C, Georges D, Bray F, Ferlay J, Clifford GM. Global burden of cancer attributable to infections in 2018: A worldwide incidence analysis. The Lancet Global Health. 2020;8(2):e180-e190
  4. 4. Burd EM. Human papillomavirus and cervical cancer. Clinical Microbiology Reviews. 2003;16(1):1-17. DOI: 10.1128/CMR.16.1.1-17.2003
  5. 5. Brenner C. Applications of bioinformatics in Cancer. Cancers (Basel). 2019;11(11):1630. DOI: 10.3390/cancers11111630
  6. 6. Rhodes D, Chinnaiyan A. Integrative analysis of the cancer transcriptome. Nature Genetics. 2005;37:S31-S37. DOI: 10.1038/ng1570
  7. 7. Rhodes DR, Yu J, Shanker K, et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(25):9309-9314. DOI: 10.1073/pnas.0401994101
  8. 8. Wang Y, Mashock M, Tong Z, Mu X, Chen H, Zhou X, et al. Changing technologies of RNA sequencing and their applications in clinical oncology. Frontiers in Oncology. 2020;10:447. DOI: 10.3389/fonc.2020.00447
  9. 9. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 2008;18(9):1509-1517. DOI: 10.1101/gr.079558.108
  10. 10. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Research. 2004;32(Database issue):D258-D261. DOI: 10.1093/nar/gkh036
  11. 11. Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA. Onto-tools, the toolkit of the modern biologist: Onto-express, onto-compare, onto-design and onto-translate. Nucleic Acids Research. 2003;31(13):3775-3378. DOI: 10.1093/nar/gkg624
  12. 12. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: Integrating viruses and cellular organisms. Nucleic Acids Research. 2021;49(D1):D545-D551. DOI: 10.1093/nar/gkaa970
  13. 13. Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. TRANSPATH: An integrated database on signal transduction and a tool for array analysis. Nucleic Acids Research. 2003;31(1):97-100. DOI: 10.1093/nar/gkg089
  14. 14. Doniger SW, Salomonis N, Dahlquist KD, et al. MAPPFinder: Using gene ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biology. 2003;4:R7. DOI: 10.1186/gb-2003-4-1-r7
  15. 15. Erdogan F, Radu TB, Orlova A, Qadree AK, de Araujo ED, Israelian J, et al. JAK-STAT core cancer pathway: An integrative cancer interactome analysis. Journal of Cellular and Molecular Medicine. 2022;26(7):2049-2062. DOI: 10.1111/jcmm.17228. Epub 2022 Mar 1. PMID: 35229974; PMCID: PMC8980946
  16. 16. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Translational Vision Science & Technology. 2020;9(2):14. DOI: 10.1167/tvst.9.2.14
  17. 17. Kourou K, Exarchos KP, Papaloukas C, Sakaloglou P, Exarchos T, Fotiadis DI. Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal. 2021;19:5546-5555. DOI: 10.1016/j.csbj.2021.10.006
  18. 18. Ramroach S, Joshi A, John M. Optimisation of cancer classification by machine learning generates an enriched list of candidate drug targets and biomarkers. Molecular Omics. 2020;16(2):113-125. DOI: 10.1039/c9mo00198k
  19. 19. Shang LW, Ma DY, Fu JJ, Lu YF, Zhao Y, Xu XY, et al. Fluorescence imaging and Raman spectroscopy applied for the accurate diagnosis of breast cancer with deep learning algorithms. Biomedical Optics Express. 2020;11(7):3673-3683. DOI: 10.1364/BOE.394772
  20. 20. Choi K, Choi SJ, Kim ES. Computer-aided Diagonosis for colorectal Cancer using deep learning with visual explanations. Annual International Conference of the IEEE Engineering in Medicine & Biology Society. 2020;2020:1156-1159. DOI: 10.1109/EMBC44109.2020.9176653
  21. 21. Shimazaki A, Ueda D, Choppin A, Yamamoto A, Honjo T, Shimahara Y, et al. Deep learning-based algorithm for lung cancer detection on chest radiographs using the segmentation method. Scientific Reports. 2022;12(1):727. DOI: 10.1038/s41598-021-04667-w
  22. 22. Ma CY, Zhou JY, Xu XT, Guo J, Han MF, Gao YZ, et al. Deep learning-based auto-segmentation of clinical target volumes for radiotherapy treatment of cervical cancer. Journal of Applied Clinical Medical Physics. 2022;23(2):e13470. DOI: 10.1002/acm2.13470
  23. 23. Abe S, Tomizawa Y, Saito Y. Can artificial intelligence be your angel to diagnose early gastric cancer in real clinical practice? Gastrointestinal Endoscopy. 2022;95(4):679-681. DOI: 10.1016/j.gie.2021.12.042
  24. 24. International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, et al. International network of cancer genome projects. Nature. 2010;464(7291):993-998. DOI: 10.1038/nature08987
  25. 25. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer genome atlas Pan-Cancer analysis project. Nature Genetics. 2013;45(10):1113-1120. DOI: 10.1038/ng.2764
  26. 26. Pavlopoulou A, Spandidos DA, Michalopoulos I. Human cancer databases (review). Oncology Reports. 2015;33(1):3-18. DOI: 10.3892/or.2014.3579
  27. 27. Tang G, Cho M, Wang X. OncoDB: An interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Research. 2022;50(D1):D1334-D1339. DOI: 10.1093/nar/gkab970
  28. 28. Tang G, Cho M, Wang X. OncoDB: An interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Research. 2022;50(D1):D1334-D1339
  29. 29. Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Statistics in Medicine. 2004;23(7):1111-1130. DOI: 10.1002/sim.1668. Erratum in: Statistics in Medicine 2005 Jan 15;24(1):156
  30. 30. Lee A, Mavaddat N, Wilcox AN, Cunningham AP, Carver T, Hartley S, et al. BOADICEA: A comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genetics in Medicine. 2019;21(8):1708-1718. DOI: 10.1038/s41436-018-0406-9
  31. 31. Antoniou AC, Hardy R, Walker L, Evans DG, Shenton A, Eeles R, et al. Predicting the likelihood of carrying a BRCA1 or BRCA2 mutation: Validation of BOADICEA, BRCAPRO, IBIS, myriad and the Manchester scoring system using data from UK genetics clinics. Journal of Medical Genetics. 2008;45(7):425-431. DOI: 10.1136/jmg.2007.056556
  32. 32. Shieh Y, Hu D, Ma L, Huntsman S, Gard CC, Leung JW, et al. Breast cancer risk prediction using a clinical risk model and polygenic risk score. Breast Cancer Research and Treatment. 2016;159(3):513-525. DOI: 10.1007/s10549-016-3953-2
  33. 33. Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, et al. Databases and web tools for cancer genomics study. Genomics Proteomics Bioinformatics. 2015;13(1):46-50. DOI: 10.1016/j.gpb.2015.01.005. [Epub 2015 Feb 21]. Erratum in: Genomics Proteomics Bioinformatics. 2015 Jun;13(3):202-203
  34. 34. Zheng H, Zhang G, Zhang L, et al. Comprehensive review of web servers and bioinformatics tools for Cancer prognosis analysis. Frontiers in Oncology. 2020;10:68. DOI: 10.3389/fonc.2020.00068
  35. 35. Dietel M, Jöhrens K, Laffert MV, Hummel M, Bläker H, Pfitzner BM, et al. A 2015 update on predictive molecular pathology and its role in targeted cancer therapy: A review focussing on clinical relevance. Cancer Gene Therapy. 2015;22(9):417-430. DOI: 10.1038/cgt.2015.39
  36. 36. Orlov YL, Baranova AV, Tatarinova TV. Bioinformatics methods in medical genetics and genomics. International Journal of Molecular Sciences. 2020;21(17):6224. DOI: 10.3390/ijms21176224
  37. 37. Fröhlich H, Balling R, Beerenwinkel N, et al. From hype to reality: Data science enabling personalized medicine. BMC Medicine. 2018;16(1):150. DOI: 10.1186/s12916-018-1122-7
  38. 38. Singer J, Irmisch A, Ruscheweyh HJ, et al. Bioinformatics for precision oncology. Briefings in Bioinformatics. 2019;20(3):778-788. DOI: 10.1093/bib/bbx143
  39. 39. Miller DT, Lee K, Gordon AS, Amendola LM, Adelman K, Bale SJ, et al. ACMG secondary findings working group. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: A policy statement of the American College of Medical Genetics and Genomics (ACMG). Genetics in Medicine. 2021;23(8):1391-1398. DOI: 10.1038/s41436-021-01171-4
  40. 40. Qian M, Li Q , Zhang M, et al. Multidisciplinary therapy strategy of precision medicine in clinical practice. Clinical and Translational Medicine. 2020;10(1):116-124. DOI: 10.1002/ctm2.15

Written By

Ghedira Kais and Yosr Hamdi

Published: 28 September 2022