Next-generation sequencing (NGS) technologies represented the next step in the evolution of DNA sequencing, through the generation of thousands to millions of DNA sequences in a short time. The relatively fast emergence and success of NGS in research revolutionized the field of genomics and medical diagnosis. The traditional medicine model of diagnosis has changed to one precision medicine model, leading to a more accurate diagnosis of human diseases and allowing the selection of molecular target drugs for individual treatment. This chapter attempts to review the main features of NGS technique (concepts, data analysis, applications, advances and challenges), starting with a brief history of DNA sequencing followed by a comprehensive description of most used NGS platforms. Further topics will highlight the application of NGS towards routine practice, including variant detection, whole-exome sequencing (WES), whole-genome sequencing (WGS), custom panels (multi-gene), RNA-seq and epigenetic. The potential use of NGS in precision medicine is vast and a better knowledge of this technique is necessary for an efficacious implementation in the clinical workplace. A centralized chapter describing the main NGS aspects in the clinic could help beginners, scientists, researchers and health care professionals, as they will be responsible for translating genomic data into genomic medicine.
- precision medicine
Precision medicine is a new way of practising medicine, which has been gaining strength in recent years, is based on the individual characteristics of each patient (genetic, environmental, behavioural) to optimize and customize strategies for prevention, detection and therapy [1, 2]. The molecular knowledge has contributed strongly to the advancement of precision medicine, providing specific strategies for target therapies and diagnosis of patients with cancer, Mendelian diseases and others. Statistics indicated that traditional clinical practices sometimes lead to poor health outcomes and also a waste of medical resources. It is estimated that about 75 billion US dollars per year (30% of health care expenditure) are destined for unnecessary or ineffective treatments in the USA .
As a result of the genome project, many molecular tools have been developed and allow medical and scientific groups to improve patient management based on a better understanding of disease biology, providing a more specific and accurate prevention and treatment of diseases . Precision medicine redefines the way traditional medicine is practised. There is a great deal of investment nowadays in prevention using these new technologies, as opposed to old medicine based on treatment since the disease was already evident or irreversible .
In recent times, Sanger sequencing, referred to as a ‘first-generation’ sequencing method, has partly been replaced by ‘next-generation’ sequencing (NGS) methods [4, 5]. NGS allows identifying biomarkers for early diagnosis as well as for personalized treatments. The emergence of NGS has changed the way clinical research, basic and applied science are done. The NGS allows producing millions of data with a smaller investment [4, 6]. Among the available NGS applications, one of them will be the resequencing of the human genome and the better genetic understanding of various human diseases. A great challenge will be the interpretation of this great number of data and its translation for the medical application . One of the major near-term medical impact of the NGS revolution will be the elucidation of mechanisms of human pathogenesis, leading to improvements in the diagnosis and the selection of treatment and prevention. Thanks to second-generation sequencing technologies, it has become easier to sequence the expressed genes (‘transcriptomes’), known exons (‘exomes’) and complete genomes of patient’s samples .
This chapter encompasses revised concepts, applications, advances, limitations and the history of technological advances until the emergence of NGS technique in the era of precision medicine, starting with a brief history of DNA sequencing followed by a comprehensive description of most used NGS platforms, sequencing chemistries methodology and general workflows. Further topics will highlight the application of NGS towards routine practice, including variant detection, whole-genome sequencing (WGS), whole-exome sequencing (WES) and multi-gene panels. A centralized chapter describing the main NGS features in the clinic could help beginners, scientists, researchers and health care professionals, as they will be responsible for translating genomic data into genomic medicine.
2. From Sanger to NGS sequencing
In 1908, Garrod introduced his concept ‘the inborn error of metabolism’ that changed the areas of biochemistry, genetics and medicine . His principal contribution was the understanding about the relationship between gene-enzyme, the molecular basis of genetic diseases. Although today this concept is considered outdated because of discoveries like RNA splicing, RNAi and others, its development allowed the researchers to understand how changes in DNA sequence could cause genetic disease. This finding increased the interest of scientists to know about human DNA sequence and mutations.
The search to know the nucleotide sequence of DNA began in the 1960s with several studies that demonstrated new methods with different strategies [9–13], but it was in 1977 that Sanger developed the method called ‘Chain-termination’ that became the most used method (first generation) to sequencing DNA (Figure 1). The method consisted of the use of dideoxynucleotides (ddNTPs), which are deoxynucleotide analogs (dNTPs) that disrupt DNA synthesis, and the separation of the different DNA fragments in a gel. These special nucleotides were radiolabeled and therefore the sequence could be inferred after the disclosure of gel autoradiography . Numerous modifications have been made in this technique to make the method more efficient, robust and sensitive. Among them are the substitution of nucleotide radiolabeled to fluorescence that allowed the sequencing reaction to occur in one tube , the development of the polymerase chain reaction , the separation of DNA fragments by capillary electrophoresis  and later the development of equipment that allowed the sequencing of more complex genomes. The most famous sequencing project, the Human Genome Project, produced in 13 years 3 billion of sequenced bases with the estimated cost around $2.7 billion . To date, Sanger is still the gold-standard method in diagnostic tests and although the most recent methods have a much higher processing capacity, confirmation of some findings is made using this method.
The second generation of DNA sequencing can be defined as the era of the parallel massive sequencing on a micro scale. The Pyrosequencing method developed by Nyrén and colleagues in 1996 was the starting point for this generation. This technique differed substantially from previous ones because it did not use radio or fluorescence-labelled nucleotides and there was no need of electrophoretic run. The method is based on the action of two enzymes: ATP sulfurylase and luciferase. ATP sulfurylase converts pyrophosphate released in nucleotide incorporation into an ATP molecule that is used by luciferase substrate. This process releases light signal in proportion to the amount of nucleotides incorporated, and the sequence can be determined according to the serial addition of nucleotides . Later on, this technology was improved and licensed generating the first ‘second-generation’ equipment, known as 454 (Roche). Among the improvements made, there are the DNA binding in beads through an adapter and the amplification of this DNA in water-in-oil microreactors (emulsion PCR). These changes and the use of microplates that compartmentalized the process and high-definition detection systems dramatically increased the amount of DNA sequenced and defined the second generation . The disadvantage of this technology is related to homopolymer regions because of difficulty in interpreting the signal strength when five or more nucleotides are incorporated in a single wash cycle. Other technologies were then developed, such as that used by Illumina which consists of binding the DNA in a flow-cell through adapters, and the parallel massive amplification occurs in clusters for each DNA strand that was originally bound in the flow-cell, called bridge-amplification. This process generates paired-ends sequences that are an advantage over other methodologies, since they improve the accuracy of mapping, mainly in repetitive regions or where DNA rearrangements or gene fusions occur. The method uses ‘reversible terminator chemistry’ which is a modified fluorescent dNTP that reversibly blocks DNA synthesis, so the addition of each nucleotide can be synchronized and monitored by a charge-coupled device (CCD) sensor . This is one of the most accurate and with lowest error rate of sequencing methodologies used currently; however, it generally requires higher DNA concentration. Another methodology is based on oligonucleotide ligation sequencing known as SOLiD and developed by Applied Biosystems (now Thermo Fisher Scientific). The method does not do sequencing by synthesis but by ligation of oligonucleotides fluorescence-labelled. Each probe is an octamer, which contains two known nucleotides in the 3’ end followed by six degenerated nucleotides with one of four fluorescent labels linked to the 5’ end. After probe annealing and ligation, fluorescent dye is cleavage and a new probe is ligated. Multiple cycles are performed according to the read length. The template from primer (n) is removed and the second round of sequencing is performed with a primer complementary to the (n-1) position . This method shows good results; however, it is considered slow compared to the others and therefore was replaced by Ion Torrent (Thermo Fisher Scientific) technology. Like 454, the DNA bound in a bead is massively amplified by emulsion PCR and detection occurs in picotiter wells using complementary metal-oxide-semiconductor (CMOS) due to the pH difference caused by the release of H+ ions in the nucleotide incorporation. This methodology is the first to use a detection method that does not work with light signal . The advantage of this technology is the speed of the process and the low cost of the equipment; however, it has the same problem about the detection of homopolymers. The second generation of the sequencing was marked by the high capacity of the sequencers in the generation of data in a single run and consequently the computational development-like bioinformatics tools to analyse them. The cost of sequencing decreased dramatically at this stage. At the beginning of the first-generation sequencing (2001), the approximate cost per megabase sequenced was $5292.39 and at the end of this phase (2007) was $397.09, while in the second generation the sequencing cost was $102.13 (2008) and at the end (2015) only $0.014 , showing a more pronounced decline in this phase (Figure 1).
There are some discussions about which technology marked the beginning of the third generation [24–27]. In this review, we will consider the technology of single-molecule sequencing (SMS), which has no need to amplify the DNA. The first technology to use SMS was ‘virtual terminators’ based on a method very similar to Illumina, but a single DNA molecule is fixed in a flow-cell with 25 channels. The process occurs in cycles where the dNTPs are incorporated and the corresponding fluorescence is captured by a CCD camera. This process generates short readings (25 bp) and it is considered slow and there is a lot of noise in the signal . Despite being the first third-generation sequencing technology, its history was brief because the company Helicos Biosciences filed for Chap. 11 bankruptcy. Another technology developed is the ‘single molecule real time’ (SMRT) that is commercialized by Pacific Biosciences. The SMRT consists of the immobilization of a single molecule in a chamber called ‘zero-mode waveguide (ZMW)’ where the incorporation of the fluorescent nucleotides occurs. ZMW allows the incorporation of each nucleotide to be monitored in real time and without interference from other light signals. The reads are very long (40 kb) and allow detecting modified bases [29, 30]. Finally, the technology of ‘nanopores’ consists of conducting a molecule of DNA or RNA through a biological or not nanopore. The detection occurs due to differences in the current of ions generated by each nucleotide. The reads are incredibly long (500 kb), and the process is extremely fast without the need for special nucleotides. The company Oxford Nanopore Technologies (ONT) is the first company to commercialize sequencers using this technology, including a portable version (MinION) that was used to sequence a mixture of bacteriophage, Escherichia coli and Mus musculus DNA at the international space station (ISS) . In common, these technologies still have high error rates that are improving with the development of technology. Its main use today is to aid in the assembly of complex regions of the genome where gene fusions, large deletions and insertions and repetitive regions occur. The third generation will further revolutionize precision medicine, enabling sequencing at lower cost and enabling this to occur virtually anywhere.
3. Clinical applications
In recent times, NGS has made possible a better understanding of genetic diseases and became a significant technological advance in the practice of diagnostic and clinical medicine . NGS allows the analysis of multiple regions of the genome in one single reaction and has been shown to be a cost-effective and an efficient tool in investigating patients with genetic diseases. Genetic data produced via NGS provides significant benefits to medical practice including accurate identification of biomarkers of disease, detecting inherited disorders and identifying genetic factors that can help predict responses to therapies [32, 33]. However, recommendations on clinical implementation of NGS that are still in discussion and that hamper its use in the genetic clinic. A variety of molecular diagnostic test use sequencing technology, such as single- and multi-gene panel tests, cell-free DNA for non-invasive prenatal testing, whole-exome sequencing (WES), whole-genome sequencing (WGS). Considering that the use of NGS as a diagnostic tool is recent, there are challenges including when to order, on whom to order and how to interpret and communicate the results to the patient and family . Therefore, it is necessary to understand the application, strength and limitations of the different approaches to recognize which one is the most suitable for your case. In the following topics, we will emphasize common applications of this technology into clinical practice.
3.1. Multi-gene panels
The traditional approach still holds great value for many disorders. Single-gene testing is indicated when the clinical features for a patient are typical for a particular disorder and the association between the disorder and the specific gene is well established and has the minimal locus heterogeneity . However, many genetic conditions are intractable to diagnostic evaluation, mainly because of the clinical variability and genetic locus heterogeneity, such as cardiomyopathies, epilepsy, congenital muscular dystrophy, X-linked intellectual disability and cancer susceptibility in families with atypical phenotypes . The diagnostic process is exhausted, with clinical assessment followed by sequential laboratory testing, in most cases tests being negative. In cases with unidentified genetic conditions (e.g., developmental delay/cognitive disability and autism spectrum disorders), the diagnosis rate can vary greatly  and a multi-gene panel is more appropriate. In diagnostic of cancer, for example, Tothill and colleagues  illustrate the application of these multi-gene panel by analysing samples of patients with cancers of unknown primary (CUP). The clinical management of patients with CUP is hampered by the absence of a definitive site of origin and this kind of NGS analysis could help to define new therapeutic options.
In multi-gene panel tests, many genes associated with a specific phenotype are sequenced and analysed concomitantly, decreasing cost and improving efficiency of genetic diagnostic . The number and which genes will be evaluated for the same or similar indications may vary significantly among different clinical laboratories and several considerations need to be taken for gene inclusion. The majority of authors believe that only genes with a strong disease association should be included since the ability to interpret their findings is much better due to clinical evidence . However, some authors consider including associated genes that have overlapping phenotypes for the purpose of differential diagnosis, or all possible genes that are remotely associated with the phenotype of interest with the objective of a better and faster diagnostic . For cancer diagnostic, multi-gene panel may include high-penetrance genes as well as associated genes with a moderate increase in risk .
The transition from single-gene to multi-gene testing should not compromise the sensitivity of the test to identify variants, mainly at genes that are responsible for a significant proportion of the defects (core genes). The sensitivity of NGS does not depend only on horizontal coverage but the vertical coverage is important as well . Additional genes will increase the chance of the diagnostic, but this should not be at cost of missing mutations that would previously have been detected by single-gene testing . Sanger sequencing or other available techniques can help to solve this problem for filling in low-coverage and no-coverage regions.
3.2. Whole-genome and whole-exome sequencing
Whole-genome sequencing (also known as WGS, full-genome sequencing, complete genome sequencing or entire genome sequencing) is the process of determining the complete DNA sequence of an organism's genome at a single time. The major benefit of WGS is completed coverage of the genome, including promoters and regulatory regions. In whole-exome sequencing (WES), all coding regions are sequenced with a relatively deeper depth. Compared to WGS, the major advantage of WES is a significant cost reduction .
Human genome comprises ~3 × 109 bp having coding and non-coding sequences. About 3 × 107 bp (1%) (30 Mb) of the genome are the coding sequences . It is estimated that 85% of the disease-causing mutations are located in coding and functional regions of the genome [41, 42]. For this reason, sequencing the complete coding regions (exome) has the power to uncover the causes of large number of rare, mostly monogenic, genetic disorders as well as predisposing variants in common diseases and cancers . In 2009, Choi and colleagues first showed the value of WES in the medical practice by making genetic diagnoses of congenital chloride diarrhoea in patients suspected of Bartter syndrome, a renal salt-wasting disease. WES was conducted on six patients who do not show any mutations in classic genes for Bartter syndrome. Results revealed homozygous deletion in SLC26A3 gene for all patients, which provided a molecular diagnosis of congenital chloride diarrhoea that was later confirmed on clinical evaluation. This result was the first to show the value of WES in making a clinical diagnosis and several similar studies have followed .
There are certain considerations to order WES instead of other NGS tools . Although exomes are supposed to cover all the protein-coding regions of the genome, the average coverage in many platforms tends to be between 85 and 95% [32, 44]. This means that a particular gene of interest that is closely linked to patient’s phenotype may not be covered, completely or partially. There are many reasons that include poorly performing capture probes due to high GC content, sequence homology or repetitive sequences. A targeted approach, such as NGS single- or multi-gene panels, on the other hand, has higher or even complete coverage of all the specific genes by filling in the gaps with complementary technologies such as Sanger sequencing or long-range PCR. Besides offering a more comprehensive coverage of the ‘known’ phenotype-specific gene panels, this targeted approach also allows for deeper coverage of these genes compared to WES, which provides greater confidence in the variants detected. However, all NGS tools are still prone to sequencing artefacts, and Sanger sequencing is recommended to confirm the variants detected before returning the results to the patient . In addition, the patient and their family need to be aware of all the nuances related to WES and WGS . It is important to let them know that the test may not yield positive results, and it is crucial to clarify that even positive results can offer diagnoses but do not improve prognosis and treatment.
To request an exam that uses the WES technique, one must start collecting as much information as possible about the patient. It is important to have a detailed family history, phenotype condition, symptoms and also, if possible, the inheritance pattern of the suspected disease . With the phenotype and pedigree information, a systematic review of literature and databases should be performed to guide the clinician on which gene(s) are crucial and must be analysed. In cases of genetic heterogeneity, targeted NGS may be the preferred approach. On the other hand, if the disease mechanism is unknown, WES may be the best choice .
WES can result in approximately 60,000–100,000 genetic variants that can be classified into pathogenic, benign or with uncertain significance (VUS) . With WES, a single pathogenic variant that is probably the cause of the patient phenotype can be detected in about 20–36%. For the other cases, it is possible to find multiple candidate variants or even no one. If no candidate variants are found, there are many reasons for it that include poor coverage or the mutation residing outside the protein-coding region of the gene, clinical summary with insufficient information or the defect is not due to a simple nucleotide change in a single gene [49–53].
The outcome of an exome should be evaluated by a multidisciplinary team that is involved with each patient's case. A discussion is necessary between physicians, geneticists, and other health professionals about all the clinical and laboratory findings to make a link with phenotype, family history and symptoms. It is necessary to review the WES results, scientific literature and medical information . If more than one candidate variant is detected, this multidisciplinary team must perform further evaluation(s) to determine which of the variant is causing the phenotype. Finally, if the test results are negative, reasons for this should be discussed in the report. As the use of this tool is becoming more frequent and more accessible, it is possible that in the near future new pathogenic variants and genetic syndromes will be described and characterized, which causes these negative results to be reanalysed within a few years .
In cases of suspicion of Mendelian disease, the exome sequencing is usually indicated for the detection of rare variants and samples from the patient and his/her parents could be needed. This is usually the standard setting in cases where the Sanger sequencing of the candidate gene gave negative result or so there are multiple genes that must be tested for the condition that would be costly and time consuming. In most cases, the results obtained from WES reach a molecular diagnosis but do not alter the management, treatment or prognosis [32, 54].
Targeted exome sequencing is becoming increasingly popular in oncology for assessing the full sequence of cancer-related genes. Targeted exome sequencing also facilitates sequencing at a greater depth, and thus the identification of subclonal mutations. Alternately, rather than sequencing the full exome sequence, it is possible to look at all the genes reported to be related to cancer in general. Although hotspot mutation testing facilitates large-scale sequencing of many samples, it does limit the knowledge that is acquired through sequencing because it limits the evaluation to small regions in selected genes. Consequently, small, targeted NGS panels increase the possibility of omitting relevant mutations for which evaluation is not being conducted, thus limiting the clinical knowledge that is gained through WES. WES could highlight novel insights into cancer mechanisms; identification of the DNA sequence of cancer cells in comparison with that of normal cells could help to reach an in-depth understanding of cancer. Using WES, it is also feasible to check germline and somatic mutations in human cancers .
Approximately 5–10% of cancers are hereditary. WES allows testing of multiple genes at once and greatly improves the variation detection rate. Many patients with hereditary cancer have tested negative for one specific genetic variation, but with WES, it is easier to find causative mutations. In a study of 300 high-risk breast cancer families, it was found previously undetected mutations in 52 probands and the reduced sequencing costs and turnaround time made the approach even more practical in clinics .
To detect familial germline mutations, WGS might be advantageous for WES-negative cases in families with a great chance of carrying a genetic variant . The major technical advantage of WGS is that the specificity is theoretically 100% (average 95–98% in practice, practically without gaps) with a uniform coverage in the regions of interest (ROIs) throughout the input material. Thus, the chance of losing disease-causing variants due to technical errors is much lower with WGS [57–59]. The major challenge in applying this tool on a medical routine is the great costs, the complex pipeline for data analysis and data interpretation. However, in the near future, the costs of NGS should be lowered, studies on genetics over non-coding regions should be improved and more approach will be implemented. With that, WGS should be performed regularly for diagnostic in order to find the causative genetic variants .
Under gene panel analysis, about 70–92% of all cases remain negative, depending on the disease. It is expected that important genes will not be contemplated with these tools, making WES and WGS analysis more appropriate to identify genetic variants in cases of familial syndromes. These tools (WES and WGS) have already been reported in identifying several risk genes for various types of cancer such as the PALB2 and ATM genes in pancreatic cancer, the hereditary pheochromocytoma susceptibility gene MAX  or the hereditary colorectal cancer moderate-risk genes POLD1 and POLE .
Nowadays, the clinical utility of WES and WGS as a generic test for mutation discovery for every genetic diagnostic question is not yet appropriate  and should be directed to specific patient groups . This limitation is due to the high cost, the need of complex bioinformatics pipelines, large storage capacity and the expected high number of VUS detected.
A transcriptome represents the complete set of RNA molecules from any genome at any time or condition and RNA plays essential role in several biological processes, including untranslated RNA species such as microRNAs (miRNAs). RNA-sequencing (RNA-seq) consists of an in-depth RNA analysis through NGS technologies and became the state-of-art technique for transcriptomic . A typical RNA-seq experiment consists of a good experimental design, sample preparation, library construction, sequencing and data analysis. However, due to several experimental options available, a careful planning and cost estimation is necessary before starting. These include number and type of replicates (technical vs. biological), sequencing platform (e.g. Illumina, Ion Torrent), library preparation method (e.g. rRNA depletion or mRNA enrichment; strand-specific or not; single or paired end), throughput, read length, sequencing depth and coverage. RNA-seq best practices can be found in Chap. RNA-seq: Applications and Best Practices from this book.
RNA-seq enables detection of novel genes and isoforms, gene fusions, splice and chimeric variants, genomic alterations and gene expression quantification. Although RNA-seq outperforms microarray in transcriptomic analysis , its clinical application is still in its infancy and, for instance, will not replace current approaches. RNA-seq is considered a complementary method depending on the needs and resources available, assisting clinicians in making decisions. In clinical practice, RNA measurement has applications across different areas in human health such as therapeutic selection, disease diagnostic and treatment .
Clinical diagnosis of infectious disease through RNA-seq is still rare, since quantitative PCR (RT-qPCR) assays are still the most common technique used for viral detection and genotyping. Applications of NGS in virology diagnostic can be used for analysis of patients with unexplained illness, especially during outbreaks and epidemics [67–70]. It also includes the identification of novel pathogens [71–74], viral community characterization [75–77], whole viral genome reconstruction [73, 78, 79], antiviral drug resistance [80–83], epidemiology [84–87] and transcriptomic [88–90]. The use of NGS in virology is increasing the knowledge of viral infection dynamics and their correlation with human health and treatment.
For oncology, RNA-based cancer diagnostics is being used by clinical oncologist to define tumour transcriptome due to its potential to guide treatment and drug therapy . Its application are especially related to gene expression profile and variants, and gene fusions detection. The pathogenicity of gene fusions in cancer is well known. Most gene fusions are correlated with specific tumour subtypes, representing diagnostic biomarkers and leading to novel therapeutic opportunities and benefits [92–94]. Some pharmacological treatments are already in clinical use . Key somatic DNA mutations can also represent cancer biomarkers and can be identified by transcriptomic mapping [95–98].
Gene expression in cancer is still quantified by non-sequencing methods (e.g. RT-qPCR and microarrays) . RNA-seq can measure expression of tumour antigens or immune checkpoint receptors and ligands after a given treatment, giving some answers about patient drug response [91, 99, 100]. Gene expression signatures can also be used for cancer types’ classification that directly impact prognosis and treatment definition and response .
NGS can also be applied for circulating tumour RNA (ctRNA) discovery. The analysis of ctRNA in plasma is still in its beginning and presents specific challenges. ctRNA degrades faster than circulating tumour DNA (ctDNA) and needs to be purified rapidly or added in preservative solutions (e.g. TRIzol) and freezed at −80°C, not always an accessible technique to many clinical sites . Despite these challenges, ctRNAs represent good biomarkers of early detection of multiple tumour types, such as breast, lung, prostate and colorectal cancers [101–109]. NGS is a more powerful tool for ctRNA detection; however, RT-qPCR remains more usable for clinical diagnostic applications .
An emerging field that has a huge impact on medicine and clinical diagnostic is epigenetics. The term was coined by Conrad Waddington in the 1940s and refers to the study of heritable changes in gene activity and expression that do not involve the DNA sequence itself, that is, a change in phenotype without a change in genotype [111, 112]. Additional information about epigenetics history can be found in Ref. . Epigenetics mechanisms represent another layer of gene regulation and NGS allowed to understand the epigenetics status on a large scale and at a single base-resolution, including mainly DNA methylation, histone modification and non-coding RNA (ncRNA)-associated silencing [111, 112].
DNA methylation was the first epigenetic mechanism identified and is the best known and the most frequent in human cancer. It involves covalent modification of cytosine through the addition of a methyl group to cytosines of CpG (cytosine/guanine) islands [111, 112]. This methylation is maintained by DNA methyltransferase (DNMTs) and plays roles for gene transcriptional repression, transposable elements silencing and viral defence . Unmethylated DNA is found in active regions of chromatin, and methylated DNA is found in inactive regions .
Post-translational histone modifications are markers for chromatin activity through acetylation and methylation of conserved lysine residues on the amino-terminal tail domains : acetylation is found in active regions of chromatin, whereas hypoacetylation is found in inactive euchromatic or heterochromatic regions [111, 112]. Enzymes involved in this process include histone deacetylases (HDACs), histone acetylases and histone methyltransferases . These and other post-translational histone modification processes (e.g. phosphorylation) result in distinct histone modification patterns that form a ‘histone code’ .
Since epigenetic mechanisms regulate DNA accessibility, perturbations of the cell epigenetic pattern affect gene expression and can give rise to human diseases, that can be inherited or somatically acquired [111, 112]. Prader-Willi, Angelman and Beckwith-Wiedemann syndromes, for example, are the best characterized congenital imprinting disorders [111, 115, 116].
4. Data analysis
Data analysis is a critical step of NGS tests. This analysis consist of a primary analysis, in which the base pairs are called and quality score are generated; a secondary analysis, numerous reads are aligned to the human reference sequence; and a tertiary analysis which consists of variant calling and annotation . Many databases are useful for helping the variant annotation, such as the 1000 Genome Project , dbSNP database , Clinvar—NCBI , LOVD—Leiden Open Variation Database , The Cancer Genome Atlas (TCGA)  and others. However, information from these sources can contain ambiguous and insufficient information. Variants detected should be reported according to Human Genome Variation Society (HGVS) recommendations, with information of the human reference genome version and transcript information used to variant description . The reference coding sequence should be preferably from the RefSeq database .
All pathogenic, likely pathogenic and VUS variants have to be reported. Secondary or incidental finding (IF) is one significant matter, especially for WES, WGS and multi-gene panels, and its report will depend on local practice .
An in-house database containing all relevant variants identified in the laboratory provides an important tool in order to allow for further annotations, which greatly streamline the diagnostic process. Furthermore, an in-house database, linking patients and variants can help when a variant is re-classified. In this case, the laboratory is responsible for re-contacting the clinicians of the patients that are possibly affected by the new status of the variant .
4.1. Sanger sequencing validation
Concerning the limitations of technology, the false positive rate for NGS, a second method, as Sanger sequencing, is required to confirm any findings with possible clinical significance. The laboratory must be able to guarantee that report variants are true variants; therefore, it is essential to mention that the variant reports were confirmed by Sanger method. An NGS technology will likely evolve, and within a few years confirmation might prove to be unnecessary [34, 39].
In some cases, mainly in large panels, complementing NGS testing with Sanger sequencing is inevitable. This limitation of NGS is dependent on the platform and on the enrichment methods, once that there are a number of strategies available with advantages and disadvantages. Sanger sequencing can also be used to fill regions that fail to amplify for having sequence complexities, such as sequence homology with pseudo genes, highly repetitive regions, GC-rich content, allelic dropout, or regions that are supported by an insufficient number of reads to call variants confidently . However, in practice, the laboratories can opt to apply different settings for NGS tests. Three kinds of tests of multi-genes panel are identified: (A) the lab informs that more than 99% of interest region are covered, and all the gaps are filled with Sanger sequencing; (B) the lab describes which regions are sequenced and fills some specific gaps (core genes) with Sanger sequencing; and (C) no additional Sanger sequencing is offered . It is essential to mention the horizontal coverage acquired in the test and the limitations of these tests in a disclaimer .
The diversity and rapid evolution of NGS technology causes many challenges associated with data generation, data manipulation and data storage . Some of the major issues with analysis, interpretation, reproducibility and accessibility of NGS data includes: (A) NGS is still too expensive to be accessible by small labs or an individual; (B) data analysis is time-consuming and needs sufficient knowledge of bioinformatics; (C) the short sequencing read lengths supported by NGS is one of the major shortcomings which limit its application, especially in de novo and highly repetitive regions sequencing; (D) data processing steps or bioinformatics is one major bottleneck for the implementation of NGS; (E) routine analysis of NGS data requires multidisciplinary teams; (F) it is critical to standardize the quality metrics for the NGS data generated. These include validation and comparison among platforms, data reliability, robustness and reproducibility, and quality of assemblers; (G) it is crucial to have a complete knowledge of family and personal history of the patient to help define the ideal analysis method, the analysis of the results obtained, and the post-test counselling and management [124–127].
Despite some challenges, it is hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease.
5.1. Regulation on NGS tests
With the advancement of gene-sequencing technologies, numerous opportunities have arisen in the genetic diagnostic, preventive medicine and other areas of human health. As a result, several life science companies and clinical laboratories started their activities in this field offering equipment and supplies as well as molecular tests using the new-generation (parallel massive) sequencing methodology. However, most manufacturers do not market IVD products (in vitro diagnostic), but, in general, these products are classified as RUO (research use only). In practice, this difference in the classification of products and reagents represents serious implications on health. Products classified as IVD are regulated and therefore follow technical standards in their production and use, and consequently the efficiency must be guaranteed by the manufacturer. The ISO 13485  is often used to ensure the quality of medical products, but other regulatory agencies such as the US Food and Drug Administration (FDA) may require other tests to prove this product is safe and effective, which is necessary for the product be classified as IVD and be commercialized on the American market. The same applies to the CE-IVD Marking in the European Economic Area (EEA). These requirements are part of an effort to ensure that users of these services and devices do not seek unnecessary treatment, delay their treatment or are exposed to inappropriate therapies. In the case of RUO products, none of these situations can be guaranteed, so the manufacturer will only be obliged to replace the product or its cost if it is performing improperly. In fact, some manufacturers may use standards of good manufacturing practice in the production of RUO equipment and supplies, but rarely perform tests to prove their efficiency in a particular case of diagnostic.
In some cases due to the need to respond quickly to the market, especially in areas where the technological advance exceeds the regulatory capacity, some agencies allow the use of tests developed by clinical laboratories. The regulation in these cases is very simpler and favours the development of new technologies as the case of new-generation sequencing (NGS). However, these tests should also be used with caution, and the laboratories must prove its accuracy, or otherwise there may be the same hazards of products classified as RUO. In 2013, the US FDA agency required to genetic testing company 23andME to suspend the marketing of its products until it receives clearance from the agency. In a letter addressed to one of its founders, the agency states its concern about the use of one of its tests and the implications on the health of the patient in case of false results.
Some of the uses for which PGS (Personal Genome Service) is intended are particularly concerning, such as assessments for BRCA-related genetic risk and drug responses (e.g., warfarin sensitivity, clopidogrel response, and 5-fluorouracil toxicity) because of the potential health consequences that could result from false positive or false negative assessments for high-risk indications such as these. For instance, if the BRCA-related risk assessment for breast or ovarian cancer reports is false positive, it could lead to undergo prophylactic surgery, chemoprevention, intensive screening, or other morbidity-inducing actions, while false negative could result in failure to recognize an existing risk that may exist. 
This example illustrates the importance of evaluating the analytical characteristics of diagnostic tests as well as the reagents and equipment used to perform these tests. In 2013, Illumina was the first company to get FDA approval for the commercialization of four NGS products. It was the first approval for a system based on NGS technology that will allow other companies to develop their own tests using this technology. In 2014, it was the time of SOPHiA Genetics and Vela Diagnostics companies that obtained the CE-IVD Marking of the first products based on the NGS technology for clinical use.
Since then, the number of products that have the classification of IVD has been increasing; however, it is important to note that the classification of an IVD product depends on local regulations, and therefore products that are classified as IVD in a market may not have this classification in other markets. This is due to the regulatory differences between the agencies and the different requirements from each market. Anyway, it is usual that classification process of these products for clinical use must be complex and sometimes elaborated, especially in areas such as genomics. Therefore, initiatives are needed to make the approval process for these products simpler and more flexible, to make the products available, but that ensures the accuracy and usefully testing.
In 2016, the US FDA agency issued two draft guidelines: ‘Use of Standards in FDA's Regulatory Oversight of Next Generation Sequencing (NGS) Based In Vitro Diagnostics (IVDs) Used for Diagnosing Germ line Diseases’ and ‘Use of Public Human Genetic Variant Databases to Support Clinical Validity for Next Generation Sequencing (NGS)-Based In Vitro Diagnostics’. Both are part of an initiative that aims to contribute to new testing using the NGS technology to reach the public with more speed and quality required by the market and health system.
5.2. Clinical validation
Almost all NGS approaches are still RUO, and validation is necessary before implementation as a diagnostic test. Prior clinical utility, a test must demonstrate analytical and clinical validity. Sensitivity, specificity, robustness, limits of detection, reproducibility, accuracy, precision and concordance between test results and clinical diagnosis should be analysed and measured. The test needs to evaluate patient outcomes and have positive impact on patient care [66, 130]. To assist the usage and implementation of NGS in clinical laboratories, some standards and best practice guidelines are already available [38, 39, 44, 131–134]. Several NGS validation studies in clinical laboratories have been published and are rich sources of information [135–138]. Improvements in NGS technologies and data analysis require revalidation before implementation.
5.3. Computational infrastructure
The high volume of NGS data generated requires a complex computational infrastructure for processing, analysing and storing the data, including sophisticated data analysis pipelines. Cloud solutions such as Google, Amazon and Microsoft can be an alternative to an in-house computational infrastructure. More user-friendly bioinformatics software are desirable for non-bioinformaticians, such as Google Genomics , SOPHiA Genetics , IBM Watson , Illumina BaseSpace , Ion Reporter , Galaxy , CLC Genomics . The variability of data formats generated during the analysis (e.g. FASTQ, UBAM, BAM/SAM and VCF files) and the laboratory must decide the appropriate data to be stored since the cost of managing, analysing and storing is high [124, 130, 146–149].
5.4. Genomic education
A multidisciplinary team of bioinformaticians, computational biologists, IT technicians, statisticians, molecular biologists, geneticists, genetic counsellors and clinicians is strongly needed and should be properly trained and educated for a successful implementation of NGS into routine diagnostic. Other related areas, such as lawyers, policy-makers, sales representative and investors, also need to be trained. Due to the constant updates of NGS approaches, an ongoing and continuing education about emerging technologies, software, databases and data analysis pipelines that reflect current practice is necessary. Genomic education also needs to be incorporated into medical school curriculum [148, 150].