Genomic analysis of COVID-19 confirmed cases from several countries.
In Wuhan, China (December 2019), viral pneumonia cases of uncertain origin have been reported. The emergency has drawn global attention. To determine the pathogenic potential, joint efforts were conducted by Chinese Multidisciplinary Task Forces. An integral component of wide range of research applications is not only determining the causative agent but also the nucleic acid bases order in biological samples. Research techniques determining genetic material features and its order is called “sequencing”, classified into three generations. Moreover, the first sequencing attempt was conducted and a genetic link identified between samples isolated from China and other previously sequenced Coronaviruses. However, there was patient to patient diversity in terms of clinical and laboratory manifestations and diseases severity. After the genetic material of the causative agent was successfully sequenced, it was named the novel coronavirus causing COVID-19. Here, we review the genome sequences of novel coronavirus infected patients from different countries such as India, Bangladesh and Ecuador compared to China (first reported case), seeking not only to recognize similarities and differences between genome sequences of novel coronavirus, but also to compare them with other forms of coronaviruses family. Utilizing this data will assist in making right decisions minimizing negative consequences of the outbreak.
- whole genome sequencing (WGS)
- sequencer genotype
A number of viral pneumonia cases of uncertain origin arose in late of December 2019 (Wuhan, Hubei Province of China). These cases have gained public attention and considered an emergency. The Public Health Emergency of International Concern and the World Health Organization (WHO) have declared “epidemic” . Thereafter, the Multidisciplinary Task Forces under the organization of the National Health Commission of the People’s Republic of China undertook collective efforts to define the causative agent. At the beginning, a group of researchers from the Chinese Academy of Medical Sciences has announced their study on the causative agent identification. Basically, they conducted a metagenomic study of specimens of the respiratory tract collected from five patients who had pneumonia. After they have successfully isolated the virus and carried out the genomic sequencing, the results have revealed that it belongs to the beta-coronavirus family [1, 2].
In more details, the results from their project showed that the genomic analysis of the specimens have approximately 79 percent homology to the Acute Respiratory Syndrome (SARS) Coronavirus (SARS-CoV) genome, approximately 52 percent similarity to the Middle East Respiratory Syndrome Coronavirus (MERS-CoV), and approximately 87 percent similarity to two bat-derived SARS-like coronavirus genomes (Zhoushan, 2015) . Likewise, identical findings were published by a team from the Chinese Center for Disease Control and Prevention . Such proof, the isolated virus was proposed to be a novel coronavirus and this novel coronavirus later dubbed the novel coronavirus 2019 (nCoV 2019), quickly recognized by the WHO as the pathogen accountable for this transmissible illness .
The arrangement of nucleic acids inside the chains of polynucleotides provides the details of inherited and biochemical characteristics of life. Determining the order of nucleic acid sequences in biomolecules is a crucial part of a wide range of research applications. Huge numbers of researchers have spent the last fifty years developing technological approaches to simplify sequencing of (DNA and RNA) molecules. During this time span, significant changes were being seen ranging from short to very long oligonucleotide sequencing, from struggling to deduct a single gene’s coding sequence to quick and widely available sequencing of the whole genome . In this section we will go over the several generations of the nucleic acids sequencing indicting the major discoveries, impact of researchers and the major characteristics of (first, second and third) generations sequencing technology.
2.1 First-generation nucleic acids sequencing
James D. Watson and Francis H.C. Crick are two scientists who were able to discover the three-dimensional structure of DNA (1953). Watson and Crick were working on crystallographic data provided by Maurice and Rosalind Franklin Wilkins, contributed to both DNA replication conceptual frameworks and the transcription of proteins in the nucleic acids. However, reading the sequence is not achieved yet [1, 2]. The initial efforts were concentrated on sequencing RNA molecule which is not as complicated as the DNA molecule.
Frederick Sanger said “knowledge of sequences could contribute much to our understanding of living matter.” In a collaboration with other scientists, Sanger was able to create a new technology based on the determination of radiolabeled partial-digestion fragments after two-dimensional fractionation (1965) , allowing the scientists to continuously build on the growing pool of RNA (ribosomal and transfer) sequences [4, 5, 6, 7, 8]. In the same year, Robert Holley and colleagues generated, for the first time, the whole nucleic acid sequence of alanine transfer RNA isolated from Saccharomyces cerevisiae . Utilizing these techniques enabled Walter Fiers and colleagues, in the period of 1972–1976, to produce the first complete protein coding gene sequence and then the whole genome of bacteriophage MS2 [10, 11]. In the mid of seventies, a strong impact was produced to get much greater resolution by replacing two steps two-dimensional fractionation with one step as single electrophoresis separation through polyacrylamide gel (considered the birth of 1st generation) [3, 12, 13]. Using this procedure, in 1977, Sanger and colleagues were able for the first time to sequence the bacteriophage X174 or (PhiX) genetic material, becoming a positive control in sequencing genome laboratories . In the same year, establishment of Sanger’s “chain-termination” or dideoxy method produced a huge advancement in DNA sequencing technology . Subsequently, tremendous efforts were made to generate an automated DNA sequencing technology and the first commercially available machine was made for sequencing the highly complex species genome [1, 16, 17, 18, 19, 20, 21, 22].
One of the major drawbacks of the first-generation nucleic acids sequencing machines is reading short base pairs below a kilo base (kb); however, scientists tried to overcome this issue by techniques an example of which is Shotgun sequencing method in which individual cloning and sequencing of two parts of DNA will be carried out and then compiled into a long sequence [23, 24].
Nonetheless, many improvements were made to the sequencing of first-generation nucleic acids that eventually ended with new dideoxy sequencers an example of which is the ABI PRISM created by Applied Biosystems from Leroy Hood’s research. This sequencer allowed hundreds of samples to be sequenced concurrently and was used in the generating the first draft of Human Genome Project completed years ahead of schedule [25, 26, 27, 28].
2.2 Second-generation nucleic acids sequencing
The luminescent method for measuring pyrophosphate synthesis was the starting point for the second generation of DNA sequencers. Basically, it is a two-enzymes reaction where ATP sulphurylase converts pyrophosphate to ATP as a luciferase substrate. Light generation is thus proportional to the amount of pyrophosphate .
Pyrosequencing was later licensed to 454 Life Sciences, a Jonathan Rothburg-founded biotechnology corporation, which grew into the first major successful technology as a commercially available next-generation sequencing (NGS). The 454-sequence equipment, later bought via Roche, were a paradigm shift allowed sequencing reactions to be mass parallelized, considerably raising the amount of DNA sequenced in a single experiment .
The parallelization technique rises the yield of sequencing efforts by order of magnitudes, enabling scientists to fully sequence a whole human genome belonging to the developer of the DNA structure, James Watson, with much low-priced and faster than a similar effort exerted by the team of DNA sequencing entrepreneur Craig Venter exploitation Sanger sequencing method [31, 32]. The novel 454 machine, called the GS 20 later replaced by the 454 GS FLX which provides not only better-quality data but also higher numeral of readings attributed to having more wells in the pico-titer plate. Indeed, it was the first high-throughput sequencing machine (HTS) broadly accessible to customers. Moreover, the concept of having massive numbers of parallel sequencing reactions on a micrometer measure improves microfabrication and high-resolution imaging and this is actually what defined the second-generation of DNA sequencing [26, 33].
Furthermore, after the success of 454, there were several parallel sequencing methods suddenly emerged. Arguably, the most vital one is the Solexa technique of sequencing developed by Illumina . The concept of this process based what they call bridge amplification. Basically, adapter-bracketed DNA molecules are passed over a complementary oligonucleotide attached to a flow cell. Then a solid phase PCR generates neighboring groups of clonal populations from each of the single original flow cell attached to the DNA strands [34, 35, 36]. Moreover, the HiSeq has emerged after the standard Genome Analyzer version (GAIIx). It is a machine characterized by its ability to huger read length and depth of a sequence. Then, the MiSeq was discovered. One of its drawbacks is having a lower-throughput. On the other hand, it is the lower price, quicker turnaround and longer read length instrument [37, 38].
Analogously to 454 sequencing, beads containing cloned DNA fragment populations produced by an emPCR are washed over a pico-well plate proceeded by each nucleotide in turn. Nevertheless, nucleotide integration is determined not by the production of pyrophosphate; however, the alteration in pH produced by the protons (H+ ions) production through polymerization facilitating a quick sequencing during the actual detection time [39, 40].
Alongside 454 and Solexa/Illumina, (SOLiD) system from Applied Biosystems was the third major choice at the early time of second-generation sequencing. Its sequencing concept is based on oligonucleotide ligation and detection. (SOLiD) system turn out to be Life Technologies following merged with Invitrogen [41, 42]. Even though the SOLiD platform is unable to manufacture Illumina system read length and depth and makes its assembly more difficult, it continued to be a cost-competitive instrument [39, 43].
One more important sequencing which utilizes the ligation technology was the DNA nanoballs method, in which sequences are similarly attained from probe-ligation. However, the generation of clonal DNA population is innovative. Instead of bead or bridge amplification, rolling circle amplification is used to produce extended DNA chains comprising of repetition units of the template sequence bordered by adapters. Then the sequence is self-assembled into nanoballs attached to a slide in order to be sequenced .
Lastly, an outstanding sequencing system of the second-generation sequencing is the one that Jonathan Rothburg created after leaving 454. It was the first so-called post-light sequencing technology “Ion Torrent” (another Life Technologies product), neither fluorescence nor luminescence is used in this technology .
The frequently mentioned “genomics revolution” has dramatically changed the cost and effort accompanying with DNA sequencing guided in large part by these extraordinary improvements in nucleotide sequencing technology. The Illumina sequencing platform; however, has been the most effective and valuable in recent years, and can therefore probably be considered having made the strongest impact to the second-generation of DNA sequencers .
2.3 Third-generation nucleic acids sequencing
There were substantial arguments to characterize the various generations of technology for DNA sequencing, particularly in regard to the division from the second to the third generation. However, a suggestive distinguishing characteristic of the third generation should be single molecule sequencing (SMS), real-time sequencing, and simply deviated from the earlier technologies [46, 47, 48, 49].
The first SMS technology was developed in Stephen Quake’s laboratory, later marketed by Helicos BioSciences, and worked broadly like Illumina does, but excluding bridge amplification step [50, 51]. Basically, the DNA templates are linked to a planar surface and then deoxyribonucleotide triphosphate (dNTPs) called virtual terminators, proprietary fluorescent reversible terminators . Although it is relatively slow, costly and generating short reads, it was considered the first technology enabling non-amplified DNA to be sequenced evading biases and mistakes that might occur. Other businesses picked up the third-generation baton, as Helicos bankrupted early in 2012 [1, 46, 48]. Moreover, the most commonly used third-generation technology was possibly Pacific Biosciences single molecule real time (SMRT) platform (PacBio range) . In a very brief period of time, this method sequences a single molecule. Some other beneficial features available in the PacBio range and not commonly shared by other commercially available machines are producing kinetic data, which enables the detection of changed bases, also capable of generating an extremely long read more than 10 kilo bases (KB) suitable for de novo genome assemblies [46, 53, 54].
Nanopore sequencing, an offshoot of a giant field of utilizing nanopores for the determination and quantification of all kinds of biological and chemical samples, is perhaps the most waited for area to develop of the third-generation DNA sequencing . For example, Oxford Nanopore Technologies (ONT), the first corporation to deliver nanopore sequencers, created a lot of exuberance about their GridION and MinION nanopore platforms [56, 57]. MinION nanopore platforms is small, cell phone sized USB device, which was first launched in an early access trial in 2014 to end users . In spite of the undoubtedly poor-quality data produced with GridION and MinION nanopore platforms, it is wished that such sequencers reflect an authentically disruptive DNA sequencing technology, delivering much cheaper, faster and extremely elongated, not-amplified reads of sequence data than previously possible [55, 57, 59].
To sum up, the value of DNA sequencing for biological research is hardly to overstate; however, it is the determination way of one of the vital features by which our lives forms can be identified and distinguished from one another. Hence, numerous investigators from all over the world have spent a countless time and money over the last half century just to improve and enhance the technologies of DNA sequencing and also to combine many features from different sequencers generations coming up with outstanding capabilities for new one. Using the experience of all generations of sequencers will offer new perspectives for future generations, as lessons learn from the prior generations guide the next generations’ development.
3. Coronaviridae family: structure and classification
Coronaviridae family members are large, enveloped, single-stranded positive-sense RNA types of viruses. Their genomic material made up of nucleotide sets ranging from 25 to 32 kb and the virus diameter varies from 118–136 nm [30, 60]. The virus is roughly spherical in shape and has obvious proteins on its cell membrane such as the large spike (S) protein extended 16–21 nm from the virus envelope, membrane protein (M-protein) plays a major role in promoting membrane curvature, envelope protein (E-protein), in low quantity, and hemagglutinin-esterase (HE) as shown in Figure 1 [27, 61].
Coronaviridae family divided into two subfamilies: the Coronavirinae and the Torovirinae. Coronavirinae is categorized into four geniuses, alpha-coronavirus, beta-coronavirus, gamma-corona-virus and delta-coronavirus. On the other hand, Torovirinae has only one genus which is Torovirus. In contrast to Coronavirinae subfamily, Toroviruses have a helical, doughnut-shaped nucleocapsid in their structure. Unlike Toroviruses, Coronavirinae subfamily is prevalent among mammals which cause mild respiratory illnesses such as Severe Acute Respiratory Syndrome 1–2 (SARS-1-2) and Middle East Respiratory Syndrome (MERS) or enteric infections [30, 64].
4. Coronavirinae: severe acute respiratory syndrome-2 (SARS-2) and corona virus disease-19 (COVID-19)
As mentioned before, Coronavirinae subfamily is categorized into four geniuses, known as alpha-coronavirus, beta-coronavirus, gamma-coronavirus and delta-coronavirus. The beta-coronavirus, in particular, has four lineages (A, B, C, and D) . The lineage B of beta-coronavirus has a subgenus known as Sarbrcovirus under which SARS-CoV-2 goes . Moreover, (novel coronavirus) or (Severe Acute Respiratory Syndrome Coronavirus-2) (SARS-CoV-2) is known as the causative microorganism associated with coronavirus disease-2019 (COVID-19). COVID-19-suffered patients are manifested with respiratory illnesses examples of which are pneumonia and breathing failure .
5. Country-wise genomic sequencing of SARS-CoV-2
Sequencing the genomic materials helps to discover the identity of the causative microorganism. Continues sequencing of the genomic materials is essential to discover changes that may happen during transcription and transmission process of the microorganism genomic materials. Utilizing this information helps in studying the nature of the causative agent whether it is altered or not and if these changes help determine severity of the infection and making right decisions to reduce the impact of the outbreak on several aspects of life consequently.
First of all, a case of pneumonia of unidentified cause has been registered in China, Hubei Province, Wuhan City (December 2019). Additional evaluation for these incidences was conducted to detect the pneumonia causal microorganism [64, 65]. The isolated virus identified and named as (SARS-CoV-2) after genomic characterization by next-generation sequencing (NGS) of the complete sequence has been carried out . The genomic analysis of the virus showed that it is an enveloped RNA virus (sized of 29,903 bp). The phylogenetic sequence analysis displayed that the virus categorized under the subgenus Sarbecovirus of the genus beta-coronavirus and to the Coronaviridae family. Moreover, it was found that around 87.5 percent of genomic material was similar two bat-derived SARS-like CoV strains (bat-SL-CoVZC45 and bat-SL-CoVZXC21) commonly affected humans, including the virus that contributed to the outbreak of SARS-CoV-1 (2003) .
The Government of India has reviewed and introduced multi-sectoral initiatives to address this emerging public health problem following the first SARS-CoV-2 study from China. They started with monitoring country boarders at 21 international airports, strengthening state-level surveillance systems and preparedness in designated hospitals for the management of clinical cases. The reported and confirmed cases in India (January, 2020) were then sequenced (next-generation sequencing). The phylogenic analysis, compared with other published SARS-CoV-2 sequence in the database, were carried out to monitor and understand their relationships. The sequences of two out the first three confirmed cases in India were found to be high (about 99.98 percent) identity with Wuhan seafood market pneumonia virus (accession number: NC 045512). The phylogenetic analysis displayed that there were two distinct introductions to India. Therefore, continuous surveillance of the sequences and review shall be crucial to consider the genetic evolution and substitution rates of SARS-CoV-2 from the affected countries .
In Bangladesh, the first three cases were detected (March, 2020). However, the Bangladeshi complete genome sequence of novel coronavirus (SARS-CoV-2) isolate was accomplished by Illumina iSeq100 sequencer (April, 2020). Findings from these results showed that 9 mutations in the genome of this sample, compared to the Wuhan strain, reference genome (GenBank accession no. MN908947.3) .
In Ecuador, the first reported cases of COVID-19, which was on March, 2020 for a traveler came from Netherlands, and the three other confirmed cases were gnomically sequenced using the MinION platform (Oxford Nanopore Technologies) and ARTIC network protocols respectively. Results from these studies showed that the cases in Ecuador transmitted from three different European countries. The sequences of the confirmed cases in Ecuador were found to have high similarity (99.68 percent) with Wuhan strain reference case (GenBank accession number MN908947). The information discussed in this section is summarized in Table 1 .
|Country||Sequencer generation used for analysis||Sequence similarity (%)|
|China||Next-Generation Sequencing (NGS)||87.5% to bat-derived SARS-like CoV strains|
|India||Next-Generation Sequencing (NGS)||About 99.98% to the Wuhan strain|
|Bangladesh||Next-Generation Sequencing (NGS)|
-Illumina iSeq100 sequencer
|9 mutations found compared to the Wuhan strain|
|99.68% to the Wuhan strain|
Numerous countries all over the world have also sequenced the viral genomic material of confirmed COVID-19 cases and have compared their results with reference cases sequence in China such as Nepal, Australia, USA and Turkey etc.
6. Precautions and control measures globally taken by governmental authorities against COVID-19 outbreak
Several countries around the world have adopted precautionary and control measures (either registered COVID-19 cases centuries or not). These listed actions below, as examples, have been taken to avoid the introduction of SARS-CoV-2 to the countries or to limit the spread of the virus:
Stopping domestic and international flights.
Shifting schools and colleges to remote learning and virtual classrooms from in-person classes.
Suspending all social and governmental gatherings and events.
Activation online shopping and home delivery services for all shops and markets.
Compulsorily wearing masks and gloves and using hand sanitizers.
Issuing a lockdown on city to city and country to country levels.
Implementing mass vaccination program
In this section, we will shed some light on the mass vaccination approach and logistics needed to implement such a program. In addition, its promising effects could end the pandemic of COVID-19 will be discussed. Mass vaccination during COVID-19 outbreaks or pandemics, against which (AstraZeneca, Moderna, BioNTech and Sputink V) vaccines are recently marketed, is a possible crucial public health intervention. The mass vaccination policy, as previously reported, was an important countermeasure against many infectious diseases, such as polio and smallpox . Mass vaccination against COVID-19 is therefore an urgent option for the current emergency and rapidly spreading SARS-CoV-2, eventually might leading to herd immunity induction. The program is started with selection of the most susceptible groups of population such as immunocompromised patients due to underlying medical conditions and elderly people etc. .
In term of logistics, implementing this program requires huge mental and physical efforts starting with vaccine and locations availability, vaccinators and the unique roles and responsibilities taken by both private and public sectors partnership . On one hand, the private sector must warrant compliance with CDC guidance and follow cold-chain management rules to ensure inoculation feasibility upon arrival, storage, and delivery . On the other hand, the public sector ensures cold-chain management and CDC guidance too; in addition, to fill the gaps in delivery service particularly to those who live in nursing homes, assisted living facilities, etc. Furthermore, a facility is a place where clinical staff stay and need vaccine and consultation services to complete logistical requirements. Collectively, the public health sector in support to the private sector entity provide consultation, vaccinator training, and pandemic protocol guidance .
In fact, vaccine requests are more than certain to consume the resources available and necessitate the marshaling of resources to meet need. COVID-19 vaccine clinics, for example, include not only usual walk-through clinic sites at doctor’s offices, pharmacies, departments of public health and big box stores, but also college campuses, worship houses, drive-through, public centers, and outdoor camps .
Even though the genetic diversity of SARS-CoV-2 is currently low, the combination of genetic, clinical, and epidemiological data is highly successful in generating outbreak management action plans. Genetic information, in particular, aids in tracking the viral introduction to countries, classifying the lineage of the ancestral origin of the virus, and recognizing the pattern of population spread during the outbreak. In addition, the information obtained from the genome sequencing tools of the SARS-CoV-2 virus identifies rates of substitution (mutation) that occurred in the viral genome.
Integration of genetic, clinical, and epidemiological information seems to be a vital step to understand the SARS-CoV-2 (genotype and phenotype) and contribute to the global landscape. The combination of this knowledge not only aids in decision-making process for the implementation of precautions and control measures, but also to the comprehension of virulence and severity, transmissibility of virus response to treatment, and effectiveness of vaccines for disease prevention.
Finally, it is doubtful that the current pandemic will be the last one, and it is therefore important to strengthen the responsiveness of our public health systems and to introduce and enhance ongoing scientific research programs combining preclinical, clinical and epidemiological information.
Conflict of interest
The authors declare no conflict of interest.