Open access peer-reviewed chapter

Utilization from Computational Methods and Omics Data for Antiviral Drug Discovery to Control of SARS-CoV-2

By Ömür Baysal and Ragıp Soner Silme

Submitted: January 2nd 2021Reviewed: May 9th 2021Published: June 9th 2021

DOI: 10.5772/intechopen.98319

Downloaded: 169


SARS-CoV-2 pandemic issue threatening world health and economy became a major problem with its destructive impact. The researchers have seen that conventional methods related to medicine and immunological background do not resolve this disease by gained knowledge of viruses previously studied. Advances in computational biology comprising bioinformatics, simulation, and yielded databases have accelerated and strengthened our facilities to predict some cases related to the biological complex by comparison with the use of artificial intelligence. Various novel drugs by using in silico resources and in vivo imaging techniques associated with high-resolution technologies can cause the confidential development of methods for the detection of antiviral drugs and the production of diagnosis kits. In the future, we will start seeing these novel techniques’ positive reflection and their advantages in cost/time effective profits. This chapter highlights these approaches and addresses updated knowledge currently used for research and development.


  • Computational biology
  • Drug discovery
  • Genomics
  • Omics science
  • SARS-CoV-2

1. Introduction

Coronaviruses (CoVs) are positive-strand RNA viruses belonging to the order of Nidoviralesincluding three families Arteriviridae, Coronaviridae, and Roniviridae[1]. Relied on the genetic studies, they classify CoVs into four genera as alpha, beta, gamma, and delta CoVs [2]. The diameter of CoVs is between 80 to 120 nm and their shapes are spherical. The fundamental structural proteins of CoVs are envelope (E), membrane (M), nucleocapsid (N), and spike (S) [1]. Its RNA genome composes of six to ten open reading frames (ORFs) [3].

The new studies will fill the knowledge gaps to reveal how the virus is evolving and adapting to new conditions. In recent years, the advanced findings on nucleic acid amplification technologies have been the reason for improving of automated DNA sequencing with the help of bioinformatics tools to characterize and classification of all kinds of infectious disease agents. One of the single-stranded RNA viruses, the Coronaviruses, has been classified using molecular tools. On sequence analysis, the genomes were identified by direct RNA extraction of the clinical specimens isolated from nasopharyngeal aspirate or stool, as the template, since the viruses are non-cultivable [4, 5, 6, 7, 8, 9, 10, 11]. The collection of SARS-CoV-2 sequences has been started in 2020 under the GISAID database. The analysis of viral genomes provided the preventing of possible viral mutations during in vitroviral replication. These provided data helped us to understand the virus, threatening the world health, at genomic and in silicolevels, which gave rise to new experiments carried out in the laboratory. Protein function prediction methods mainly fall into sequence- and structure-based approaches. Using precisely important databases and tools relied on comparison for sequence, structural differentiation, and gene ontology enables us to find exact protein function annotation [12]. Given the destructive effect of the virus SARS-CoV-2 on human health and its contagious virulence, it has attracted the attention of researchers to find its efficiently curative method. We have realized that antiviral chemotherapy with small molecules for their properties as nucleoside analogs can identify new uncharacterized viral genes for producing antiviral drugs related to viral glycoproteins to cellular receptors, viral regulatory proteins. These drugs may block the synthesis and replication of the viral genome that induce the host immune response [13, 14].

These targeted regions can reduce the crucial function required for survival of the virus, by polymerase and/or protease assay enabled us getting of high throughput screening for identification of inhibitor small molecules and enzymes, which can be beneficial for the development of effective antiviral components and synthesized novel molecules [15]. Understanding of viral gene products could overcome the challenge of the development of antiviral drugs contributing to essential functions on the virus, with assays carried out in vitroconsidering molecular mechanisms involved in gene products and biochemical processes. New technology related to omics science is suggesting novel possibilities to find the right answers to inquiries resulting from unexplored pathogenic behavior of the virus. Bioinformatics promises to generate new knowledge on virus and host interaction that can help drive the efforts in more detail to the discovery and development of antiviral therapies [16]. Genomics, proteomics, and related technologies will also be beneficial in molecular virology as suited techniques and approaches for big data.


2. Bioinformatics and computational tools

The progress in the fields of genomics and proteomics are encouraging biological studies on the virus. Genomic sequences and bioinformatics are also major tools in this field and quantities of raw data which has tremendously increased besides their complexity. Therefore, significant computational resources required to manage the volumes of data and their manipulation, researchers studying in these fields for any future drug discovery projects are using these new technologies. Bioinformatics resources (GISAID; NCBI) required to analyze the data, identify patterns and display the patterns help to investigators for understanding the problem, testing, and confirmation of their hypothesis in the laboratory to focus on prioritized compounds or genes [17]. Computational methods applied in the study of SARS-CoV-2 could be paved for the characterization of the virus collected from unique specimens and comparison with similar genomes resulted from sequence similarity. Comprehensively studied investigations on the characterization of the viruses to set a unique set of well-described genomes compared within each other have been reported [18]. Bioinformatics workflows and tools related to SARS-CoV-2 to the detection of potential drug targets and providing beneficial knowledge on therapeutic strategies have also been recently acknowledged [19]. New bioinformatics tools applied to these genomes to test their ability and to predict the organization of viral genes involving coding capacity and the function of the viral proteins are commonly used [20]. These tools are assisting for the confirmation of transcriptional patterns, gene expression, and gene function which are essential studies earlier than in vitrostudies not to lose time and labor cost [21].

Data relevant to the discovery of new drugs contain information related to biological function, chemical structure, and the biologic activity of small molecules that all findings can help for the searching for new compounds. Even the nature of this problem is inherently complex, bioinformatics is a useful tool to handle the volumes of data required with databases. Small molecule inhibitors could target the computational methods, suggesting aspects of the viral genomic property. Then they may be the reason for identification of the small molecules well-described with their biological effects, which could be used to probe for following of the cellular functions related to chemical structure, protein structure, biochemical activity, and biologic activity of the virus. As another branch of data mining on whole existing data shows a way for screening on inhibitory chemicals with known biochemical activities according to their chemical classes.


3. Impact of omics science and related fields on SARS-CoV-2 research

With advanced techniques and bioinformatics tools, the scientific landscape has dramatically changed in recent years. The huge data yielding on omics science plays an important role in the steps related to the biology of SARS-CoV-2 infections towards understanding more [22]. These resources are tremendously necessary to scientists studying SARS-CoV-2 infections and provide a map and common reference points to reach the data for describing precisely viral transcripts and ORFs. Comparison of different genomic organizations among all the SARS-CoV-2 isolates forms a starting point to determine the evolutionary relationships in this virus family. The most important point that should not be missed is the instability of the nucleotide sequences in the virus genome, which causes high ratio mutations. Viral genomes data inherently in GenBank involves missing annotated parts that sequence needs to be corrected. Annotations of viral genomes conducted with the best tools are available to test gene prediction with precise algorithms to identify new genes [23]. The annotation process may cause inconsistent findings for different genomes as the terminology used to describe gene function [24]. Viral genomes need to be updated and re-annotated as additional strains are of importance for comparing sequences for the continual annotation process considering analogous by released versions in NCBI [25].

The RNA sequence and structure of the genomes could be easily sequenced, but to predict their role in infection with any certainty seems difficult at the course of an infection. In vitroexperimentation results should prove the different ORFs identified through algorithms such as codon/pair usage, dinucleotide/junction usage, RNA structure differentiation which are detected by bioinformatics tools on a viral infection [26]. Even microarray using oligonucleotide probes to hybridize with putative exons and splice junctions could be beneficial for following the expression of predicted transcripts and splice variants in virus genome [27], single-cell RNA sequencing analysis of SARS-CoV-2 will help define how the virus integrates into a human as use host cell organization to regulate and code for all the required biologic process [28]. As this knowledge with different biological assays increasingly supports findings on SARS-CoV-2 and its pathogenic behavior, the proteomics data obtained on up/down-regulated expression levels expressed by the virus reflects ongoing RNA transcripts that can be evaluated as biological cases related to post-translational processing playing role in protein formation complexity. Proteomics methods have also the potential to follow the modification of viral gene products during viral infection, which will help to characterize how post-translation modification that affects viral replication. Since omics technology maybe not sufficient alone to find effective compounds inhibiting viral replication and invasive negative effects that occurred on the human body, we should consider the detection of the genomic parts showing stability without a high mutation ratio to design targeted molecules with inhibitory potential [13]. Genomic screening using specific algorithms to identify conserved motifs and to predict protein structure could be an efficient way to understand protein functions [29]. In immunological studies, model organism, yeast, thanks to its two-hybrid (Bait and Prey) system that can be suggested to prove the protein–protein interactions among viral-cellular proteins and potential gene products cooperating in biological processes can be clarified by the construction of protein–protein interaction maps [30].


4. Drug discovery by means of omics data on SARS-CoV-2

Genomics and proteomics are promising new areas affecting apparently whole biological fields with widespread data and tools provided by databases. The DNA Data Bank of Japan (DDBJ)(, GISAID initiative (, National Center for Biotechnology Information (The GenBank)( and The European Bioinformatics Institute (EMBL-EBI) ( are important resources to researchers as nucleotide databases provided on the web. Most functions in micro/ macro organism are directed by interactions of proteins and ligands. Hence, computational techniques comprising in silicotechniques to predict the protein complex formed can be remarkably cheaper and quicker than experimental methods. They are being a guide for subsequent targeted experiments before initiating in vitrostudies cause of their predictive capability. Predicting the binding possibilities of multiple proteins is critical for understanding their biological function in any target organism to design of drugs addressing the impairment of biological processes (Figure 1). Many solutions generated from a pair of static molecular structures with scoring function comprise the specific position of each atom, giving rise to the simulation of modeling that is seriously sensitive to the specific packing of atoms at the interface [31, 32, 33]. For modeling the protein, dynamics and correct protein arrangement are required, considering scoring functions related to the feature of docking poses using techniques such as molecular dynamics (MD) [34, 35].

Figure 1.

The experiment-based approach is activity-based repositioning of original drugs for new pharmacological indications based on experimental assays, which involves protein target-based and cell/organism-based screeningin vitroand/orin vivoassays. These studies are followed with cell assay, animal model approach and clinical approach. Illustration was created withBioRender.comby the authors of this chapter.


5. Importance of drug discovery and molecular docking

The docking method relies on steric complementarity at the protein–protein interface level. These interfaces are observed in co-crystallized complexes available in the Protein Data Bank (PDB). They have been the major driving force in the development of docking with the addition of physicochemical and statistics-based properties [36, 37].

Homology modeling and protein prediction analysis enables us to test different proteins on SARS-CoV-2 with various ligands. Analysis by protein-ligand docking servers (Table 1) is available for geometric shape complementarity score (GSC score) and approximate interface area (AI area). Additionally, different software-based tools for molecular dynamic analysis could be used. The interaction analysis of protein-ligand complexes and their amino acid position with bond distances calculated and visualized through the software provides an opportunity for molecular docking simulations. Protein docking servers can confirm the results within the protein and ligand [44]. They can get an insight into their all binding preferences within the active site of the protein and ligand (Figure 2).

AutoDock VinUSA2010[39]
BetaDockSouth Korea2011[40]
LigDockCSASouth Korea2011[41]
PythDockSouth Korea2011[42]

Table 1.

List of most commonly used protein-ligand softwares, comprising the updated ones.

Figure 2.

Protein- ligand interaction illustration from Baysal et al. [13]. Arrows indicate the binding possibilities.


6. The effect of circadian rhythm on SARS-CoV-2-infection and host immune response

Many physiological processes influenced by 24-hour circadian rhythms has been known within individual cells. Even virology and circadian rhythms seem like different fields of biology, studies have suggested a novel interaction between viral infection and the circadian clock [45]. Edgar et al. showed 10 times higher viral replication in the mice infected at the start of their resting phase, likely as late evening for humans [45]. The finding confirmed the relationship between the circadian phase and virus infection. This data shows viral infection can directly affect the advantage of physiological rhythms in combating viruses. Noteworthy, a new study also indicated that the expression of the genes exhibited circadian rhythm in monocytes are shifting between two-time points showing the active and the resting periods in human individuals, which also indirectly affects SARS-CoV-2 replication [46]. The drug intake should also be adjusted considering the circadian rhythm of the body and virus to increase the efficacious of inhibitory compounds (Figure 3).

Figure 3.

Cellular clock affects the virus life cycle (1) which directly/ indirectly influenced by circadian clock of the host cell. These cases trigger multiple steps in regulation of different pathways (2) against virus and its replication phases involving particle genesis. In addition, immune responses combating viral infections regulated by circadian clocks could change depending on day and night (3). Illustration was created withBioRender.comby the authors of this chapter.

Another study also shows circadian clock has a central role in coordinating daily physiological processes involving immunity and biological process that humans are more susceptible to infections at certain times of the day cause of the function related to defense systems (Figure 3) with a daily rhythmic pattern [47]. In viral diseases, deciphering the complex relationships between circadian timekeeping, host immunity, and host-virus interactions has a great potential to unravel the complexity of severe acute respiratory syndrome coronavirus pathogenicity. Infection severity which may be regulated by the circadian clock affecting therapy positively against the novel pathogen may also result in recession of the pathogen. Therefore, circadian nature accounted for responses needs urgent studies on clock–infection biology in SARS-CoV-2. Even the pathophysiology of SARS-CoV-2 infection and its severe complications are not well understood, SARS-CoV-2 severity may also depend on day-night cycle (Figure 3). The battle between virus replication and its neutralization by the host immune system could move simultaneously with the circadian activity phase of the host. Accelerating the activity of circadian immunity factors may help to control virus replication, as circadian clocks provide a competitive advantage to the host against SARS-CoV-2.

As known defect on the circadian clock in hosts causes increasing of pathogen replication and invasion, which indicates that the severity of infection influenced by circadian rhythms. All these cases can be followed by omics technologies comprising genomics, transcriptomics and proteomics data [48, 49]. We stress the combination of the whole omics data to understand all biological cases related to SARS-CoV-2 and host defense mechanism. In consequence, exact treatment times aiming to control of virus will provide higher success in managing the disease. Circadian Expression Profiles Database (CircaDB) ( could be beneficial for the evaluation of omics data with expression profiles related to the circadian clock [50]. Notably, the expression profiles of potential drug targets considering the circadian rhythm of the host could provide a new strategy for effective compounds depending on application doses, which may affect the efficiency of tested pre-purposed or novel compounds. We need to better understand how the circadian clock affects SARS-CoV-2 infection to optimal clinical management of the virus.

Another highlighted point for drawing the attention of the scientists, virulent agents causing infections could help to shape our genome that is also responsible for various polymorphic structure involving many loci related to MHC antigen processing [51]. There is also strong evidence on the effect of the virus infection depending on HLA allele’s expression [52], even their influence is moderate level. Accordingly, the resistance to pathogens comprising viruses conferred by several genes acts at different stages of the host-pathogen interaction. The finding on another virus HIV at least 250 genes affecting the success of the infection has been demonstrated [53]. Consequently, the lack of genes or its impairment could have a negative or less effect on the virus invasion. These cases arise from deficient genes which lead to increased susceptibility to different virus infections [54].


7. Mutational changes in virus genome and traceability

As the inherent property of the virus genome, there is a higher mutational tendency compared to other micro/macro-organisms. The RNA virus replication does not comprise proofreading, as in the DNA cycle that this case renders genetic material to convenient for missing in the transcriptional phase [55]. This feature of RNA virus replication gives rise to high mutation possibility and formation of high yields, occurring of replication in a short time. RNA virus replication involves a complex and dynamic mutant formation ratio in certain genomic sites affecting the nucleotide sequence, caused by environmental factors. The model described for the evolution of virus shows comprehensible pathogenic behavior as quasispecies that have special population structure with a huge number of variant genomes relied on mutations [56]. These high mutation rates raised continually are changing in the relative frequency of the replication and selection. This process is the adoption of primitive replicons involving mutant distributions as seen in RNA viruses within their host [56, 57].

This mutational tendency depends on the population size of the virus involved in the infection. Therefore, a large population results in rapid fitness for cellular organisms. An important challenge in studies on RNA virus evolution is the differentiation, depending on phenotypic traits with ongoing specific mutations. They may associate different mutations with the biological behavior of the virus, which may be existence for the expression of phenotypic traits. These cases are the reason for the formation of restricted types. The findings on epidemiological, functional genomics, and structural studies showed the tolerance of the genetic changes on RNA viruses which are indispensable characteristic properties stemming from the virus evolution. However, the extinction of the viral infection cannot be estimated just from the characteristics of the existing sequence that is an unpredictable transitional phase of the genetic information based on lethal mutagenesis. Relied on the genomic data, the mutational ratio on viral sequences can be easily followed, but the effect of the mutation resulted in less epidemical and pathogenic behavior cannot be determined without clinical studies and monitoring on the host. Even the omics science provides predicational data on the virus, this is not enough alone if not supported by filiation studies on epidemic cases. This mutation limiting the pathogenicity of the virus may result in alternative solutions occurring spontaneously in nature for ending the viral infections.


8. Future outlook

According to our current knowledge on SARS-CoV-2, our facilities limit to the efficient management of the disease and do not be enough to cut down the severity of the pathogen invasion except for protective methods relied on vaccination. Even it seems a major alternative method within other possibilities we are not sure how long the virus will keep its genetic stability without the mutation, which will not render all developed vaccines possible and effective approach for further infection waves. Particularly, we now need to determine whether SARS-CoV-2 is also more severe at certain times of the day or not, which is directly related with crosstalk between the circadian clocks and viral infections besides immunological strategies based on vaccination. Drug designing or testing of pre-purposed compounds with high potential inhibitory on viral replication should be accelerated without wasting time. We are not sure what will tomorrow bring us and how other biotic and abiotic reasons will affect the pathogenicity and genetically that may change the SARS-CoV-2 and other viruses.


9. Conclusions

Given the existing advanced techniques present today for following the genetic structure of the RNA viruses, we are able to find a solution for combating them. But it is possible to face the novel viruses that appeared in further periods in the world cause of shifting in ecological balance and the negative effect of global warming. We should be prepared for the further worst epidemic scenarios resulted from not only due to the virus but also by other microorganisms. It appears this kind of devastating case will be not the last one if the human being continues wasting of the irreversible property of nature and ecological biodiversity. As the genetic material RNA, viruses have their own unique repair process that emerged as early as 3.5 to 2.5 billion years ago in the crust of the world. The uncovered genomic data puts insights on many biological processes for deciphering the dramatic scientific cases threatening world health, environmental issues. We believe that the post-genomic area at which we have completed genomic characterization of whole macro/ micro-organisms will serve for the harvesting of the fruits, which will be useful for the scientists. The data on the genome sequences available, already /or will be soon, will offer all the information concerning the threats such as SARS-CoV-2. Bioinformatics will have a dramatic impact on improving our understanding of this kind of unclear cases. Omics science and yielded purified data are expected to be an important contributor to the global issues waiting for outbreaks cause of pandemic cases. Researches in this field will play a major role and will impact drug discovery and pharmaceutical development comprising health care and the environment.

We stress in this chapter; bioinformatics tools will increase the potential of curing the diseases and producing new effective solutions besides accurately correlated clinical parameters of patient responsiveness to therapy. Bioinformatics used in the building of global databases in molecular microbiology to enhance the accumulative knowledge in the purpose of the experimental data and meta-data about microorganisms. Drastically, whole bioinformatics tools and data yielding with omics science involving data mining will establish dynamically updated and flexible portals upon the novel microbial diversity with biotechnological innovations by our efforts aimed to reach end-products.



The authors wish to thank to Prof. Dr. Nazlı Arda for her valuable support on our studies related to SARS-CoV-2. Moreover, the authors dedicate this chapter to scientists who think outside the box.

Conflict of interest

The authors declare no conflict of interest.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ömür Baysal and Ragıp Soner Silme (June 9th 2021). Utilization from Computational Methods and Omics Data for Antiviral Drug Discovery to Control of SARS-CoV-2, SARS-CoV-2 Origin and COVID-19 Pandemic Across the Globe, Vijay Kumar, IntechOpen, DOI: 10.5772/intechopen.98319. Available from:

chapter statistics

169total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

SARS-CoV-2 Origin and COVID-19 Pandemic Across the Globe

Edited by Vijay Kumar

Next chapter

Organoid Technology and the COVID Pandemic

By Ria Sanyal and Manash K. Paul

Related Book

First chapter

Interaction of Host‐Microbial Metabolism in Sepsis

By Beloborodova Natalia Vladimirovna

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us