Open access peer-reviewed chapter

Drug Design for Malaria with Artificial Intelligence (AI)

By Bhaswar Ghosh and Soham Choudhuri

Submitted: December 7th 2020Reviewed: June 2nd 2021Published: July 7th 2021

DOI: 10.5772/intechopen.98695

Downloaded: 117


Malaria is a deadly disease caused by the plasmodium parasites. Approximately 210 million people get affected by malaria every year resulting in half a million deaths. Among several species of the parasite, Plasmodium falciparum is the primary cause of severe infection and death. Several drugs are available for malaria treatment in the market but plasmodium parasites have successfully developed resistance against many drugs over the years. This poses a serious threat to efficacy of the treatments and continuing discovery of new drug is necessary to tackle the situation, especially due to failure in designing an effective vaccine. People are now trying to design new drugs for malaria using AI technologies which can substantially reduce the time and cost required in classical drug discovery programs. In this chapter, we provide a comprehensive overview of a road map for several AI based computational techniques which can be implemented in a malaria drugs discovery program. Classical computers has limiting computing power. So, researchers are also trying to harness quantum machine learning to speed up the drug discovery processes.


  • Malaria
  • Plasmodium falciparum
  • machine learning
  • drug design
  • Quantum machine learning
  • Topological data analysis

1. Introduction

Malaria is an infectious and dreadful disease caused by the plasmodium parasites. The parasite is transmitted to humans through the bites of infected mosquitoes. Around 210 million people get infected by malaria every year resulting in 440,000 deaths, especially children under the age of five [1]. The Plasmodium falciparumis the primary cause of severe infection and death in most cases. So far, numerous drugs are available in the market such as Quinine, Mepacrine, Chloroquine, Mefloquine, Halofantrine, Artemisinin and their derivatives. Unfortunately, malaria parasites, especially the falciparum species, developed resistance against many of these drugs if not all after some time posing a serious threat to the medication’s efficacy [2]. As a result, the continued discovery of new drugs against malaria becomes essential to mitigate this threat. During the last decade, various drug design programs focusing on malaria are initiated all over the world. Drug discovery is a time-consuming and complex process that can be broadly divided into four main phases: (i) the target selection and validation, (ii) screening and optimization of lead compounds, (iii) preclinical studies, and (iv) clinical trials. First, the targets associated with specific diseases need to be identified. This requires an evaluation of cellular and genetic targets, genomics and proteomics analysis, and prediction bioinformatics. The next step was to hit identification. The compound is identified from the library Molec-snake using combinatorial chemistry, high-throughput screening, and virtual screening. Furthermore, in vivo pharmacokinetic studies are performed to conduct toxicity tests on animals. After, the preclinical tests are conducted successfully, clinical trials are performed on the infected patients. Clinical trial is conducted in three phases. The Phase I constitutes drug safety test with few people; In Phase II, the dose amount of the drug necessary to eliminate the infection is determined on few patients and finally Phase III comprises of precisely quantifying the efficacy of the drug on large number of patients. After the drug candidates’ safety and efficacy are confirmed in the clinical phases, agencies such as the FDA review this compound for approval and commercialization. The total cost of a conventional drug development pipeline is projected to be USD 2.6 billion, and it can take more than 12 years for a complete traditional workflow.


2. Malaria disease overview

Malaria has a wide effect in the subtropical and tropical continents. Sub-Saharan Africa has the most malaria cases as well as significant cases in India, Brazil, Afghanistan, Sri Lanka, Thailand, Indonesia, Vietnam, Cambodia, and China [3, 4]. In many medium climate areas, such as Western Europe and the United States, public health measures and economic development have succeeded in achieving elimination of malaria, apart from occasionally imported cases through international travels. Female Anopheles mosquitoes transmit malaria. There are approximately 400 species of Anopheles in the world. Among them, 30 species are responsible for malaria. Plasmodium species can also infect animals, birds, etc. The four species of malaria parasites namely Plasmodium falciparum, Plasmodium vivax, Plasmodium ovale and Plasmodium malariae, can infect humans under natural conditions. Plasmodium falciparumis the major killer among these species. In August 1897, Ronald Ross for first time reported that the parasites can infect female mosquitoes and further showed that the parasite completed it’s development cycle in the female mosquitoes which is the primary source of the spread of infection from one infected patient to another healthy individual.

2.1 Life cycle

Malaria infection begins when the anopheles mosquito bites someone and injects plasmodium parasitse into the bloodstream in the form of sporozoites. Sporozoites are then passed to the human liver and multiply asexually in liver cells for the next 7 to 10 days, but do not cause any symptom. In animals, parasites are released from liver cells to the vesicles in the form of merozoites which then enter into the heart and in the lungs and finally stays in the lung capillary.

The vesicles eventually disintegrate, enabling the merozoites to progress to the blood stage of development. Merozoites enter red blood cells (erythrocytes) in the bloodstream and multiply until the cells burst [5] resulting in released merozoites which further penetrate more erythrocytes. Each time parasites invade the blood cells, this cycle is repeated, causing fever. Some parasites inside the infected blood cells leave the asexual multiplication cycle [6]. Instead of replicating, these parasites in those blood cells develop into sexual forms of parasites, called gametocytes flowing through the bloodstream. When an anopheles mosquito bites an infected human, it swallows the gametocytes, which develop further into mature sex cells called gametes [7]. The female gametes that have been fertilized grow into actively moving ookinetes which form oocysts outside the surface. Thousands of active sporozoites develop within the oocyst and eventually, the oocyst bursts, releasing sporozoites into the body’s cavity that fly to the salivary glands of the mosquito [8]. As the mosquito bites another person injecting the sporozoites the cycle of human infection begins again (Figure 1).

Figure 1.

Malaria parasite life cycle. This pictute is downloaded from (

2.2 Drug design for malaria using deep learning and others in silico techniques

For the last few years, several reports have been published on drug design for malaria using machine learning or others computational methods. Arash et al. [9] devised a deep learning based technique (DeepMalaria) using a graph based model and SMILE to predict the anti-malarial inhibitory compounds. There are few studies which delineate In Silico approaches of analysing anti-plasmodium compounds. Monika Samant et al. [10] has developed a protein–protein interaction network of Plasmodium falciparumand human host by integrating experimental data and computational prediction of interactions using the interolog method. Manila et al. [11] has also studied inhibitor against the plasmepsin II. This is an aspartic protease encoded by the malarial parasite that is essential for host haemoglobin degradation. They have studied target protein structure and searched a suitable molecule that has a high binding affinity towards the target protein. Although there are few studies on In silico drug target identification or drug design for malaria but real-life implementation of these drugs which have been designed by computational methods is not successfully achieved yet. One of the main reasons for this is that these computation methods are only optimizing physical or chemical properties but these methods are not able to predict whether the designed drug is biologically relevant for malaria. In the next sections, we will provide a road map to harness recent developments in several genomics and AI technique which can be used in drug design programs (Table 1).

DeepMalaria (a graph based model using SMILE)[9]
Analysis of malaria inhibitors[10]
Prediction of antimalarial drug target[12]
inhibitor designing against malarial parasite[11]

Table 1.

Different approaches for in silico antimalarial drug discovery.

Note: These are the most significant paper for in-silico malaria research and DeepMalaria is the only paper where authors has used deep neural network for malaria drug design.


3. Drug target selection using genomics data

The first step in the drug discovery project is to identify a potential target. Transcriptomics data can be effectively utilized for target identification. Differential gene expression analysis provides information about differences in gene expression between normal and diseased states. For malaria parasites, plasmoDB database [13] provides a large variety of trarencriptomics datasets at different stages of the life cycle or at different times after it infects the RBC’s. However, bulk RNA-sequencing data does not have the ability to recognize cell to cell variability within a population as a result of which some essential feature may remain undetected. For example, recent single cell RNA-seq experiment was able to differentiate between sexually and asexually committed scizont from a population of parasites [14]. With the launch of such single cell RNA sequencing (RNA-seq), it is now possible to measure RNA levels on the entire genome scale to gain insights into cellular processes and illuminate the specifics of many important molecular events such as alternative splicing, gene fusion, variation of single nucleotide, and differential genes expression. ScRNA-seq enables analysis of individual cell transcriptomes. ScRNA-seq is generally used to examine transcriptional similarities and variations within the cell population. RNA sequencing technologies continue to advance and provide new ideas for understanding biological processes. Early findings revealed previously unrecognised levels of heterogeneity in embryonic and immune cell population [15, 16]. Thus, the analysis of heterogeneity remains a core reason to embark on scRNA-seq studies. Similarly, assessments of transcriptional variations between individual cells have been used to distinguish unusual cell populations that would otherwise go undetected in pooled cell analysis [17], such as malignant tumor cells within a tumor mass [18], or hyper-responsive immune cells within an otherwise homogeneous group [19]. The scRNA-seq technique is also ideal for the examination of individual cells where each cell is essentially unique, such as individual T lymphocytes expressing highly diverse T-cell receptors [20], brain neurons [21], or early-stage embryo cells [22]. In scenarios such as embryonic development, cancer, myoblast, and lung epithelium differentiation and lymphocyte fate diversification, scRNA-seq is also increasingly used to trace the lineage and developmental relationships between heterogeneous yet related cellular states [22, 23, 24, 25, 26, 27]. In addition to solving cellular heterogeneity, scRNA-seq can also provide important information on essential gene expression characteristics which include studying the expression of monoallelic genes [15, 28, 29], splicing patterns [30], as well as noise during transcriptional responses [30, 31, 32]. Importantly, the study of gene co-expression patterns at the single-cell level could enable the identification of co-regulated gene modules and even inference of gene-regulatory networks underlying functional heterogeneity and cell-type specifications [32, 33]. Additionally, we can extract many more information such as how many genes can be detected and whether a particular gene of interest is being expressed, or whether there has been differential splicing, depending on the procedure of generating the mRNA data. Several single cell RNA-seq experiments have been performed in the last couple of years for plasmodium parasites which paved a much more sophisticated way to characterise gene expression at different stages of the life cycle [14, 34, 35, 36]. Both supervised and unsupervised learning methods can be used to identify deferentially expressed genes at different life cycle stages [37, 38]. These information can be harnessed in a gene ontology analysis or a genome-scale metabolic model to identify function of the genes and subsequently potential target for a drug.


4. Drug discovery with AI

Classical approach of drug discovery is a time consuming and complex process. It takes almost 12 years to discover a drug with the cost soaring to billions of dollars. Several pharma-companies are working on drug discovery but 90 percent of all drug discovery programs are failing due to limitations both at the computational as well as clinical phases. We can divide drug discovery into four major steps (1) target selection and validation; (2) compound screening and lad optimization; (3) preclinical studies; (4) clinical trials. Bio-pharmaceutical industries are focusing on computational approaches in order to enhance the drug discovery processes as well as to reduce research and development expenses by diminishing failure rates in clinical trials and ultimately generate superior medicines. Different machine learning approaches help to identify drug targets, find suitable molecules from data libraries, suggest chemical modifications, etc. There are several steps for drug discovery and we will discuss how computational approaches help in each step of drug discovery process.

4.1 Primary drug screening with AI

4.1.1 Image processing and usage of AI to sort and classify cells

AI technology performs well at classifying images that contained various objects or features [39, 40]. Various dimension reduction techniques like principal component analysis(PCA) can be utilized to reduce the features of the images and then we can use AI-based techniques to classify the cells [41]. Least square support vector machine (LS-SVM), which use classification and regression techniques shows the highest accuracy (95.34) during classification. Modern devices like activated cell sorting images (IACS) are used to measure the optical, electrical, and mechanical properties of cells for highly versatile and scalable cell sorting automation. This instrument use neural network algorithms to do decision-making and high-speed digital image processing. AI is recently used to interpret computerized electrocardiography (ECG). This process plays an significant role in the diagnosis/clinical treatment of the workflow.

4.2 Secondary drug screening with AI

4.2.1 Physical properties predictions

For drug design, features like bioavailability, bioactivity and toxicity are very important defining characteristics of a compound. The Partition coefficient (logP) and melting point affect a drug molecule’s bioavailability. The melting point of a drug indicates how easily it dissolves in water, whereas logP quantifies relative solubility between oil and water. logP is used to calculate cellular drug absorption. A molecular fingerprint, SMILE(simplified molecular input line-entry system) string, potential energy measurements (e.g., from ab initio calculations), molecular graphs with varying weights for atoms or bonds, Coulomb matrices, molecular fragments or bonds, and atomic coordinates in 3D are examples of molecular representations used in an AI drug design algorithm that takes these properties into account [42]. These inputs can be utilized in the DNN training phase and can be processed by various DNNs in different stages, including generative and predictive stages. We can also use reinforcement learning (RL). The generative stage of a DNN is trained to generate chemically feasible SMILES strings using SMILES inputs in a typical sample, while the predictive stage is trained to predict molecule properties. Although the two stages are initially trained separately using supervised learning algorithms, different kinds of biases may be introduced by rewarding or penalising specific properties when the two stages are trained simultaneously.

4.2.2 Predictions of bioactivity and toxicity

The toxicity and bioactivity profiles are significant properties of a compound. Matched molecular pair (MMP) analysis can be used to explore the local changes of the drug molecule and its significance on the molecular properties as well as bioactivity [43]. MMP is used to study the quantitative structure–activity relationship (QSAR) [43]. Random forest(RF), gradient boosting machine (GBMs), and DNNs, the machine learning techniques previously applied without MMP, are used to gather new transformations, fragments as well as modifications of the core static. When it comes to predicting compound activity, DNN outperforms RF and GBM. MMP with ML has been used to predict many properties of bioactivity such as oral exposure, the distribution coefficient (log D) [44, 45], the intrinsic clearance, the absorption, distribution, metabolism, and excretion (ADME), and mode of action owing to the rapid increase of public databases (such as ChEMBL and PubChem) containing a significant amount of structure–activity relationships (SAR). A few methods for predicting the bioactivity of a drug candidate have recently been created. For example, few researcher used a network coding convolution graph with discrete chemicals to extract the drug target sites’ signature into a sustainable space latent vectors (LVS). LVS enables optimization based on molecule gradients in space, allowing predictions to be made based on the model’s differential affinity and other binding properties. The DeepTox algorithm is important for toxicity prediction [46].

4.3 AI in drug design

4.3.1 Prediction of target proteins 3D structure

The 3D structure of a target protein’s ligand-binding site is usually used to design new drug molecules [47, 48]. As a result, researchers have used homology modelling and de novo protein design in the past [49, 50, 51]. With the development of AI-based approaches, prediction of the 3D structure of a target protein can be performed more accurately. The AI tool AlphaFold is successfully implemented to predict the 3D structure of a drug target protein in the recent Crucial Assessment of Protein Structure Prediction contest and performed amazingly well. AlphaFold correctly predicted 25 of 43 structures using only primary protein sequences. These results outperformed the second-place finisher, who correctly predicted just three of the 43 test sequences. AlphaFold is based on deep neural networks (DNNs) trained to predict proteins’ properties based on their primary sequences. It measures the angles between peptide bonds in close proximity as well as the distances between pairs of amino acids. These two features are then combined to generate a score that can be used to predict the accuracy of a proposed 3D protein structure model. These scoring functions are used by AlphaFold to examine the protein structure landscape and find structures that match predictions.

4.3.2 Predicting drug-protein interactions

Quantum mechanics (QM) is a very effective tool for predicting protein–ligand (drug) interactions [52, 53]. These methods takes the help of quantum effects for the simulated system at the atomic resolution, resulting in substantially higher precision than conventional MM methods. The time–cost of QM-based methods is far higher than that of MM methods since MM methods only use basic energy functions based on atomic coordinates [54, 55]. As a result, applying AI methods to QM calculations necessitates a tradeoff between QM accuracy and molecular mechanics(MM) models’ favorable time–cost [56]. AI models have been trained to replicate QM energies from atomic coordinates, and they can outperform MM methods in terms of calculation speed. Deep learning can be used to predict the potential energies of small molecules, thus replacing computationally challenging quantum chemistry calculations with a fast ML method [56]. AI is mainly utilized for atomic simulations and predictions of electrical properties, while DL has been utilized to predict the potential energies of small molecules, thus replacing computationally demanding quantum chemistry calculations with a fast ML method. DFT (density functional theory) potential energies derived from quantum chemistry have been measured and used to train DNNs for large data sets. For example, the accuracy of an ML model improved with increasing sample size in a study of two million elpasolite crystals, reaching 0.1 eV/atom for DFT formation energies trained on 10,000 structures. The model was then used to test different compositional choices for different properties (Table 2) [57].

Tools nameDescriptionReference
AlphaFoldProtein 3D structure prediction
ChemputerA more structured format for documenting a procedure for chemical synthesis[58]
DeepChemA python-based AI platform for different predictions of drug discovery tasks[59]
DeepNeuralNet-QSARMolecular activity predictions[60]
DeepToxToxicity predictions[46]
DeltaVinaA scoring feature for protein-ligand binding affinity rescoring[61]
Hit DexterML models for molecule prediction that could react to biochemical assays[62]
Neural Graph FingerprintsPrediction of properties for novel molecules[63]
NNScoreFor protein-ligand interactions, neural network-based scoring mechanism[64]
ODDTA robust chemoinformatics and molecular modelling toolkit for use[65]
ORGANICAn powerful method for molecular generation to build molecules with desired characteristics[66]
PotentialNetLigand-binding prediction of affinity on the basis of a convolution Neural Network (CNN)[67]
PPB2Polypharmacology prediction[68]
QMLA Python toolkit for quantum ML
REINVENTUsing RNN (recurrent neural network) and RL (reinforcement learning), molecular de novo architecture[69]
SCScoreA scoring feature for the assessment of a molecule’s synthesis complexity[70]
SIEVE-ScoreAn improved method of virtual screening based on structure via interaction-energy-based learning[71]

Table 2.

List of AI-Based Computational Tools for Drug Discovery.

4.4 Possibility of drug design using topological data analysis

If we have a dataset containing information about compounds with respect to their structural features, toxicity, binding affinity to the target then we can use TDA, a mathematical technique to create a similarity network of the compounds by studying shapes preserving high-dimensionality. TDA will enable us to visualize our compound library as a two-dimensional network, with compounds (located in nodes) linked by a series of edges indicating their degree of mutual similarity. As a result, two compounds with identical properties will appear closer together in the network, whereas two compounds with vastly different properties will appear farther apart. This network would make it simple to create subgroups or families of related compounds, which could then be used to choose the best compound.

4.5 Possibility drug design using Quantum machine learning

The Quantum Computer Algorithm (QC) was introduced as a way to speed up classical machine learning algorithm. However, modern computers are not enough to analyze the behavior of the atoms in a molecule. Even the most powerful supercomputers at this time can only simulate relatively simple molecules by significantly limiting their ability to predict the interaction of complex molecules and atoms. Classical computing (CC) and big data analysis methods can screen desired properties of molecules. Researchers use Molecular simulator to simulate interactions between electrons of each atom to test how they react with each other in the real world. However, due to CC’s relatively limited processing capacity, the computationally designed drug fails to deliver the desired results in 90 percent cases during the first phase of the clinical trial causing loss of billions of dollars every year. When Google announced a successful quantum computing experiment (QC) that has achieved what is called “quantum supremacy” (calculations that are too complex to do using CC), it was praised by many peoplearound the world as a key moment for QC. It is also an important development for drug discovery. Indeed, QC can change the drug discovery game entirely by packing enough computational muscles to enable molecular analysis at an unprecedented scale which was completely unimaginable with a CC. Additionally, quantum machine learning (QML) which uses quantum algorithms to perform complex algorithms of learning assignments [72] can augment the computational tools of drug discovery programs even further. Classical machine learning has already shown a significant promise in drug discovery. QML allows scientists to translate classical ML algorithms into quantum circuits to run an ultra-strong quantum computer efficiently. This quantum computer will become more efficient in future as the quantum technologies become less prone to mistakes. Scientists from the University of Warwick, the University of Luxembourg, and the University of Berlin recently created a deep ML algorithm that could predict the molecule’s quantum state faster than before. “Solve the fundamental equation of quantum mechanics in conventional ways requires high-performance computing resources and computational months,” explained the laboratory news. The team said that “the new AI algorithm developed can provide accurate predictions in seconds on a laptop or cellphone.” Researchers from Pfizer have made similar progress, using modeling techniques called predictions of the crystal structure (CSP) to map molecular 3D structures - calculations that usually take months to solve. However, there is a long path to travel before we can say that we have achieved a practical quantum advantage. However, QC and QML hold extraordinary promises for the pharmaceutical business - which, when refined, will pave the way towards the development of new drugs as well as safer and cheaper health products for patients in the future.


5. Conclusions

Many pharmaceutical companies are using AI tools in drug discovery process. Costs and time remain big challenges in drug discovery programs. There are usually about 1 million compounds in a standard high-throughput screening library, where designing each compound typically costs 50–100 USD. As a result, an initial screening phase would cost several million dollars and take months to complete. AI or ML techniques help to do optimization of lead compound. It takes only few days to find the lead compounds by AI, when classical approach takes several years. AI helps to predict bio-activity, toxicity, physical properties, structure prediction of potential drug. There are few companies like Merck, Novartis etc. who are using AI technologies to design drug. Classical computer has some limiting computing power. So, Researcher are trying to make quantum computer or using quantum machine learning algorithm in Classical computer to do computation in a faster way. Since, malaria is one of the major health burdens in the developing world, AI based drug design programs will be immensely helpful in aiding WHO’s goal to reduce cases of malaria by 90 percent by 2030. The inefficacy of vaccination strategies further impose all the burdens on continuous discovery of new drugs. We strongly suggest through this review that AI based drug program would substantially benefit in tackling this debilitating disease with respect to saving human life at lower amount of time and cost.



Authors thank Department of Biotechnology (No. BT/RLF/Re-entry/32/2017), Government of India for funding this project.

The authorship criteria are listed in our Authorship Policy:

This section of your manuscript may also include funding information.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Bhaswar Ghosh and Soham Choudhuri (July 7th 2021). Drug Design for Malaria with Artificial Intelligence (AI), Plasmodium Species and Drug Resistance, Rajeev K. Tyagi, IntechOpen, DOI: 10.5772/intechopen.98695. Available from:

chapter statistics

117total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Plasmodium vivax and Drug Resistance

By Puji Budi Setia Asih and Din Syafruddin

Related Book

First chapter

Aptamers for Targeted Delivery: Current Challenges and Future Opportunities

By Chetan Chandola and Muniasamy Neerathilingam

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us