Open access peer-reviewed chapter

The Impact of Bioinformatics on Vaccine Design and Development

By Ribas‐Aparicio Rosa María, Castelán‐Vega Juan Arturo, Jiménez‐ Alberto Alicia, Monterrubio‐López Gloria Paulina and Aparicio‐ Ozores Gerardo

Submitted: November 8th 2016Reviewed: April 18th 2017Published: September 6th 2017

DOI: 10.5772/intechopen.69273

Downloaded: 2389


Vaccines are the pharmaceutical products that offer the best cost‐benefit ratio in the prevention or treatment of diseases. In that a vaccine is a pharmaceutical product, vaccine development and production are costly and it takes years for this to be accomplished. Several approaches have been applied to reduce the times and costs of vaccine development, mainly focusing on the selection of appropriate antigens or antigenic structures, carriers, and adjuvants. One of these approaches is the incorporation of bioinformatics methods and analyses into vaccine development. This chapter provides an overview of the application of bioinformatics strategies in vaccine design and development, supplying some successful examples of vaccines in which bioinformatics has furnished a cutting edge in their development. Reverse vaccinology, immunoinformatics, and structural vaccinology are described and addressed in the design and development of specific vaccines against infectious diseases caused by bacteria, viruses, and parasites. These include some emerging or re‐emerging infectious diseases, as well as therapeutic vaccines to fight cancer, allergies, and substance abuse, which have been facilitated and improved by using bioinformatics tools or which are under development based on bioinformatics strategies.


  • reverse vaccinology
  • immunoinformatics
  • structural vaccinology
  • computational strategies
  • vaccine

1. Introduction

The success of vaccination is reflected in its worldwide impact by improving human and veterinary health and life expectancy. It has been asserted that vaccination, as well as clean water, has had such a major effect on mortality reduction and population growth [1, 2]. In addition to the invaluable role of traditional vaccines to prevent diseases, the society has observed remarkable scientific and technological progress since the last century in the improvement of these vaccines and the generation of new ones. This has been possible by the fusion of computational technologies with the application of recombinant DNA technology, the fast growth of biological and genomic information in database banks, and the possibility of accelerated and massive sequencing of complete genomes [35]. This has aided in expanding the concept and application of vaccines beyond their traditional immunoprophylactic function of preventing infectious diseases, and also serving as therapeutic products capable of modifying the evolution of a disease and even cure it [3]. Vaccines are the pharmaceutical products that offer the best cost‐benefit ratio in the prevention or treatment of diseases. In that it is a pharmaceutical product, a vaccine development and production are costly and it takes years for this to be accomplished. Several approaches have been applied to reduce the times and costs of their development, mainly focusing on the selection of appropriate antigens or antigenic structures, carriers, and adjuvants [6]. One of these approaches is the incorporation of bioinformatics methods and analyses into vaccine development. At present, there are many alternative strategies to design and develop effective and safe new‐generation vaccines, based on bioinformatics approaches through reverse vaccinology, immunoinformatics, and structural vaccinology [7]. This chapter provides an overview of the application of bioinformatics strategies in vaccine design and development, supplying some successful examples of vaccines in which bioinformatics has furnished a cutting edge in their development.

2. Reverse vaccinology

Reverse vaccinology is a methodology that uses bioinformatics tools for the identification of structures from bacteria, virus, parasites, cancer cells, or allergens that could induce an immune response capable of protecting against a specific disease [7].

This approach possesses many advantages over traditional vaccinology: it reduces time and cost in vaccine development; refines the number of proteins to be studied, facilitating the selection process; can identify antigens present in small amounts or expressed only at certain stages, which would hinder or prevent their purification; and allows for the study of noncultivable or risky microorganisms [3]

An important requirement for utilizing this methodology is the availability of genomic information of the pathogen under study and, in some instances, even the human or animal cell genome must be known (i.e., DNA vaccines and therapeutic vaccines). Once the genome sequence is obtained, it is possible to identify all likely proteins that could be expressed. For this purpose, several software systems and programs identify all open reading frames (ORFs) that constitute the sequences expressing the majority of proteins [810].

The next step in reverse vaccinology is to determine several antigenic and physicochemical properties that have been associated with good antigens. These characteristics must be analyzed for each protein in the proteome under study, employing different bioinformatics approaches to select the protein(s) with the best properties for testing through in vitro and in vivo assays, in order to demonstrate its safety and immunogenicity. With the best vaccine candidates, different types of vaccines can be designed and developed, for example: subunit, recombinant, and nucleic acid vaccines [11].

The first application of reverse vaccinology was to study Neisseria meningitidis to obtain a new subunit vaccine based on the genome study of this microorganism by means of bioinformatics tools [12]. Thereafter, this technology has been used to study pathogenic agents including eukaryotic organisms and those involved in diseases transmitted by vectors [13], to design and obtain not only vaccines for humans but also for animals [5]. The majority of new vaccines against infectious diseases that have been developed with this technology are currently found in preclinical or clinical trial. However, it is important to mention that in some instances, the vaccine candidate obtained by this technology could fail as a good vaccine antigen, because it is identified based solely on computational probabilistic studies, and there are other factors that could interfere when this antigen is administered in a complete organism. In addition, vaccine candidates identified by this technology are restricted to proteins or lipoproteins, in that they are encoded in the genome. By reverse vaccinology, it is impossible to identify carbohydrate or lipid antigenic molecules [3, 14].

Some of the important properties to detect good vaccine candidates are described as follows:

2.1. Protein cellular localization

Proteins are localized in different parts of the cell: in the cytoplasm, the cell membrane, or they can be secreted out of the cell and become extracellular. Molecules localized on the cell membrane or extracellularly are better antigens because they are more exposed to host cells, specifically to those related to the immune system; thus, they have a greater probability of generating a protective response [15]. In addition to the software that can predict these characteristics, there are protein databases that generate information about protein subcellular localization, such as LOCATE, LocDB, and eSLDB.

2.2. Adhesin properties

In an infectious process, the first contact of the microorganism with the host cells is through adhesins. Molecules with adhesin properties are vaccine candidates [16]. The probability of identifying an adhesin is calculated based on the frequency of amino acids, dipeptides, or homopolymers present in the protein, and the physicochemical characteristics of each amino acid that constitutes a protein: acidic, basic, neutral, hydrophilic, or hydrophobic. There are programs that analyze all of these characteristics, comparing them with those of adhesins that have been previously proven experimentally [17].

2.3. Antigenicity

There are known sequences of antigens with good in vivo and in vitro immunologic inductions that are compared with each sequence of the proteome under study in order to search for similarities. In this case, it is probable that two proteins with similar sequences have comparable antigenic effects. Moreover, predictions of independent antigenicity alignment exist based on the physicochemical properties of amino acids [18].

2.4. Similarity

It is important to study the similarity between the sequences under study with molecules from the host that will receive the vaccine, as well as between the related etiological agents. Molecules with a high degree of similarity could generate two different effects: the first is undesirable because the antigen could cause autoimmune reactions; on the other hand, if the molecules are similar between other etiological agents, the vaccine could induce cross‐protection [19]. In the case of a vaccine against cancer, it is important to select molecules present in cancer cells but absent in healthy cells. The similarity analysis can also be utilized to search for molecules with the same function, providing an idea of antigenicity and virulence [20]. It is important to predict these values because the main characteristic of a vaccine must be innocuous; in this way, if it is inferred that a protein can be antigenic but also toxic, the better course is not to use it.

2.5. Transmembrane helix

A transmembrane helix is a protein segment of 17–25 amino acids that conforms an α‐helix structure that spans through the membrane cell. Most of the time, vaccine candidates are expressed in biological systems that are different from the original source; in that case, the three‐dimensional (3D) structure of the protein could be changed or difficult to purify if it has a transmembrane helix, due to differences in membrane structure [21]. The low transmembrane helix number is a major characteristic for the selection of a vaccine candidate.

According to the etiology of the disease under study, protein cellular localization, adhesin properties, antigenicity, lack of homology with human proteins to avoid the induction of a potential autoimmune response, and low or null transmembrane helix structures are the main properties that should be identified. This can be addressed by utilizing several computer programs to analyze each of these properties and by bioinformatics tools for the screening and selection of vaccine candidates, according to their top feature values.

There are Websites and downloadable software that can be useful for a particular reverse vaccinology analysis, for example, NERVE, Vaxign, Jenner‐predict server, and Vacceed. In some cases, the proteome‐of‐interest can be uploaded, and in others the organism in a specific database needed to be chosen; for this analysis, some characteristics about the agent and the host are required. In addition, there are databases with vaccine candidates already identified or with complete information about vaccines, for example VIOLIN and MycobacRV (Table 1).

Table 1.

Main characteristics considered for vaccine candidate selection by reverse vaccinology.

3. Immunoinformatics

The immunological system can be classified as cellular or humoral and, depending on the disease, it can be induced the expected immune response. If a vaccine that induces a cellular response is needed, for example a tuberculosis vaccine [22] or a parasite vaccine against leishmaniasis [23], the software must search for antigens that can be recognized by the major histocompatibility complex (MHC) molecules present in T lymphocytes [4]. Software for this purpose include TEpredict, CTLPred, nHLAPred, ProPred‐I, MAPPP, SVMHC, GPS‐MBA, PREDIVAC, NetMHC, NetCTL, MHC2 Pred, IEDB, BIMAS, SVMHC, POPI, Epitopemap, iVAX, FRED2, Rankpep, BIMAS, PickPocket, KISS, and MHC2MIL. At their Websites, there are several options for search for MHC molecules as follows: for a specific species; type I or II, or even the allele(s) that will be employed for the prediction. The latter use different algorithms and some of these analyze the genome of the organism‐under‐study in order to identify new, probable MHC molecules.

On the other hand, if a humoral response is required, the software needs to identify antigens for B cells, for example, in the case of influenza virus or HIV [24, 25]. There is software that specifically searches for sequential epitopes for B cells, including BCPREDS, BepiPred, BEpro or PEPITO, ABCpred, Bcepred, IgPred, and BCEP. In addition, there are also Websites that, utilizing the 3D structure of a protein, can predict conformational epitopes for B cells, including the CEP, SEPPA, and DiscoTope Websites.

These software packages are based on computer training with the epitopes and nonepitopes previously identified, in order to provide values for new proteins and to predict whether or not it is an epitope. There are different techniques for this machine learning: position‐specific scoring matrices (PSSMs), support vector machines (SVMs), hidden Markov models (HMMs), or artificial neural networks (ANNs). Each technique possesses different advantages and accuracy levels [26].

To achieve an analysis, the “immunome” of an organism is required; this includes all of the genes and proteins of cells that take part in its immune response. The study of all of the reactions that take part in the immune response is known as “immunomics” and it is specific for each organism; therefore, it is important to perform the study with information of the recipient organism. There have been many advances in the knowledge of immunomics using molecular biology and other throughput techniques, in order to understand the mechanisms of the immune system [27].

When immunomics and bioinformatics merged, a new science‐denominated immunoinformatics was created, with the purpose of analyzing all of the information of an organism’s immunomics and of making predictions of immune responses against specific molecules [28]. Websites already exist that present databases with antigens, with their epitopes identified in several organisms, and other immunological information, for example, IEDB, SIFPEITHI, IMGT, MHCBN, AntiJen, Dana‐Farber Repository, and AgAbDb.

Once an antigen with the expected response has been identified, immunoinformatics can predict whether a region of an antigen, which usually is a protein, can generate a best stimulus by itself. If a protein has one epitope, this can be employed in a subunit vaccine and can be combined with other epitopes of different organisms in order to generate a polyvalent vaccine, reducing the cost of the formulation. The epitopes can be synthesized artificially or obtained with molecular biology tools. This renders a vaccine safer, not only in its formulation but also in its production process, because there is no risk of the presence of infectious organisms [29].

With the purpose of determining epitopes, the proteins are analyzed to identify hydrophilic regions. The tertiary structure of a protein is based on the interactions between the amino acids and the medium, that is, the region with hydrophilic amino acids is exposed to the exterior. In the opposite case, the hydrophobic amino acids are located in the center of the structure. If this protein interacts with immune cells, it is more probable that contact will be generated with the hydrophilic region, a place localized in the epitope [28].

An additional step can be added, that is the prediction of the stability of peptide binding to MHC, because some epitopes can be attached with greater force and affinity, making activation of the immune system more probable. For this purpose, software has been created such as NetMHCStab, which utilizes artificial networks for the analysis [30].

In the case of cancer vaccines, antigens present in B cell have been developed that can help in the cancer cell elimination process. Additionally, antibodies against regulatory T‐cells have been found with aid in the regression process of the tumor [9, 31]. The latter opens the way in the search for epitopes that could be used in vaccines, allowing better and faster elimination of the disease. For an allergy vaccine, other predictors, such as Allermatch and AlgPred, can be employed with the purpose of identifying proteins with potential allergenicity.

Other software developers have addressed the analysis of the complete immune response against specific antigens, such as C‐ImmSim. In this case, the software uses different algorithms for each step; at the end, a series of graphic representations of each cell type can supply an idea of whether the response is sufficient to protect against a disease [32]. However, the general panorama is limited because this analysis implies the interaction of many cells and molecules and, in many cases, we do not yet know how these can interact with each other in a specific disease.

4. Structural vaccinology

Structural vaccinology focuses on the conformational features of macromolecules, mainly proteins that make them good candidate antigens. This approach to vaccine design has been used mainly to select or design peptide‐based vaccines or cross‐reactive antigens with the capability of generating immunity against different antigenically divergent pathogens. The initial stage in bioinformatics analyses involves linear epitope prediction, taking hydrophilicity as the major characteristic for locating epitopes. However, considering these predictions as the sole factor in determining the potential of a sequence to be immunogenic is risky. For example, the predicted epitopes could be sterically hindered by nearby amino acids, or if a peptide vaccine is being developed, the resulting peptide could adopt a conformation that differs from the peptide within the context of a whole protein, resulting in different conformational epitopes. In fact, available structures from nonoclonal antibodies (Mab) complexed to proteins have demonstrated that, in the majority of cases, Mab recognize conformational rather than linear epitopes [33].

Many epitope‐based vaccines attempt to elicit an antibody‐mediated immune response that could neutralize the activity of toxins or pathogen receptors. Currently, there are many bioinformatics programs that predict protein epitopes. However, the majority of these programs rely only on the hydrophobicity or the hydrophilicity of amino acids. The main drawbacks in this are that many predicted epitopes are buried within the protein; thus, they would not be detected by the antibodies. In addition, the predicted epitopes are linear, leaving out conformational epitopes. In these cases, structural information can be helpful for selecting the epitopes that are exposed to the solvent and that are proximal to functional sites of the target protein, such as catalytic pockets or receptor binding pockets, or for detecting conformational epitopes on the surface of the target protein. Structural information is utilized to map antigenic epitopes to detect conformational features that could affect immunogenicity, such as the structural stability of proteins or the solvent exposure of candidate peptides, and to select antigenic regions shared by proteins of different pathogens that otherwise (i.e., by multiple alignments or epitope mapping) could not be evident. The approach that has been employed to develop vaccines is to perform several bioinformatics analyses at both at the sequence and structure level. For example, Cornick et al. [34] developed universal vaccine candidates against serotype 1 Streptococcus pneumoniae considering epitope prediction and structure modeling.

Protein flexibility can lead to vaccine failure due to high conformational variations that can avoid recognition by cell receptors or antibodies; for example, the failure of vaccines aimed at the HIV has been attributed to high flexibility of the globular head of gp120 [33, 35]. This is a concern, especially with peptides, which are usually more flexible and disordered than when they are found in a complete protein context. Bioinformatics predictions of flexibility can be attained from amino acid sequences (through structural alphabets) or from a 3D structure. High‐performance bioinformatics tools such as molecular dynamics (MD) simulations can be employed to predict the stability of proteins or peptides [36]. This tool can be used to select the appropriate size of a peptide in order to render its stability and to introduce stabilizing mutations or chemical modifications that minimize flexibility, hence yielding better vaccine candidates than simple peptides.

Molecular docking is another bioinformatics tool that can be utilized in the selection and design of target antigens. It consists of complexing two molecules (protein‐protein or protein‐ligand) with best shape complementarity and minimal binding energy. In the field of structural vaccinology, molecular docking can be employed to predict the binding of epitopes to antibodies or to MHC receptors. Candidate antigens can be evaluated through the binding energy of the complex, and even mutations can be introduced to improve binding, but maintaining the specificity of the immune response [37].

Alam et al. [38], in a preliminary report, designed peptides as vaccine candidates against the Zika virus. They predicted MHC‐I restricted epitopes, and then performed docking of these peptides with human leukocyte antigen (HLA) receptors to confirm their predictions. Toxicity analyses included allergenicity prediction. Another study proposed a multivalent vaccine with fused peptides against Staphylococcus aureus. Again, epitope prediction was followed by peptide structure prediction, docking with TLR2, molecular dynamics simulations to assess the stability of the complexes, and finally, allergenicity prediction [39].

Care should be taken while designing peptide‐based vaccines because the resulting peptide could be toxic or allergenic. Several bioinformatics studies perform toxicity or allergenicity prediction on peptide candidates to rule out adverse effects in the resulting candidate vaccine [38, 39].

Bioinformatics analyses have been performed to improve the functionality of antibodies. One study modified the Fc portion of antibodies to increase binding of proteins to the antibodies’ Fc. This approach is relevant to improve the functionality of designed antibodies, to study immune response evasion by some pathogens, and in biotechnology to purify antibodies or proteins [37].

One premise of bioinformatics is to detect epitopes that can be recognized by antibodies, but modeling antibody‐antigen complexes has been difficult because of the mobility of protein loops in the Fab region of antibodies [40]. One way to avoid this drawback is the strategy presented by Koivuniemi et al., which involved homology modeling to deduce the structure of the antigen and the antibody, docking, and molecular dynamics simulations [41] (Figure 1).

Figure 1.

Path to antigen selection and validation. Databanks are created with experimental data from pathogens that can originate in the lab or be gathered through databases. Protein or nucleic acid sequences can be aligned to detect conservation and strain or species coverage. Three-dimensional (3D) structure information can be obtained from databases or inferred from bioinformatics analysis. Several predictions can be mapped into the structure, such as epitope prediction or amino acid conservation. Molecular docking tools can be used to establish interaction between two or more molecules (antibodies and cell receptors). Finally, the stability of these interactions can be assessed through energy calculations or molecular dynamics simulations.

5. Special cases: vaccines against infectious and noninfectious diseases

5.1. Vaccines against infectious diseases

5.1.1. Tuberculosis

Tuberculosis is an infectious disease caused by Mycobacterium tuberculosis, which is the most virulent and transmissible bacterium of the genus; however, it is a microorganism that is difficult to study because of its requirements and slow growth. The number of new cases worldwide rose to 10.4 million [42]; this high incidence rate is based on several factors, and one of the most important factors is the ineffectiveness of the vaccine used at present: the BCG. Thus, why many working groups are investigating new vaccines that can improve the level of protection against this disease, and one of the tools utilized is reverse vaccinology [10].

One strategy applied for vaccine design is to identify the structures present only in M. tuberculosis and absent in Mycobacterium bovis BCG [43]. In addition, the vaccine candidates studied presented the characteristics described previously, such as nonhuman homology, adhesins [44], secreted or membrane structures [45, 46] with low transmembrane helix, and in addition, the proteins expressed in the latent or active state of the microorganism [47]. The immunity sought is a protective response that is cellular. Therefore, immunoinformatics is based on the study of T‐cell epitopes [22, 4850].

Several candidates and epitopes have been found with different software. Some of these have been expressed and proven in vitro and in vivo, demonstrating their immunogenicity and protective effect. Among these are highlighted the ESAT‐6, PE and PPE protein family group [51], and the Ag85 protein family, which obtained better immune response than the BCG vaccine in an animal model [43].

5.1.2. Influenza

The design of influenza vaccines is challenging due to the influenza virus’s antigenic plasticity. Influenza viruses evade the immune response through antigenic drift and antigenic shift [52], rendering a long‐lasting immune response very difficult. Current influenza vaccines contain hemagglutinin (HA) and neuraminidase (NA) as main antigenic components, usually having one type‐B strain, and one H1 and one H3 subtype strain [53, 54]. Predicting the composition of next‐year’s vaccines relies on epidemiological data, although evolutionary models can aid in predicting antigenic drift, improving vaccine design [55].

Influenza HA recognizes cell receptors and mediates membrane fusion between the virus and the target cell. The globular head of HA contains the receptor binding site and the majority of the antigenic sites; consequently, this region is also the most variable. The stem region contains the fusion peptide and, although it previously was not considered a target for vaccine development, the discovery of neutralizing antibodies aimed at this region revealed its potential in vaccine design [52, 56]. Several conserved regions have been described in the stem region of HA [57], which make a universal vaccine a possibility. It has been found that neutralizing antibodies can bind to intact trimers, confirming the possibility of a universal vaccine aimed at the HA stem. In fact, engineered HA stem antigens have been shown to elicit immune responses against heterosubtypic challenge models and serve as a proof‐of‐concept that these vaccines work [58].

Given the high cooperation, hence availability, of influenza viral protein sequences, there are open databases such as OpenFluDB [59] or the Influenza Research Database [60] that help in the designing of influenza vaccines. EpiCombFlu is a database that aids in defining conserved epitopes across influenza strains that can be combined to maximize strain coverage. Analysis of these sequences has led to the identification of conserved motifs among influenza strains that can be targets in vaccine or inhibitor design [61].

5.1.3. Chikungunya fever

For CHIKungunya Virus (CHIKV), there are some vaccine candidates in clinical trials, but there is no licensed vaccine to date. Efforts include the development of vaccines of inactivated virus, live attenuated virus (LAV), and virus‐like particles (VLPs). In preclinical studies, LAV and VLP vaccines have been promising, but during clinical trials, they have shown inadequate immunogenicity and residual virulence, for example, the risk of production of chronic rheumatism seen for LAV [62]. However, vaccines should be able to induce high levels of neutralizing antibodies, ideally with only one dose, LAV remain good candidates for which attenuation strategies are of central importance.

Because the CHIKV E2 glycoprotein is thought to interact with cellular receptors and has demonstrated to elicit neutralizing antibodies, generating protection against lethal challenge in mice [63], it has been extensively studied. Kam et al. [64] mapped its epitope‐containing sequences using experimentally infected macaque antibodies. Their results revealed that one of four recognized regions mapped onto the surface of E2, that the majority of the epitopes clustered in the middle of the protein, and that antibody recognition of E2 changes throughout the disease course in experimentally infected macaques may be due to the spatial positions of the B‐cell epitopes on the native form of the E1/E2 glycoprotein complex. As part of the study, these authors included computational modeling utilizing the structural data of the E2 retrieved from PDB and visualizing the results using UCSF CHIMERA software.

In the design of an LAV for CHIKV, Gardner et al. [65] considered three known facts: that the substitution for positively charged residues in E2 that confer enhanced, Heparan sulfate (HS)‐dependent infectivity in vitro is a common phenomenon among cell culture‐passaged strains of some CHIKV‐related viruses; that these mutations can be selected from within only a few serial passages in vitro, and that viruses whose in vitro infectivity is enhanced by artificial HS attachment/entry are typically attenuated/avirulent in vivo. In the case of CHIKV, an LAV candidate, attenuated by serial passages in MRC‐5 fibroblasts, the authors predicted an amino acid substitution at E2 position 82, which was highly dependent upon ionic interaction with HS for infectivity. Afterward, this mutation demonstrated the attenuation two strains of CHIKV in vivo. Based on this fact [59], E2 mutations were selected that confer HS dependence on infectivity by serial passage of wild‐type CHIKV‐LR on different cell types in vitro. Then they introduced these mutations individually into CHIKV and identified a panel of E2 mutations that confer reduced virulence in a murine model. In this work, computational modeling played an important role because it helped to explain the effect of the single amino acid mutations on altering the electrostatic profile of the E2 glycoprotein and increasing net positive charge in two exposed regions.

5.1.4. Zika virus disease

Zika virus, a positive single‐stranded RNA virus transmitted by mosquito bites, is currently spreading worldwide and there is no available commercial vaccine. Several candidates are undergoing preclinical and clinical studies, and some platforms being investigated include inactivated, subunit/peptide, DNA‐based, live‐attenuated, and vectored vaccines. For a vaccine against this pathogen, multiple bioinformatics strategies are being exploited as an essential tool; the majority of studies involve in silico predictions to find the best epitopes. Dikhit et al. [66] found nine promiscuous highly conserved class I restricted epitopes among capsid 1, the envelope, and NS2A, NS4B, and NS5 viral proteins. Then, the tertiary structure of the selected epitopes was modeled using PEPstr and finally there was docking to HLA calculation with PatchDock.

Dar et al. [67] utilized ProPred1 to predict antigenic epitopes for HLA class I, as well as 48 antigenic epitopes for HLA class II employing ProPred immunoinformatics algorithms. These authors found 21% of MHC class I binding epitopes among NS5 viral proteins, followed by the envelope (17%). For MHC class II, NS5 contained 19% of predicted epitopes, and 17% were in the envelope, 17% in NS1, and 17% in NS2. Additionally, they obtained the antigenicity score for each predicted epitope using the VaxiJen 2.0 tool. Ashfaq and Ahmed are other researchers who used ProPred1 and ProPred, but focused in the envelope protein, finding two highly antigenic candidates among T‐cell epitopes. They also performed a molecular docking to study the interactions of B‐cell epitopes with HLA‐B7 [68].

Another bioinformatics‐based study is that of Mirza et al. [69], in which the authors predicted antigenic B‐cell (IEDB) and CTL epitopes (NetCTL.1.2 server). They determined, by in silico studies, surface accessibility, surface flexibility, hydrophilicity, homology modeling (MODELLER ver. 9.12, CHARMM, WhatIF, PROCHECK, Verify 3D), and structure‐based epitope prediction for E protein, NS3, and NS5. They performed molecular docking of the ZIKV‐E protein with HLA‐A0201, of the ZIKV‐NS3 protein with HLA‐B2705, and of the ZIKV‐NS5 protein with HLA‐C0801 (PatchDock rigid‐body docking server, FireDock server). Finally, these authors investigated the stability of the docked peptide‐MHC I protein complexes by performing Molecular Dynamics (MD) simulations (AMBER 12 simulation package) [69].

An important aspect in the design of a vaccine is the study of the virus’s molecular biology, its proteome, and the genotypes. Sun et al. reported such data, to our knowledge for the first time, using new computational methods for annotation of mature peptide proteins, genotypes, and recombination events for all ZIKV genomes [70]. In an effort to aid in the development of vaccines and therapeutic drugs, an integrative multi‐omics platform, ZikaVR ( was created by Gupta el at.. This platform contains genomic, proteomic, and therapeutic information about the Zika virus [71].

5.2. Vaccines against noninfectious diseases

5.2.1. Vaccines to treat addictions

In the search for a vaccine to fight drug abuse, cocaine, nicotine, and methamphetamines are some of the main targets; however, to date there are, to our knowledge, no US: Federal Drug Administration (FDA)‐approved vaccines. The development of such products has been hindered by the need of a carrier protein and an adjuvant to combine with haptens of the drugs to elicit the necessary antibody levels expected to interfere with the transport of the drug to the Central Nervous System (CNS), thus with the expected effect [72].

Kimishima et al. have explored tetanus toxoid (TT), the bacterial flagellin FliC, alum, and CpG (cytosine‐phosphate‐guanine oligodeoxynucleotide) in the development of an anticocaine vaccine. TT is used as a carrier; FliC acts as a carrier protein, and additionally it has been demonstrated that it stimulates toll‐like receptor 5 (TLR5), therefore inducing myeloid differentiation factor 88 (MyD88), which renders a TH2 response to predominant production of IgG1 and no cytotoxic T lymphocytes (CTL). CpG (a B‐class OligoDeoxyNucleotide [ODN]) motifs can be used as activators of TLR9 to promote a TH1‐type immune response, stimulating B‐cell immune responses to generate IgG2a and CTL [73].

Lockner et al., in a first attempt, conjugated GNE (a cocaine hapten) with a recombinant FliC, utilized in silico modeling and computational analysis of the recombinant protein to ensure its structural integrity and conservation of the binding to TLR5; by Modeler, they studied the homology of the recombinant flagellin, as well as the number of lysines per domain and relative solvent accessibility with and without GNE cocaine haptens present. Their computational results agreed with those used for experimentation since then in a TLR5 reporter assay: the modified flagellin protein still activated TLR5 when the hapten density was <10 GNE per FliC. Finally, the authors showed that cocaine‐flagellin conjugates induced, in a dose‐dependent model, the production of anticocaine antibodies in mice, improving the response with the adjuvant alum [73, 74].

On the other hand, as they observed in prior experiments in which they conjugated GNE (a cocaine hapten) with FliC, TLR5 activation was attenuated at higher hapten densities (i.e., above ∼10 GNE per flagellin). Consequently, they induced a mutation in the flagellin gene (mFliC), which could protect the TLR5 binding interface against covalent modification with the bulky GNE hapten, thus potentially preserving the ability of the modified flagellin to activate TLR5 independently of hapten densities. mFliC consisted of a mutation of the 10 lysine residues within the D0 and D1 domains of wild‐type FliC (as well as one additional lysine residue previously introduced through cloning) to arginine residues [73]. Again, bioinformatics was necessary to assess the secondary structure and MHC‐II binding predictions for FliC and mFliC, employing the PSIPRED ( method and the external software from IEDB (, respectively [74].

The computational results for MHC‐II binding and hapten presentation revealed that the FliC conjugate was better than mFliC; these results indirectly correlated with those conducted by enzyme‐linked immunosorbent assays (ELISA) and radioimmunoassays (RIA). However, because FliC and mFliC exhibited poor efficacy as carrier proteins when comparing two formulations, GNE‐FliC + CpG and GNE‐TT + CpG, through a hyperlocomotion test and analysis of cocaine in blood, where GNE‐TT + CpG had best efficacy, the authors proposed the investigation of monomers of FliC instead of the polymeric form utilized [74]. Allergies

Allergies comprise another area where vaccine (specific immunotherapy (SIT)) investigation is conferred due to the association of allergy with asthma and anaphylaxis. Some common allergies are caused by cat, peanut, and cockroach allergens, with the specific immunotherapy (SIT) effective, but sometimes associated with IgE‐dependent adverse events. In allergies, computational approaches have been applied to find T‐cell epitopes to target allergen‐specific T cells, thus improving the safety of the immunotherapy.

In 2011, Worm et al. performed a clinical study administering the ToleroMune cat vaccine (short synthetic peptide sequences from the major cat allergen Fel d 1) to 66 subjects with cat allergy. The authors identified each peptide‐MHC interaction by using physical binding assays and analyzed these in silico with the immune epitope database (; in vitro, the individual peptides and the vaccine were at least 1000‐fold less able to induce basophil histamine release associated with adverse effects than the native allergen. The vaccine administered intradermally (i.d.) or subcutaneously (s.c.) showed no serious adverse events (SAEs) during the study and no subject withdrew from the latter due to an adverse event. Thus, the vaccine was safe and well tolerated [75].

Another example of research to improve safety comprises the work of Pascal et al. for the treatment of peanut allergy, which presents symptoms ranging from mild oropharyngeal pruritus to life‐threatening anaphylaxis, considerably compromising the patient’s quality of life. Ara h 1, Ara h 2, and Ara h 3 include the three major peanut allergens, although IgE antibodies to Ara h 2 correlate most closely with clinical reactivity, and in vitro Ara h 2 and its homologue, Ara h 6, are more potent inducers of basophil degranulation than Ara h 1 and Ara h 3. Because conventional s.c. immunotherapy with crude peanut extract entertains a high risk of anaphylaxis and since peptides have been successful in the desensitization of patients to cat‐allergy and bee venom‐allergy, an alternative is the use of peptide fragments that retain immunogenicity, but that are of insufficient length to cross‐link allergen‐specific IgE on mast cells and basophils. In addition to proliferation assays utilizing peripheral blood mononuclear cells (PBMCs) from peanut‐allergic children and Ara h 2 peptides, Pascal and colleagues predicted, to our knowledge for the first‐time, epitopes in a food‐allergy through the artificial neural network‐based alignment (NN‐align) method NetMHCIIpan‐2.0. Their objective was to analyze additional theoretical peptides that are not included in the proliferation assays, finding that both strategies, in vitro and in silico, rendered consistent results; therefore, they were able to select peptide candidates for the development of a peanut allergy vaccine [76].

Regarding allergy to cockroaches, there are some research studies that have followed the in silico prediction of B‐cell, T‐cell, and IgE‐binding epitopes in a first stage to propose a vaccine formulation. Chen et al., Yang et al., and Tong et al. are members of a workgroup that studied this allergy by means of in vitro and in silico approaches. The allergens analyzed were Per a 6 and Bla g (found in Periplaneta americana and Blattellagermanica, respectively) [7779].

Chen et al. employed three immunoinformatics tools: the Protean™ system (DNAStar, Inc., Madison, WI, USA); the bioinformatics predicted antigenic peptides (BPAP) system (, and the BepiPred 1.0 server (, which utilizes four properties, including hydrophilicity, flexibility, accessibility, and antigenicity as parameters for the prediction of B‐cell epitopes. After a consensus of the three bioinformatics tools, these authors selected the final potential epitope regions (regions whose consensus epitope result was 67 or 100%) to develop a vaccine. Additionally, through the NN‐align method NetMHCIIpan‐2.0 ( for HLA‐DR alleles and NetMHCII‐2.2 ( for HLA‐DQ alleles, they found strong and weak binders [77]. In 2016, Yang et al. and Tong et al. predicted, using the same strategy, B‐ and T‐cell peptides belonging to Per a 9 and Per a 10 (two major allergens as assessed by enzyme‐linked immunosorbent assays (ELISA) but, in order to obtain substantial quantities of these allergens for use in functional studies, they cloned and expressed them in an Escherichia coli system [78, 79] Cancer

Since T cells educated in the thymus do not recognize mutated antigens expressed in cancer cells, there is no negative selection, and these neoantigens are ideal targets for therapeutic vaccination; furthermore, they are not present in healthy tissue. On the other hand, advances in next‐generation sequencing (NGS) permit the sequencing of genomes, exomes, or transcriptomes within hours. Therefore, they investigated the mutanome (the tens‐to‐hundreds of somatic nonsynonymous mutations) in order to select the specific targets for the recognition by cytotoxic and helper T cells with antitumor activity. The complexity of some experimental tools such as mass spectrometry hampers its usefulness in the selection of targets in a clinical setting where personalized therapy is needed. In this context, because it is not possible to analyze all of the mutations, bioinformatics addresses this problem and has become important in the selection of targets and in their prioritization [80].

An example of the success of in silico predicted mutations is the study of Castle et al., where the authors, applying thresholds for MHC class II binding prediction and mRNA expression levels, without further validation by immunogenicity testing, were able to enrich immunogenic MHC class II‐restricted epitopes. They obtained efficient and sustained control of advanced tumors in mice [81].

Although there are successful in vitro and preclinical studies that initiated by utilizing computational approaches, the majority of algorithms predict the affinity of peptide binding to MHC molecules, which may not correlate well with their immunogenicity or may not predict peptides that are not generated and presented. Moreover, some immunogenic ligands may escape detection. Additionally, in general in silico prediction of ligands for MHC II is less accurate than for MHC molecules. Because the immunogenicity of predicted peptides has been reported to correlate better with peptide‐MHC complex stability, the use has been proposed of biochemical methods to reduce the number of in silico predicted MHC ligands and to generate data that helps in the training of prediction algorithms to validate peptide binding predictions. Some biochemical methods include peptide rebinding (referred to as iTopia), peptide‐rescuing, and refolding for MHC I peptide binding validation, and peptide‐driven refolding for MHC II [82].

Another approach to circumvent the limitations of the binding prediction for MHC molecules is molecular docking, a structure‐based method that has been tested on both peptide‐MHC class I and II complexes. This method can be applied to previously predicted peptides and is expected to improve prediction accuracy in order to identify the best MHC class I and II binders. Following this strategy, in a research for vaccine candidates against breast cancer, predicted discontinuous B‐cell epitope peptides using PEPOP for the first time, then the 3D structure of epitope‐based peptides by PEP‐FOLD server, and their theoretical physicochemical properties utilizing the Prot Param algorithm, and finally, with.pdb files of two class I and seven class II MHC‐peptide complexes from the protein data bank, perform molecular docking through the genetic optimization for ligand docking (GOLD) 5.4. After virtual screening, they confirmed a predicted peptide agreement between their docked results and previous experimental results (i.e., the immunogenicity of this peptide was confirmed in vivo studies), thus proposing molecular docking as an additional technique to improve the selection of peptide candidates for cancer vaccines [83].


The authors are grateful for the support of this work from Instituto Politécnico Nacional (IPN) Grants SIP‐IPN 20171932, 20172085, 20171766, and 20171992. RMR‐A and GA‐O thank EDD‐IPN; JAC‐V and AJ‐A thank EDI‐IPN. All authors are grateful to COFAA‐IPN. Authors thank Margaret Brunner for English editing.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ribas‐Aparicio Rosa María, Castelán‐Vega Juan Arturo, Jiménez‐ Alberto Alicia, Monterrubio‐López Gloria Paulina and Aparicio‐ Ozores Gerardo (September 6th 2017). The Impact of Bioinformatics on Vaccine Design and Development, Vaccines, Farhat Afrin, Hassan Hemeg and Hani Ozbak, IntechOpen, DOI: 10.5772/intechopen.69273. Available from:

chapter statistics

2389total chapter downloads

2Crossref citations

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Immunization against Pertussis: An Almost Solved Problem or a Headache in Public Health

By Waldely de Oliveira Dias, Ana Fabíola R.O. Prestes, Priscila S. Cunegundes, Eliane P. Silva and Isaias Raw

Related Book

First chapter

Introductory Chapter: Leishmaniasis: An Emerging Clinical Syndrome

By Farhat Afrin and Hassan A. Hemeg

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us