Free onlinetools and databases. Adapted from Norman et al., 2019 .
In the past few years, improvement in computational approaches provided faster and less expensive outcomes on the identification, development, and optimization of monoclonal antibodies (mAbs). In silico methods, such as homology modeling, to predict antibody structures, identification of epitope-paratope interactions, and molecular docking are useful to generate 3D structures of the antibody–antigen complexes. It helps identify the key residues involved in the antigen–antibody complex and enable modifications to enhance the antibody binding affinity. Recent advances in computational tools for redesigning antibodies are significant resources to improve antibody biophysical properties, such as binding affinity, solubility, stability, decreasing the timeframe and costs during antibody engineering. The immunobiological market grows continuously with new molecules, both natural and new molecular formats, such as bispecific antibodies, Fc-antibody fusion proteins, and mAb fragments, requiring novel methods for designing, screening, and analyzing. Algorithms and software set the in silico techniques on the innovation frontier.
- antibody structure modeling
- computational analyses
- epitope prediction
- paratope prediction
- molecular docking
The development of new therapeutic antibodies is a multiple task challenge. The approval of OKT3 (1986), the first therapeutic mAb, opened the perspective of using this class of product in many other antibody-based therapies. Only the concept of “Magic Bullets,” however, was not enough to provide for safety and efficacy, resulting in many preclinical or clinical trial failures. Soon became evident the need for the humanization of antibodies that mitigated their immunogenicity with the counterpart of decreasing their affinity in many cases. An alternative to circumvent this issue relied on back mutations. However, how to suggest such mutations?
Methodologies have emerged to optimize newly discovered antibodies, either in their affinity to the target or other properties, including in silico methods. The computational capacity has grown exponentially over the past few decades, providing an equally exponential advance in computational drug optimization techniques. The increased public’s databases associated with bigdata works transformed the internet in the most profitable “laboratory” with free reagents (i.e., data), low-risk experiments (i.e., in silico assays), and time safe results. Considering all these aspects together are a simple way to understand the webtools’ strengths.
In this chapter, we present different aspects of in silico methodologies for prospecting and characterizing mAbs. For didactic reasons, we could start from modeling molecules by homology (Section 2), followed by the prediction of epitopes and paratopes, affinity maturation, and molecular docking in Section 3, and finally, the improvement of biophysical and biological properties in Section 4. We aim to present free tools currently available, highlighting their features and applications, allowing the readers to find the most appropriate way to solve their problems. Albeit many tools and their applicability are shown, we call attention to their sequence of use and refinement that is inherent to the particular questions to be answered. We remind you that all the tools presented here were available online and free of charge to the academy until the closing of this edition.
2. Antibody structure modeling
The ability of the antibodies to recognize a diverse set of antigens is acquired by V(D)J recombination and affinity maturation. These two mechanisms contribute to a large number of possible unique sequences of the antibodies, around 1011–15 [1, 2, 3].
Protein structures are strongly related to specificity and function, and their knowledge is crucial to analyze the antibody. Although many crystal structures are available in the Protein DataBank , this number is small (around 6700) compared to the number of possible sequences. Computational modeling is a feasible method for predicting antibodies’ structures and allows us to evaluate antibodies’ properties and to understand antibody–antigen interaction.
The first step in the antibody modeling is its alignment with the germline sequence and the V(D)J classification. The International Immunogenetics Information System (IMGT)  is the central database of germline antibody sequences. Some webtools, such as IgBlast  and IMGT/V QUEST , use this database to align and classify the annotated sequence (Table 1). Since differences in the variable domains are responsible for the structural and functional antibodies’ diversity, most of the structure prediction methods are based on the Fv modeling (Table 1). Framework regions are sequences with highly conserved structures, making it easier to generate their models from template structures. CDRs from the light chain (CDRs L1–3), CDRs 1 and 2 from the heavy chain (CDRs H1–2) are relatively conserved, regarding their structures, being possible to predict their structures based on their amino acid sequence. There is a set of canonical structures that allows us to predict the conformation of each loop. Recent studies have classified non-CDR H3 loops by their type and length and identified 72 clusters . CDR H3 loop is usually longer (5–26 amino acids) than the others and presents a highly diverse structure. The CDR H3 loop also influences the VL-VH orientation and, consequently, the antibody–antigen interaction [9, 10]. For these reasons, the major challenge in antibody modeling is to achieve accuracy in CDR3 loop structure prediction. Usually, the primary sequence of CDR H3 is not enough for the prediction of the loop conformation.
|Antibody structural modeling||Link||Ref.|
|Kotai Antibody Builder|||
|Linear B-cells epitope|| |
|Epitope Conservancy Analysis|||
|Epitope Cluster Analys|||
|Biophysical properties of mAbs||Link||Ref.|
|Prediction of glycosylation spots||Link||Ref.|
Some information, as the position of the key residues, seems to be necessary. CDR H3 can be divided into two regions, the torso and the head . The head usually presents a standard hairpin structure . The torso, which is the region closer to the framework, can be predicted comparing with similar antibodies in which crystal structures are available on databases. Some software as Rosetta  has a platform to predict antibody structure, which first models each CDR and framework based on very similar antibodies and then generates the CDR H3 conformations by assembling small peptides fragments. Software as SPHINX , uses ab initio modeling algorithm to predict CDR H3 conformation. Some software that performs the antibody structure modeling, and also the CDR H3 modeling, are listed in Table 1.
Antibody modeling is an essential step for most of the procedures discussed below, and the researcher must proceed according to the necessary refinement.
3. Antibody-antigen complex: methods of paratope and epitope prediction; molecular docking
In the past few years, improvement in computational approaches provided faster and less expensive outcomes on the identification, development, and optimization of monoclonal antibodies (mAbs). One of the leading goals of the rational development of antibodies is the identification of epitope-paratope interactions. 3D structures of the antibody–antigen complex using X-ray crystallography are the gold standard to reach the binding site information; however, these experimental methods can be money and time-consuming, and scarce to obtain. Thus, computational methods mean a rapid alternative across antibody discovery.
In silico methods, such as homology modeling, molecular docking, and interface prediction can be used to generate 3D models of the antibody–antigen complexes and to predict critical residues involved in antigen binding. Once the antibody–antigen contact residues are known, it can be computationally mutated to screen for residues that could increase antibody specificity and affinity against the target, if desired. Computational techniques to perform such a process fall into those that predict the paratope, the epitope, or the entire antibody–antigen complex (Figure 1).
3.1 Paratope prediction
Paratope represents the antibody amino acid residues in direct contact with the antigen. Since antigen-binding typically involves residues in the CDRs, about 80% of the amino acids constituting the paratope are in the CDRs . However, only a third of the CDR residues participate in antigen binding . Besides the residues in the CDRs, some framework regions are also involved in antigen-binding , relevant to identify the paratope residues precisely.
Several computational methods exist to predict paratopes (Table 1). The online tool Paratome  indicates the antigen-binding regions given the amino acid sequence or 3D structure. It identifies structural elements consensus, which is commonly involved in antigen binding between antibodies by aligning a set of all known antigen–antibody complexes on PDB. Paratome can also identify positions in the framework region that might contribute to antigen recognition [14, 16]. However, this tool does not provide information on the specific residues directly involved in the binding, relevant to antibody engineering experiments, such as in silico affinity maturation .
Statistical approaches such as Antibody i-Patch  utilizes structural information of both the antibody and antigen to generate its paratope prediction. This software assigns a score to each residue indicating how likely they are to be in contact with a given antigen. The higher score implies that the residue is more likely to form part of the paratope, information useful in guiding mutations in the artificial affinity maturation process. The tool considers the structure of both antibody and antigen. It should return more bespoke results, generating more accurate antigen-binding residue predictions.
Recently, machine learning approaches overcame Paratome and Antibody i-Patch. The PRediction Of AntiBody Contacts (proABC)  is a random forest algorithm, based on a machine-learning method, which uses the antibody sequences (eliminating the need for a 3D structure), the hypervariable loop canonical forms and lengths , and the germline family  as features to predict which residues of an antibody are involved in recognizing its cognate antigen. The prediction includes the nature of their contacts, distinguishing between hydrogen bonds, hydrophobic, and other non-bonded interactions. The proABC-2  is an update of the original random-forest antibody paratope predictor, which uses the same set of features, but based on a deep learning framework, thus generating improved predictions and, as a consequence, increasing the success rate and quality of the docked models.
The Parapred  was the first algorithm based on modern deep learning for paratope prediction. This method only requires the amino acid sequence of a CDR and four adjacent residues as its input, without a full sequence, homology model, crystal structure, or antigen information. Its predictions improve the speed and accuracy of a rigid docking algorithm. The AG-Fast-Parapred  is an outperform of Parapred, which for the first time, provides antigen information in an in-depth paratope predictor.
Computational predictors of paratopes can provide valuable information to guide the modeling of antibody–antigen complexes. They will enable the accurate identification of residues that are the most important in determining the antibody’s activity, leaving other residue positions as potential mutation sites, open to exploring other molecular characteristics by engineering.
3.2 Epitope prediction
The epitope is the antigen region in contact with the antibody in an antibody–antigen complex. Accurate identification of an epitope is a substantial step in characterizing the function of an antibody, helps predict possible cross-reactivity, and understand antibody mechanisms of action. The gold standard to determine the antibody epitope and the paratope is the 3D structure of the antibody–antigen complex by X-ray crystallography. Adding to these methods, peptide array, peptide ELISAs, phage display, expressed fragments, partial proteolysis, mass spectrometry, and mutagenesis analyses are also experimental methods applied to identify antibody epitopes. However, those assays can be expensive, time-consuming, and their outcome is uncertain .
Computational methods serve as an alternative to identify antibody epitopes (Table 1) . Methods for computational B-cell epitope prediction can be categorized into the sequence and structure-based methods; the former focus on identifying contiguous stretches of primary amino acid sequence to predict linear epitopes. In contrast, the latter takes into account their 3D structure to predict conformational epitopes. The first in silico B-cell epitope prediction methods focused on amino acid properties within a sequence, such as hydrophobicity, hydrophilicity, or antigenicity [26, 27, 28, 29, 30]. They aim to identify propensities and patterns of a set of residues on the antigen capable of binding to an antibody . However, in many epitope prediction methods, it lacks information on the cognate antibody resulting in limited practical use since the epitopes predicted are generical. Therefore, antibody-specific epitope prediction methods later replaced these approaches.
The first antibody-specific epitope prediction method was suggested in 2007 by Rapberger et al. . Some approaches, such as ASEP , BEPAR , ABEpar , and PEASE , are antibody-specific epitope prediction methods that do not require antibody structure. The PEASE (Predicting Antibody-Specific Epitopes) method is based on a machine-learning model and utilizes the sequence of the antibody, in the absence of structural information. It evaluates a pair score for all combinations of residues from the antibody CDR and residues from the surface-exposed region of antigen. The predictions are provided both at the residue level and as patches on the antigen structure using antibody–antigen contact preferences and other properties computed from the antibody sequence and antigen structure or sequence. The EpiPred  is an antibody-specific epitope prediction method that identifies the epitope region on the antigen combining conformational matching of the antibody–antigen structures and a specific antibody–antigen score. Patches on the antigen structure are ranked according to how likely they are to be the epitope. This method aims to generate epitope predictions specific for a given antibody to facilitate docking.
The most recent approaches, such as MabTope  and the method suggested by Jespersen et al. , are docking-based prediction methods of the epitope. The MabTope methodology integrates both a docking-based prediction method and experiment steps. MabTope involves three phases; in the first, docking the antibody on its target to generate possible conformations of the antigen–antibody complex (docking poses); secondly, ranking these docking poses with the design of the peptides predicted to be part of the epitope; and last, experimental validation procedures based on these peptides. The method suggested by Jespersen et al. combines geometric and physicochemical features correlated in paratope-epitope interactions with statistical and machine learning algorithms. This method can identify the cognate antigen target for a given antibody, besides the antibody target for a given antigen.
Several B-cell epitope databases were developed over the last decades, compiling validated information of the experimentally annotated B-cell epitopes. The Immune Epitope Database (IEDB)  is a multifaceted database that includes epitope sequence and structure, source antigen, the organism from which the epitope is derived, and details of the experiments describing recognition of the epitope. IEDB provides tools to predict linear B-cell epitopes based on sequence characteristics of the antigen [41, 42], and also to predict B-cell epitopes from protein structure, using methods based on solvent-accessible surfaces, such as DiscoTope [31, 43] and ElliPro . The database Epitome  compiles a collection of antibody–antigen complex structures, describes the residues (on antigen and antibody CDRs) involved in the interactions, and provides information concerning specific structural characteristics of the binding regions.
The epitope information from the B-cell epitope databases can evaluate existing epitope prediction methods and develop new and better algorithms for prediction. The identification or prediction of epitopes might be useful as an information for more sophisticated computational antibody design methods, such as antibody–antigen docking.
3.3 Antibody–antigen docking
The paratope and epitope prediction methods can offer useful information on antibody–antigen recognition by identifying a subset of residues involved in antigen–antibody interface formation. However, they do not provide information about the specific pairwise relations between the residues on the antibody and the antigen. This issue can be dealt with antibody–antigen docking, a specialized application of the broader field of molecular docking .
Molecular docking tools (Table 1) allow predicting the best binding interface of two interacting proteins. Different docking algorithms have been developed over the years to predict the 3D structure of biological complexes, and they typically involve two steps: sampling and scoring. In the sampling step, the conformational space surveys for thousands of possible complex conformations (‘decoys’); in the scoring, the decoys are ranked using scoring functions, which sort the decoys to identify or predict the models that are closer to the native conformation (lowest energy structure). The sampling strategy applied during the simulation is used to classify the docking methods. The global docking algorithms do not consider any previous information about the binding interfaces and perform an exhaustive search of the interaction space. The local or integrative docking approaches, on the other hand, use the available experimental data or predicted information about the binding interface to drive the sampling during the docking .
There are three types of docking: rigid-body docking, partial flexible docking, and flexible docking [48, 49]. Most protein–protein docking algorithms perform rigid-body docking, which means that both binding partners are kept inflexible, as rigid molecules, hindering the exploration of conformational degrees of freedom during the binding. These methods are based on the fast Fourier transform search algorithm  and usually are applied when the structures are complementary . Examples of used rigid-body docking software are ClusPro , ZDOCK , and PatchDock . ClusPro is an antibody specific docking, unlike ZDOCK and PatchDock. In partial flexible docking, the antibody remains rigid, while the antigen is flexible . One of the docking tools that applied this concept is AutoDock . AutoDockFR  also allows partial flexibility of the antibody. However, removing the conformational limitations can improve the binding site identification, since, in most situations, protein flexibility is a crucial factor to be considered . Therefore, flexible docking involves both interacting molecules as flexible structures. FLIPDock , Swarmdock , SnugDock , and HADDOCK [60, 61, 62] are examples of these approaches. SnugDock and HADDOCK allow some flexibility alongside chains and the backbone during a refinement stage. Snugdock is the first antibody specific docking to apply flexibility to the target antibody resulting in flexible binding interfaces, which can compensate for the errors caused by homology modeling .
The docking approaches depend on the 3D structures of the components. For antibodies, modeling methods can generate reasonably accurate structures [63, 64, 65]. Since these methods cannot compete with the reliability of crystallography-derived structures, the performances of docking methods are continuously evaluated by the Critical Assessment of Predicted Interactions (CAPRI) experiment [66, 67].
Although there are many successful cases in predicting the protein–protein complexes, docking of antibody–antigen complexes is still challenging [68, 69, 70] due to the inherent properties of their interfaces [71, 72]. As the improvement of predicting antibody–antigen interaction methods, we expect that the results of paratope prediction, epitope prediction, and antibody–antigen docking methods would offer a valuable, fast and economical alternative to obtain reliable information about which to base rational antibody design decisions (Figure 1).
3.4 In silico affinity maturation
Recent advances in computational prediction of the 3D structure of an antibody–antigen complex stimulated the development of in silico methods for redesigning antibodies to improve their biophysical properties, such as binding affinity. These computational methods can screen a large number of variants in a virtual library, in a short timeframe and a cost-effective manner, and select the one most optimized, based on a better understanding of antibody–antigen interactions and structural analysis through different algorithms.
The availability of crystal structures of antibody–antigen complexes is an essential factor in achieving computational antibody affinity maturation. However, when the crystal structures of the complexes are not available, as seen above, many modeling software can predict the 3D structure of the antibody–antigen complex . When we use molecular docking for this purpose, it is possible to identify residues involved in intermolecular interactions and select candidate residues that can be mutated to improve antibody affinity [68, 71, 72, 73, 74].
The prediction of binding affinities usually utilizes energy functions, such as physics-based force fields or knowledge-based statistical potentials derived from the structural database, to estimate changes in the free energy of an antibody–antigen complex with a focus on getting the global minimum energy conformation . Some algorithms and methods identify the lowest energy function of two-body interactions through changes made in the amino acid sequence or the rotameric state of an amino acid [76, 77]. Computational tools, such as molecular dynamics, simulate the dynamic behavior of antibody structures, and provide alternative candidates that can be evaluated by further experimental assessments [49, 78]. Also, some tools can identify hotspot residues on protein interfaces, for which mutation to alanine strongly attenuates binding, and calculate the values for the change in the binding energy of the protein complex upon mutation [79, 80, 81]. These platforms are useful to study the effect of a particular amino acid on the binding affinity of an antibody–antigen complex.
Computational affinity maturation usually focuses on residues in the CDRs. However, as we learned in previous sections of this chapter, some residues in the framework can also play a role in the binding affinity and maintain the canonical conformations of antibodies. Although some mutations in noninteracting regions resulted in improved binding affinity [82, 83], the strategies to modify the CDR to increase antibody affinity are highlighted. Some examples of in silico affinity maturation of antibodies performed comprehensive computational CDR mutagenesis targeting all residues in CDRs or CDR H3 [84, 85, 86, 87]. There are also examples of monitoring all interactions between the 3D structure of an antibody and its cognate target to determine the most relevant CDR residues in the binding by considering their stabilizing energies, inter and intra molecules distance, bonds formation or breakage, and overall complex stability .
These techniques still present deviations from the experimental data; however, they demonstrate that in some scenarios, computational approaches alone can be used for affinity maturation, decreasing the timeframe and costs of antibody engineering.
4. Analyses of mAbs’ properties (solubility, stability, aggregation, chemical degradation, glycosylation)
In vitro antibody affinity maturation frequently results in a destabilizing process, needing compensatory modifications for preserving the thermodynamic stability of mAbs . Emerging in silico tools are significant resources to promote the balance between affinity and stability during antibody engineering (Table 1). Before proceeding to available resources to deal with the destabilizing process, we should mention two types of stability in antibodies: physical and chemical. The physical stability of a protein is related to conformational changes and also to its colloidal stability. Concerning the conformational changes, we relate the free energy (ΔG) of the protein in its unfolded and folded-state, and the folded-state should present less energy than the unfolded state (Gfolded-state < Gunfolded-state) . One of the in silico methods used to investigate folded and unfolded-state energies was mentioned earlier, e.g., molecular dynamics.
Among the numerous in silico tools for predicting conformational stability, DeepDDG  proved to be quite efficient compared to eight other methods (Table 1). DeepDGG is a machine learning method trained from 5444 experimental data. This tool allows the calculation of the energy difference between the mutated protein and its native state. This calculation allows us to observe whether the proposed mutations, for example, for an improvement in affinity, cause structure destabilization. Experimentally, the conformational change of a protein it is accessed indirectly through its melting temperature (Tm), and it can be measured by different experimental techniques, such as scanning calorimetry (DSC), differential scanning fluorometry (DSF), and circular dichroism (CD). The changes between folded and unfolded-state can be reversible, unlike the process known as aggregation, related to colloidal stability.
Although aggregation is different from solubility, the solubility of a molecule is usually calculated for aggregation prediction. In computational chemistry, aggregation and solubility are commonly treated as the same parameter. The aggregation tendency of some mAbs that could impair their efficacy might be prevented through aggregation-prone regions (APRs) analyses. APR assays rely on the hydrophobicity scales and residues’ charge annotations. Among several predictors of solubility and APRs for proteins, it is possible to highlight two endeavors successfully applied to antibodies: Wang et al.  combined tools to predict APRs in commercial mAbs. They found similar aggregation-prone motifs among commercial and non-commercial antibodies, without correlation with 3D structures.
In 2011, Agrawal et al.  compared several aggregation prediction tools demonstrating their usefulness in drug discovery and development, especially when screening a large number of molecules by fast and low cost in silico assays. Recently, Raybould et al.  launched Therapeutic Antibody Profiler (TAP), a web application that compares candidates’ sequences with natural antibody sequences, as natural antibodies are assumed to display favorable biophysical properties. TAP, notably, depends on the previous data of clinical-stage antibody therapeutics (CST). So, the robustness of this method is directly affected by the input improvement of the CST database. One modern and elegant approach drove the development of AggreRATE-Disc , a machine learning-based tool that can predict, within the sequences, mutations that can promote or mitigate aggregation. Although in silico tools can highlight sequences with aggregation issues, they do not substitute experimental assays; however, they can be managed, reducing the totality of necessary tests. These tools and databases help the screening steps across the development/discovery of new therapeutic drugs.
Regarding the chemical stability of antibodies, it is possible to mention the degradation by chemical modification of amino acids, such as asparagine (Asn) deamidation, aspartate (Asp) isomerization, methionine (Met) oxidation, and lysine (Lys) glycation [96, 97]. The IgGs are commonly N-glycosylated at Asp297 residue in each Fc-CH2 domain . These Fc N-glycan are associated with correct folding, stability, aggregation, immunogenicity, and serum half-life of the mAbs. The conformational changes at the CH2 antibody portion by multiple hydrophobic and polar non-covalent interactions harnesses the Fc binding to preferences of binding to C1q and FcɣRs . There are no specific mAb glycosylation’s webtools. Still, some web platforms (Table 1) designed to predict glycosylation sites on human protein sequences could also be useful for mAbs. The IgGs have a conservative N-glycan site; consequently, it needs attention in the engineering process that could accidentally create or remove a glycosylation site and interfering in the mAb chemical stability. In other instances, the glycosylation site is intentionally removed.
To evaluate any possible glycosylation spots, the NetNGlyc 1.0  predicts N-glycosylation sites in human proteins using a trained neural network to distinguish between the acceptor and non-acceptor residue sequences. The N-GlyDE is a two-stage N-glycan prediction tool trained by the human proteome datasets. An algorithm generates a score between N-glycosylation proteins and non-N-linked glycoproteins in the first step. In the second stage, the prediction uses a support vector machine to evaluate if each asparagine-Xaa-serine/threonine (being Xaa different to proline) sequence can be glycosylated . Further, the GlycoSiteAlign  is a tool that aligns amino acid sequences regarding its glycosylation site using the GlyConnect databank. This tool can be useful to compare a high number of mAbs sequences derived from different clones or expression conditions.
In a linear amino acid sequence of an antibody, it is possible to find numerous regions prone to modification. However, one must note that many of these regions may be buried due to the molecule conformation. Therefore, a conformational study is essential to highlight the residues liable to the chemical change. Chemical stability is generally based on statistical analysis derived from experiments or databases available in the literature, although some computational methods are being used [96, 102, 103, 104, 105, 106, 107, 108]. Statistics-based methods depend on data from previous experiments and provide valuable information about the behavior of proteins, being excellent guides during the development of new antibodies.
Currently, there are tools to predict the most varied protein characteristics. Many of them are free for academic purposes (Table 1). A difficulty still faced during the development of an antibody lies in the complexity of details and how one parameter influences another. For example, modifications to improve binding affinity may interfere with the stability of the molecule or even generate/remove a glycosylation spot. In the same way, a structural change for stability can impair binding affinity. There has been an immeasurable evolution of in silico methods, allowing analyzes to be carried out more quickly and at a lower cost than traditional experimental methods.
Advances in bioinformatics allow us to outline different strategies in the discovery of new therapeutic antibodies. There has been significant progress in online tools in recent years, and probably the refinement of the techniques will be increased, bringing more accurate and reliable results.
Online platforms can present a long wait and execution times. The use of those platforms requires a good internet connection, and also a robust computer for analysis and treatment of the generated data.
Bioinformatics is a notably promising field, and indeed, has a prominent place on the innovation frontier.
FAPESP (2015/15611-0, 2016/08782-6, 2019/10724-2), CNPq (307636/2016-0). ORCID Tania M. Manieri (0000-0003-1152-7425), ORCID Carolina G. Magalhães (0000-0001-7099-060X), ORCID Daniela Y. Takata (0000-0001-6369-1775), ORCID João V. Batalha-Carvalho (0000-0002-1526-6915), ORCID Ana M. Moro (0000-0002-0650-7764).