Free onlinetools and databases. Adapted from Norman et al., 2019 .
In the past few years, improvement in computational approaches provided faster and less expensive outcomes on the identification, development, and optimization of monoclonal antibodies (mAbs). In silico methods, such as homology modeling, to predict antibody structures, identification of epitope-paratope interactions, and molecular docking are useful to generate 3D structures of the antibody–antigen complexes. It helps identify the key residues involved in the antigen–antibody complex and enable modifications to enhance the antibody binding affinity. Recent advances in computational tools for redesigning antibodies are significant resources to improve antibody biophysical properties, such as binding affinity, solubility, stability, decreasing the timeframe and costs during antibody engineering. The immunobiological market grows continuously with new molecules, both natural and new molecular formats, such as bispecific antibodies, Fc-antibody fusion proteins, and mAb fragments, requiring novel methods for designing, screening, and analyzing. Algorithms and software set the in silico techniques on the innovation frontier.
- antibody structure modeling
- computational analyses
- epitope prediction
- paratope prediction
- molecular docking
The development of new therapeutic antibodies is a multiple task challenge. The approval of OKT3 (1986), the first therapeutic mAb, opened the perspective of using this class of product in many other antibody-based therapies. Only the concept of “Magic Bullets,” however, was not enough to provide for safety and efficacy, resulting in many preclinical or clinical trial failures. Soon became evident the need for the humanization of antibodies that mitigated their immunogenicity with the counterpart of decreasing their affinity in many cases. An alternative to circumvent this issue relied on back mutations. However, how to suggest such mutations?
Methodologies have emerged to optimize newly discovered antibodies, either in their affinity to the target or other properties, including
In this chapter, we present different aspects of
2. Antibody structure modeling
The ability of the antibodies to recognize a diverse set of antigens is acquired by V(D)J recombination and affinity maturation. These two mechanisms contribute to a large number of possible unique sequences of the antibodies, around 1011–15 [1, 2, 3].
Protein structures are strongly related to specificity and function, and their knowledge is crucial to analyze the antibody. Although many crystal structures are available in the Protein DataBank , this number is small (around 6700) compared to the number of possible sequences. Computational modeling is a feasible method for predicting antibodies’ structures and allows us to evaluate antibodies’ properties and to understand antibody–antigen interaction.
The first step in the antibody modeling is its alignment with the germline sequence and the V(D)J classification. The International Immunogenetics Information System (IMGT)  is the central database of germline antibody sequences. Some webtools, such as IgBlast  and IMGT/V QUEST , use this database to align and classify the annotated sequence (Table 1). Since differences in the variable domains are responsible for the structural and functional antibodies’ diversity, most of the structure prediction methods are based on the Fv modeling (Table 1). Framework regions are sequences with highly conserved structures, making it easier to generate their models from template structures. CDRs from the light chain (CDRs L1–3), CDRs 1 and 2 from the heavy chain (CDRs H1–2) are relatively conserved, regarding their structures, being possible to predict their structures based on their amino acid sequence. There is a set of canonical structures that allows us to predict the conformation of each loop. Recent studies have classified non-CDR H3 loops by their type and length and identified 72 clusters . CDR H3 loop is usually longer (5–26 amino acids) than the others and presents a highly diverse structure. The CDR H3 loop also influences the VL-VH orientation and, consequently, the antibody–antigen interaction [63, 64]. For these reasons, the major challenge in antibody modeling is to achieve accuracy in CDR3 loop structure prediction. Usually, the primary sequence of CDR H3 is not enough for the prediction of the loop conformation.
Some information, as the position of the key residues, seems to be necessary. CDR H3 can be divided into two regions, the torso and the head . The head usually presents a standard hairpin structure . The torso, which is the region closer to the framework, can be predicted comparing with similar antibodies in which crystal structures are available on databases. Some software as Rosetta  has a platform to predict antibody structure, which first models each CDR and framework based on very similar antibodies and then generates the CDR H3 conformations by assembling small peptides fragments. Software as SPHINX , uses
Antibody modeling is an essential step for most of the procedures discussed below, and the researcher must proceed according to the necessary refinement.
3. Antibody-antigen complex: methods of paratope and epitope prediction; molecular docking
In the past few years, improvement in computational approaches provided faster and less expensive outcomes on the identification, development, and optimization of monoclonal antibodies (mAbs). One of the leading goals of the rational development of antibodies is the identification of epitope-paratope interactions. 3D structures of the antibody–antigen complex using X-ray crystallography are the gold standard to reach the binding site information; however, these experimental methods can be money and time-consuming, and scarce to obtain. Thus, computational methods mean a rapid alternative across antibody discovery.
3.1 Paratope prediction
Paratope represents the antibody amino acid residues in direct contact with the antigen. Since antigen-binding typically involves residues in the CDRs, about 80% of the amino acids constituting the paratope are in the CDRs . However, only a third of the CDR residues participate in antigen binding . Besides the residues in the CDRs, some framework regions are also involved in antigen-binding , relevant to identify the paratope residues precisely.
Several computational methods exist to predict paratopes (Table 1). The
Statistical approaches such as Antibody i-Patch  utilizes structural information of both the antibody and antigen to generate its paratope prediction. This software assigns a score to each residue indicating how likely they are to be in contact with a given antigen. The higher score implies that the residue is more likely to form part of the paratope, information useful in guiding mutations in the artificial affinity maturation process. The tool considers the structure of both antibody and antigen. It should return more bespoke results, generating more accurate antigen-binding residue predictions.
Recently, machine learning approaches overcame Paratome and Antibody i-Patch. The PRediction Of AntiBody Contacts (proABC)  is a random forest algorithm, based on a machine-learning method, which uses the antibody sequences (eliminating the need for a 3D structure), the hypervariable loop canonical forms and lengths , and the germline family  as features to predict which residues of an antibody are involved in recognizing its cognate antigen. The prediction includes the nature of their contacts, distinguishing between hydrogen bonds, hydrophobic, and other non-bonded interactions. The proABC-2  is an update of the original random-forest antibody paratope predictor, which uses the same set of features, but based on a deep learning framework, thus generating improved predictions and, as a consequence, increasing the success rate and quality of the docked models.
The Parapred  was the first algorithm based on modern deep learning for paratope prediction. This method only requires the amino acid sequence of a CDR and four adjacent residues as its input, without a full sequence, homology model, crystal structure, or antigen information. Its predictions improve the speed and accuracy of a rigid docking algorithm. The AG-Fast-Parapred  is an outperform of Parapred, which for the first time, provides antigen information in an in-depth paratope predictor.
Computational predictors of paratopes can provide valuable information to guide the modeling of antibody–antigen complexes. They will enable the accurate identification of residues that are the most important in determining the antibody’s activity, leaving other residue positions as potential mutation sites, open to exploring other molecular characteristics by engineering.
3.2 Epitope prediction
The epitope is the antigen region in contact with the antibody in an antibody–antigen complex. Accurate identification of an epitope is a substantial step in characterizing the function of an antibody, helps predict possible cross-reactivity, and understand antibody mechanisms of action. The gold standard to determine the antibody epitope and the paratope is the 3D structure of the antibody–antigen complex by X-ray crystallography. Adding to these methods, peptide array, peptide ELISAs, phage display, expressed fragments, partial proteolysis, mass spectrometry, and mutagenesis analyses are also experimental methods applied to identify antibody epitopes. However, those assays can be expensive, time-consuming, and their outcome is uncertain .
Computational methods serve as an alternative to identify antibody epitopes (Table 1) . Methods for computational B-cell epitope prediction can be categorized into the sequence and structure-based methods; the former focus on identifying contiguous stretches of primary amino acid sequence to predict linear epitopes. In contrast, the latter takes into account their 3D structure to predict conformational epitopes. The first
The first antibody-specific epitope prediction method was suggested in 2007 by Rapberger et al. . Some approaches, such as ASEP , BEPAR , ABEpar , and PEASE , are antibody-specific epitope prediction methods that do not require antibody structure. The PEASE (Predicting Antibody-Specific Epitopes) method is based on a machine-learning model and utilizes the sequence of the antibody, in the absence of structural information. It evaluates a pair score for all combinations of residues from the antibody CDR and residues from the surface-exposed region of antigen. The predictions are provided both at the residue level and as patches on the antigen structure using antibody–antigen contact preferences and other properties computed from the antibody sequence and antigen structure or sequence. The EpiPred  is an antibody-specific epitope prediction method that identifies the epitope region on the antigen combining conformational matching of the antibody–antigen structures and a specific antibody–antigen score. Patches on the antigen structure are ranked according to how likely they are to be the epitope. This method aims to generate epitope predictions specific for a given antibody to facilitate docking.
The most recent approaches, such as MabTope  and the method suggested by Jespersen et al. , are docking-based prediction methods of the epitope. The MabTope methodology integrates both a docking-based prediction method and experiment steps. MabTope involves three phases; in the first, docking the antibody on its target to generate possible conformations of the antigen–antibody complex (docking poses); secondly, ranking these docking poses with the design of the peptides predicted to be part of the epitope; and last, experimental validation procedures based on these peptides. The method suggested by Jespersen et al. combines geometric and physicochemical features correlated in paratope-epitope interactions with statistical and machine learning algorithms. This method can identify the cognate antigen target for a given antibody, besides the antibody target for a given antigen.
Several B-cell epitope databases were developed over the last decades, compiling validated information of the experimentally annotated B-cell epitopes. The Immune Epitope Database (IEDB)  is a multifaceted database that includes epitope sequence and structure, source antigen, the organism from which the epitope is derived, and details of the experiments describing recognition of the epitope. IEDB provides tools to predict linear B-cell epitopes based on sequence characteristics of the antigen [27, 28], and also to predict B-cell epitopes from protein structure, using methods based on solvent-accessible surfaces, such as DiscoTope [29, 30] and ElliPro . The database Epitome  compiles a collection of antibody–antigen complex structures, describes the residues (on antigen and antibody CDRs) involved in the interactions, and provides information concerning specific structural characteristics of the binding regions.
The epitope information from the B-cell epitope databases can evaluate existing epitope prediction methods and develop new and better algorithms for prediction. The identification or prediction of epitopes might be useful as an information for more sophisticated computational antibody design methods, such as antibody–antigen docking.
3.3 Antibody–antigen docking
The paratope and epitope prediction methods can offer useful information on antibody–antigen recognition by identifying a subset of residues involved in antigen–antibody interface formation. However, they do not provide information about the specific pairwise relations between the residues on the antibody and the antigen. This issue can be dealt with antibody–antigen docking, a specialized application of the broader field of molecular docking .
Molecular docking tools (Table 1) allow predicting the best binding interface of two interacting proteins. Different docking algorithms have been developed over the years to predict the 3D structure of biological complexes, and they typically involve two steps: sampling and scoring. In the sampling step, the conformational space surveys for thousands of possible complex conformations (‘decoys’); in the scoring, the decoys are ranked using scoring functions, which sort the decoys to identify or predict the models that are closer to the native conformation (lowest energy structure). The sampling strategy applied during the simulation is used to classify the docking methods. The global docking algorithms do not consider any previous information about the binding interfaces and perform an exhaustive search of the interaction space. The local or integrative docking approaches, on the other hand, use the available experimental data or predicted information about the binding interface to drive the sampling during the docking .
There are three types of docking: rigid-body docking, partial flexible docking, and flexible docking [88, 89]. Most protein–protein docking algorithms perform rigid-body docking, which means that both binding partners are kept inflexible, as rigid molecules, hindering the exploration of conformational degrees of freedom during the binding. These methods are based on the fast Fourier transform search algorithm  and usually are applied when the structures are complementary . Examples of used rigid-body docking software are ClusPro , ZDOCK , and PatchDock . ClusPro is an antibody specific docking, unlike ZDOCK and PatchDock. In partial flexible docking, the antibody remains rigid, while the antigen is flexible . One of the docking tools that applied this concept is AutoDock . AutoDockFR  also allows partial flexibility of the antibody. However, removing the conformational limitations can improve the binding site identification, since, in most situations, protein flexibility is a crucial factor to be considered . Therefore, flexible docking involves both interacting molecules as flexible structures. FLIPDock , Swarmdock , SnugDock , and HADDOCK [92, 93, 94] are examples of these approaches. SnugDock and HADDOCK allow some flexibility alongside chains and the backbone during a refinement stage. Snugdock is the first antibody specific docking to apply flexibility to the target antibody resulting in flexible binding interfaces, which can compensate for the errors caused by homology modeling .
The docking approaches depend on the 3D structures of the components. For antibodies, modeling methods can generate reasonably accurate structures [10, 95, 96]. Since these methods cannot compete with the reliability of crystallography-derived structures, the performances of docking methods are continuously evaluated by the Critical Assessment of Predicted Interactions (CAPRI) experiment [97, 98].
Although there are many successful cases in predicting the protein–protein complexes, docking of antibody–antigen complexes is still challenging [99, 100, 101] due to the inherent properties of their interfaces [102, 103]. As the improvement of predicting antibody–antigen interaction methods, we expect that the results of paratope prediction, epitope prediction, and antibody–antigen docking methods would offer a valuable, fast and economical alternative to obtain reliable information about which to base rational antibody design decisions (Figure 1).
affinity maturation In silico
Recent advances in computational prediction of the 3D structure of an antibody–antigen complex stimulated the development of
The availability of crystal structures of antibody–antigen complexes is an essential factor in achieving computational antibody affinity maturation. However, when the crystal structures of the complexes are not available, as seen above, many modeling software can predict the 3D structure of the antibody–antigen complex . When we use molecular docking for this purpose, it is possible to identify residues involved in intermolecular interactions and select candidate residues that can be mutated to improve antibody affinity [99, 102, 103, 104, 105].
The prediction of binding affinities usually utilizes energy functions, such as physics-based force fields or knowledge-based statistical potentials derived from the structural database, to estimate changes in the free energy of an antibody–antigen complex with a focus on getting the global minimum energy conformation . Some algorithms and methods identify the lowest energy function of two-body interactions through changes made in the amino acid sequence or the rotameric state of an amino acid [107, 108]. Computational tools, such as molecular dynamics, simulate the dynamic behavior of antibody structures, and provide alternative candidates that can be evaluated by further experimental assessments [89, 109]. Also, some tools can identify hotspot residues on protein interfaces, for which mutation to alanine strongly attenuates binding, and calculate the values for the change in the binding energy of the protein complex upon mutation [110, 111, 112]. These platforms are useful to study the effect of a particular amino acid on the binding affinity of an antibody–antigen complex.
Computational affinity maturation usually focuses on residues in the CDRs. However, as we learned in previous sections of this chapter, some residues in the framework can also play a role in the binding affinity and maintain the canonical conformations of antibodies. Although some mutations in noninteracting regions resulted in improved binding affinity [113, 114], the strategies to modify the CDR to increase antibody affinity are highlighted. Some examples of
These techniques still present deviations from the experimental data; however, they demonstrate that in some scenarios, computational approaches alone can be used for affinity maturation, decreasing the timeframe and costs of antibody engineering.
4. Analyses of mAbs’ properties (solubility, stability, aggregation, chemical degradation, glycosylation)
Among the numerous
Although aggregation is different from solubility, the solubility of a molecule is usually calculated for aggregation prediction. In computational chemistry, aggregation and solubility are commonly treated as the same parameter. The aggregation tendency of some mAbs that could impair their efficacy might be prevented through aggregation-prone regions (APRs) analyses. APR assays rely on the hydrophobicity scales and residues’ charge annotations. Among several predictors of solubility and APRs for proteins, it is possible to highlight two endeavors successfully applied to antibodies: Wang et al.  combined tools to predict APRs in commercial mAbs. They found similar aggregation-prone motifs among commercial and non-commercial antibodies, without correlation with 3D structures.
In 2011, Agrawal et al.  compared several aggregation prediction tools demonstrating their usefulness in drug discovery and development, especially when screening a large number of molecules by fast and low cost
Regarding the chemical stability of antibodies, it is possible to mention the degradation by chemical modification of amino acids, such as asparagine (Asn) deamidation, aspartate (Asp) isomerization, methionine (Met) oxidation, and lysine (Lys) glycation [124, 125]. The IgGs are commonly N-glycosylated at Asp297 residue in each Fc-CH2 domain . These Fc N-glycan are associated with correct folding, stability, aggregation, immunogenicity, and serum half-life of the mAbs. The conformational changes at the CH2 antibody portion by multiple hydrophobic and polar non-covalent interactions harnesses the Fc binding to preferences of binding to C1q and FcɣRs . There are no specific mAb glycosylation’s webtools. Still, some web platforms (Table 1) designed to predict glycosylation sites on human protein sequences could also be useful for mAbs. The IgGs have a conservative N-glycan site; consequently, it needs attention in the engineering process that could accidentally create or remove a glycosylation site and interfering in the mAb chemical stability. In other instances, the glycosylation site is intentionally removed.
To evaluate any possible glycosylation spots, the NetNGlyc 1.0  predicts N-glycosylation sites in human proteins using a trained neural network to distinguish between the acceptor and non-acceptor residue sequences. The N-GlyDE is a two-stage N-glycan prediction tool trained by the human proteome datasets. An algorithm generates a score between N-glycosylation proteins and non-N-linked glycoproteins in the first step. In the second stage, the prediction uses a support vector machine to evaluate if each asparagine-Xaa-serine/threonine (being Xaa different to proline) sequence can be glycosylated . Further, the GlycoSiteAlign  is a tool that aligns amino acid sequences regarding its glycosylation site using the GlyConnect databank. This tool can be useful to compare a high number of mAbs sequences derived from different clones or expression conditions.
In a linear amino acid sequence of an antibody, it is possible to find numerous regions prone to modification. However, one must note that many of these regions may be buried due to the molecule conformation. Therefore, a conformational study is essential to highlight the residues liable to the chemical change. Chemical stability is generally based on statistical analysis derived from experiments or databases available in the literature, although some computational methods are being used [124, 127, 128, 129, 130, 131, 132, 133]. Statistics-based methods depend on data from previous experiments and provide valuable information about the behavior of proteins, being excellent guides during the development of new antibodies.
Currently, there are tools to predict the most varied protein characteristics. Many of them are free for academic purposes (Table 1). A difficulty still faced during the development of an antibody lies in the complexity of details and how one parameter influences another. For example, modifications to improve binding affinity may interfere with the stability of the molecule or even generate/remove a glycosylation spot. In the same way, a structural change for stability can impair binding affinity. There has been an immeasurable evolution of
Advances in bioinformatics allow us to outline different strategies in the discovery of new therapeutic antibodies. There has been significant progress in online tools in recent years, and probably the refinement of the techniques will be increased, bringing more accurate and reliable results.
Online platforms can present a long wait and execution times. The use of those platforms requires a good internet connection, and also a robust computer for analysis and treatment of the generated data.
Bioinformatics is a notably promising field, and indeed, has a prominent place on the innovation frontier.
FAPESP (2015/15611-0, 2016/08782-6, 2019/10724-2), CNPq (307636/2016-0). ORCID Tania M. Manieri (0000-0003-1152-7425), ORCID Carolina G. Magalhães (0000-0001-7099-060X), ORCID Daniela Y. Takata (0000-0001-6369-1775), ORCID João V. Batalha-Carvalho (0000-0002-1526-6915), ORCID Ana M. Moro (0000-0002-0650-7764).