Recombinant human antibody technology has been the cornerstone of the uprising of biologics in the pharmaceutical industry. The introduction of various display technologies like phage, yeast, bacterial, ribosomal, mRNA, DNA display and mammalian cell surface display has allowed improved antibody generation programs. The ability to generate recombinant antibodies from available human antibody libraries by using in vitro display methods pave the way to select recombinant human antibodies against almost every antigen. The libraries are a close representation of the B-cell response elicited by the natural immune system. The introduction of various methods to fine tune the antibody affinities has made recombinant antibody technology highly sought after. The ability to engineer specific characteristics of each antibody by design is possible utilizing advanced in vitro strategies. This chapter will focus on the technologies commonly applied in antibody display technologies to engineer improved affinities.
- naïve- immune- and synthetic- antibody library
- in vitro display systems
- affinity maturation
- de novo antibody gene synthesis
- genome editing techniques
The introduction of recombinant antibody technology has revolutionized and improved the way antibodies are being generated for various applications in research, diagnosis and therapy [1, 2, 3, 4]. Antibodies have been the cornerstone for many biomedical advances in the past due to its high specificity and affinity to capture target antigens. The key characteristic of antibodies that makes it highly sought after is the defined specificity of the complementarity determining regions (CDR) of the variable domains against a specific target . This specificity is programmed in vivo by a series of different molecular mechanisms such as V-D-J recombination of the heavy chain, V-J recombination of the light chain and somatic hypermutation [6, 7]. After primary immune response the VHDHJH and VLJL exons are randomly mutated, mainly in the CDRs, by somatic hypermutation leading to high affinity antibodies (see article of Oliver Backhaus in this book). These molecular processes have a profound effect on the way the genotype is delineated as the gene rearrangements will bring about multiple gene segment combinations. Additional mutagenesis is elicited through incorporation of additional nucleotides between the junctions of the V, D and J gene segment of the heavy chain and V and J gene segment of the light chain. These variations at the genotypic level have a direct influence on the phenotypic variation seen in terms of target specificity and affinity of the generated antibody [8, 9]. Figure 1 shows the correlation between the genotypic variations and the phenotypic nature of the generated antibodies.
The introduction of recombinant DNA technology and display technologies has allowed recombinant antibodies to be generated at a rapid pace. This is evident with the increase of recombinant antibodies going into clinical trials in the last 3 years [10, 11]. The general concept of display oriented techniques for antibody generation relies upon the ability to harness the natural or synthetic diversity of an antibody library . As with most recombinant DNA approaches, the ability to customize or modify the genotype either at single base or amino acid level was now possible . This opened many new avenues in the field of recombinant antibody technology to allow modification and customization of characteristics of the phenotype. The advent of display technologies allowed for selective isolation of specific phenotypes with their respective genotypic information to be retrieved together . This means that it was now possible to replicate the in vivo antibody generation and maturation process in vitro . The impact of antibody display technologies combined with affinity maturation strategies in the isolation and identification of high affinity antibodies is monumental in the way antibodies are made today.
1.1. Display technologies
The first display technology that was applied for the generation of recombinant antibodies was phage display. Although initially the technology was designed to display polypeptides, the robust nature of the method meant that larger proteins could also be displayed by bacteriophages [15, 16]. This allowed the introduction of antibody fragments to be presented on the surface of the phage particles for selection. Phage display takes advantage of the natural replication cycle of bacteriophages to fuse the antibody gene with the gene of a phage coat protein. This design allows the co-expression and translocation of the antibody fused coat protein during the phage packaging process to display the antibody proteins on the surface of mature phage particles. More importantly, this allowed for a physical linkage to be established between the genotype and phenotype .
Since the introduction of phage display, other display systems have been developed. This includes systems like yeast display, bacterial cell surface display, ribosomal display, mRNA display, DNA display and mammalian cell surface display . Figure 1 shows the alternative display systems used for antibody presentation. Yeast display has an additional feature compared to phage display when dealing with mammalian proteins. This is due to the application of the eukaryotic machinery to assimilate the display mechanism. In this system, the antibody gene is fused to the Aga2p agglutinin subunit found on the surface of yeast cells [18, 19]. In a similar fashion, bacterial cell display functions by displaying antibodies on the surface of Gram-negative or Gram-positive cells as a fusion to the flagella or outer membrane proteins [12, 16].
Ribosome display is a cell-free display approach where polysomes are stalled on mRNA templates and nascent antibody protein remains in complex with the ribosomes. The stalling of the ribosomes is done with the removal of a stop codon and a C-terminal peptide spacer is required to ensure proper folding of the protein . This is critical as steric hindrance caused by the ribosomal tunnel can alter the folding of the protein leading to lower display efficiency . A somewhat related method to ribosome display is mRNA display. In mRNA display systems, the interaction between the template and protein is covalently linked via puromycin. Puromycin functions to mimic the role of amino-acyl tRNA by attaching itself to a DNA primer affixed to the mRNA template. This allows puromycin to attach itself covalently to the nascent antibody protein based on the peptidyl transferase activity of the ribosome [12, 22].
Mammalian display approaches utilizes mammalian host cells like HEK293T and Chinese hamster ovary (CHO) cells to present a library of antibodies on the cell surface for selection . The approach adapts a similar concept to that of yeast cell display . It capitalizes on the transient expression of antibodies in which antibody encoding DNA introduced into the cells persist over days to consistently express antibodies. Transient expression systems are commonly used for single round selections from immune repertoires. A stable expression system with the integration of DNA to the host genome is inefficient because of multi-gene incorporation per cell making libraries difficult to resolve . A suitable method to allow multi-round selection is by stable episomal vectors derived from viruses. Virus based vectors are used to infect mammalian cells to display the antibodies for selection making it suitable for large sized libraries . A major advantage of mammalian cell systems is the ability to screen using full length IgG . DNA display applied for the screening of peptides/proteins was originally based on streptavidin-fused peptides/proteins linked with their encoding DNAs via biotin in emulsion compartments . However this method was not often used for recombinant antibody selection except some promising results with Fabs, and recently with bispecific diabody fragments have been reported [28, 29].
Other alternative DNA display systems are cis-activity based (CIS) and covalent display technology (CDT) display systems. CIS display uses the ability of the bacterial replication initiator protein, RepA to carry out a cis-activity. This means that RepA has the ability to bind the encoding DNA that was utilized. This activity is largely dependent on the presence of two non-coding regions 3′ to bind to the repA sequence. The actual mechanism is unknown but is believed to involve stalling of RNA polymerase during transcription at the CIS element allowing the nascent repA protein to non-covalently attach to its binding site of the template . The covalent display technology (CDT) exploits the properties of the replication initiator protein from E. coli bacteriophage P2 . A pool of DNA encoding antibody molecules is generated as a fusion to the P2A coding sequence . The DNA pool is then transcribed and translated using cell free expression systems. The cis-activity of the P2A protein allows the DNA molecule to covalently tag with its own gene product [16, 17]. Ultimately, the recurring trend of all display systems is the ability of the system to allow the translation of antibody genes to produce a collection of antibody molecules that is physically fused to the encoding genes for selection.
Taking advantage of the different display systems, many different forms of libraries can be represented for antibody generation. There is no actual discrimination as to which method is best for antibody generation. All the display approaches highlighted are useful in different circumstances and has its own brand of unique features that makes some more suitable for a particular set of antigens. Ultimately, all the display systems are capable of isolating and identifying recombinant human monoclonal antibodies using a library of antibody genes . The variation of the antibody sequences in the antibody gene repertoire (the diversity) will have a significant impact on the quality of antibodies generated. The antibody repertoire being presented on the various display platforms is in essence the basic antibody response divulged by the immune response system . The multi-level process of antibody gene generation and maturation of the V-D-J gene segments will finally dictate the antibody characteristic being inherited to the display systems for recombinant antibody generation. This is evident as antibody V-D-J gene segments function as the basic building blocks of antibodies influencing the characteristics of the antibodies of an antibody gene repertoire . Therefore, an understanding of the processes involved in antibody gene repertoire generation is vital to design engineering strategies for antibodies with improved affinities.
1.2. Generation of human antibody repertoires
The human antibody repertoire represents a diverse collection of immunoglobulin gene segments that encodes for heavy (VH) and light chain (VL) domains , forming an unique set of antigen-binding sites [35, 36]. The heavy chain (HC) locus is located at chromosome 14, comprises of VH, D, JH and CH gene segments. The kappa light chain locus is found in chromosome 2 with the VK, JK and CK gene segments. The lambda LC locus with the Vλ, Jλ, and Cλ gene segments are found on chromosome 22 .
The generation of a natural antibody repertoire is attributed to several natural mechanisms such as somatic recombination that is rearrangements of gene segments to form a single unique antibody gene sequence . The V(D)J recombination process that takes place during B-cell development allows for combinatorial rearrangements of V (variable), D (diversity), and J (joining) gene segments of the heavy chain resulting in the formation of numerous possibilities [35, 39, 40], see also Backhaus O. this book. A similar process (the VL-JL rearrangement of the light chain) occurs at the light chain locus , see also Backhaus O. this book. This process is regulated by lymphocyte-specific RAG1 and RAG2 endonucleases that cleaves DNA at the recombination signal sequences (RSSs) resulting in blunt signal ends and hairpin coding ends. The ends are later joined by classical non-homologous end-joining (cNHEJ) pathway to ensure genomic stability [40, 41]. The outcome of recombination is an ordered fashion of V-D-J and VL-JL gene assembly that encodes the antibody binding site (variable region). Antibody diversity is further enhanced by junctional diversification, characterized by variability at the junctions due to insertions or/and deletions of few nucleotides during fusion of segments [38, 40].
An individual is expected to have at least 108 of antibody-producing B-cell clones that are responsive to unique antigens . This natural repertoire is known as the naïve or primary repertoire, expresses cell surface IgM and has not undergone specialization by antigen encounter . The antigen-binding site of an antibody consists of the surrounding framework regions and the complementarity determining regions (CDRs), CDR1, CDR2 and CDR3. CDR3 region are particularly important for antibody-antigen specificity . The V(D)J and V(J) rearrangement of the antibody gene segments and somatic mutations will give rise to higher binding diversities to various antigens [35, 36].
Upon encountering new antigens, naïve B-cells are stimulated and become activated B-cells, undergo proliferation and differentiation. B-cell proliferation is also known as clonal expansion, in which an antibody B-cell clone specific to an antigen is selected and produced in large scale. This process takes place in secondary lymphoid organs such as lymph nodes and spleen, also referred to as germinal centers. The differentiation process generally involves somatic hypermutation (SHM) and class switch recombination (CSR). Somatic hypermutation introduced extensive point mutations in the variable (V) region gene, such as single base substitutions, insertions and deletions. Consequently, the V region exon is further diversified resulting in altered affinities against the target antigen [39, 41]. Class switch recombination replaces the constant region (CH) gene of the HC resulting in class switching from IgM to IgE, IgA and IgG. The type of isotype used determines the methods for elimination of captured antigen by immunoglobulin or the location for antibody accumulation [37, 44, 45]. The combination of both mechanisms offers an improved diversity to the antibodies  and enables the selection for high affinity antibody-producing cells against a particular antigen. This process of improved affinity is known as affinity maturation of antibodies.
An essential element that mediates both SHM and CSR is the activation-induced cytidine deaminase . AID is a protein exclusively expressed in activated B-cells in germinal centers but the exact function and mechanism of AID in SHM and CSR are not fully understood. However, several studies have been reported and shown that AID is capable of editing RNA and DNA deamination. AID deaminates cytidine residues to uracil residues on single-stranded DNA (ssDNA) at preferred “hotspots,” described as DGYW motif. Such motif favors mutation and is ubiquitous throughout the genome. The maintenance of genome fidelity attempts to correct the “deamination error” by base excision repair and mismatch repair pathways, thereby producing mutations and double-stranded breaks [41, 45, 48].
The natural diversification processes has allowed for highly diverse antibody repertoires to be generated. This natural phenomenon is the basis of the unique ability of the immune system to counter any foreign infection. The ability to replicate or represent the in vivo repertoire in the laboratory is the basis of recombinant antibody technology. The need to reproduce this feature is achieved for example by the construction of antibody phage libraries. The robust nature of combinatorial technologies has enabled easy selection of monoclonal antibodies from highly diverse naïve, immune and synthetic repertoires by coupling it with biopanning processes .
Phage display enables the sorting and handling of large antibody libraries. Antibody phage libraries consist of a random collection of antibody variable genes being presented as a fusion to phage coat proteins. The antibody fragments can be expressed as a fusion protein on the surface of phages, without affecting the infectivity of phages . Moreover, the displayed antibody molecules retains its antigen-antibody binding capabilities . However, the challenge in generating high affinity antibodies is closely related to the quality of the library generated. Even so, the advancement of recombinant DNA technologies has allowed for downstream affinity maturation processes to be carried out for the improvement of antibody affinities post-selection [52, 53].
2. Antibody libraries
An antibody library is basically a physical collection of various antibody genes being represented in a single pool. Antibody molecules are divided into two sets of binding domains, the variable domain of the heavy chain (HC) and light chain  that either preferentially or concomitantly contributes to the binding affinity of the antibody to the target antigen . Therefore, in order to replicate the diverse repertoire of antibodies afforded by the immune system, a random combinatorial mix of both the HC and LC repertoire is required. The source of the antibody repertoire has a profound influence on the type of antibody libraries being constructed as for example if you amplify the variable antibody genes from immune patients the immune response of different individuals in different health and disease states will have a definite impact on the diversity of the generated antibody repertoire. The diversity of naïve antibody repertoires will be reflected by random variations in the genetic information of the clones generated in the library . This brings to light the different classification of antibody libraries that are essentially defined by the origin of the antibody repertoire. There are generally three different classes of antibody libraries namely the naïve, immune and synthetic antibody libraries applied for antibody display .
2.1. Naïve antibody libraries
The natural collection of immunoglobulins for antibody library generation is obtained from circulating B-cells in primary and secondary lymphoid tissues and blood. Naïve libraries are constructed from IgM mRNA of B-cells from healthy donors, non-immunized donors, isolated from peripheral blood lymphocytes, spleen, tonsils, and bone marrow. In some cases, the repertoire could also be retrieved from animal sources resulting in antibodies of different origins . The diversity offered by a naïve repertoire is undeniably vast, whereby the antibody fragments are PCR amplified randomly from the antibody cDNA of non-antigen stimulated B-cells as well as those B-cells that have been resided in the immune system due to earlier infections [58, 59]. A single naïve library (also known as single pot library, generated from several donors) can be used to generate antibodies against all types of antigens, peptides, toxins, as well as self-antigens (typically important in the area of cancer and autoimmune disease therapeutics). Some of the antibodies are generated against red cell antigens, haptens, tumor necrosis factor (TNF) . The clonal diversity exhibited by B-cells enables the generation of a range of antibodies against a wide variety of antigens. The characteristics of a naïve repertoire mainly result in modest affinity and polyreactivity antibodies. Due to the polyreactive nature of a naïve library, it is important to generate a larger library to increase success rates for obtaining high affinity antibodies against multiple antigens by successive rounds of selection. The main advantage of a naïve library is the ability to screen for antibodies against any antigens. This comes with a huge drawback, in which the antibodies are of lower affinities than from immunized clones [56, 60]. However, this issue can be solved and improved by affinity maturation in vitro to yield high affinity antibody against a specific antigen. Other shortcomings that can affect the library quality are inconsistent levels of variable antibody gene expression and the limitation of IgM to exhibit diversity, as well as increase chances of cross-reactivity . To improve library quality, one of the method is to randomize the CDR regions of the variable genes while maintaining the original frameworks of the naïve library, this results in further diversification and modifications, becoming a semi-synthetic library .
2.2. Immune antibody libraries
The source of antibody genes for immune library generation is mainly focused on using IgG mRNA from disease-infected individuals or cancer patients. This may include patients with acute infections, recovery stage or patients which have recovered from a particular disease or infection . In addition to that, cancer derived material can also be used as a source . The unique characteristic of an immune repertoire is that the sample material is obtained from activated B-cells, where affinity maturation has taken place during antigen encounter . Thus, it is easier to obtain high affinity binders specific to an antigen from immune libraries in comparison to naïve libraries due to the biased nature of the repertoire post-exposure of the antigen. The size of an immune library need not be as large as naïve libraries per se, but it can also be applied for other targets but may not be suitable for self-antigens [56, 62]. The obvious limitation of an immune library is the possibility of generating immune libraries of human donors against various targets. Therefore, the application of immune libraries from humans is mainly confined to disease-infected individuals  or cancer patients . The biased nature of the library repertoire also means that the library is mainly useful against the antigen used for immunization. Therefore new libraries are required when dealing with targets of different diseases . However, it is also possible for immune libraries to successfully enrich antibodies against non-related targets of the disease of origin. This indirectly indicates the influence of B-cell memory during immune responses that provides an extended breath of protection for individuals.
2.3. Synthetic antibody libraries
The main difference between naïve and immune libraries with synthetic libraries is the source of the repertoire used to build the library. While both naïve and immune libraries are amplified from a natural source, synthetic libraries are designed in silico and the repertoire is generated in controlled conditions [49, 56]. The artificial repertoire is generated from the diversity afforded by the randomization of the CDR using synthetic approaches . The basic design of most synthetic libraries is the randomization of various CDRs using degenerated oligonucleotides. The freedom afforded by synthetic libraries is the possibility of pre-defined designs of the framework and the degree of randomization of the CDRs. The design pattern is generated based on bioinformatics analysis using existing experimental data on antibody epitopes, antigen-antibody interactions, affinity maturation designs, variable gene segments recombination, and structural predictions on variable regions to yield desirable synthetic repertoires. These studies provide insights to the hypervariable regions on amino acid predominance and variabilities . The hypervariable regions (CDR loops) have been shown to exhibit some amino acid biases. In particular, certain residues (G, P, S, N, H, L, and Y) are predominantly found on CDR loops that are associated with improved antigen binding . Designing CDR sequences that mimic the natural diversity can help circumvent selection of low affinity binders. The two major synthetic antibody libraries available, HuCAL® and n-CoDeR® are based on two separate platforms. Their models will be discussed further as case studies.
2.3.1. Case study of synthetic antibody libraries: HuCAL®
A novel concept of synthetic human library construction, named Human Combinatorial Antibody Library (HuCAL) uses more than one framework sequence to construct the library. The HuCAL construction is based on modular consensus frameworks, consisting of seven VH and seven VL consensus sequences to represent the major germline families, yielding 49 possible combinations of master genes . The master genes are designed such that different frameworks promote different structural diversity of human antibodies while unfavorable residues that cause protein aggregation are removed. Furthermore, HuCAL is characterized by having unique restriction sites flanking all CDRs of the antibodies as well as usage of phage display and unique expression vectors. This allows for a seamless conversion to different antibody formats, for instance scFv and Fab [66, 67].
In HuCAL, the CDR3 regions are designed to exhibit natural amino acid composition and distribution as well as length variation at each position for each framework. The CDR is synthesized using trinucleotide mixtures (TRIM technology), which offers the elimination of stop codons and redundant amino acid residues in order to optimize CDR design for downstream production of encoded antibodies. TRIM technology uses trinucleotide phosphoramidites to add three bases at a time to a growing single strand of synthetic DNA . The addition of three bases allows for the design and pre-determination of specific codons to be added. In addition to codon optimization for E. coli, improved accuracy of antibody design would then be possible. This ultimately improves the functional library size of HuCAL as well as the diversity by having higher number of clones with correct assembly, devoid of frameshifts, stop codons and deletions .
There have been different versions of the HuCAL library being constructed over the years, each with different characteristics. The initial HuCAL focuses on the scFv library construction using 49 master genes, resulting in high expression levels of HuCAL-scFv antibodies (2 × 109 clones) and nanomolar range of affinities to several antigens tested, such as haptens, DNA, peptides, and proteins . HuCAL GOLD® is a synthetic Fab library, generated by diversifying six CDRs that mimics the natural diversity. Affinities of antibodies generated from this library are able to achieve picomolar range when tested on different target molecules . The latest optimized version, HuCAL PLATINUM® has a more advantageous design focusing on the optimization of CDR3 sequences in the modular sequence in order to yield antibodies with improved folding and enhanced binding . The optimization includes avoiding N-glycosylation sites and unproductive sequences to maximize the sequence space and availability. In addition, the library is improved to enhance antibody expression in both bacterial and mammalian expression systems. Sequence optimization on nucleotides has been extensively carried out during library construction, therefore Fab fragments and IgG formats can be expressed optimally in both bacterial and mammalian systems, respectively. The resulting library offers higher diversity than the HuCAL GOLD® library [64, 70].
2.3.2. Case study of synthetic antibody libraries: n-CoDeR®
The principle of the n-CoDeR® library is based on the recombination of a single framework with multiple CDRs from non-immunized donors to generate functional diversity . This approach allows the retrieval of CDR loops from immunoglobulin genes from different germline origins. All CDR loops are successfully recombined into one single VH-VL scaffold, while maintaining reactivity and functionality of the antibody fragments . The underlying concept of constructing the n-CoDeR® library is the amplification of desired CDR loops from immunoglobulin cDNA with overlap extension and assembly being performed to place the CDRs into the single framework . The use of CDR loops originating from the human immune system is said to be remarkable as the sequences obtained have undergone in vivo processing, thus such sequences are said to have undergone proof-reading and the functionality has been confirmed . The resulting genetic diversity of this library is remarkably enormous (2 × 109), and has the potential to yield diversities equaling the human immune system .
This library appears to be a suitable candidate for therapeutic and diagnostic applications as it can generate functional antibody fragments against many types of antigens. Initially the approach of using a single framework to present various types of CDR loops was seemed risky due to the limitation in capacity. It was later proved to be successful with the isolation of antibodies specific to various types of antigens reaching affinities in the sub-nanomolar range. Another benefit afforded by this approach was the ability to select a single framework that can customize desirable characteristics and properties, as well as ensuring that antibodies can be generated which can be produced and folded in good condition . Antibodies harnessed from the n-CoDeR® library are potentially advantageous for therapeutic purpose as they demonstrated a lower number of T-cell epitopes than normal antibodies. It indicates that self-reactivity is therefore circumvented and immunogenicity issues are reduced .
3. Affinity maturation strategies for recombinant antibodies
Recombinant antibodies obtained via combinatorial library technology from naïve or synthetic libraries have the advantage of increased diversity as a result of the large repertoire of the antibody genes. Antibodies isolated from combinatorial libraries against their respective targets sometimes may not exhibit the desired specificity and affinity. The increased affinity of an antibody is important to enhance its pharmacokinetics, efficacy and safety profile by enhancing the binding strength and function of an antibody . Such optimizations can be achieved either by in vitro or in vivo affinity maturation strategies.
3.1. In vitro approaches toward affinity maturation of antibodies
There are several strategies that have been used to perform in vitro affinity maturation to improve recombinant antibody molecules. Mutagenesis is widely employed to introduce mutations into antibody sequences. Sequences of antibody are diversified by random mutations via methods such as error-prone PCR or through site-directed mutations, where mutations are assigned to specific positions in CDRs or framework regions as well as mutational hot spots by using PCR and degenerate primers . In addition, de novo synthesis of DNA offers the most straightforward modification procedure to further diversify the antibody sequences as a whole.
Random mutagenesis is a non-systematic mutagenesis method that can be performed in the absence of information regarding the importance of structures and residues that contribute to antigen-antibody binding as well as affinity maturation of antibody . The method introduces point mutations into antibody genes in a random fashion. The mechanisms involves: (1) transitions, where a purine or pyrimidine is substituted by another purine or pyrimidine, (2) transversions, where a purine is substituted by a pyrimidine, or vice versa, (3) deletions of one or more nucleotides from the gene sequence, (4) insertions of one or more nucleotides into a gene sequence, (5) inversions where double-stranded DNA segments of two base pairs or longer is rotated at 180° .
Error-prone PCR is a universal method used for the introduction of random mutations by capitalizing on the natural error rate of a low fidelity DNA polymerase, for example Taq polymerase that lacks 3′ to 5′ proofreading activity. Several parameters during PCR amplification govern the error rate of DNA polymerases in order to create ideal mismatches in the amplified product. The manipulation of the enzyme’s fidelity can be performed by varying several parameters like: (1) concentration of Taq DNA polymerase, (2) concentration of divalent cations (Mn2+ and Mg2+), (3) concentration of deoxyribonucleoside triphosphates (dNTPs), (4) polymerase extension time and the (5) number of PCR cycles [79, 80]. Upon amplification, the product must be ligated to a suitable plasmid and an additional step is required to recover the transformants that consist of the mutations. Error-prone PCR is a robust technique, whereby it can only introduce limited amount of base substitutions into the gene sequence. Therefore it is very useful to identify amino acid positions that are associated with function, affinity and specificity of antibodies for the method to be applied on . The resulting libraries consist of a large amount of A to G and T to C transitions, thus causing high GC content amplification bias. This limitation can be circumvented by the addition of unbalanced ratios of nucleotides to reduce the amplification bias. A commercial DNA polymerase, Mutazyme® was introduced for error-prone PCR with reduced mutational bias which overcomes the issue of preferential nucleotide base selection by Taq DNA polymerase during amplification . Error-prone PCR has been performed across the entire coding region to promote enhanced binders by the introduction of additional interacting residues between ligands and targets, altering the three dimensional structure of the target contact regions or promoting the thermal stability of ligands . This method is suitable for use in ribosome, mRNA, and DNA displays whereby PCR amplification step is required after each round of selection. Additional mutations can be introduced to potential binders during this stage and can be characterized in the following round of selection. This approach was successfully used in combination with DNA shuffling for the selection and affinity maturation of an anti-fluorescein scFv which achieve an affinity of 100 fM from a 107 yeast display library . Another variant of error-prone PCR applies isothermal rolling circle amplification for gene diversification. It amplifies a circular DNA template by rolling circle mechanism, generating single-stranded DNA comprising of multiple tandem repeats . To generate a randomly mutated sequence library, a wild-type sequence can be introduced into a plasmid followed by isothermal rolling circle amplification under error-prone conditions [78, 83].
Recombination provides another approach for gene modification and diversification. Mutational rearrangements are highly advantageous to identify and obtain beneficial mutational combinations otherwise absent in nature. Chain shuffling is a process that serves as a “mix and match” system to increase gene repertoires. Chain shuffling allows for one of the two antibody chains (heavy or light chains) to be paired with a repertoire of partner chains to generate a secondary library in order to produce higher affinity antibodies. The domain shuffling is a useful affinity maturation tool for antibodies as it mimics the in vivo SHM process . While DNA shuffling generates chimeric libraries through random fragmentation of a pool of similar genes, reassembly of the fragments will result in template switching giving rise to sequence diversity. Application of DNA shuffling to different antibody genes leads to exchange of CDRs and frameworks. Affinity maturation can be achieved by using a single variable heavy chain domain or light chain domain from a known binder and mixing it with an array of diverse different heavy chain or light chain domains for an improved affinity.
Site-directed mutagenesis involves in vitro gene modifications that are targeted at a specific genetic locus or a segment of DNA sequence to study the sequence-structure-function of a gene candidate . However, site-saturating mutagenesis will substitute specific sites against all possible amino acid residues. Hence, the importance of a specific amino acid residue towards the function of an antibody can be elucidated through this focused mutagenesis method [78, 86]. This can be applied for stability engineering of antibodies by determining the influence of different amino acids at strategic positions along the antibody structure. Site-directed mutagenesis can be performed through several different approaches. The availability of restriction nucleases and DNA ligases allows easy incorporation of mutagenic sequences into templates to construct recombinant DNAs . The rapid development of oligonucleotide synthesis has also contributed to oligonucleotide-mediated mutagenesis method. Such approach is designed to consist of internal mismatches that complement the template DNA for directing point mutations or multiple mutations. For instance, a mutagenic primer anneals to a single-stranded DNA template, followed by extension with Klenow fragment of DNA polymerase I and is ligated with T4 DNA ligase. The resulting combination of mutant and wild-type DNA is produced when the heteroduplex DNA is transfected into competent E. coli . Kunkel mutagenesis uses a circular, single-stranded DNA (ssDNA) template that incorporates uracil as template. The ssDNA is then annealed to the mutagenic primers to generate double-stranded DNA (dsDNA) that consists of the mutation. The dsDNA is then transfected into E. coli where the bacterial repair mechanisms will remove the parent strand (uracil-strand) while the recombinant clones predominate and propagate [88, 89].
Mutagenesis on a single-stranded DNA template (ssDNA) is labor intensive because the template requires subcloning and ssDNA rescue. Therefore, several commercial kits are available that utilizes double-stranded DNA (dsDNA) as template for site-directed mutagenesis with mutagenic primers. The QuickChange™ system (Stratagene) uses a pair of complementary oligonucleotides (forward and reverse) that consists of the desired mutations to amplify the whole plasmid with high fidelity polymerase, followed by removal of the parental DNA using Dpn 1 endonuclease. The GeneTailor™ system (Invitrogen) is somewhat similar to QuickChange™, however it requires DNA methylase to methylate the DNA template prior to amplification. The GeneEditor™ system requires multiple transformations and ampicillin resistance cloning vectors for selecting mutants that has undergone mutagenesis [88, 90]. A major convenience for the GeneTailor™ and QuickChange™ system is the ability to carry out the mutagenesis without requiring additional vectors, host strains, or restriction sites.
Another variant of site-directed mutagenesis is a PCR-driven method termed as overlap extension PCR. This technique employs PCR to generate modified genes from cloned DNA with just a few simple steps. The segments of a target gene are amplified from a DNA template using two flanking master primers and two internal primers. The internal primers consist of the desired mutation and overlapping nucleotide sequences. Two rounds of PCR are carried out, first by amplifying the target genes with their respective pair of primers to create two gene fragments that share some overlapping sequences at the 3′ end. Subsequently, these double-stranded duplexes are denatured and annealed, resulting in two heteroduplexes with each strand consisting of the mutated site. Then DNA polymerase functions to extend the overlapping ends of each heteroduplexes. A second PCR is done with the use of two flanking master primers to amplify the entire modified gene [90, 91]. This method was recently employed by Kitzman et al.  to create massive single amino acid mutagenesis in a parallel fashion coupled with microarray-based DNA synthesis technology. This is particularly useful for assessing and screening of variants in libraries.
The increased understanding of molecular biology and specific functions of molecular biology enzymes has allowed the introduction of different approaches for mutagenesis. The combination of the different function of various enzymes has been utilized successfully to carry out directed evolution of antibody genes. Lambda exonuclease in nature functions to assist the repair of dsDNA breaks of viral DNA. It is a highly processive 5′→3′ dsDNA exonuclease which selectively degrades the phosphorylated chain of a duplex DNA to yield mononucleotides and ssDNA. A strategy that takes advantage of this feature of lambda exonuclease was applied for antibody gene mutagenesis. The formed ssDNA template will function as the template for in vitro antibody gene recombination. The ssDNA template is then hybridized with degenerate oligonucleotides and treated with Klenow Fragment to generate dsDNA templates of the hybridized products. This will result in a final dsDNA template that has incorporated the diversification introduced by the degenerate oligos at a specific site of the antibody gene. The method was successfully applied to carry out chain shuffling and mutagenesis of antibody clones . However, a major bottleneck with these methods is their inability to allow directed mutagenesis with codon specificity.
The diversification of the antibody repertoire can also be realized by in vitro somatic hypermutation using the AID enzyme . The AID enzyme is classified in the APOBEC family of cytidine deaminases that is able to catalyze the deamination of cytidine residues to uridine residues in vitro only on ssDNA, giving rise to thymine residues at the end of the replication events .
Typically, the cytidines are targeted at the mutational hotspot motif RGYW and AGY (R = A/G, Y = C/T, W = A/T). This motif is also the preferred region for mutations during in vivo somatic hypermutation [96, 97]. Reports revealed that in vitro cytidine deamination occurs in an orientation-dependent fashion, relying on the transcription to gain access to both template and non-template strands of DNA. Nevertheless, it is capable of introducing mutations into DNA and therefore applicable for gene diversification [98, 99]. AID-mediated mutagenesis serves as a useful method to enhance antibody affinities through sequence diversity by introducing point mutations, such as single amino acid substitutions or indels (insertions, deletions) specifically on the antibody CDR sequences. Nucleotide transversions and duplications are among the most complicated to design into a library but possible with the AID-mediated mutagenesis approach . The AID enzyme is capable of generating indels which are localized in CDR regions, while affinity maturation through somatic mutation further improves the antibody binding and specificity . In vitro expression of the AID enzyme is sufficient to initiate indels, hypermutations inside the CDRs and clonal expansion that is comparable to in vivo events for antibody evolution.
Studies have been carried out to analyze the amino acid diversities in the germline and mature antibody sequences. It was found that the number of germline hotspots decreases in high affinity antibodies, suggesting that hotspot-based somatic mutations occurred via in vivo affinity maturation . Through the in vitro randomization of these short CDR regions that somewhat mimics the natural in vivo SHM sequences diversity is generated and results in in vitro affinity maturation. These hotspots are embedded in the codons of amino acids that are directly and indirectly involved in interactions with antigens. They can serve as the mutation targets in the human genome allowing for various mutagenesis to occur with the aid of the general mutator candidates being AID enzymes or other trans-acting hypermutation factors .
The diversity associated with the utilization of various sequences either in the CDR or framework is directly related to the affinity of the clones generated [5, 103]. The continuous development in molecular technologies has allowed the introduction of various approaches for gene modification. The design of the framework regions in the antibody gene also plays a contributing role in the improvement of the antibody affinity. This is due to the influence of the framework genes on the stability, solubility and affinity of the antibody . The framework regions mainly in the neighboring regions of the CDRs have been known to also contribute to the binding characteristic of antibody clones .
3.2. De novo synthesis of antibody genes
Direct gene synthesis of modified sequences or de novo synthesis of DNA is ideal to create desired gene sequences based on iterative and comprehensive analyses with the aid of high throughput sequencing technology. There are few reasons why de novo synthesis of DNA is preferred in many instances. Firstly, engineering new functions normally requires great modifications to the genetic sequences therefore de novo synthesis is more preferred. Secondly, de novo gene synthesis allows specific design of DNA constructs. De novo synthesis enables to study the influence of new designed sequences on particular functions of corresponding expressed recombinant antibodies. The aim is to improve or modify phenotypic features of antibodies.
Lastly, targeted sequences from natural constructs are sometimes hard to access, therefore synthesis provides a more efficient alternative to retrieve the targeted sequences . Currently, oligos are generated or synthesized automatically, employing solid-phase phosphoramidite chemistry. The principle behind phosphoramidite-based oligo synthesis encompasses a total of four key steps (deprotection, coupling, capping and oxidation) to add one base at a time to a growing oligo chain attached to a solid support.
Synthesis takes place individually in small columns. The purified oligos are then subjected to quality assessments. The automated process can generate up to 100 nmol of oligos at a time with low error rates in the region of one base error in 200 nucleotides [105, 106].
Besides conventional gene synthesis from oligo fragments using column-based synthesized oligos, array-based oligos can be used for gene synthesis as well. An array-based synthesis has the advantage of high throughput synthesis. The polymer array support by Affymetrix is synthesized chemically comprising photolabile protecting groups and photolithography. The photolithographic mask is able to direct UV light over the solid substrate and selectively deprotect and activate the 5′-hydroxyl group in the growing chain, in order for free nucleotides to be incorporated into the chain. The mask is designed for exposing targeted sites on the microarray, where incorporation of nucleotides occurs while masking other non-targeted sites. The oligo fragments are directly synthesized on the support surface, and can be recovered as a heterogenous pool of sequences. Today, several technologies have surpassed the need to use the masking technique. An ink-jet-based printing developed by Agilent allows picolitres of free nucleotides and activator to be spotted on targeted sites on one array. NimbleGen Systems uses the programmed automated micromirror device to activate specific sites on the array. Furthermore, CustomArray (CombiMatrix) utilizes semiconductor-based electrochemical acid production to deprotect desired nucleosides [107, 108].
Nevertheless, the NimbleGen and CustomArray oligo synthesis techniques suffer high error rates when trying to generate longer and multiple oligo strands in parallel. This is due to the side reactions such as depurination and inefficient addition of nucleotides that results in unwanted substitution and indels (insertion/deletion) errors, which greatly affects the overall quality of the synthesized product. Therefore purification steps utilizing polyacrylamide gel electrophoresis and high performance liquid chromatography are essential to remove erroneous sequences upon generating the intended DNA sequences .
The generated oligo fragments obtained after conventional or area-based synthesis are then used as raw substrates to construct larger synthetic fragments (usually few hundreds of base pairs), also known as gene synthesis. Using a ligation-based approach, the complementary overlapping fragments are joined enzymatically by the thermostable DNA ligases, producing larger DNA fragments under high stringency . Another approach, known as polymerase cycling assembly (PCA)-based method, utilizes polymerase to elongate the originated overlapping oligo fragments into double-stranded fragments . Ligation-based synthesis offer higher stringency, therefore error in sequences is less likely to be assembled, but the oligo synthesis are costly due to synthesis of longer fragments. The longer oligonucleotides will allow for better annealing and less steps in comparison to shorter oligonucleotdies. The final step would sometimes involve an additional Polymerase chain reaction (PCR) amplification step to yield more material for cloning. On the contrary, PCA-based methods are more cost-effective as it relies on overlapping short oligo fragments (15–25 nt) per gene synthesis. However, this approach promotes higher error rates due to the lack of error elimination during hybridization . Also, target diversity can be introduced at the regions where the overlapping oligo fragments hybridizes .
Despite the fact that concentrations of individual oligos on the array are quite low and insufficient for priming, as well as the error rates of the oligo pools are higher as the column-based methods, there are successful examples that overcome these challenges. This is done with the use of programmable DNA microchips with an array of oligonucleotides and their selection . To increase the concentration of oligonucleotides for gene synthesis, amplification of oligo fragments before assembly is required. Sequence errors can be detected via hybridization of the synthetized cleaved oligonucleotides to complementary oligonucleotides spanned on a second area. Lastly error-free fragments will be assembled into full-length sequences. However, this method is not feasible for assembling a large pool of oligos because of the risk of cross-hybridization based on the huge diversity. Another approach used selective oligonucleotide pool amplification directed by predesigned barcodes to generate and assemble particular DNA fragments that are required to make a full gene before the barcodes are digested prior to full gene assembly . Recently, this approach was applied to construct few scFv gene libraries with degenerate oligonucleotides synthesized on two DNA microchips in parallel . The humanized anti-ErbB2 antibody (HuA21) was targeted to diversify the CDR regions via a small perturbation mutagenesis method and was validated using deep sequencing by the Illumina platform. Finally, the mutant candidates were screened by phage display to select for high affinity binders [115, 116].
gBlocks gene fragments are readily usable short-to-medium length synthesized DNA fragments that contains particular desired gene modifications. gBlocks are dsDNA blocks that undergo controlled synthesis allowing various applications for antibody and protein engineering. The main application focuses specifically on gene construction and editing. gBlocks are constructed using gene fragment libraries (pools of short DNA fragments that comprise 18 consecutive N bases or K (G,T) bases). The synthesized product is then subjected to various quality control tests such as capillary electrophoresis (fragment length) and mass spectrometry (sequence composition) to verify the final product and reduce potential errors. For gene editing, gBlocks can introduce modifications such as deletion or insertion on relatively short stretches of DNA fragments. The primers are designed to target the region of the gene that is to be edited. Subsequently, the region will be cleaved and replaced by the gBlock . This method has allowed the design and generation of antibody libraries [117, 118].
3.3. In vivo approaches towards affinity maturation of antibodies
Bacterial mutator strains, such as Escherichia coli mutants are shown to introduce random mutations, such as single-base substitutions with higher rates than wild type strains. The mutant strains are characterized by the absence of several DNA repair pathways, resulting in a high rate of mutations . Affinity maturation via this approach involves two key steps; firstly, antibody genes are transformed and replicated in E. coli mutants to introduce random mutations. Next, screening of mutated antibody clones to identify highest affinity binders is done using display technologies. The affinity maturation process requires several rounds of mutation, selection and amplification in order to obtain high affinity mutants. Phage display technology is best coupled with E. coli mutator cells for in vivo mutation of antibody fragments due to the ease of application with phage and phagemid vectors . The E. coli mutator cell such as E. coli mutD5-FIT consists of mutD mutation, F′ factor for Fd phage transfection and supE mutation. This mutant is able to express phage displayed antibody fragments, where antibody genes are fused to the N-terminus of gene III protein and are subsequently packaged to form a mature virus particle. Alternatively, another E. coli mutator strain, XLIRed carries mutD, mutL and mutS, while it is devoid of the F′ episome. This does not allow the cell to be applied for phage infection. However, this F′ deficient mutator cell can be converted to F′ mutator strain by mating with other E. coli strains with F′ episome [120, 121]. The choice of bacterial mutator strains are largely governed by downstream selection strategies that requires rational considerations. A human antibody fragment that targets the hapten 2-phenyl-5-oxazolone (phOx) was affinity matured by a factor 100 fold via E. coli mutD5 strain, whereby the mutations are extensively located in CDR loops and less in framework regions, which improve the binding affinity of the antibody to the target .
In vivo affinity maturation can also be performed via AID-mediated mutagenesis. It fits quite well with the robust mammalian cell display techniques for selection and affinity maturation of functional antibody clones. As an example, the cDNA of an anti-human complement protein C5 antibody is transfected into HEK293 cells together with the AID enzyme expression plasmid to initiate in vitro SHM. The antibody clones are identified and isolated by fluorescence-activated cell sorting (FACS) with fluorescence labeled antigen followed by iterative rounds of SHM to yield high affinity antibodies . In mammalian, yeast and bacterial cell surface display it is essential to select and isolate the cells that are able to produce functional antibodies. Cell-based sorting such as high throughput flow cytometry, allows high throughput screening of cells per minute, analyzing the cells according to size, granularity and binding to fluorescence labeled antigens. The utmost reason for antibody engineering is the production of human monoclonal antibody in large scale, hence, it is crucial to implement a selection towards antibody expressing mammalian cells. Mammalian display allows for screening of eukaryotes-expressed full length antibodies with correct glycosylation [18, 123].
The advancement of genome editing technologies offers a new approach to create sequence diversity. In fact, cells can repair DNA damages intrinsically by joining two ends together or filling the gap with similar sequences. However, cells can also repair the break by using a new piece of DNA that has the desired mutation. This is the basis of in vivo genome editing technologies today . Extensive functional genomics studies helps to provide the insights required that targeted DNA double-stranded breaks (DSBs). The DSBs can induce genome editing via homologous recombination (HR) in the presence of exogenous homology repair template, as well as error-prone non-homologous end-joining repair (NHEJ) pathway in the absence of repair template. These two pathways are versatile to allow precise genome modification . To date, there are four major classes of engineered DNA binding proteins to target DSBs: meganucleases, zinc finger (ZF) nucleases, transcription activator-like effectors (TALEs) and RNA-guided DNA endonuclease Cas9 .
Meganucleases are derived from microbial mobile genetic elements that integrate nuclease and DNA binding domains. ZF nucleases (Cys2His2 bound to a single atom of zinc) are eukaryotic transcription factors that contain the DNA binding domain and is similar to a set of three fingers, with each finger contacting with 3 nucleotides of DNA . TALEs are produced naturally by Xanthomonas sp. that consist of DNA binding domains with 30–35 tandem repeats, with each domain recognizing a single nucleotide of DNA. Specificity of TALEs is governed by the two amino acids that are known as the repeat-hypervariable diresidues . Both ZF nucleases and TALEs require FokI nuclease to direct the nucleolytic activity towards the genome locus for modifications. Recently, RNA-guided DNA endonuclease Cas9 is given much more attention due to its simplicity and versatile approach towards in vivo genome engineering. It is derived from type II bacterial adaptive immune system. The CRISPR-Cas9 mediated immunity is explicitly explained in Yang et al. . The hallmark of this system is the short RNA guide that recognizes the target DNA through base pairing, forming a RNA-DNA complex, subsequently Cas9 creates DSB on the target sequence [129, 130] and a designed DNA fragment can be specifically incorporated.
While other approaches have their own limitations, the robustness of the CRISPR-Cas9 system sheds some light on direct endogenous genome editing on virtually any organism of choice. Meganuclease lacks target specificity, which is why it is not widely employed. However, ZF domains have a tendency to crosslink with neighboring protein -domains or -complexes resulting in lower binding efficiency towards DNA targets. Although TALEs require only one nucleotide for binding towards target, however the synthesis for novel TALEs is costly due to their repetitive sequences . Nevertheless, these enzymes are constructed in customizable fashion to cater for the need of genome editing, as well as programming the enzymes for multiplex gene targeting . Some model organisms were tested with the genome editing technologies, such as zebrafish, rats, mice, Drosophila, C. elegans. Some delivery methods of introducing these programmed enzymes into organisms are microinjections of stem cells with mRNA encoding the enzymes or direct transfection of an plasmid consisting of the enzyme cDNA into HEK293 cells [128, 131].
Naïve and synthetic human antibody repertoires are a very valuable source for the selection of antibodies against nearly any antigen. The role display technologies play in the quest to generate monoclonal antibodies from these libraries is obvious with the increasing number of antibody lead candidates going into clinical trials.
Affinity maturation of selected binders is now possible by expressing for example the AID enzyme during selection of antibodies using antibody mammalian cell surface display or by using a pool of microchip-synthesized CDRs incorporated into an antibody framework. Selection of naïve and synthetic recombinant antibodies combined with in vitro and in vivo affinity maturation techniques will have a profound effect on the generation of high affinity diagnostic and therapeutic human antibodies.
The authors would like to acknowledge the support from the Malaysian Ministry of Higher Education under the Higher Institution Centre of Excellence (HICoE) Grant (Grant no. 311/CIPPM/44001005).