Site-Directed Mutagenesis as Applied to Biocatalysts

We constructed a small library by site-directed mutagenesis to explore the effects of replacing H222 with D, Q and E. All mutants showed a greater amount of methyl-glucoside than did the wild-type enzyme, as a result of a change in the alcoholysis/hydrolysis ratio. Mutant H222Q showed an increase in the alcoholysis


Introduction
Enzymes are biological catalysts responsible for supporting almost all of the chemical reactions in living organisms. Their activities, specificities and selectivities make them attractive as biocatalysts for a wide variety of industries. Examples are agrochemicals, detergents, starch, textiles, personal care, pulp and paper, food processing, and animal feed. The chemo-, enantio-and regioselectivities of biological catalysts are hallmarks that make them especially attractive for use in the synthesis of fine chemicals and pharmaceutical intermediates. They are a viable alternative to chemical synthesis, which is usually characterized by low yield and the accumulation of undesirable secondary products. The imminent decrease in the use of fossil fuels has turned attention to new enzyme-based developments for the production of biofuels (e.g., biodiesel) that use renewable raw materials (Cherry & Fidantsef, 2003).
Nevertheless, natural enzymes are often not optimal for use in industrial conditions. It is usually necessary to change the conditions of the process or, most commonly, to alter one or more of the properties of the enzyme. Desirable changes in the enzyme are those that affect substrate specificity, expression level, solubility, stability, activity, selectivity, or thermal stability. Other desired effects could include tolerance to organic solvents or to extreme pH values (Hibbert et al., 2005;Turner, 2009).
Protein engineering usually involves the modification of amino acid sequences at the DNA sequence level by means of chemical or genetic techniques. The resultant protein is then tested for novel, optimal or improved physical and/or catalytic properties (Ulmer, 1983). There are two different basic approaches to engineer proteins, although it is common to combine both approaches for better results.
a. Rational design. Mutations are introduced at specific places in the protein-encoding gene. Positions to mutagenize are based on the knowledge of possible relationships of sequence, structure, function and/or the catalytic mechanism of the protein.
Recently computational predictive algorithms have been developed and used to preselect promising target sites (Bolon & Mayo, 2001;Kaplan & DeGrado, 2004;Kuhlman et al., 2003;Pavelka et al., 2009;Zanghellini et al., 2006). However, a deep knowledge of the structure and energy functions is required in order to predict the changes required to modify some parameter of the enzyme. This is especially true if one wishes to change the reaction mechanism. b. Directed evolution. This approach involves repeated cycles of random mutagenesis of and/or recombination with variants of the gene to create a library of genes with slightly different sequences. The enzyme variants thus obtained are submitted to genetic selection or to high-throughput screening to identify those enzyme variants with improvements in the desired property (Stemmer, 1994a;Zhao et al., 1998). Directed evolution has been demonstrated to be a very powerful technique, especially for increasing stability Ladenstein & Antranikian, 1998;Song & Rhee, 2000;Uchiyama et al., 2000;Zhao & Arnold, 1999) or to change the specificity of an enzyme (Castle et al., 2004;Cohen et al., 2004;Christians et al., 1999;Joerger et al., 2003;Jurgens et al., 2000;Levy & Ellington, 2001;Matsumura & Ellington, 2001;Sakamoto et al., 2001;Song et al., 2002;Zhang et al., 1997). It is a particularly useful approach, since no structural or mechanistic information is required. In many cases, changes that contribute to the improved properties are far from the active sites. They would not have been targeted by a rational strategy.
Both techniques have strengths, but they also have some limitations. For this reason, it is often common to find rational design work combined with directed evolution. It may be desirable to tune some properties of a designed protein (Savile et al., 2010;Siegel et al., 2010) or randomization may be directed to specific regions of a protein that were identified in the design process.
This review summarizes the most common strategies used to identify possible targets for site-directed mutagenesis to enhance biocatalysis. We included sequence-and structurebased strategies for generating enzymes with desired properties. To illustrate a number of the points discussed above, special attention was paid to the site-directed mutagenesis of glycosyl hydrolases. We also used the modification of alpha amylases as a case study. We have described the sequence-based mutagenesis approach that was used to change the transglycosylation/hydrolysis ratio of alpha amylases. Residues involved in the hydrophobicity and electrostatic environment of the active site were identified by sequence and structural alignments with other glycosyltransferases. As a result, certain residues were targeted for mutagenesis. We also used a multiple sequence alignment and structural information in an approach to reduce the hydrolytic activity of the alpha amylase from Thermotoga maritima, while increasing its alcoholytic activity. Unlike the wild-type parent, the modified enzyme was able to synthesize alkyl-glucosides.

Approaches for selecting targets to mutagenize
Any biochemical, structural, or protein sequence information may be useful for identifying residues that may influence a desired enzyme property. The information may indicate changes that increase or decrease the overall fitness of the enzyme.
A common approach is to focus on regions or positions that may be directly related to the catalytic property. For example, amino acid residues that alter substrate specificity or selectivity are commonly non-conserved residues. They are often in close contact with catalytic residues in or near the active site, cofactors or substrates Paramesvaran et al., 2009;Park et al., 2005). Another approach is the identification of sequence motifs that are thought to have been conserved during evolution (Saravanan et al., 2008). In contrast, residues thought to be involved in thermostabilization are spread throughout the entire sequence. Each such residue is thought to make a small contribution to thermostability. However, the additive effect can be significant. For this reason, random mutagenesis is a powerful tool for achieving protein stabilization. However, some features that are known to contribute to protein stability can be implemented by site-directed mutagenesis strategies. These include the introduction of additional disulfide bridges (Mansfeld et al., 1997); decrease of loop entropy by replacement of some amino acid residues to P or by the shortening of loops (Nagi & Regan, 1997); change of α-helix propensity by mutations to replace G residues (low α-helix propensityto A residues (high helix proepnsity); or by the introduction of salt bridges to increase electrostatic interactions in the protein (Kumar et al., 2000;Lehmann & Wyss, 2001;Spector et al., 2000).

Alanine scanning
Alanine scanning is a method used to determine the contribution of the side-chains of specific residues in a protein. Substitution of residues with alanine removes all side chain atoms past the β-carbon, without introducing additional conformational changes into the protein backbone. Although mutagenesis by alanine scanning can be a laborious method (because each alanine-mutated protein must be constructed, expressed and analyzed separately), it has nevertheless been useful for the study of interactions at protein-protein interfaces or for the identification of residues involved in substrate recognition (Gibbs & Zoller, 1991), protein stability (Blaber et al., 1995), or binding (Ashkenazi et al., 1990;Cunningham & Wells, 1989).
Alternatives to conventional alanine scanning are computational methods for modeling alanine-scanning mutants. This approach has proven to be useful for predicting active-site residues important for activity (Funke et al., 2005) and to identify amino acid residues important in protein-protein interactions (Kortemme et al., 2004).

Protein sequence alignment
Nature has had the opportunity to explore the protein sequence space through millions of years of evolution. Genetic drift is thought to be the driving force that is responsible for the sequence diversity observed today. However, residues that are indispensable to function and/or stability have been maintained by selective pressure. Multiple sequence alignments are useful tools for identifying positions that are unchangeable in a protein. They will also identify those regions with the plasticity to allow multiple changes. Briefly, when residues with a common evolutionary origin or having structural or functional equivalence are arranged so that the highly conserved residues are aligned, their alignment serves as an anchor for the alignment of the sequences in a set. Analysis of position-specific residue usage (residue profiles) gives information about amino acid conservation or variability at each position. When a multiple sequence alignment is combined with phylogenetic information, it is possible to explore ancestral relationships among groups of homologous protein sequences. It is also useful for identifying important amino acids that probably cannot be modified.

Correlating amino acid sequence patterns to specific properties
An approach for identifying residues that may be functionally relevant is to correlate an enzyme property with the amino acid patterns observed in a multiple sequence alignment. For example, comparison of more stable proteins with less stable ones is a strategy for identifying possible thermostabilizing residues (Ditursi et al., 2006;Gromiha et al., 1999;Kumar & Nussinov, 2001;Perl et al., 2000).
Sequence patterns can also be used to identify the determinants of specificity. Good examples are the attempts to change cofactor specificity in dehydrogenases to NAD + , since NAD + is considerably less energetically demanding for the cell to make than is NADP + (Flores & Ellington, 2005;Kristan et al., 2007;Rodríguez-Zavala, 2008;Rosell et al., 2003).
Even distant mutations can significantly affect the properties of an active site. They may alter slightly the geometry, electrostatic properties or dynamics of amino acids in the active site. Distant residues that are important for their interactions with the active site may be seen as conserved in a multiple sequence alignment. Multiple sequence alignments have also revealed that some residues are infrequent in a sequence; but nevertheless are frequently adjacent. Such cluster-forming residues have probably coevolved. These "protein sectors" are often critical for specific functional roles, including substrate binding, stability, allosteric regulation or catalytic activity (Halabi et al., 2009).

Consensus sequence
The method of using a consensus sequence is based on the assumption that, in an amino acid sequence alignment of homologous proteins, the consensus amino acid at a given position contributes more to the stability or the function of the protein than does a nonconsensus residue. This assumption is based on the belief that a consensus sequence may closely mimic the sequence of an ancestral protein. One hypothesis posits that many proteins were originally thermophilic or hyperthermophilic (Di Giulio, 2003). Under this premise, the consensus sequence has been used to improve the thermostability of several enzymes. This was achieved by mutation of several residues towards the consensus sequence obtained from a multiple sequence alignment. There are numerous examples in which this approach has been used to increase the thermostability of proteins. Some proteins reconstruct the complete consensus sequence (Lehmann et al., 2000;Sullivan et al., 2011). In others, point mutations were used to identify the residues that increased stability. They were then combined to increase the thermostability of the protein (Maxwell & Davidson, 1998;Nikolova et al., 1998;Yamashiro et al., 2010).
Similarly,  showed the improvement of an enzyme property by mutagenizing the codon for a residue to codons for those amino acids that appear frequently in natural enzymes at identical positions. Evolution probably selected these residues. They are unlikely to perturb the folding or the function of the protein. In contrast, absent and rarely occurring residues are the ones that are probably not allowed. Their rareness suggests that they may be deleterious to the protein. This approach was used to improve the activity and enantioselectivity of an esterase from Pseudomonas fluorescens (PFE). The amino acid distribution at four positions near the active site of PFE, previously reported to influence the enantioselectivity of the enzyme, was determined by a structure-guided multiple-sequence alignment of 171 esterases generated by the 3DM database (Kuipers et al., 2010). A library was created by site-directed mutagenesis of the coding regions for the four active site positions in PFE. Substitutions were limited to frequently occurring residues. Almost all mutants in the library showed significantly improved activity towards a commonly used esterase substrate. Moreover, one mutant had its specific activity enhanced 240-fold relative to that of the wild-type enzyme. The mutant also exhibited substantially higher enantioselectivity in the hydrolysis of 3-phenyl butyric acid p-nitrophenyl ester (E=80) compared to that of the almost nonselective wild-type enzyme (E=3.2) .

Design of ancestral proteins
As mentioned above, one hypothesis suggests that ancestral proteins were able to withstand the harsh conditions prevalent on earth at that time (Di Giulio, 2001). In addition to their thermostability, ancestral enzymes may have been promiscuous with respect to substrates. The evolution theory of proteins holds that current proteins evolved from low-specificity ancestral proteins. Because of their low specificity, the ancestral proteins evolved to become more efficient at using specific substrates. Thus, the reconstruction of ancestral sequences from multiple sequence alignments and phylogenetic trees may provide the opportunity to change enzyme specificity. Several methods based on this approach have been reported. Some of them are given below.
In the Ancestral Library method, all residues located close to or within the enzyme´s active site are mutated to residues predicted by phylogenetic analysis and ancestral inference. The substitutions are those residues found in the hypothetical proteins at various nodes and branches of the evolutionary trajectories of a given enzyme family. They do not reflect the entire diversity seen in existing family members (Alcolombri et al., 2011).
Alcolombri and coworkers (2011) used serum paraoxonases and cytosolic sulfotransferases (SULTs) as models. In order to promote changes in substrate specificity, they constructed ancestral libraries of enzymes. Their mutagenesis was directed to residues near or within the active site of an enzyme. From a phylogenetic tree, the most probable ancestral sequences were obtained for all nodes. Using these sequences as templates and the three-dimensional structure of the enzyme, residues in and near the active site were located. The ancestral residues were identified, and the relevant altered enzymes constituted a library of mutants. After activity screening, several variants with different activities and specificities were identified. Some mutants had up to 50-fold higher activity than the activity of the starting enzyme.
REAP (Reconstructing Evolutionary Adaptive Paths) analysis uses phylogeny to identify mutations in gene sequences that are thought to have emerged from a common universal ancestor during functional divergence. The findings are used to generate focused and functionally enriched enzyme libraries (Chen et al., 2010).
REAP was implemented to identify differences in the sequence of promiscuous viral polymerases and non-viral polymerases. The differences may be responsible for the functional divergence without loss of catalytic activity. Sequence alignments and a phylogenetic tree of 719 polymerases were constructed. Ancestral proteins sequences were inferred from the collection of sequences at the nodes of the tree. REAP identified sites that may have changed during the separation of viral and non-viral polymerases. In one example, mutations of the residues identified by REAP analysis for the DNA polymerase of Thermus aquaticus yielded 8 mutants that showed a change in the substrate specificity for unnatural dNTP´s (Chen et al., 2010).
Similarly, the Evolutionary Trace method correlates evolutionary variations within a gene of interest with divergence in the phylogenetic tree of that sequence family. This method has been shown to reveal the functional importance of residues (Lichtarge et al., 1996;Lichtarge et al., 2003).

SCA (Statistical Coupling Analysis)
Well-separated mutations can significantly affect the activity, specificity, or enantioselectivity of an enzyme by slightly altering the geometry, electrostatic properties or dynamics of amino acids in the active site. Moreover, it is known that physically contiguous residues form "protein sectors" that can be critical for specific functional roles, including substrate binding, stability, allosteric regulation and catalytic activity (Halabi et al., 2009).
Statistical Coupling Analysis (SCA) is a method that estimates the co-evolution between pairs of amino acids in the multiple sequence alignment of a protein family. SCA shows that proteins can be divided into "protein sectors." In several different proteins, the sectors correspond to amino acids that are physically contiguous. These amino acids often underlie various aspects of function, allosteric regulation, binding, catalytic specificity, and/or fold stability. An application of SCA revealed networks of small subsets of residues that link distant functional sites and cooperate in allosteric communication (Suel et al., 2003).

Structure-based mutagenesis
The evolution of proteins involves mutations of single residues, insertions, deletions (Pascarella & Argos, 1992), gene duplications, fusions, exon duplications and shuffling (Grishin, 2001). Such changes, which accumulate over time, make the identification of sequence similarities very difficult. However, structure is more preserved than sequence and can be used as an evidence of homology among proteins. Comparative analyses of protein sequences and structures are important approaches for the identification of structural, evolutionary and functional relationships between proteins.
The rapidly growing number of protein structures in the Protein Database (PDB) and advances in homology modeling are of great value for generating structural alignments. In general, these methods provide a measure of structural similarity between proteins. They also generate an alignment that defines the residues that have structurally equivalent positions in the proteins being compared. Homology modeling can be done even when no sequence similarity is detected. Based on structural alignments, it is possible to identify residues in direct contact with the substrate or near the active-site cavity. A more complex analysis can even look into enzyme locations that are far from the active site, but are part of a network of interactions that hold the active site together. The residues at these locations can be targeted for mutagenesis. There is a server called HotSpot Wizard that combines information from extensive sequence and structure database searches with functional data to create a mutability map for a target protein. This approach was validated by comparing "hot spot" predictions with mutations extracted from the literature (Pavelka et al., 2009).

Site-directed Saturation Mutagenesis (SDSM)
The Site-directed Saturation Mutagenesis (SDSM) approach consists of using all 20 amino acids at a position in a protein. Based on structural knowledge, it may be sufficient to target the active-site residues (Park et al., 2005;Schmitzer et al., 2004;Wilming et al., 2002;Woodyer et al., 2003). SDSM can be used to complement error-prone PCR. One of the limitations of using error-prone PCR to generate variants of a protein is that the sequence exploration is limited to an average of seven amino acid substitutions per residue. Once positions that seem to be important for improving a property of a protein are identified by error-prone PCR, SDSM will tune the optimization by testing all 20 amino acids in those positions. Multiple positions can be mutagenized simultaneously by SDSM if only a few positions are being explored. However, the number of variants increases exponentially with the number of positions being explored. Therefore, if more than 3 positions are being randomized, it is better to carry out successive targeted randomizations at the positions.

Combinatorial Active-site Saturation Test (CAST)
The Combinatorial Active-site Saturation Test (CAST) was developed to increase the enantioselectivity and/or the substrate specificity of enzymes. The basis of the method is the generation of small libraries of mutant enzymes that are easy to screen for activity. The mutants are produced by simultaneous randomization of sets of two or three spatially close amino acids, whose side chains form part of the substrate-binding pocket.
Application of this methodology allowed the expansion of the substrate specificity of Pseudomonas aeruginosa lipase (PAL). Based on the crystal structure, pairs of amino acids surrounding the binding pocket were defined; and the corresponding libraries were created separately by simultaneous saturation mutagenesis at each pair. The libraries were screened for activity with different substrates. The best-performing variants were selected (Reetz et al., 2005). Further optimization was achieved by iterative cycles of CASTing (Reetz et al., 2006a). Mutants that enhanced a given catalytic property were selected. The residue positions thought to be responsible were organized into groups of two or three. Each group was randomized by saturation mutagenesis to create libraries that were subsequently screened. The best hit of those libraries was used as the template for the next round of mutagenesis. Variability at the other sites was introduced by another round of saturation mutagenesis. The process was continued until the desired degree of catalyst improvement had been achieved. Iterative screening has been applied to enhance very different catalytic properties, including thermostability (Reetz et al., 2006b), substrate acceptance and enantioselectivity (Clouthier et al., 2006;Reetz et al., 2006a).

B-factor iterative test
The B-factor iterative test is used to modify enzyme thermostability by increasing rigidity at sites to help prevent unfolding. The selection of target residues is made on the basis of crystallographic B-factor data. This value reflects the degree to which the measured electron density for a particular atom spreads out. It is strongly influenced by thermal fluctuations and the mobility of the atom. Residues with the highest B factors have high flexibility. Appropriate mutations lead to enhanced rigidity and, therefore, to higher thermostability. Target sites are chosen as sites for iterative saturation mutagenesis (ISM), in which each of the target residues is mutagenized to saturation. The best mutant of the first screening is then used as the template for a second round of saturation mutagenesis at one of the other selected sites. The cycle is repeated in an iterative manner (Reetz et al., 2006a).
This method has been used to enhance the thermostability and the tolerance to organic solvents of the mesophilic lipase (LipA) from Bacillus subtilis. After several rounds of ISM at residues with high B-factors, a mutant was obtained with the inactivation temperature increased from 48°C to 93°C and with an improved robustness towards organic solvents without affecting the activity of the enzyme (Reetz et al., 2006b;Reetz et al., 2010). Similarly, Jochens and coworkers  increased the Tm of the esterase from Pseudomonas fluorescens by 9 °C. Smart libraries were guided by the B-factor. By guiding ISM at residues displaying the lowest B-factors, Reetz and coworkers (Reetz et al., 2009) were able to create a lipase from Pseudomonas aeruginosa that has a decrease in the Tm value from 71.6 °C to 35.6 °C without affecting the catalytic profile of the enzyme.

Site-directed homologous recombination
Most proteins are only marginally stable. For this reason, the accumulation of a few mutations is sometimes sufficient to destabilize a protein. The introduction of variability by recombination with the structural gene for a homolog is often less perturbing for folding than is mutagenesis to introduce point mutations. The reason is that some amino acid changes introduced by recombination have already been selected by Nature to give a particular structure and function. Because recombination is more conservative than mutagenesis, several research groups have tried to introduce variability in a sequence by constructing chimeras with genes for homologous proteins. This can be done either randomly (Crameri et al., 1998;Minshull & Stemmer, 1999) or in a site-directed fashion (Landwehr et al., 2007;Li et al., 2007;Pantazes et al., 2007). By substituting a homologous segment, some interactions may be perturbed; and the protein might not be functional. The less perturbing sites for chimeragenesis are thus identified, and a library of recombinants is then constructed by recombining DNA fragments from different homologous genes. The resulting library is screened for a specific property. The properties have included thermostability, activity towards different natural and non-natural substrates, and/or specificity. The power of this technique, compared to random recombination strategies, is that the libraries constructed have a high percentage of folded proteins, thus making it easier to find interesting variants.

Site-directed loop exchange in proteins
With the same idea as the site-directed chimeragenesis described above, site-directed loop exchange is based on introducing variability only in the binding and/or catalytic loops; the rest of the structure is perturbed very little. The basis of this strategy is that a good portion of catalytic and molecular recognition sites are in loops, while the rest of the structure is the scaffold that maintains the residues and the network of interactions in place to create an environment suitable for catalysis. By exchanging loops, the sequence can be different from that of the parental protein; but the stability and the folding of the protein is maintained. The technique has been widely used, especially for antibodies. It has been recognized that the binding specificity of antibodies relies on the loops of the variable regions (Clark et al., 2009). More recently, import of loops from natural enzymes has been carried out to explore novel activities in a given scaffold (Park et al., 2006). In our laboratory, we have developed a strategy that allows systematic loop exchange from eight different proteins with a TIM barrel fold into a TIM barrel scaffold. (A TIM fold is characterized by a barrel formed by eight parallel β-strands surrounded by seven or eight α-helices. The loops that join the βstrands to the α-helices at the top of the barrel conform the active site in proteins sharing this fold.) We demonstrated that the libraries generated had a high percentage of properly folded proteins (Ochoa-Leyva et al., 2009).

Glycosyl hydrolases
Glycosyl hydrolases (also called glycosidases) constitute a widespread group of enzymes that catalyze the hydrolysis of the glycosidic bond. Glycosyl hydrolases can hydrolyze the glycosidic linkage between two or more carbohydrates or between a carbohydrate and a non-carbohydrate moiety to release smaller sugars. They are classified as exo-and endoglycosidases, depending on their ability to cleave a substrate at the end (the nonreducing end) or in the middle of an oligosaccharide or polysaccharide chain, respectively. Glycossyl hydrolases have been classified into more than 100 families based on amino acid sequence similarities (Davies & Henrissat, 1995;Henrissat, 1991;Henrissat & Bairoch, 1993;). This classification system, available on the CAZy (CArbohydrate-Active EnZymes) web site (Cantarel et al., 2009), allows reliable prediction of evolutionary relationships, mechanism (retaining/inverting), active site residues and possible substrates. It is even a reasonable tool for newly sequenced enzymes for which function has not yet been biochemically demonstrated.

Reactions catalyzed by glycosyl hydrolases
In most cases, the hydrolysis of the glycosidic bond is catalyzed by two amino acid residues -a general acid (proton donor) and a nucleophile/base. Depending on the spatial positions of these catalytic residues, hydrolysis occurs via overall retention or overall inversion of the anomeric configuration (Davies & Henrissat, 1995;McCarter & Withers, 1994;Sinnott, 1990).

Inverting glycosyl hydrolases
Inverting enzymes act by a single-step, acid/base-catalyzed mechanism. Two residues, typically glutamic or aspartic acids located 6-11 Å apart, act as acid and base. The leaving group is directly displaced by the nucleophilic water with a single inversion at the anomeric centre ( Fig. 1). Fig. 1. Inversion hydrolysis mechanism of glycosyl hydrolases.

Retaining glycosyl hydrolases
Retaining glycosidases act through a double-displacement mechanism (each step resulting in inversion at the anomeric centre) involving a covalent glycosyl-enzyme intermediate (Fig.  2). The reaction is catalyzed with acid/base and nucleophilic assistance provided by two amino acid side chains, typically glutamate or aspartate, located 5.5 Å apart. In the first step (glycosylation), one residue plays the role of a nucleophile. It attacks the anomeric centre to displace the aglycon and form a glycosyl enzyme intermediate. At the same time, the other residue functions as an acid catalyst and protonates the glycosidic oxygen as the bond cleaves. In the second step (deglycosylation), the now deprotoned acid-base carboxylate functions as a base to activate the incoming nucleophile (water, saccharide or alcohol) to which the glycosyl is transferred from the enzyme intermediate to give the product. Fig. 2. Retaining mechanism of glycosyl hydrolases. In a first reaction (a), glycosidic bond breakage of the donor saccharide is carried out and a glycosyl enzyme complex is formed. In (b), the incoming acceptor molecule (water) is activated to promote the release of sugar. If the incoming nucleophile is different from water, the enzyme carries out a transfer reaction, called transglycosylation if the incoming molecule is an oligosaccharide (c), or alcoholysis if the incoming nucleophile is an alcohol (d).

Industrial uses of glycosyl hydrolases
In nature, glycosyl hydrolases catalyze the degradation of diverse glycosylated polymers, like starch, glucogen, cellulose and hemicellulose. They also participate in anti-bacterial defense strategies (e.g., lysozyme), in pathogenesis mechanisms (e.g., viral neuraminidases) and in normal cellular functions. Glycosyl hydrolases are also of great importance to industry. For example, in the food industry, enzymes like invertase and amylase are employed for the manufacture of invert sugar or maltodextrins; in the paper and pulp industry, xylanases are used to remove hemicelluloses from paper pulp; cellulases are widely used in the textile industry and in laundry detergents; and recently, cellulases and xylanases have been used in the conversion of lignocellulosic biomass into forms suitable for biofuel production.
However, in most cases, glycosyl hydrolases are not optimal in industrial conditions. It is often necessary to alter their stabilities, catalytic activities and/or substrate specificities by protein engineering methods. One of the most frequently altered enzyme properties is thermostability. It can be a limiting factor in the selection of enzymes for industrial applications due to the elevated temperatures or the extreme pH of many biotechnological processes. The stability of an enzyme can be improved by site-directed mutagenesis (Ben Mabrouk et al., 2011;Ghollasi et al., 2010;Leemhuis et al., 2004;Liu et al., 2008;Yin et al., 2011). One of the most exhaustive efforts was done by Palackal and coworkers (Palackal et al., 2004), who used saturation mutagenesis for each of the 189 amino acid residues of a xylanase. They generated a library of modified enzymes, each altered at single position. This library was then screened for variants with increased thermostabiltity, and nine single amino acid changes that contribute to increased stability were identified. These nine single substitutions were then combinatorially assembled to generate all 512 possible variants. Another round of screening identified eleven enzymes with melting temperatures up to 35°C higher than that of the wild-type enzyme.
Another enzyme property that is desirable to modify is the optimum pH. For example, in soybean β-amylase, the hydrogen bond networks around the catalytic base residue (E380) of the enzyme were removed by point mutations, raising the optimal pH from 5.4 to the more neutral pH range of between 6 and 6.6 (Hirata et al., 2004a;Hirata et al., 2004b).
In vivo, glycosidases catalyze the hydrolysis of glycosidic linkages. However, in vitro, they can be used as synthetic catalysts to form glycosidic bonds. This process is called the kinetic approach, and it can be accomplished by reverse hydrolysis or by transglycosylation (Fig. 2  c and d). The utility of glycosidases in the synthesis of glucosides through transglycosylation reactions has been employed to synthesize unusual products that are difficult to obtain by other methods. Several site-directed mutagenesis strategies have been used to increase the translycosylation activity of glycosidases or to change substrate specificity. For example, rational modification of the β-glycosidase from Sulfolobus sulfataricus to accept a wider range of substrates in transglycosylation reactions has been done. Site-directed mutagenesis was used to alter two key residues involved in substrate recognition to provide access to many different glycoside linkages, including the especially problematic β-mannosyl and β-xylosyl linkages (Hancock et al., 2005). We will focus our discussion on the protein engineering work on α-amylases carried out by us and others.

alpha-amylases
α-Amylases (EC number 3.2.1.1) are part of the family 13 of glycosyl hydrolases. They catalyze the hydrolysis of internal α-1,4-glycosidic linkages of starch, liberating poly-and oligosaccharides chains of varying lengths. They are found in both eubacteria and eukaryotes. They have a large number of different substrate specificities, as well as huge variations in both temperature and pH optima (Vihinen & Mantsala, 1989).
α-Amylases are the starting enzymes in the industry of modification and conversion of starch. This is because of their capacity to catalyze reactions under environmentally friendly conditions and without the addition of expensive activated sugars (Buchholz & Seibel, 2008).
In the sugar-producing industry (Nielsen & Borchert, 2000), bacterial and fungal α-amylases of family GH13, particularly those of the Bacillus species, play a vital role in the starch liquefaction process. Starch from wheat, maize and tapioca is hydrolyzed to produce oligosaccharides by the thermostable α-amylases from Bacillus licheniformis. The oligosaccharides are then saccharified to glucose by glucoamylase (Crabb & Shetty, 1999). According to the degree of hydrolysis of starch, α-amylases are divided into two categories: (1) saccharifying α-amylases, which hydrolyze 50 to 60% of the saccharide bonds and (2) liquefying enzymes, which process about 30 to 40% of starch hydrolysis (Fukumoto & Okada, 1963). The enzyme commonly used in the industrial process is the α-amylase from Bacillus licheniformis. It has the great advantage of being thermostable. This enzyme thus allows the fast hydrolysis of starch at the high temperatures required to dissolve it, with the consequent decrease in viscosity, before decreasing the temperature for the addition of the next enzyme in the process. Some of the disadvantages of using different enzymes are that the pH or temperature conditions may need to be adjusted during the process, with consequent increase in time, costs, and the introduction of salts (buffers) that will have to be removed from the final product. Thus, several research groups, including ours, have tried to engineer α-amylases to change their product profiles (i.e., to make them more saccharifying) to increase their optimal temperatures and to widen their pH spectra.
All α-amylases consists of three domains, called A, B and C. Domain A contains the catalytic residues and has four conserved sequence regions (numbered I-IV) (Mackay et al., 1985;Nakajima et al., 1986;Rogers, 1985), which have been postulated to be essential for the function of α-amylase. Among α-amylase sequences, the four regions align and are spaced at similar intervals along the proteins. These regions presumably form the active site cleft, the substrate-binding site, and the site for binding the stabilizing calcium ion. Domain B forms a large part of the substrate binding cleft, and it is presumed to be important for the substrate specificity differences observed among α-amylases (MacGregor, 1988). It is the least conserved domain among α-amylases (Guzman-Maldonado & Paredes-Lopez, 1995). Finally, domain C constitutes the C-terminal part of the sequence and seems to be involved in substrate binding. All known α-amylases contain a conserved calcium ion located at the interface between domains A and B (Boel et al., 1990;Kadziola et al., 1998;Machius et al., 1998;Machius et al., 1995). The calcium ion is known to be essential for enzyme stability (Vallee et al., 1959).
Depending on the enzyme, the active site cleft can accommodate from four to ten glucose units, each one bound by amino acid residues that constitute the binding subsite for that glucose unit. Subsites are numbered according to the location of the scissile bond. In αamylases there are two or three subsites on the reducing end of the scissile bond (subsites +1, +2 and +3). The number of subsites on the non-reducing side of scissile bond varies between two and eleven (subsites -1, -2, ... -11) (Brzozowski et al., 2000;Davies et al., 1997;MacGregor, 1988). The number of subsites and their affinities are some of the determinant factors of the final product profiles of α-amylases Kandra et al., 2002;Matsui et al., 1992a;.

Transglycosylation reactions in alpha-amylases
As other retaining glycosidases, α-amylases, particularly saccharifying amylases, can also catalyze transfer reactions, which are the result of employing molecules other than water (e.g., carbohydrates or alcohols) as glucosyl acceptors (Fig. 2 c and b, respectively). When a high molecular-weight alcohol is used as an acceptor, the products are alkyl-glucosides. These molecules have a high surface tension activity that has important applications in several industries. Although various retaining glucosidases, like β-galactosidase (Moreno-Beltran et al., 1999;Svensson, 1994), β-xylosidase (Shinoyama et al., 1988), βfructofuranosidase (Rodríguez et al., 1996;Straathof et al., 1988), and β-glucosidase (Chahid et al., 1992;Vulfson et al., 1990), have been used in alcoholysis reactions, the use of a readily available substrate, like starch, gives α-amylases great potential in the catalysis of this type of reaction.
We found a correlation between the efficiency of hydrolysis and the capacity of the enzymes to carry out transglycosylation reactions. A plausible hypothesis is that those enzymes that are able to transglycosylate can recycle intermediate size oligosaccharides produced during hydrolysis to generate longer ones that are better substrates. This would result in a more saccharifying pattern at equilibrium. Transglycosylation activity is not reported in the bacillary α-amylases used in the starch process industry. We decided to introduce this activity by engineering liquefying α-amylases from Bacillus stearothermophilus (Saab-Rincon et al., 1999) and Bacillus licheniformis (Rivera et al., 2003). We tried to identify residues that could be responsible for transferase activity. Kuriki and coworkers (Kuriki et al., 1996) suggested three residues that are likely to be responsible for controlling the water activity in the active site of the neopullulanase, a natural transferase from Bacillus stearothermophilus. When one of these residues (Y377) was mutated to a non-polar residue, the transglycosylation reaction was favored due to a change in the transglycosylation/ hydrolysis ratio. We carried out a multiple sequence alignment of α-amylases and cyclodextrin glycosyltransferases (CGTases) and identified a residue (A289 in the Bacillus stearothermophilus α-amylase) that is analogous to Y377 in the Bacillus stearothermophilus neopullulanase. The Bacillus stearothermophilus α-amylase is a liquefying enzyme unable to carry out transglycosylation reactions (Fig. 3). We used site-directed mutagenesis to change the A at residue 289 to Y and F, which are present in natural transferases, like neopollullanases and CGTases. The two mutants that were generated were able to carry out the transfer reaction not only to other saccharides ( Fig. 4) but also to alcohols, like methanol, to produce methyl-glucosides. The A289Y mutant was more efficient at catalyzing transfer reactions than was A289F (Fig. 5) (Saab-Rincon et al., 1999). Apparently the hydrophobic nature of the mutant residues and the electrostatic interactions that may affect the geometry of the side chains in the active site are important for the transglycosylation reaction. In contrast, when the same mutations were introduced at the equivalent position (V286) in the α-amylase from Bacillus licheniformis, the V286Y mutant showed an increase of hydrolytic activity, whereas the V286F mutant had a higher translgycosylation/hydrolysis ratio (Rivera et al., 2003).
In contrast to bacterial liquefying α-amylases from B. licheniformis and B. stearothermophilus, several fungal amylases like those from Aspegillus niger and Aspergillus oryzae have the ability to carry out alcoholysis reactions. These two fungi amylases are responsible for saccharifying enzymes that produce maltose, maltotriose and some glucose. Santamaria and coworkers ) demonstrated that these enzymes were able to carry out alcoholysis reactions in the presence of methanol and starch as substrate, even at high methanol (20%) and starch (15%) concentrations. Although the alcoholysis reaction was reported in α-amylase from A. oryzae using aryl-maltoside and either, methanol, ethanol or butanol as substrates (Matsubara, 1961), the alcoholysis reaction with starch as substrate is less efficient .  The product profiles obtained with wild-type (WT) and mutant (A289F and A289Y) Bacillus stearothermophilus α-amylases are compared at 0, 1, 5 hours of reaction. We used as standards a mixture of oligosaccharides (1) and methyl-glucoside (2). Although the wild-type enzyme and the A289F and A289Y mutants showed similar hydrolysis and transglycosylation patterns, the mutants showed products between the glucose and methyl-glucoside standards that could be attributed to alcoholysis reactions. Presumably, those spots for which there are no molecular weight markers correspond to alkyl-oligosaccharides. However, the direct use of these enzymes for the production of alkyl-glucosides is precluded by the high temperature required for starch solubilization. The use of a thermophilic saccharifying α-amylase would be attractive, not only in the development of alcoholysis reactions, but also in the starch-processing industry. Liebl et al. (1997) described an extracellular α-amylase (AmyA) produced by the hyperthermophillic bacterium Thermotoga maritima MSB8. The enzyme is a saccharifying amylase with an optimum temperature of 85°C. It can hydrolyze internal α-1-4-glycosidic bonds in various α-glucans, such as starch, amylose, amylopectin and glycogen, to yield mainly glucose and maltose as final products. Because AmyA has the advantage of being a saccharifying enzyme in a stable scaffold, we explored its properties in the transglycosylation and alcoholysis reactions (Damián-Almazo et al., 2008;Damian-Almazo et al., 2008;Moreno et al., 2010). In addition to the characterization reported by Lieb et al., we found that AmyA is capable of using small oligosaccharides (G2 to G7) as substrates for the transglycoslation reactions at short reaction times. This was followed by hydrolysis to yield glucose and maltose as final products. The ability of AmyA to use maltose as a substrate is unusual, as most αamylases are not capable of using maltose to transfer glucosyl units to other oligosaccharides. Moreover, in the presence of various substrates, AmyA is able to form neotrehalose, a nonreducing disaccharide composed of two glucose molecules joined by α-1, β-1 linkage. It uses 6% maltose as a substrate. Like other saccharifying enzymes, AmyA is capable of transferring glycosyl units to methanol and butanol to produce alkyl-glucosides. When compared to other saccharifiyng α-amylases, AmyA has a high transfer capacity. The enzyme generates 7.5 mg/ml of methyl-glucoside (Moreno et al., 2010), almost three times the maximum amount found for the A. niger α-amylase and almost eight times the maximum amount found for the A. oryzae α-amylase .
In order to increase the alcoholytic activity present in AmyA, we constructed a structural homology model based on the structure of the α-amylase from A. oryzae. The low sequence identity between these enzymes precluded the use of the automatic modeler function in the Swiss Prot server (Sali et al., 1995;Sanchez & Sali, 1997). Therefore, the sequence alignment of the proteins had to be manually adjusted using as anchors the four highly conserved regions of the α-amylases, as shown in Fig. 3. Once a model was generated, the inhibitor molecule acarbose was placed in the active site using the coordinates of the A. oryzae αamylase (PDB code 7TAA). A close-up of the active site model (Fig. 6) supported our hypothesis of the relationship between the presence of an aromatic residue at the position equivalent to Y377 in neopullulanase and the transglycosylation activity of the enzyme and a saccharifying profile. We identified other residues in subsite +1 that are involved in the transglycosylation activity of other glycosyl hydrolases (Kim et al., 2000;Leemhuis et al., 2004;van der Veen et al., 2001). One of these (H222) is part of the second highly conserved region among glycosyl hydrolases and has also been implicated in calcium ion coordination. In the AmyA model, this residue points toward the sugar moiety at subsite +1. Mutagenesis of the equivalent residue in other amylases has been shown to change transferase activity. In the case of the B. stearothermophilus α-amylase, the replacement of the equivalent H238 with aspartic acid generated an enzyme with a reduced hydrolysis rate and a modified final product profile (Vihinen & Mantsala, 1990). We constructed a small library by site-directed mutagenesis to explore the effects of replacing H222 with D, Q and E. All mutants showed a greater amount of methyl-glucoside than did the wild-type enzyme, as a result of a change in the alcoholysis/hydrolysis ratio. Mutant H222Q showed an increase in the alcoholysis events as a consequence of an increase in alcoholysis and a reduction in hydrolytic activity of almost 30%. The same change was observed in mutants H222D and H222E. The instability of these mutants toward alcohols decreased the final yield of alkyl-glucoside, as shown in Fig. 7  The inhibitor acarbose (red) is surrounded by catalytic residues D218 and E258 (blue) and various mutated residues (green). The F277 residue, equivalent to Y377 of neopullulanase, is shown in orange. Fig. 7. Alcoholysis reaction yields of the wild-type α-amylase from Thermotoga maritima and some of the mutants generated Quantification of alcoholysis reactions generated by wild-type α-amylase from Thermotoga maritima and the H222 residue mutants. (A) Alcoholysis and hydrolysis events from 6% starch -20% methanol obtained with 20 U/ml of the enzymes shown; (B) alcoholysis/hydrolysis ratios and methyl-glucoside yields.
The comparison of liquefying and saccharifying α-amylases, neopollulanases, CGTases and maltogenic amylases through a multiple sequence alignment (Fig. 3) has also made possible the identification of other residues potentially involved in the transglycosylation activity (Fig. 6). In the CGTase from Bacillus circulans, residue F260 has been identified as part of a switch for the transglycosylation and hydrolysis reactions (van der Veen et al., 2001). Mutants formed by changing the equivalent residue in wild-type AmyA (F260) to W and G and the H222Q mutant showed opposite behaviors. In the presence of soluble starch as substrate, mutants H222Q and F260G leave higher amounts of high-molecular weight oligosaccharides, while the wild-type enzyme and mutant F260W show a higher proportion of glucose. These differences were seen as changes in the transglycolyslation/hydrolysis ratios. In the double mutant H222Q-F260W, the more transglycosidic pattern of H222Q was recessive, thus eliminating or reducing the presence of longer oligossacharides (Damián-Almazo et al., 2008).

Conclusions
Site-directed mutagenesis is a powerful tool for both the study of protein function and the design of novel proteins. Using several approaches to identify phylogenetically conserved residues or residues involved in binding, it has been possible to modify the properties of enzymes that have industrial and biotechnological applications. In order to increase the transglycosylation reactions carried out by α-amylases, we have applied site-directed mutagenesis to residues close to the active site. Based on multiple sequence alignments of natural transferases, like CGTases, we identified conserved residues involved in the transferase reactions of fungal and bacterial α-amylases. Changes to these residues in αamylases that originally were unable to perform the translycosylation reactions altered the product profiles and increased the translgycosylation/hydrolysis ratios. Furthermore, it was possible to increase the alcoholysis reactions in the α-amylase from Thermotoga maritima, which was already capable of carrying out this kind of reaction at a low level.