Until the middle of 1970s, the roles of biochemical engineers had been to design effective processes for production of industrially important proteins and to control bioreactors under optimal conditions. Many microorganisms were isolated from soils, rivers, and seawater, and their genes were randomly disrupted by mutagenic agents such as 1-methyl-3-nitro-1-nitrosoguanidine to enhance the productivity, which was the best procedure of breeding for a long time. Gene manipulation technology born in the late 1870s was a groundbreaking invention, and the technology drastically changed the world of biotechnology. Polymerase chain reaction (PCR) method and genome analysis using high-performance DNA sequencers, which were born after gene manipulation technology, enhanced the drastic progress. Recently, a huge number of genes have been cloned from isolated microorganisms, and many excellent vector plasmids and promoters have been developed.
Recently, the interest of biochemical engineers shifted from breeding using gene manipulation technology to using the discovered novel biocatalysts and the “directed evolution” technology. The idea of directed evolution was proposed by Arnold et al. They proposed that artificial evolution, which is similar to the natural evolution that slowly occurs in the nature world, can be performed at rapid speed using biotechnology. Based on this concept, many studies have been conducted and have become an important target of biochemical engineers. In the future, such artificial proteins will introduce a next wave in the world of biotechnology . Based on this background, I introduce in this chapter the recent advancements in artificial enzymes. There are other important targets for biochemical engineers, and therefore, I recommend referring to other chapters of this book by other authors.
2. Construction of artificial proteins
The scheme of the screening process of an artificial protein or enzyme using directed evolution technology is shown in Figure 1. The screening process in a single cycle is composed of three steps: (1) construction of DNA (or RNA) variants, (2) construction of display library composed of both DNA (or RNA) variants and protein molecules, and (3) screening of target DNA variants. Directed evolution is completed by several repeats of these steps to obtain an optimal enzyme (or protein). Each step is discussed in the following sections.
2.1. Computational design of artificial protein
Structural design by computational simulation provides important information to determine the optimal protein. There are many types of folds and motifs, the elements that compose the tertiary structure. Because of these elements, two proteins can have similar tertiary structures when they have more than 30% homology in their amino acid sequences, and the structural similarity is enhanced in proportion to increasing homology, according to the DBAli database. Moreover, homology models such as consensus protein design , phylogeny-based design , and design combined with simulation by molecular dynamics  have been recently developed. Therefore, the tertiary structure of an unknown protein can be predicted using a known homologous protein.
In the twentieth century, precise prediction of the tertiary structure of proteins had been difficult due to insufficient operation speed of computers. However, the operation speed of computers has drastically improved in the last 10 years. Using a recently developed supercomputer such as Kei computer , complicated structural analysis of proteins can be completed in a considerably shorter time. In addition to the progress in computer technology, high-performance DNA sequencers have also entered the next generation. HiSeq2000 DNA sequencer, which equips many flow cells having a huge number of template DNA, can determine giga bases of DNA sequences at one run. Data on genomes of many organisms have been added to data libraries at high speed owing to breakthrough in the DNA sequencing technology. Data on protein structure also have been enriched. The Protein Data Bank (PDB) is a data bank which includes experimental data such as X-ray crystal structural analysis and NMR data, which are important for prediction of 3D structures of proteins . The data in PDB has rapidly increased in the past several years. Accordingly, computational analysis can now yield sufficiently correct 3D structures of proteins and is one of the most prominent tools to design artificial proteins.
2.2. Site mutagenesis and random mutagenesis
Mutation at the protein parts determined by computational structure simulation is generally performed by site mutagenesis or saturation mutagenesis method (site-direct mutagenesis at hotspots) . For instance, activity of α-amylase was enhanced 16.7-fold using site-directed mutagenesis , and thermostability of transglutaminase was enhanced twelvefold at 60°C by saturation mutagenesis . Truncation was also used for the mutation based on computationally designed results . Enzymes include unnecessary domains, and characteristics of proteins are often improved by truncation. For example, the enzymes truncated at the N-terminus and C-terminus selected using a randomly truncated library showed high thermostability  and activity . The most important advantage of site-direct mutagenesis is that the improved protein can be rapidly selected, because library size required for screening is much smaller than that in random mutagenesis.
Random mutagenesis is also a powerful tool to obtain novel enzymes , and several procedures for random mutagenesis such as error-prone PCR (ep-PCR), DNA shuffling, and staggered extension process (StEP) were proposed. In ep-PCR, random mutagenesis is introduced by PCR using error-prone polymerase under high concentration of Mn2+ and/or nonuninform dNTP concentration . For example, activity of the subtilis E variant obtained by the ep-PCR method was 256 times higher in 60% DMF solution . In DNA shuffling, many DNA variants containing mutations at different sites are digested with deoxyribonuclease, and PCR is performed using these fragments as template DNAs . StEP is an improved method of DNA shuffling . In this method, random mutation is introduced by repeated short DNA extension times in PCR with many DNA variants as template DNA. Random mutagenesis is a powerful tool to create unknown proteins.
2.3. Novel proteins based on mutagenesis analysis
Soils and seawater are abundant resources of microorganisms. According to the study on the sequences of DNAs of 16S rRNA of soil microorganisms , the ratio of microorganism which cannot grow under normal culture conditions to total microorganisms was 99%, suggesting that prominent biocatalysis may be present among microorganisms which cannot be cultured in media.
Metagenome is all genomes contained in the soil or water sample, and metagenomics which deals with metagenome is the other powerful tool to obtain proteins that we cannot even imagine [17, 18]. The procedure to obtain metagenome is as follows: DNAs of all microorganisms obtained from a target portion are purified without culture, digested to appropriated length with a restriction enzyme, and metagenomic library is constructed by combining the digested DNAs with adequate vectors. Metagenomic library of microorganisms living in special and harsh environment is especially useful because the enzymes have extremely prominent activity even though most of them are hard to culture under normal condition. Screening is recently performed to the environment such as hypersaline soda lake sediments  and hot environment [20, 21], and their metagenome data is accumulating day by day.
2.4. Display and screening to obtain important artificial proteins
2.4.1. In vitro display
Construction of display libraries and screening are the most important step in directed evolution technology . In vitro and in vivo displays were proposed [23, 24], and in vitro displays use a cell-free system. Cell-free systems contain components for protein synthesis such as ribosome, energy regeneration substrates, amino acids, and cofactors. Crude extract purified from wheat germ or insect cells [25, 26] and PURE system , a mixture of components separately purified from Escherichia coli cells, can be used as cell-free system. Protein can form normal tertiary structure by the addition of adequate chaperones.
In vitro display is a protein which is connected to a DNA or RNA variant coding the protein, and various types of this display were proposed . In RNA display, RNA variants combined with puromycin via linker molecules at 3′ terminal of mRNA are used. Puromycin stops the protein synthesis by combining to C-terminus of protein; the protein combined with the mRNA is produced. In ribosome display, RNA variants of which stop codon is removed are used. The ribosome cannot be demolished because of loss of the stop codon; the complex composed of mRNA, protein, and ribosome is produced. In DNA display, the biotin-labeled DNA variant which codes streptavidin gene at the terminal of the gene is used, and protein is synthesized in W/O emulsion. The protein tagged by streptavidin combines with biotin-labeled DNA. Liposome display, a single vesicle liposome containing the cell-free synthesis system, is used. Advantage of liposome display is that membrane protein can be displayed at surface of lipid bilayer.
Target proteins must be screened from displays. In case of screening based on coupling constant, selection of protein is performed based on strength of binding with a target molecule immobilized on a plate or micro-beads. In the case of screening based on enzyme activity, the SIMPLEX method using microplate  can be used for the selection of DNA, RNA, and ribosome displays. Display using emulsion  and liposome display  is more useful for high-throughput screening because FACS or confocal fluorescence coincidence analysis (CFCA)  can be used. The advantage of in vitro display using a cell-free system is that construction of library and automation at microscale are easy.
2.4.2. In vivo displays
In vivo system using E. coli cells is also used for screening of proteins. The most general procedure is construction of a gene library of E. coli transformants: Plasmids containing DNA variants are transformed into E. coli cells. Phage display and display using virus-like molecules were also proposed. Phage display, which was developed to display V-region of antibody, can display the target protein at the surface of the phage by the fusion with coat protein of bacteriophage. Relatively large proteins can be displayed by the fusion with N-terminus of g3p protein of M13 phage or C-terminus of g10 protein of T7 phage . Recently, several groups developed display using nucleocapsids as artificial virus [33, 34]. Non-viral cage of lumazine synthase (AaLS) was spontaneously formed, and mRNA could be contained in the capsule.
Selection method of target protein in in vivo displays is as follows: In case of the selection based on bonding strength, protein can be selected with phage display . Biopanning is generally used for the selection from phage display: Phage displays connecting with immobilized molecules are collected and are transformed into E. coli cells. In the case of the selection based on enzyme activity, agar plate containing screening medium, which is the general method for screening of microorganisms, is used in the selection of E. coli transformant library. Besides this method, a method such as IAN-PCR method, which is based on amino acid sequence, is proposed. The procedure using E. coli library is not adapted to select from a huge size of library because the screening speed is very low, but this screening method is adapted to select from small-scale library such as site-mutated library combined with computer simulation or metagenome library.
3. Targets of directed evolution technology
Directed evolution is a trend in twenty-first century, and interesting study subjects have been proposed . One of the interesting study subjects is, of cause, development of super-proteins which are not present in the nature world and can catalyze novel reaction, and the study on Kemp elimination reaction is its pioneer. Natural enzyme catalyzing the Kemp elimination reaction had not been discovered. Rӧthlisberger et al. produced the eight novel enzymes catalyzing Kemp elimination reaction by using computational design and site mutagenesis . Similarly, Hilvert et al. conducted the site-directed mutagenesis and screened by using droplet-based microfluidic screening platform. The obtained artificial aldolases showed high activity although the original protein showed few activities . Tryptophan synthetase and cytochrome P450 variants obtained by directed evolution could catalyze novel reactions . In addition to directed evolution, metagenomics is a powerful tool to discover such unknown enzymes [39, 40]. For instance, more than unknown 300 nitrilases were discovered from the metagenome library, although nitrilase discovered during many years of study was only less than 20 .
Directed evolution technology can be also applied to improvement of metabolic pathways . For example, 1-propanol is expected as a biofuel, but adequate producing strains have not been discovered. Thus, Atsumi et al. produced E. coli recombinant by expressing a series of genes for 1-propanol production, and improvement of citramalate synthase gene was conducted by using error-prone PCR and DNA shuffling. The productivities of 1-propanol and 1-butanol in the improved strain were enhanced . Arnold et al. improved alcohol dehydrogenase and ketol-acid reducto-isomerase to enhance the productivity of 1-propanol by directed evolution. The obtained enzymes could use NADH instead of NADPH as a coenzyme . Otherwise, directed evolution technology is applied to regulatory genes for improvement of metabolic pathway. Promoter , operon connection , and enhancer sequences  were improved by the method.
The other interesting study subject is to understand the evolution of life. One approach to the subject is to know which genome length is required for a microorganism. Gibson et al. constructed artificial genome DNA which only includes chemically synthesized and introduced the genome into microorganisms in which the genome was previously removed . Another research group tried to construct minimal bacterial genome; the constructed genome JCVI-syn3.0 was only 531 kb (473 genes) , and the obtained microorganism could grow and showed several characteristics of the microorganism.
The other approach to understand evolution of life is to generate artificial cells using a cell-free system. Szostak et al. randomly synthesized RNAs (1015 of 90 mer) and created a ribozyme having sufficient RNA ligase activity using error-prone PCR with only 10 rounds of repeats . Noireaux et al. constructed cell-sized synthetic vesicle (artificial cells) containing components for translation and transduction . Yomo et al. also produced novel artificial cells, which can progress artificial evolution of RNAs by themselves . Induction of unnatural compounds into cells is another approach. Unnatural basic pairs and more than 100 unnatural amino acids were synthesized, and they were site-specifically introduced into proteins . Accordingly, evolutional RNA engineering may impart validity to the hypothesis of RNA word in the future.
Enzymes are useful for industrial chemical reactions, and the role of biochemical engineers was to study the industrial use of enzymes. However, most of the enzymes that occur in nature do not have sufficient activity for industrial use, and the bioreactors were not as successful as expected. Directed evolution technology is proposed to overcome these problems. Artificial enzymes have been drastically produced by exploiting the direct evolution technology. The technology may be advanced by using editing tools such as TALEN and CRISPR/Cas9. Artificial microorganisms whose genomes are directly improved using CRISPR/Cas9 may be more suitable than recombinant E. coli for use as bioreactors. As such, biochemical engineers will play more important roles in the development of bioreactors in the future.