Consensus sequence of six classes of importin α-dependent NLS
The complete decoding of the human genome in 2003 raised expectations for targeted drug design and personalized medical care based on genetic information and also for gene therapy of disease. To reach these goals, it is necessary to develop new technologies capable of identifying drug-target genes from enormous volumes of genome information, validating the biological functions, and comprehensively analyzing protein functions and interactions that have hitherto been studied individually. We first developed an mRNA display, termed the in vitro virus (IVV) method [1-3], and a C-terminal protein labeling method [4,5]that provided the required technology (Fig. 1).
The IVV method, originally developed for evolutionary protein engineering based on in vitro translation systems, was subsequently applied for the analysis of various protein interactions. In the IVV method, the genotype molecule (mRNA) is linked to the phenotype molecule (protein) through puromycin in a cell-free translation system . We have developed this method into a stable, efficient, and high-throughput technique that allows simple selection without any requirement for post-translational work , unlike previous systems. An additional technological advance is the elimination of the need to express and purify bait proteins for downstream protein‑protein interaction studies. Our totally in vitro cell‑free co translation system provides a simpler solution that is suitable for high‑throughput, genome‑wide analysis, as baits are synthesized within each reaction. Moreover, as cotranslation of bait and prey proteins is favorable for the formation of multi‑protein complexes, this approach offers a better chance to obtain a more comprehensive data set, including both direct and indirect interactions, in a single experiment. Here, we provide an overview of this system and discuss its advantages for analyses of various protein interactions including protein-protein, DNA(RNA)-protein, peptide-protein, drug-protein, and antigen-antibody interactions.
The antibiotic puromycin , which is an analogue of the 3’ end of TyrtRNATyr , acts in both prokaryotes and eukaryotes  as an inhibitor of peptidyl transferase . It has two modes of inhibitory action. The first is by acting as an acceptor substrate that attacks peptidyl-tRNA (donor substrate) in the P site to form a nascent peptide . The second is by competing with aminoacyl-tRNA for binding to the A’ site, which is the binding site of the 3’ end of aminoacyl-tRNA within the peptidyl transferase active site . It has been reported that the polypeptides released by puromycin are not full-length proteins . Similarly, it has been shown that growing peptide chains on ribosomes are transferred to the α-amino group of puromycin, which interrupts the normal reaction of peptide bond formation . Therefore, these conventional studies suggested that puromycin is a non-specific inhibitor of protein synthesis as a result of competition with aminoacyl-tRNA. However, since most of the studies on puromycin were performed at relatively high concentrations, the behavior of puromycin at lower concentrations was still an open question at that time. It had been reported that full-length protein that fails to be released from ribosomes at the final stage of protein synthesis, requires treatment with puromycin or RFs to be released . These results led us to hypothesize that puromycin at very low concentrations, which would not effectively compete with aminoacyl-tRNA , might act as a noninhibitor and bond specifically to full-length protein at the stop codon. Indeed, we confirmed that puromycin and its derivatives bond only to full-length protein at very low concentrations (such as 0.04 μM), where they are “non-inhibitors” of protein synthesis, by using 32P-labeled rCpPuro (2’-ribocytidylyl-(3'→5')- puromycin) in an E. coli S30 extract cell-free translation system . Our results provided the first evidence that specific bonding of puromycin to the full-length protein occurs at the stop codon during the process of termination of protein synthesis at very low concentration. In other words, puromycin at sufficiently low concentrations only has the opportunity to be bonded to proteins at a stop codon, where it does not need to compete with aminoacyl-tRNA. Since termination is a relatively slow step involving a translational pause in eukaryotes  and E. coli , it is possible that puromycin even at very low concentrations can bind to the A' site, and compete with RFs to release the full-length protein from ribosomes. Accordingly, under these conditions, puromycin can be incorporated specifically at the C-terminus of the full-length protein. This concept is the basis of so-called puromycin technology [2,3]. The combination of this puromycin technology with in vitro translation has yielded novel methods, such as IVV and C-terminal labeling techniques, for protein selection and screening [15,16], fluorescence labeling [2,5], and affinity purification (pull-down assay), as well as protein chips for proteomics [5,17], and superior methods for evolutionary protein engineering.
2. Application of the IVV method for analyzing various protein interactions
The IVV method is applicable for analyzing protein–protein [15,16,18,19] DNA(RNA)–protein [20-22], peptide–protein [23-27], drug–protein [28, 29] and antigen–antibody [30,31] interactions. As a result, we are able to explore protein complexes, transcription factors, RNA-binding proteins, bioactive peptides, drug-target proteins, and antibodies (antibodies will be discussed later) (Fig. 2). library and conjugation with a polyethylene glycol (PEG) spacer with puromycin, cell-free translation to form IVV and interaction of bait attached to beads and prey IVV library to form complexes, IVV selection with bait, and RT-PCR to amplify mRNA tags. Selection rounds are repeated until sufficient enrichment is obtained, followed by cloning and sequencing, and protein sequences are decoded.
2.1. DNA-protein interaction
The specific interactions between cis-regulatory DNA elements and transcription factors are critical components of transcriptional regulatory networks [32,33]. The whole genome and complete cDNA sequences contain large numbers of transcription factors and their binding DNA sequences, and thus comprehensive analysis of DNA-transcription factor interactions is expected to provide a deep understanding of the mechanisms of cell proliferation, developmental processes in tissue morphogenesis and disease. Currently, combined use of chromatin immunoprecipitation (ChIP) assay with DNA microarrays (ChIP-chip) has become the most widely used high-throughput method for discovering cis-regulatory DNA elements for a transcription factor. In contrast, development of high-throughput methods for discovering transcription factors for a cis-regulatory DNA element remains at an early stage. Although the yeast one-hybrid method  and phage display  are attractive candidates, these methods are not easily scalable because of the use of living cells. In addition, as over-expression of transcription factors often affects cellular metabolism, such transcription factors are difficult to screen. In order to circumvent these difficulties, we focused on a totally in vitro mRNA display technology such as IVV method [1-3,15,16] for the discovery of DNA–protein interactions.
Comprehensive analysis of DNA–protein interactions is important for mapping transcriptional regulatory networks at a genome-wide level. We employed the IVV method for in vitro selection of DNA-binding protein heterodimeric complexes . Under improved selection conditions using a TPA-responsive element (TRE) as a bait DNA, known interactors c-fos and c-jun were simultaneously enriched about 100-fold from a model library (a 1:1:20 000 mixture of c-fos, c-jun and gst genes) after one round of selection. Furthermore, almost all of the AP-1 family genes, including c-jun, c-fos, junD, junB, atf2 and b-atf, were successfully selected from an IVV library constructed from a mouse brain poly A+ RNA after six rounds of selection. These results indicate that the IVV selection system can identify a variety of DNA-binding protein complexes in a single experiment. Since almost all transcription factors form hetero-oligomeric complexes to bind with their target DNA, this method should be most useful to search for DNA-binding transcription factor complexes.
2.2. Peptide–protein interaction
Peptides are powerful tools for disrupting protein-protein interactions because the large interacting surfaces and the high specificity of these peptides lead to fewer adverse side effects when they are used as pharmaceutical agents . As previously reported, several peptides that inhibit the MDM2-p53 interaction have been identified from randomized peptide libraries using phage display . Hu et al. identified a 12-amino-acid (aa) peptide (LTFEHYWAQLTS), DI, that could inhibit not only the MDM2-p53 interaction, but also the MDMX-p53 interaction more effectively than Nutlin-3, a small molecular inhibitor of the MDM2-p53 interaction . An MDM2 homologue, MDMX is highly expressed in tumors, and it binds to and negatively regulates p53 . Furthermore, DI expressed with recombinant adenovirus as a thioredoxin-fused protein could activate the p53 pathway both in vitro and in vivo. However, DI was not sufficiently optimized because it was selected by phage display from a 12-mer random library (4.161,015 possible members) with a size of ∼108 that did not cover all of the possible sequences.
To overcome this problem, we performed in vitro selection of MDM2-binding peptides from random peptide libraries using the IVV method [1-3,15]. This system based on cell-free translation is a potent method for screening large peptide libraries (1013 unique members) and is able to cover all of the possible sequences in a 10-mer random library. We applied the IVV method to identify a highly optimized peptide that could disrupt the MDM2-p53 complex from a random library containing all of the possible sequences by dividing the selection process into two stages. We also verified that a selected peptide could inhibit the MDM2-p53 interaction in living cells and block tumor cell growth.
We identified an optimal peptide named MIP that inhibited the MDM2-p53 and MDMX-p53 interactions 29- and 13-fold more effectively than DI, respectively (Fig. 3) . Adenovirus-mediated expression of MIP fused to the thioredoxin scaffold protein in living cells caused stabilization of p53 through its interaction with MDM2, resulting in activation of the p53 pathway. Furthermore, expression of MIP also inhibited tumor cell proliferation in a p53-dependent manner more potently than did DI. These results show that two-stage, the peptide selection by IVV method [1-3,15] is useful for the rapid identification of potent peptides that target oncoproteins.
Bcl-XL, an antiapoptotic member of the Bcl-2 family, is a mitochondrial protein that inhibits activation of Bax and Bak, which commit the cell to apoptosis, and it therefore represents a potential target for drug discovery. Peptides have potential as therapeutic molecules because they can be designed to engage a larger portion of the target protein with higher specificity. We selected 16-mer peptides that interact with Bcl-XL from random and degenerate peptide libraries using the IVV method . The selected peptides have sequence similarity with the Bcl-2 family BH3 domains, and one of them has higher affinity (IC50 = 0.9 μM) than Bak BH3 (IC50 = 11.8 μM) for Bcl-XL in vitro. We also found that GFP fusions of the selected peptides specifically interact with Bcl-XL, localize in mitochondria, and induce cell death. Further, a chimeric molecule, in which the BH3 domain of Bak protein was replaced with a selected peptide, retained the ability to bind specifically to Bcl-XL. These results demonstrate that this selected peptide specifically antagonizes the function of Bcl-XL and overcomes the effects of Bcl-XL in intact cells. Thus, the IVV method is a powerful technique to identify peptide inhibitors with high affinity and specificity for disease-related proteins.
The importin α/β pathway mediates nuclear import of proteins containing the classical nuclear localization signals (NLSs). Although the consensus sequences of the classical NLSs have been defined, there are still many NLSs that do not match the consensus rule and many nonfunctional sequences that match the consensus. We identified six different NLS classes that specifically bind to distinct binding pockets of importin α. By screening of random peptide libraries using the IVV method, we selected peptides bound by importin α and identified six classes of NLSs, including three novel classes (Table 1). Two noncanonical classes (class 3 and class 4) specifically bound to the minor binding pocket of importinα, whereas the classical monopartite NLSs (class 1 and class 2) bound to the major binding pocket. Using a newly developed universal green fluorescent protein expression system, we found that these NLS classes, including plant-specific class 5 NLSs and bipartite NLSs, fundamentally require regions outside the core basic residues for their activity and have specific residues or patterns that confer distinct activities between yeast, plants, and mammals. Furthermore, amino acid replacement analyses revealed that the consensus basic patterns of the classical NLSs are not essential for activity, and more unconventional patterns, including redox-sensitive NLSs, were generated. These results explain the causes of NLS diversity. The defined consensus patterns and properties of importin α-dependent NLSs provide useful information for identifying NLSs [25-27].
2.3. Drug-protein interaction
Despite the introduction of newly developed drugs, such as lenalidomide and bortezomib, multiple myeloma is still difficult to treat and patients have a poor prognosis. In order to find novel drugs that are effective for multiple myeloma, we tested the antitumor activity of 29 phthalimide derivatives against several multiple myeloma cell lines. Among these derivatives, 2-(2,6-diisopropylphenyl)-5-amino-1H-isoindole-1,3-dione (TC11) was found to be a potent inhibitor of tumor cell proliferation and an inducer of apoptosis via activation of caspase-3, 8 and 9. This compound also showed in vivo activity against multiple myeloma cell line KMS34 tumor xenografts in ICR/SCID mice. To identify TC11-binding proteins, we used the IVV method [1-3,15]. We first prepared a cDNA library derived from KMS34 cells, because our data suggested that KMS34 cells were the most sensitive to TC11. As a bait, biotinylated TC11 was immobilized on a microfluidic chip and TC11-binding proteins were selected. Although the 4-amino group of TC11, which was experimentally inferred to be critical for the activity, was biotinylated via a linker, the biotinylation hardly affect the antitumor activity. Among 11 candidate TC11-binding proteins identified by the IVV method after 4 rounds of selection, we focused on the nucleolar phosphoprotein nucleophosmin (NPM). Sequencing revealed that three selected NPM clones, designated 1–183 NPM, encoded the 183 NH2-terminal amino acids of NPM, which include the oligomerization domain and a part of the histone-binding domain. The enrichment efficiency of the NPM clones was confirmed to be 104-fold after 4 rounds of selection by RT-PCR. NPM is a multifunctional protein involved in both tumorigenesis and tumor suppression ; for example, it regulates cell proliferation and centrosome dupulication  and stabilizes oncoprotein Myc  and tumor-suppressor protein p53 . Therefore, we hypothesized that NPM is involved in TC11-induced apoptosis of tumor cells. Immunofluorescence and NPM-knockdown studies in HeLa cells suggested that TC11 inhibits centrosomal clustering by inhibiting the centrosomal-regulatory function of NPM, thereby inducing multipolar mitotic cells, which undergo apoptosis. NPM may become a novel target for development of antitumor drugs active against multiple myeloma .
We screened 46 novel anilinoquinazoline derivatives for activity to inhibit proliferation of a panel of human cancer cell lines. Among them, Q15 showed potent in vitro growth-inhibitory activity towards cancer cell lines derived from colorectal cancer, lung cancer and multiple myeloma. It also showed antitumor activity towards multiple myeloma KMS34 tumor xenografts in lcr/scid mice in vivo. Unlike the known anilinoquinazoline derivative gefitinib, Q15 did not inhibit cytokine-mediated intracellular tyrosine phosphorylation. To elucidate the mechanism through which Q15 inhibits proliferation of tumor cells, we set out to identify Q15-binding proteins by means of the IVV method [1-3,15]. We prepared a cDNA library derived from total RNA of human colon carcinoma SW480 cells, because, like other tumor cells, SW480 cells were sensitive to Q15. Proteins that bind to biotinylated Q15 immobilized on beads were selected using the IVV method. From the library obtained after 5 rounds of selection, we analyzed the DNA sequences of 100 clones. Among them, we obtained six clones of a fragment of the Luzp5/NCAPG2 gene encoding hCAP-G2262-476 containing the HEAT (Huntingtin, elongation factor 3, a subunit of protein phosphatase 2A, TOR lipid kinase) repeat domain. Although three other clones were obtained redundantly, they were confirmed to be false-positive clones by means of binding assay (data not shown). hCAP-G2 is a subunit of condensin II complex [45,46], which is regarded as a key player in mitotic chromosome condensation . Immunofluorescence study indicated that Q15 compromises normal segregation of chromosomes, and therefore might induce apoptosis. Thus, our results indicate that hCAP-G2 is a novel therapeutic target for development of drugs active against currently intractable neoplasms .
3. Application of the IVV method for comprehensive interactome network analyses
Interactome networks are essential for complete systems-level descriptions of cells. Large-scale protein-protein interactions (PPIs) are integral in the analysis of topological and dynamic features of interactome networks. We introduce large-scale interactome network analyses using a combination of the IVV method and a biorobot for 50 human transcription factors (TFs).
Comprehensive analysis of PPIs is an important task in the field of proteomics, functional genomics and systems biology. PPIs are usually analyzed by means of biochemical methods such as pull-down assay and co-immunoprecipitation, yeast two-hybrid (Y2H) assay  and phage display . Recently, the combined use of mass spectrometry (MS) with an affinity tag  has made biochemical methods more comprehensive and reliable. However, the testable interaction conditions are restricted by the properties of the biological sources. The Y2H assay is one of the major tools used in the discovery and characterization of PPIs . However, the results of Y2H analyses often include many false positives due to auto-activating bait or prey fusion proteins  and interactions of proteins that are toxic to yeast cells cannot be examined. Phage display, the most widely used display technology , is an effective alternative, because the interactions between libraries and target proteins occur in vitro, allowing optimal conditions to be used for many different target proteins. However, the detectability of very low copy number proteins by phage display is still limited, because phage libraries are produced in living bacteria . Totally in vitro display technologies such as ribosome display , the IVV method [1-3,15] and DNA display  can circumvent the above difficulties, because they do not need living cells. As a model bait protein, we chose the basic leucine zipper (bZIP) domain of Jun protein, an important transcription factor, to screen Jun interactors from a mouse brain cDNA library. By performing iterative affinity selection and sequence analyses, we selected 16 novel Jun-associated protein candidates in addition to four known interactors. By means of real-time PCR and pull-down assay, 10 of the 16 newly discovered candidates were confirmed to be direct interactors with Jun in vitro. Furthermore, interaction of 6 of the 10 proteins with Jun was observed in cultured cells by means of co-immunoprecipitation and observation of subcellular localization. These results demonstrate that this in vitro display technology is effective for the discovery of novel protein–protein interactions and can contribute to the comprehensive mapping of protein–protein interactions.
Furthermore, in a single experiment using bait Fos, more than 10 interactors, including not only direct, but also indirect interactions, were enriched. Further, previously unidentified proteins containing novel leucine zipper (L-ZIP) motifs with minimal binding sites identified by sequence alignment as functional elements were detected as a result of using a randomly primed cDNA library. Thus, we consider that this simple IVV selection system based on cell-free cotranslation could be applicable to high-throughput and comprehensive analysis of PPI and complexes in large-scale settings involving parallel bait proteins.
Interactome networks are essential for complete systems-level descriptions of cells. Large-scale PPIs are integral in the analysis of topological and dynamic features of interactome networks . Several attempts to collect large-scale PPI data have been initiated using various model organisms [56-61] and were subsequently conducted in humans [62-64]. Traditionally, protein interaction data are collected using high-throughput in vivo expression tools based on the yeast two hybrid (Y2H; ) and tandem affinity purification-mass spectrometry (TAP-MS; ) methods. Experiments of this nature have provided large-scale PPI data, but they have only generated information on interacting partners, without considering binding domains in detail. In the field of systems biology, a further understanding of cellular networks will require more complete data sets describing the underlying physical interactions between cellular components. Thus, it is important to identify not only the binding partners, but also the interacting domains at the amino acid level . In fact, the idea of mapping the interacting regions (IRs) involved in a PPI has been previously suggested in connection with several large-scale screens . Our IVV method of analyzing PPIs [15,16] is well suited for large-scale, high-throughput mRNA display of the domain-based interactome using a randomly primed cDNA library, and we were able to achieve the first large-scale mapping of human IR data at the domain level for TF-related protein complexes. Functional domains were easily extracted based on the identified sequences using a randomly primed prey library as a non-biased representation . Bait mRNA templates were prepared in vitro , and large-scale IVV method was performed using a biorobot that can simultaneously execute up to 96 selections. Fifty human TF-related proteins were used as bait, and a human brain cDNA library was used as prey. A modified high-throughput version of IVV selection was employed . Integration of large-scale PPI data with other data sets, such as 3D structural information  and expression data , is necessary to identify the possible functions of interaction networks . Large-scale IR data sets are expected to reflect functional domains and to indicate the biological roles of the network without the need to integrate additional data. We confirmed the reliability and accuracy of our data by performing pull-down assays  and by examining the overlap between our results and known PPI domains using Pfam search . The core data set (966 IRs; 943 PPIs) displayed a verification rate of 70%. Analysis of the IR data set revealed the existence of IRs that interact with multiple partners (Fig. 4). Furthermore, these IRs were preferentially associated with intrinsic disorder. This finding supports the hypothesis that intrinsically disordered regions play a major role in the dynamics and diversity of TF networks through their ability to structurally adapt to and bind with multiple partners. Accordingly, this domain-based interaction resource represents an important step in refining protein interactions and networks at the domain level and in associating network analysis with biological structure and function.
4. Ultrahigh enrichment of antibodies by the IVV method on a microfluidic chip
Rapid preparation of monoclonal antibodies with high affinity and specificity is required in diverse fields from fundamental molecular and cellular biology to drug discovery and diagnosis . In addition to classical hybridoma technology, in vitro antibody-display technologies [30,53,72-75] are powerful approaches for isolating single chain Fv (scFv) antibodies from recombinant antibody libraries. However, these display techniques require several rounds of affinity selection (typically, the library size is 107~1012, while the enrichment efficiency is 10~103-fold per round). Recently, microfluidic systems have been developed for high-throughput protein analysis , since they offer the advantages of very low sample volumes, rapid analysis, and automated recovery of captured analytes for further characterization. However, there have been few attempts to combine microfluidic systems with in vitro antibody-display technologies so far. We showed that a microfluidic system can be combined with the IVV method [1-3,15] and employed for scFv selection from naïve and randomized scFv libraries with ultrahigh efficiency of 106- to 108-fold per round .
4.1. In vitro selection of antibodies from a naïve scFv library
The IVV selection of scFv was performed on a Biacore microfluidic chip. Since the diversity of the mouse scFv library prepared from mouse spleen poly A+ RNAs is estimated to be 106~108, while IVV method  allows screening of ~1012 molecules, we also introduced random point mutations into the scFv library. We chose p53 (human tumor suppressor protein) and MDM2 (human murine double minute) proteins as model antigens that were immobilized on the Biacore sensor chip. The selection experiment was performed on the microfluidic chip, and selected scFv genes were amplified by reverse transcription (RT)-PCR and identified by cloning and sequencing. Unexpectedly, the recovered anti-p53 and anti-MDM2 scFv sequences converged on a single sequence and two sequences, respectively, after only two rounds of selection. These clones showed high affinity, but also low antigen-specificity, in pull-down assays, and so we examined the clones obtained after a single round of selection in each case. When the binding activities of 29 (anti-p53) and 20 (anti-MDM2) clones with distinct sequences were examined by means of pull-down assays, P1-93 and M1-19 showed high specificity against the respective antigens among p53, MDM2 and BSA (Fig. 5A). The amino-acid sequences of P1-93 and M1-19 are shown in Fig. 5B. In competitive ELISA, both clones dose-dependently inhibited the ELISA signal (Fig. 5C), and Scatchard plots revealed that the KDs of P1-93 and M1-19 were 22 nM and 5.9 nM, respectively. The KDs of P1-93 and M1-19 were also determined by surface plasmon resonance (SPR) as 12 nM and 4.3 nM, respectively (Fig. 5D). The values obtained by the two different methods are similar.
4.2. In vitro evolution of scFv
Further, we performed in vitro evolution of scFv with higher affinity against MDM2 from a randomly mutated M1-19 scFv library. We applied on-rate or off-rate selection as a selection pressure for in vitro affinity maturation with the Biacore instrument: the on-rate selection was performed by controlling flow rate, and the off-rate selection was carried out by using a prolonged washing process on the sensor chip. After one round of selection, the recovered scFv genes were cloned and sequenced, and the KDs were evaluated by competitive ELISA (Figs. 6A and 6D ). We obtained four mutants with higher affinity for MDM2 (KD = 0.7-3.8 nM) than the progenitor M1-19 from 22 distinct clones. The strongest binder, M1-19a, was confirmed to have a higher on-rate and lower off-rate than M1-19 by SPR (Fig. 6B) and was also confirmed to recognize only the antigen protein MDM2 in crude cell lysates by Western blotting (Fig. 6C). These results indicated that the selected scFv had high enough affinity and specificity for practical use. Although the mutations of the selected scFvs were distributed among the whole sequences and no "consensus" mutations were identified, the mutation Y100bH within VH CDR3 may contribute to the improved affinity and specificity, because this region is usually important for binding with antigens.
4.3. Ultrahigh efficiency of protein selection
Surprisingly, our results indicated that positive clone(s) were efficiently enriched through only one or two rounds of selection from a large library containing ~1012 molecules, implying ultrahigh efficiency of the method. To estimate the enrichment efficiency, we performed model experiments using a mixture of two kinds of scFv genes. The P1-93 (anti-p53) or M1-19 (anti-MDM2) gene was mixed with an anti-fluorescein scFv gene ('Flu' as a negative control) at a ratio of 1:102, 1:104, 1:106 or 1:108, and subjected to one round of the IVV selection on the sensor chip. The selection of the 1:106 mixture of P1-93:Flu genes and the 1:108 mixture of M1-19:Flu genes each resulted in a roughly 1:1 final gene ratio (Fig. 7A), indicating enrichment efficiencies of 106- and 108-fold per round, respectively. Furthermore, we confirmed that not only protein-protein (antigen-antibody) interactions, but also protein-DNA and protein-drug interactions were selected by our method with high enrichment efficiencies of >106-fold (Figs. 7B and 7C). Since the enrichment efficiencies of these model experiments with a usual agarose resin were only 10~103-fold per round, the enrichment efficiency was improved 103~105-fold over previous methods. Furthermore, we confirmed that the IVV system using a photocleavable linker between mRNA and protein is useful for in vitro selection of epitope peptides, recombinant antibodies, and drug-receptor interactions .
Although Biacore instruments have so far been utilized mainly to analyze biomolecular interactions by SPR, a few researchers have used this approach to fish for affinity targets from a randomized DNA library , phage-displayed protein libraries [79,80], or a ribosome-displayed antibody library . However, the enrichment efficiency in these applications was not high. Why, then, was ultrahigh efficiency achieved in the present protein selection by IVV method? The IVV is a relatively small object pendant from its encoding RNA moiety, which is about ten times larger. Thus, nonspecific adsorption of RNA on solid surfaces is potentially significant. The matrix of the Biacore sensor chip consists of carboxymethylated dextran covalently attached to a gold surface and poorly binds nucleic acid molecules, since both materials are negatively charged. In contrast, phage display and ribosome display involve large protein moieties (coat proteins or ribosome), so the use of the sensor chip may not improve the enrichment efficiency in these cases.
It should be noted that the ultrahigh enrichment efficiency made it difficult to set the number of selection rounds at a level that is appropriate to remove all non-binders as well as to pick all binders with various affinities from a library. If the number of selection rounds is too small, many negative sequences will be cloned; on the other hand, excess rounds of selection will yield only a single sequence with the highest affinity. In this study, we obtained 20-30 different sequences, including P1-93 and M1-19, with high antigen-specificity after a single round of selection, while we obtained only one or two negative sequences with high affinity but low antigen-specificity from the 106-108 library after two (probably excess) rounds of selection (>1012-fold).
In summary, we achieved ultrahigh efficiencies (106~108-fold per round) of protein selection by IVV method with the microfluidic system. We obtained scFvs with high affinity and specificity from a naïve library by IVV selection for the first time. It took only three days to perform each selection experiment, including activity evaluation by ELISA. Although preparation of target materials of high quality is required, we anticipate this simple method to be a starting point for a versatile system to facilitate high-throughput preparation of monoclonal antibodies for analysis of proteome expression and detection of biomarkers, high-throughput analysis of protein-protein, protein-DNA and protein-drug interactions in proteomic and therapeutic fields, and rapid evolution of novel artificial proteins from large randomized libraries that often require ten or more rounds of selection.
5. Highly sensitive, high-throughput cDNA tiling arrays for detecting protein interactions selected by the IVV method
The most serious bottleneck in the IVV method has been in the final decoding step to identify the selected protein sequences. This step is usually achieved by cloning in bacteria and DNA sequencing using Sanger sequencers, but the following difficulties arise: 1) Only a limited number of clones can be analyzed, and thus positive candidates whose contents in the selected library are less than a threshold determined by the number of analyzed clones are lost as false negatives. 2) Positive sequences with low contents in a library can be enriched by iterative rounds of affinity selection, but lower-affinity binders compete with higher-affinity binders and therefore drop out of the screening. 3) DNA fragments which are injurious to cloning hosts, e.g., cytotoxic sequences, may be lost. 4) Cloning and sequencing of a huge number of copies of selected sequences is redundant, cost-ineffective, and time-consuming.
A DNA microarray is an efficient substitute for the cloning and sequencing processes to overcome the above limitations (Fig. 8). The combined use of a tiling array  representing ORF sequences with the IVV method would provide a completely in vitro platform for highly sensitive and parallel analysis of protein interactions. It should be possible to detect enrichment of cDNA fragments of selected candidates even with low contents or low affinity. However, for the analyses, the tiling arrays should be custom-designed specifically for the IVV screening, because the DNA fragments from the IVV method have unique characteristics, e.g., short length and functional region-concentrated distribution . In this chapter, we introduce a highly sensitive, high-throughput protein interaction analysis procedure combining the IVV method with tiling arrays.
First, we designed a custom oligo DNA microarray as follows: 1) Oligonucleotide probes of 50-mer in length were used. This is the preferred length for microarray probes, because shorter probes result in low sensitivity and longer probes produce non-specific signals . 2) There should be no gaps between the probes. A contiguous linear series of data is required to recognize a signal peak in the algorithm for tiling array analysis, as described below, so the probes must be densely arranged. 3) mRNA sequences were employed for the tiling array. Only coding regions are required for the purpose of protein-interaction analysis, so other genomic sequences, e.g., introns, control regions and non-coding RNAs, were not employed.
Second, we also improved the method for labeling of cDNA samples. Usually, double-stranded DNA samples for a tiling array analysis are labeled by using random primers . However, cDNA fragments selected from a randomly fragmented cDNA library  seem to be too short for efficient labeling by random priming. Indeed, in a test analysis with a tiling array using the random priming labeling method, we failed to detect any of the previously detected positive controls. Therefore we employed another labeling procedure , in which sense-strand-labeled
RNAs were produced by one-step in vitro transcription using a SP6 promoter attached to cDNA fragments from IVV screening.
Third, we developed a detection algorithm for specific signal peaks from raw data. After iterative rounds of IVV screening, the resulting cDNA libraries in the presence and absence of bait protein, called bait (+) and bait (-) library, respectively, are labeled with fluorescence dyes using the above method, and hybridized separately. The ratios of the signal intensities from the experiments in the presence and absence of bait were calculated. Next, we searched for signal peaks in the data using the “windowed threshold detection” algorithm (Fig. 9). This algorithm looks for at least four data points that are above a threshold value within a window. These points were grouped together and presented as a peak. We used the following parameters in the algorithm: peak window size, 300 bp; percent of peak threshold, 20% of maximum data in each mRNA sequence. The value of each peak was the maximum value of the data points in that peak. Only reproducible peaks in the duplicated data were collected as candidates.
As an actual model study, we performed protein-protein interaction screening for mouse Jun protein , a transcription factor containing a bZIP domain, using the combined IVV and tilling array method. For this study, we constructed a novel custom microarray containing ~1,600 ORF sequences of known and predicted mouse transcription-regulatory factors (334,372 oligonucleotides) [16,87,88] to analyze cDNA fragments from IVV screening for Jun-interactors, and named it the Transcription-Factor Tiling (TFT) array. From the 5th-round DNA library of the IVV screening in the presence and absence of a bait Jun protein, we obtained labeled RNAs and hybridized them onto the TFT array .
Positive signal peaks were collected using the windowed threshold detection algorithm; the total number of peaks was 647 on 545 mRNA sequences (some of the mRNA sequences included multiple peaks) . An example is shown in Fig. 10. To distinguish between true positives and false positives, specific enrichment of the selected candidate was validated by real-time PCR. Among the top 10 percent of the peaks (64 regions), specific enrichment of 35 peaks was confirmed in the screening (white bars in Fig. 11A). The data indicate that the appropriate threshold for distinguishing between true positives and noise in the microarray signal is a signal ratio of 3~4. The 35 candidates identified in the present study include all of the 20 Jun-interactors identified in our previous studies using conventional cloning and sequencing [16,88]. Furthermore, the 35 candidates include eight well-known Jun-associated proteins, which is double the number in the previous study, in which four known Jun-interactors were obtained (white bars of Fig. 11B) [16,88,89]. In other words, 15 proteins including four known Jun-interactors were newly detected using the TFT arrays.
Finally, we used in vitro pulldown assay and the surface plasmon resonance method to confirm the physical association of the 11 newly discovered candidates with Jun. As a result, ten of the 11 tested candidates exhibited specific interaction with the bZIP domain of Jun. Although most of the above tested interactions seem to be very weak, we considered that the interactions are true positives, because all of the candidates except for one contain leucine-heptad repeats in the selected regions, and such repeats are an important motif for heterodimerization with Jun .
Previous studies and our survey revealed that the cDNA library used in this screening contained 29 known Jun-interactors [89,90]. Of these proteins, four (14%) and eight (28%) were detected by conventional sequencing and by the TFT array method, respectively. Thus, the TFT array method provides a remarkable increase in the number of identified interactors and this confirms the value of our new methodology as a screening tool for protein interactions. While the coverage was increased considerably, the accuracy did not decrease. Specifically, the number of false positives did not increase: the rates of confirmation of proteins by in vitro pull-down assays in the previous and present studies were 75% and 74%, respectively . Undetected remaining interactors were considered to be false negatives. Mismatching of the selection conditions, e.g., salts, detergents, and pH, or the bait construct, e.g., length, region, and tags, might inhibit these interactions.
For quantitative analysis, the abundance ratios of 35 specifically selected candidates in the initial and screened cDNA libraries were determined by real-time PCR, and the enrichment rates (abundance ratio in the 5th round library per that in the initial library) were also calculated [88,89]. The abundance of the 15 newly found candidates (excluding four cases) was less than the theoretical threshold determined from the results of our previous study (an analysis of 451 clones). In order to detect the least abundant candidate (1.3 x 10-4% of the screened cDNA library) by cloning and sequencing, it would have been necessary to analyze at least 1.0 x 106 clones. These results indicate that our new method is more sensitive, higher-throughput and more cost-effective than the previous method.
From the standpoint of the detection sensitivity, the combined use of the IVV method with tiling arrays provides an extremely sensitive method for protein-interaction analysis, because even a very weakly expressed target could be detected in this study. In the cDNA library before IVV screening, the content of fragments of the selected region of the least abundant known Jun-binder was 1.2 x 10-7%. If one mRNA molecule existed per cell, the content of a fragment of the gene would be about 1.2 x 10-5 to 5.9 x 10-5% (we employed reported parameters for this calculation ). Thus, the content of the least abundant mRNA in the initial library corresponds to about one molecule per 20 to 100 cells. This suggests that this gene is expressed at a very low level in a cell type that is a minor component of the tested tissue. It is noteworthy that targets expressed at such low levels can be detected without the need for a cell purification procedure, e.g., collection of somatic stem cells by flow cytometry. The high sensitivity of our method may allow access to targets which would be hard to analyze with other existing tools, such as the TAP method .
In summary, we have applied tiling array technology, which has previously been used for ChIP-chip assays and transcriptome analyses, to protein-interaction analysis with the IVV method. Compared with previous results obtained with cloning and sequencing, the use of the tiling array greatly increased sensitivity. This method can detect targets expressed at extremely low levels. This highly sensitive and reliable method has the potential to be used widely, because the tiling array approach can easily be extended to a genome-wide scale, even though the search space is limited in tiled sequences.
We have developed an mRNA display technology, named the in vitro virus (IVV) method, as a stable and efficient tool for analyzing various protein functions. The IVV method is applicable for exploring protein complexes, transcription factors, RNA-binding proteins, bioactive peptides, drug-target proteins and antibodies, as well as in vitro protein evolution from random-sequence and block-shuffling libraries. We further developed a large-scale and high-throughput IVV screening system utilizing a biorobot, microfluidic tip, and tiling array. Here we reviewed applications of the IVV method for protein functional analyses.