Consensus sequence of six classes of importin α-dependent NLS
The complete decoding of the human genome in 2003 raised expectations for targeted drug design and personalized medical care based on genetic information and also for gene therapy of disease. To reach these goals, it is necessary to develop new technologies capable of identifying drug-target genes from enormous volumes of genome information, validating the biological functions, and comprehensively analyzing protein functions and interactions that have hitherto been studied individually. We first developed an mRNA display, termed the
The IVV method, originally developed for evolutionary protein engineering based on
The antibiotic puromycin , which is an analogue of the 3’ end of TyrtRNATyr , acts in both prokaryotes and eukaryotes  as an inhibitor of peptidyl transferase . It has two modes of inhibitory action. The first is by acting as an acceptor substrate that attacks peptidyl-tRNA (donor substrate) in the P site to form a nascent peptide . The second is by competing with aminoacyl-tRNA for binding to the A’ site, which is the binding site of the 3’ end of aminoacyl-tRNA within the peptidyl transferase active site . It has been reported that the polypeptides released by puromycin are not full-length proteins . Similarly, it has been shown that growing peptide chains on ribosomes are transferred to the α-amino group of puromycin, which interrupts the normal reaction of peptide bond formation . Therefore, these conventional studies suggested that puromycin is a non-specific inhibitor of protein synthesis as a result of competition with aminoacyl-tRNA. However, since most of the studies on puromycin were performed at relatively high concentrations, the behavior of puromycin at lower concentrations was still an open question at that time. It had been reported that full-length protein that fails to be released from ribosomes at the final stage of protein synthesis, requires treatment with puromycin or RFs to be released . These results led us to hypothesize that puromycin at very low concentrations, which would not effectively compete with aminoacyl-tRNA , might act as a noninhibitor and bond specifically to full-length protein at the stop codon. Indeed, we confirmed that puromycin and its derivatives bond only to full-length protein at very low concentrations (such as 0.04 μM), where they are “non-inhibitors” of protein synthesis, by using 32P-labeled rCpPuro (2’-ribocytidylyl-(3'→5')- puromycin) in an
2. Application of the IVV method for analyzing various protein interactions
The IVV method is applicable for analyzing protein–protein [15,16,18,19] DNA(RNA)–protein [20-22], peptide–protein [23-27], drug–protein [28, 29] and antigen–antibody [30,31] interactions. As a result, we are able to explore protein complexes, transcription factors, RNA-binding proteins, bioactive peptides, drug-target proteins, and antibodies (antibodies will be discussed later) (Fig. 2). library and conjugation with a polyethylene glycol (PEG) spacer with puromycin, cell-free translation to form IVV and interaction of bait attached to beads and prey IVV library to form complexes, IVV selection with bait, and RT-PCR to amplify mRNA tags. Selection rounds are repeated until sufficient enrichment is obtained, followed by cloning and sequencing, and protein sequences are decoded.
2.1. DNA-protein interaction
The specific interactions between cis-regulatory DNA elements and transcription factors are critical components of transcriptional regulatory networks [32,33]. The whole genome and complete cDNA sequences contain large numbers of transcription factors and their binding DNA sequences, and thus comprehensive analysis of DNA-transcription factor interactions is expected to provide a deep understanding of the mechanisms of cell proliferation, developmental processes in tissue morphogenesis and disease. Currently, combined use of chromatin immunoprecipitation (ChIP) assay with DNA microarrays (ChIP-chip) has become the most widely used high-throughput method for discovering cis-regulatory DNA elements for a transcription factor. In contrast, development of high-throughput methods for discovering transcription factors for a cis-regulatory DNA element remains at an early stage. Although the yeast one-hybrid method  and phage display  are attractive candidates, these methods are not easily scalable because of the use of living cells. In addition, as over-expression of transcription factors often affects cellular metabolism, such transcription factors are difficult to screen. In order to circumvent these difficulties, we focused on a totally
Comprehensive analysis of DNA–protein interactions is important for mapping transcriptional regulatory networks at a genome-wide level. We employed the IVV method for
2.2. Peptide–protein interaction
Peptides are powerful tools for disrupting protein-protein interactions because the large interacting surfaces and the high specificity of these peptides lead to fewer adverse side effects when they are used as pharmaceutical agents . As previously reported, several peptides that inhibit the MDM2-p53 interaction have been identified from randomized peptide libraries using phage display . Hu et al. identified a 12-amino-acid (aa) peptide (LTFEHYWAQLTS), DI, that could inhibit not only the MDM2-p53 interaction, but also the MDMX-p53 interaction more effectively than Nutlin-3, a small molecular inhibitor of the MDM2-p53 interaction . An MDM2 homologue, MDMX is highly expressed in tumors, and it binds to and negatively regulates p53 . Furthermore, DI expressed with recombinant adenovirus as a thioredoxin-fused protein could activate the p53 pathway both
To overcome this problem, we performed
We identified an optimal peptide named MIP that inhibited the MDM2-p53 and MDMX-p53 interactions 29- and 13-fold more effectively than DI, respectively (Fig. 3) . Adenovirus-mediated expression of MIP fused to the thioredoxin scaffold protein in living cells caused stabilization of p53 through its interaction with MDM2, resulting in activation of the p53 pathway. Furthermore, expression of MIP also inhibited tumor cell proliferation in a p53-dependent manner more potently than did DI. These results show that two-stage, the peptide selection by IVV method [1-3,15] is useful for the rapid identification of potent peptides that target oncoproteins.
Bcl-XL, an antiapoptotic member of the Bcl-2 family, is a mitochondrial protein that inhibits activation of Bax and Bak, which commit the cell to apoptosis, and it therefore represents a potential target for drug discovery. Peptides have potential as therapeutic molecules because they can be designed to engage a larger portion of the target protein with higher specificity. We selected 16-mer peptides that interact with Bcl-XL from random and degenerate peptide libraries using the IVV method . The selected peptides have sequence similarity with the Bcl-2 family BH3 domains, and one of them has higher affinity (IC50 = 0.9 μM) than Bak BH3 (IC50 = 11.8 μM) for Bcl-XL
The importin α/β pathway mediates nuclear import of proteins containing the classical nuclear localization signals (NLSs). Although the consensus sequences of the classical NLSs have been defined, there are still many NLSs that do not match the consensus rule and many nonfunctional sequences that match the consensus. We identified six different NLS classes that specifically bind to distinct binding pockets of importin α. By screening of random peptide libraries using the IVV method, we selected peptides bound by importin α and identified six classes of NLSs, including three novel classes (Table 1). Two noncanonical classes (class 3 and class 4) specifically bound to the minor binding pocket of importinα, whereas the classical monopartite NLSs (class 1 and class 2) bound to the major binding pocket. Using a newly developed universal green fluorescent protein expression system, we found that these NLS classes, including plant-specific class 5 NLSs and bipartite NLSs, fundamentally require regions outside the core basic residues for their activity and have specific residues or patterns that confer distinct activities between yeast, plants, and mammals. Furthermore, amino acid replacement analyses revealed that the consensus basic patterns of the classical NLSs are not essential for activity, and more unconventional patterns, including redox-sensitive NLSs, were generated. These results explain the causes of NLS diversity. The defined consensus patterns and properties of importin α-dependent NLSs provide useful information for identifying NLSs [25-27].
2.3. Drug-protein interaction
Despite the introduction of newly developed drugs, such as lenalidomide and bortezomib, multiple myeloma is still difficult to treat and patients have a poor prognosis. In order to find novel drugs that are effective for multiple myeloma, we tested the antitumor activity of 29 phthalimide derivatives against several multiple myeloma cell lines. Among these derivatives, 2-(2,6-diisopropylphenyl)-5-amino-1H-isoindole-1,3-dione (TC11) was found to be a potent inhibitor of tumor cell proliferation and an inducer of apoptosis
We screened 46 novel anilinoquinazoline derivatives for activity to inhibit proliferation of a panel of human cancer cell lines. Among them, Q15 showed potent
3. Application of the IVV method for comprehensive interactome network analyses
Interactome networks are essential for complete systems-level descriptions of cells. Large-scale protein-protein interactions (PPIs) are integral in the analysis of topological and dynamic features of interactome networks. We introduce large-scale interactome network analyses using a combination of the IVV method and a biorobot for 50 human transcription factors (TFs).
Comprehensive analysis of PPIs is an important task in the field of proteomics, functional genomics and systems biology. PPIs are usually analyzed by means of biochemical methods such as pull-down assay and co-immunoprecipitation, yeast two-hybrid (Y2H) assay  and phage display . Recently, the combined use of mass spectrometry (MS) with an affinity tag  has made biochemical methods more comprehensive and reliable. However, the testable interaction conditions are restricted by the properties of the biological sources. The Y2H assay is one of the major tools used in the discovery and characterization of PPIs . However, the results of Y2H analyses often include many false positives due to auto-activating bait or prey fusion proteins  and interactions of proteins that are toxic to yeast cells cannot be examined. Phage display, the most widely used display technology , is an effective alternative, because the interactions between libraries and target proteins occur
Furthermore, in a single experiment using bait Fos, more than 10 interactors, including not only direct, but also indirect interactions, were enriched. Further, previously unidentified proteins containing novel leucine zipper (L-ZIP) motifs with minimal binding sites identified by sequence alignment as functional elements were detected as a result of using a randomly primed cDNA library. Thus, we consider that this simple IVV selection system based on cell-free cotranslation could be applicable to high-throughput and comprehensive analysis of PPI and complexes in large-scale settings involving parallel bait proteins.
Interactome networks are essential for complete systems-level descriptions of cells. Large-scale PPIs are integral in the analysis of topological and dynamic features of interactome networks . Several attempts to collect large-scale PPI data have been initiated using various model organisms [56-61] and were subsequently conducted in humans [62-64]. Traditionally, protein interaction data are collected using high-throughput
4. Ultrahigh enrichment of antibodies by the IVV method on a microfluidic chip
Rapid preparation of monoclonal antibodies with high affinity and specificity is required in diverse fields from fundamental molecular and cellular biology to drug discovery and diagnosis . In addition to classical hybridoma technology,
In vitroselection of antibodies from a naïve scFv library
The IVV selection of scFv was performed on a Biacore microfluidic chip. Since the diversity of the mouse scFv library prepared from mouse spleen poly A+ RNAs is estimated to be 106~108, while IVV method  allows screening of ~1012 molecules, we also introduced random point mutations into the scFv library. We chose p53 (human tumor suppressor protein) and MDM2 (human murine double minute) proteins as model antigens that were immobilized on the Biacore sensor chip. The selection experiment was performed on the microfluidic chip, and selected scFv genes were amplified by reverse transcription (RT)-PCR and identified by cloning and sequencing. Unexpectedly, the recovered anti-p53 and anti-MDM2 scFv sequences converged on a single sequence and two sequences, respectively, after only two rounds of selection. These clones showed high affinity, but also low antigen-specificity, in pull-down assays, and so we examined the clones obtained after a single round of selection in each case. When the binding activities of 29 (anti-p53) and 20 (anti-MDM2) clones with distinct sequences were examined by means of pull-down assays, P1-93 and M1-19 showed high specificity against the respective antigens among p53, MDM2 and BSA (Fig. 5A). The amino-acid sequences of P1-93 and M1-19 are shown in Fig. 5B. In competitive ELISA, both clones dose-dependently inhibited the ELISA signal (Fig. 5C), and Scatchard plots revealed that the KDs of P1-93 and M1-19 were 22 nM and 5.9 nM, respectively. The KDs of P1-93 and M1-19 were also determined by surface plasmon resonance (SPR) as 12 nM and 4.3 nM, respectively (Fig. 5D). The values obtained by the two different methods are similar.
In vitroevolution of scFv
Further, we performed
4.3. Ultrahigh efficiency of protein selection
Surprisingly, our results indicated that positive clone(s) were efficiently enriched through only one or two rounds of selection from a large library containing ~1012 molecules, implying ultrahigh efficiency of the method. To estimate the enrichment efficiency, we performed model experiments using a mixture of two kinds of scFv genes. The P1-93 (anti-p53) or M1-19 (anti-MDM2) gene was mixed with an anti-fluorescein scFv gene ('Flu' as a negative control) at a ratio of 1:102, 1:104, 1:106 or 1:108, and subjected to one round of the IVV selection on the sensor chip. The selection of the 1:106 mixture of P1-93:Flu genes and the 1:108 mixture of M1-19:Flu genes each resulted in a roughly 1:1 final gene ratio (Fig. 7A), indicating enrichment efficiencies of 106- and 108-fold per round, respectively. Furthermore, we confirmed that not only protein-protein (antigen-antibody) interactions, but also protein-DNA and protein-drug interactions were selected by our method with high enrichment efficiencies of >106-fold (Figs. 7B and 7C). Since the enrichment efficiencies of these model experiments with a usual agarose resin were only 10~103-fold per round, the enrichment efficiency was improved 103~105-fold over previous methods. Furthermore, we confirmed that the IVV system using a photocleavable linker between mRNA and protein is useful for
Although Biacore instruments have so far been utilized mainly to analyze biomolecular interactions by SPR, a few researchers have used this approach to fish for affinity targets from a randomized DNA library , phage-displayed protein libraries [79,80], or a ribosome-displayed antibody library . However, the enrichment efficiency in these applications was not high. Why, then, was ultrahigh efficiency achieved in the present protein selection by IVV method? The IVV is a relatively small object pendant from its encoding RNA moiety, which is about ten times larger. Thus, nonspecific adsorption of RNA on solid surfaces is potentially significant. The matrix of the Biacore sensor chip consists of carboxymethylated dextran covalently attached to a gold surface and poorly binds nucleic acid molecules, since both materials are negatively charged. In contrast, phage display and ribosome display involve large protein moieties (coat proteins or ribosome), so the use of the sensor chip may not improve the enrichment efficiency in these cases.
It should be noted that the ultrahigh enrichment efficiency made it difficult to set the number of selection rounds at a level that is appropriate to remove all non-binders as well as to pick all binders with various affinities from a library. If the number of selection rounds is too small, many negative sequences will be cloned; on the other hand, excess rounds of selection will yield only a single sequence with the highest affinity. In this study, we obtained 20-30 different sequences, including P1-93 and M1-19, with high antigen-specificity after a single round of selection, while we obtained only one or two negative sequences with high affinity but low antigen-specificity from the 106-108 library after two (probably excess) rounds of selection (>1012-fold).
In summary, we achieved ultrahigh efficiencies (106~108-fold per round) of protein selection by IVV method with the microfluidic system. We obtained scFvs with high affinity and specificity from a naïve library by IVV selection for the first time. It took only three days to perform each selection experiment, including activity evaluation by ELISA. Although preparation of target materials of high quality is required, we anticipate this simple method to be a starting point for a versatile system to facilitate high-throughput preparation of monoclonal antibodies for analysis of proteome expression and detection of biomarkers, high-throughput analysis of protein-protein, protein-DNA and protein-drug interactions in proteomic and therapeutic fields, and rapid evolution of novel artificial proteins from large randomized libraries that often require ten or more rounds of selection.
5. Highly sensitive, high-throughput cDNA tiling arrays for detecting protein interactions selected by the IVV method
The most serious bottleneck in the IVV method has been in the final decoding step to identify the selected protein sequences. This step is usually achieved by cloning in bacteria and DNA sequencing using Sanger sequencers, but the following difficulties arise: 1) Only a limited number of clones can be analyzed, and thus positive candidates whose contents in the selected library are less than a threshold determined by the number of analyzed clones are lost as false negatives. 2) Positive sequences with low contents in a library can be enriched by iterative rounds of affinity selection, but lower-affinity binders compete with higher-affinity binders and therefore drop out of the screening. 3) DNA fragments which are injurious to cloning hosts, e.g., cytotoxic sequences, may be lost. 4) Cloning and sequencing of a huge number of copies of selected sequences is redundant, cost-ineffective, and time-consuming.
A DNA microarray is an efficient substitute for the cloning and sequencing processes to overcome the above limitations (Fig. 8). The combined use of a tiling array  representing ORF sequences with the IVV method would provide a completely
First, we designed a custom oligo DNA microarray as follows: 1) Oligonucleotide probes of 50-mer in length were used. This is the preferred length for microarray probes, because shorter probes result in low sensitivity and longer probes produce non-specific signals . 2) There should be no gaps between the probes. A contiguous linear series of data is required to recognize a signal peak in the algorithm for tiling array analysis, as described below, so the probes must be densely arranged. 3) mRNA sequences were employed for the tiling array. Only coding regions are required for the purpose of protein-interaction analysis, so other genomic sequences, e.g., introns, control regions and non-coding RNAs, were not employed.
Second, we also improved the method for labeling of cDNA samples. Usually, double-stranded DNA samples for a tiling array analysis are labeled by using random primers . However, cDNA fragments selected from a randomly fragmented cDNA library  seem to be too short for efficient labeling by random priming. Indeed, in a test analysis with a tiling array using the random priming labeling method, we failed to detect any of the previously detected positive controls. Therefore we employed another labeling procedure , in which sense-strand-labeled
RNAs were produced by one-step
Third, we developed a detection algorithm for specific signal peaks from raw data. After iterative rounds of IVV screening, the resulting cDNA libraries in the presence and absence of bait protein, called bait (+) and bait (-) library, respectively, are labeled with fluorescence dyes using the above method, and hybridized separately. The ratios of the signal intensities from the experiments in the presence and absence of bait were calculated. Next, we searched for signal peaks in the data using the “windowed threshold detection” algorithm (Fig. 9). This algorithm looks for at least four data points that are above a threshold value within a window. These points were grouped together and presented as a peak. We used the following parameters in the algorithm: peak window size, 300 bp; percent of peak threshold, 20% of maximum data in each mRNA sequence. The value of each peak was the maximum value of the data points in that peak. Only reproducible peaks in the duplicated data were collected as candidates.
As an actual model study, we performed protein-protein interaction screening for mouse Jun protein , a transcription factor containing a bZIP domain, using the combined IVV and tilling array method. For this study, we constructed a novel custom microarray containing ~1,600 ORF sequences of known and predicted mouse transcription-regulatory factors (334,372 oligonucleotides) [16,87,88] to analyze cDNA fragments from IVV screening for Jun-interactors, and named it the Transcription-Factor Tiling (TFT) array. From the 5th-round DNA library of the IVV screening in the presence and absence of a bait Jun protein, we obtained labeled RNAs and hybridized them onto the TFT array .
Positive signal peaks were collected using the windowed threshold detection algorithm; the total number of peaks was 647 on 545 mRNA sequences (some of the mRNA sequences included multiple peaks) . An example is shown in Fig. 10. To distinguish between true positives and false positives, specific enrichment of the selected candidate was validated by real-time PCR. Among the top 10 percent of the peaks (64 regions), specific enrichment of 35 peaks was confirmed in the screening (white bars in Fig. 11A). The data indicate that the appropriate threshold for distinguishing between true positives and noise in the microarray signal is a signal ratio of 3~4. The 35 candidates identified in the present study include all of the 20 Jun-interactors identified in our previous studies using conventional cloning and sequencing [16,88]. Furthermore, the 35 candidates include eight well-known Jun-associated proteins, which is double the number in the previous study, in which four known Jun-interactors were obtained (white bars of Fig. 11B) [16,88,89]. In other words, 15 proteins including four known Jun-interactors were newly detected using the TFT arrays.
Finally, we used
Previous studies and our survey revealed that the cDNA library used in this screening contained 29 known Jun-interactors [89,90]. Of these proteins, four (14%) and eight (28%) were detected by conventional sequencing and by the TFT array method, respectively. Thus, the TFT array method provides a remarkable increase in the number of identified interactors and this confirms the value of our new methodology as a screening tool for protein interactions. While the coverage was increased considerably, the accuracy did not decrease. Specifically, the number of false positives did not increase: the rates of confirmation of proteins by
For quantitative analysis, the abundance ratios of 35 specifically selected candidates in the initial and screened cDNA libraries were determined by real-time PCR, and the enrichment rates (abundance ratio in the 5th round library per that in the initial library) were also calculated [88,89]. The abundance of the 15 newly found candidates (excluding four cases) was less than the theoretical threshold determined from the results of our previous study (an analysis of 451 clones). In order to detect the least abundant candidate (1.3 x 10-4% of the screened cDNA library) by cloning and sequencing, it would have been necessary to analyze at least 1.0 x 106 clones. These results indicate that our new method is more sensitive, higher-throughput and more cost-effective than the previous method.
From the standpoint of the detection sensitivity, the combined use of the IVV method with tiling arrays provides an extremely sensitive method for protein-interaction analysis, because even a very weakly expressed target could be detected in this study. In the cDNA library before IVV screening, the content of fragments of the selected region of the least abundant known Jun-binder was 1.2 x 10-7%. If one mRNA molecule existed per cell, the content of a fragment of the gene would be about 1.2 x 10-5 to 5.9 x 10-5% (we employed reported parameters for this calculation ). Thus, the content of the least abundant mRNA in the initial library corresponds to about one molecule per 20 to 100 cells. This suggests that this gene is expressed at a very low level in a cell type that is a minor component of the tested tissue. It is noteworthy that targets expressed at such low levels can be detected without the need for a cell purification procedure, e.g., collection of somatic stem cells by flow cytometry. The high sensitivity of our method may allow access to targets which would be hard to analyze with other existing tools, such as the TAP method .
In summary, we have applied tiling array technology, which has previously been used for ChIP-chip assays and transcriptome analyses, to protein-interaction analysis with the IVV method. Compared with previous results obtained with cloning and sequencing, the use of the tiling array greatly increased sensitivity. This method can detect targets expressed at extremely low levels. This highly sensitive and reliable method has the potential to be used widely, because the tiling array approach can easily be extended to a genome-wide scale, even though the search space is limited in tiled sequences.
We have developed an mRNA display technology, named the