Prediction and Experimental Detection of Structural and Functional Motifs in Intrinsically Unfolded Proteins

This


Introduction
Intrinsically unstructured proteins (IUPs) or proteins with intrinsically unstructured regions (IURs) have quickly gained increasing interest within the biological community because of their significant presence in the human genome and their potential links to major pathologies such as cancer, neurodegeneration and diabetes (Tompa, 2005;Tompa & Fuxreiter, 2008).The terms IUPs and IURs designate proteins or protein regions intrinsically devoid of a well defined tertiary structure.The concept was introduced a few years ago in the scientific literature as a brand new idea, which would represent a family of proteins thought to have been previously ignored or unappreciated (Dunker et al., 2001;Wright & Dyson, 1999).However, as in many other examples in Science, the concept of IUPs is far from being new.In the 70s, it was universally accepted that what were known as 'biologically active peptides' had no intrinsic structure, often being too short to have a proper hydrophobic core.Peptides would/could however fold in a definite conformation upon interaction with a partner/receptor, thus having all the features of modern IUPs (Boesch et al., 1978).Hormones and opioid peptides are two among several of the best studied examples.The concept of intrinsic disorder and/or flexibility has now been extended to proteins and has deeply transformed our perception of the importance of protein dynamics as opposed to the static picture introduced by years of crystallographic studies.Even more important is the fact that accepting the existence of IUPs proposes a unique paradigm in which function can be directly linked to structural disorder rather than to a defined structure.IUPs have been classified in two broad categories.In the first family, IUP's function is achieved through binding to one or several partner molecule(s) in a structurally adaptive process, which enables an exceptional plasticity in cellular responses.These proteins do not form a structure by themselves and are functionally inactive in the absence of a partner, but structure can be induced upon recognition of another molecule.When bound to a substrate, they are able to acquire a structure and become rigid, according to an induced-fit mechanism or to what has been recently generalized in the concept of 'conformational fuzziness' (Wright & Dyson, 2009).Macromolecular association rates have in fact been demonstrated to be highly enhanced by a relatively non-specific association enabled by flexible recognition segments.Molecular recognition occurring in this way has been described according to the 'fly-casting' (Shoemaker et al., 2000) or 'protein fishing' (Evans & Owen, 2002) mechanisms.Examples of proteins belonging to this family are those bearing RNA binding motifs which acquire a structure only upon interaction to RNA.In the second family, IUPs work as entropic chains exploiting their ability of fluctuating over ensembles of structural states with similar conformational energies.In this way they either generate force against structural changes or influence the orientation/localization of attached domains (Dunker et al., 2002).According to these properties, they are active in their unstructured form and play the role of flexible linkers necessary to allow other portions of the protein to move like 'a dog on a leash' and ultimately to interact with other partners.A classical example of such a case is the IUR, called PEVK, of the muscle protein titin (Labeit & Kolmerer, 1995).This region confers some of the passive elastic properties of the titin filament, providing the stiffness required in muscle contraction to keep the sarcomere in register (Greaser et al., 2000).These unique features are exploited in many biological processes thus explaining the multiplicity of different functions in which IUPs are involved (Dunker et al., 2002).Protein disorder prevails for instance in signaling, regulatory and cancer-associated proteins.The functional importance of protein disorder is also underlined by its dominant presence in proteins associated with signal transduction, cell-cycle regulation, gene expression and chaperone activities (Dunker et al., 2002;Iakoucheva et al., 2002;Tompa & Csermely, 2004;Uversky, 2002;Ward et al., 2004b).Because of their susceptibility to degradation, IUPs have also been linked to the ubiquitin (Ub)/proteasome pathway (Csizmok et al., 2007).In this chapter, we discuss the problems related to the prediction, production and characterization of IUPs/IURs.We take as a representative case study ataxin-1, a human protein of biological and clinical interest that is related to the neurodegenerative disease spinocerebellar ataxia of type-1.Using ataxin-1, we retrace how the application of different bioinformatics tools has contributed to shed light on the structure and functions of the protein since its first identification.We also provide an update on the physico-chemical methods used to translate the sequence information into structural and functional models of the protein and its interactome.This hands-on example might provide a valuable paradigm of how correct identification of linear motifs and IURs can provide the key for understanding protein function.

Prediction methods of IUPs
While it has become increasingly easy to appreciate the importance of IUPs/IURs, their prediction and experimental characterization remain somewhat problematic.Since Romero et al. (Romero et al., 1997) indicated for the first time that lack of a defined protein tertiary structure is predictable on the sole basis of the primary sequence, several different methods have been developed that enable prediction (reviewed in Radivojac P et al. (Radivojac et al., 2007) (Table 1).They are based on different definitions of IURs and detect different indicators such as hydrophobicity, sequence composition, secondary structure content, etc.These programs, however, are not always entirely reliable.Most weaknesses arise from the intrinsic difference between the conceptual and operational definition of IUP/IUR.As mentioned above, conceptually, there are two classes of IUP/IUR.The first does not form a structure by itself, but this can be induced by a partner.Proteins belonging to the second class can perform their function in three ways: (1) through the newly acquired structure, (2) by inducing structural changes in their partners and modulating the function of these  (Ferron et al., 2006).
The operational definition used by bioinformatic software is typically based on the likelihood of the protein/peptide forming a structure under certain (often poorly defined) cellular environments, with little or no information on how the protein/peptide functions or what partner the IUP/IPR may have.Intrinsically unstructured proteins are for instance characterized by a low content of bulky hydrophobic amino acids and a high percentage of polar and charged amino acids.As a consequence, they do not contain enough residues to build a hydrophobic core that is typical of stably folded globular proteins.Another symptomatic indication of an IUP is the presence of low complexity motifs, i.e. sequences with over-representation of just a few residues (e.g.polyglutamine stretches, arginineglycine (RG) repeat observed in RNA binding proteins or arginine-serine (RS) repeats observed in splicing factors).Of course, while low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences.
We can then expect that as these predictions become more robust the more we shall be able to distinguish between the different IUPs/IURs and have functional clues.For the time being, this software should be handled with care.For instance it is quite important to use more than one approach on the same sequence and compare the resulting scores.

Detection of linear functional motifs
It is difficult to underestimate the importance of the concept of structural motifs or modules in the world of globular proteins (Konagurthu & Lesk, 2010;Schaeffer & Daggett, 2010).A similar role is taken now by linear motifs in the world of IUPs.Eukaryotic linear motifs (ELMs) are short stretches of eukaryotic protein sequences (typically 3-10 amino acids long) to which a molecular function is associated.These segments provide regulatory functions independently of protein tertiary structure.While also potentially present in stably folded proteins, they acquire a particular importance in IUPs because in these they cannot be shielded by structure.It is in fact found experimentally that short functional sites, which are frequently involved in regulatory processes, must reside in non-structured or non-globular regions or, when within globular domains, in flexible highly exposed loops (Gibson, 2009).
Examples of linear motifs are phosphorylation sites, nuclear localization signals or signalling sequences.A useful tool for the prediction of linear motifs is the ELM database (http://elm.eu.org/) developed from a collaborative effort between EMBL and University of Rome (Diella et al., 2008;Puntervoll et al., 2003).A sequence of interest can be screened against this database to quickly suggest the position of ELMs.

When can we be sure that a protein is a IUP?
Many of us have been confronted with natural or recombinant proteins that, after purification, result in being unfolded even when they are expressed in a soluble form.Are these bona fide IUPs?It is some times difficult to distinguish between IUPs and orphan complex proteins, yet there is a profound difference between the two families.IUPs have no intrinsic capability to adopt a definite structure at least in the absence of a partner.Their conformational state is also independent of the way they have been purified.On the contrary, Orphan complex proteins are able to fold but their tertiary fold is only marginally stable even though not being intrinsically devoid of a 3D structure (Sjekloća et al. 2011).This is often due to the absence of a partner which stabilizes their fold.These proteins might be produced as unfolded proteins, as folded/unfolded mixed populations or as species highly prone to aggregation.Their misbehaviour can in principle be reduced or neutralized by finding more suitable conditions or less drastic purification protocols (e.g.concentration is often a problematic step).The only guidance to distinguish among the two families may be sequence analysis coupled to comparison with other homologous.If the motif is observed in other proteins where it adopts a stable fold, it is difficult to believe that it is intrinsically unfolded.In such a case, we suggest putting more care in protein production and attempting to identify the partners able to provide stabilization.A way to solve the ambiguity when the protein of interest is purified as an unfolded monomer can be to register an NMR spectrum directly on the cell lysate, if proper overexpression can be achieved.This circumvents the purification step and 'shows' the structural properties of the protein independently of human intervention.

Experimental methods to study IUPs
How can we study the structure of intrinsically unstructured molecules?The statement sounds like an oxymoron in which the two terms contradict each other.The problem closely resembles the difficulties encountered in the study of denatured states of ordered proteins.It goes without saying that IUPs cannot be crystallized in isolation because of their flexibility.Solution studies are therefore more appropriate to characterize their structural states.The ultimate goal of these studies cannot, however, be that of obtaining the structure but rather to describe the protein as an ensemble of rapidly interconverting alternative structures characterized by differing backbone torsion angles.
Among the biophysical methods able to provide structural and dynamic information, two techniques are probably the ones that have given more interesting results over the last few years: Nuclear magnetic resonance (NMR) and Small Angle X-ray Scattering (SAXS).Despite the disadvantages of the very poor chemical shift dispersion (particularly for proton and aliphatic carbon resonances) because of conformational averaging, NMR is particularly powerful thanks to its ability of measuring different independent observables.These include secondary chemical shifts, residual dipolar couplings (RDC), hydrogen bonds, torsional angles and long-range NOE upon spin labelling (for an exhaustive review see Mittag & Forman-Kay, 2007).Among them, detection of residual secondary chemical shifts is probably the simplest qualitative way to detect complete or partial local disorder of a chain.Most of the other NMR observables can be exploited in a more quantitative way: since the early implementation of the ensemble-averaged nOe distance restraints (Bonvin & Brunger, 1996), a variety of restraining algorithms, including simultaneous time and ensemble averaging (Fennen et al., 1995), have been developed and have been used to describe native, transition, intermediate, and unfolded states (Clore & Schwieters, 2004a, 2004b;Kuszewski et al., 1996;Vendruscolo & Paci, 2003;Vendruscolo & Dobson, 2005).A modern approach that takes into account protein flexibility is that of imposing penalizing forces if the calculated average distances at a given time across an ensemble of simulated molecules (the 'replica ensemble') do not match the experimental ones.It is interesting to note that RDCs provide a particularly powerful way to assess protein structures using an absolute reference system despite the original scepticism towards applying these measurements for the treatment of IUPs and IURs which pose fundamental problems in the way structural averaging should be used.To address the problem, new methods based on RDC measurements have been developed to provide detailed information on protein dynamics also in cases of conformational averaging, for instance based on analytical deconvolution, Gaussian axial fluctuation methods and restrained molecular dynamics simulations.Most of these methods have, however, been used to assess the dynamics of relatively small amplitude methods.Their application to flexible and conformationally interconverting molecules such as IUPs and IURs remains to be fully established.SAXS, by which it is possible to achieve the measurement of molecular dimensions and a description of the overall shape of the ensemble, is, in many ways, complementary to NMR.
A key advancement to quantitatively characterizing flexible proteins in solution by SAXS was achieved with the implementation of the approach known as ensemble optimization method (EOM) (Bernado et al., 2007).In this approach, flexibility is taken into account by postulating the coexistence of different conformational states for the protein contributing to the experimental scattering pattern.The different conformers can then be selected by genetic algorithms from a pool containing a large number of randomly generated models covering the protein configurational space.The EOM selected models are then analysed by quantitative statistical criteria also developed to determine the optimal number of conformers necessary to represent the ensemble.When possible, the quality of the analysis is increased by simultaneous fitting of multiple scattering patterns from deletion mutants, a procedure which is somewhat equivalent to improving the quality of sequence alignment introducing information about sequence homologues.The EOM protocol has now been validated by applying it to the study of several examples of completely or partially unfolded proteins and on multidomain proteins interconnected by flexible linkers and has shown to be a robust and helpful approach to the study of IUPs.

Ataxin-1 as a case study
To discuss the problems related to the prediction and characterization of IURs we chose as a representative case study ataxin-1, a human protein of biological and clinical interest that is related to the neurodegenerative disease spinocerebellar ataxia of type-1 (SCA1).This hereditary pathology is dominant and a member of a small family of neurodegenerative diseases linked to protein aggregation and misfolding, all caused by anomalous expansion of polyglutamine (polyQ) tracts in the gene coding region (Orr & Zoghbi, 2007) (Figure 1).Other members of the polyQ pathology family are Huntington's chorea, Machado-Joseph disease and other spinocerebellar ataxias.Elongation of polyQ tracts is the result of a gene polymorphism where unstable consecutive CAG triplets may become expanded during DNA replication.The mutated proteins form intracellular aggregates which are thought to be toxic for the cell and lead to cell death according to a gain rather than loss of function mechanism.While generally accepted that polyQ expansion is the leading factor in triggering pathology, it has become increasingly clear that protein context is an important element that modulates protein behaviour and pathology.Despite best efforts, all the polyQ pathologies are currently incurable.A way to develop a specific treatment is through the identification of the function(s) of the native proteins.In this endeavour, the support of bioinformatics analysis of the protein sequence becomes essential to predict structure and a great help in suggesting a function for these proteins.The sequences of the polyQ disease families are very different in length and position of the polyQ tract.They are also not homologous and share only a rather loose common feature: they all seem to contain large IURs which are sometimes interrupted by readily identifiable globular domains.As such, they constitute an excellent example to discuss the problems inherent in detecting and studying IUPs/IURs.
In the following sections we shall discuss a detailed analysis of the ataxin-1 primary sequence and how this study has suggested working hypotheses which could then be tested experimentally using structural and cellular approaches.We discuss the identification of two major potential IURs in ataxin-1 which contain short linear motifs important for phosphorylation, aggregation and protein-protein interactions that are directly related to pathogenesis (Chen et al., 2003;de Chiara et al., 2009;Emamian et al., 2003;Jorgensen et al., 2009;Klement et al., 1998).We pay particular attention to the prediction and characterization of the structural and functional features of the different protein regions and to the identification of key ELMs of crucial importance both for the normal and anomalous behaviour of the protein.
Prediction and Experimental Detection of Structural and Functional Motifs in Intrinsically Unfolded Proteins 89

The polyQ tract
The N-terminus of the human protein is characterized by the presence of a highly polymorphic almost uninterrupted polyQ stretch which ranges from 4 to ~39 Qs in normal population and is expanded to ~40-83 Qs in SCA1-affected individuals (Zoghbi & Orr, 2008).Pathology typically develops when the repeat length exceeds a threshold of 35-45 glutamines (Genis et al., 1995;Jayaraman et al., 2009;Orr et al., 1993).Expansion of this region is a feature not shared among other species suggesting an evolutionary gain associated only to humans.The polyQ tract of ataxin-1 starts at residue 197.Indeed, the structure of polyQ stretches in solution has been shown experimentally both by CD and NMR spectroscopy to be a random coil when in a non-aggregated form (Masino et al., 2002).This is at variance with predictions by SMART which propose a helical coiled-coil region for the same region (amino acids 193-230).The discrepancy should anyway be ascribed by a bias in SMART for poly-amino acids.Interestingly, we now know that expansion of the polyQ tract in ataxin-1 is a condition necessary but not sufficient for triggering disease: two other motifs, a nuclear localization signal (NLS) and phosphorylation of S776, both located at the C-terminus and discussed in a session below, have been proved to also be required.
As for the other polyQ proteins the native function of the polyQ tracts is unknown, although their presence has mostly been detected in proteins associated with transcriptional regulation activity.Indeed, the transcriptional regulator poly-Q binding protein-1 (PQBP1) has been found to bind ataxin-1 in a polyQ length-dependent manner, suggesting that PQBP1 and mutant ataxin-1 may act cooperatively to repress transcription and induce cell death (Okazawa et al., 2002).Direct evidence to support this hypothesis is now necessary.

Prediction of significant ELMs in the N-terminal IUR
Although ataxin-1 has been reported as a protein shuttling in and out the nucleus and the cytoplasm, there is a large body of evidence showing that the protein is predominantly located in the nucleus.Restricting the search for candidate short linear motifs and posttranslational modifications within the N-terminal IUR to those significant for the nuclear localization, several ELMs have been predicted by using the ELM resource (Diella et al., 2008;Puntervoll et al., 2003) (Table 2).
A plethora of potential phosphorylation sites are predicted by the ELM resource in the Nterminal IUR of ataxin-1, among which only Ser239 (Vierra-Green et al., 2005) and Ser254 have been experimentally verified (Dephoure et al., 2008) (Diella et al., 2008;Puntervoll et al., 2003).Among all the predicted phosphorylation sites only the ones which have been

91
confirmed in vivo and are supportive of the prediction of other phosphopeptide motifs have been included.a Phosphorylation site and kinase experimentally confirmed in cerebellum (Jorgensen et al., 2009).b Phosphorylation site confirmed, kinase not identified (Dephoure et al., 2008;Vierra-Green et al., 2005) putative phosphorylation-dependent protein-protein interaction motifs are predicted in the N-terminus of ataxin-1.Among these are several forkhead-associated FHA domain type-1 and -2, and adaptor protein 14-3-3 ligand motifs suggesting that the N-terminus of ataxin-1 may play a significant role in the assembly of the protein interactome.

Self association region (SAR)
At the cross-point between the N-terminal IUR and the AXH domain, a region of ca 100 aa (495-605) was identified in a yeast two hybrid system as responsible for protein self association (SAR) in cell (Burright et al., 1997).SAR shares 39 aa with the N-terminus of the AXH domain (567-689) which has been shown to be dimeric in solution (Chen et al., 2004;de Chiara et al., 2003).Therefore, this region seems to account for dimerization of the fulllength ataxin-1.Interestingly, according to PONDR (VL3 predictor) and IUPRED, both based on the analysis of the local aminoacid composition, the full region ~440-700, which includes SAR and the AXH domain, is predicted as a potentially folded domain (Figure 3).

The structure of the ataxin-1 AXH domain
Soon after gene identification (Banfi et al., 1994), the analysis of the ataxin-1 sequence and the prediction of the secondary structure performed from multiple alignment of the protein from different species allowed the discovery of a new small putative independently folded domain (ca.~130 aa) (with predicted predominantly beta structure) (de Chiara et al., 2003).The domain, successively named AXH (for Ataxin-1 Homology domain), did not show any detectable homology with any other known folding units (SMART accession number SM00536; http://smart.embl-heidelberg.de/)(Letunic et al., 2009;Schultz et al., 1998).A few years later, the homology between the ataxin-1 AXH and a region of an unrelated protein, the transcription factor HBP1, was detected (Mushegian et al., 1997).The two proteins share ~28% identity and ~54% similarity with the HBP1 AXH domain showing a ca. 10 aminoacids insertion loop between secondary structures respect to the ataxin-1 domain (de Chiara et al., 2003) (Figure 4).The structure of ataxin-1 AXH, as solved by X-ray crystallography, consists of a noncanonical oligonucleotide-and oligosaccharide-binding (OB) fold (Chen et al., 2004;Murzin, 1993) (Figure 5).The AXH appears as a constitutive asymmetric dimer which crystallizes as an asymmetric dimer of dimers.Each monomer displays a common structure in the Cterminal part (residues 610-685), recognizable as the OB-fold, which superposes with an average root mean square deviation of 0.90 ± 0.06 Å on the backbone atoms.Conversely, approximately the first 30 N-terminal aminoacids show appreciable main chain differences between each of the two monomers in the dimer, with the same stretch of aminoacids adopting alternative secondary structures in the two cases.In this respect, the AXH domain represents an interesting example of a chameleon protein, which is a protein that adopts different folds under different environments.Interestingly, the observed structural differences in ataxin-1 are not induced by different experimental conditions or by the presence of ligands.Instead, they are present in the context of the same protein.(Chen et al., 2004;de Chiara et al., 2005a).SS_ATX1 and SS_HBP1:experimental secondary structure of ataxin-1 and HBP1. www.intechopen.com Selected Works in Bioinformatics 94 When comparing the structure of the HBP1 domain, which is monomeric (de Chiara et al., 2003;de Chiara et al., 2005a), to any of the ataxin-1 X-ray monomers, only the C-terminal part, representing the core of the OB-fold, is structurally superposable, while the Nterminus adopts a different topology, despite maintaining the same secondary structure elements along the sequence (Figure 6).Only a structure-based comparison allowed us to realign the sequences and to correctly position the HBP1 long-loop insertion, which was originally set between beta-3 and beta-4, between helix-1 and beta-3 (Figure 7).These findings support the possibility that the AXH motif is intrinsically able to adopt different topologies.

The function of the ataxin-1 AXH domain
Further studies on the role of the OB-fold of the AXH domain have shown that this region is designed to mediate interactions both with nucleic acids and proteins.The crystal structure of the ataxin-1 AXH allowed us to rationalize previous literature notions on the ability of the AXH to bind RNA homopolimers in vitro with the same nucleotide preference as full-length ataxin-1 (de Chiara et al., 2003;Yue et al., 2001).In addition to a direct binding to RNA through the AXH domain, the protein was found to co-localize with RNA also when the domain was deleted, thus suggesting an involvement of other RNA binding proteins in the ataxin-1 interactome (de Chiara et al., 2005b).Recent findings on the ability of ataxin-1 to interact with splicing factors through a short motif localized C-terminally to the AXH domain opened the intriguing possibility that the protein may be involved in pre-mRNA processing at the level of the splicing machinery (de Chiara et al., 2009;Lim et al., 2008).However, no RNA targets have, as yet, been identified and more research is needed to address the question of whether the protein may play a role in RNA metabolism and/or nuclear RNA export as suggested also by co-localization with the mRNA export factor TAP/NXF1 (Irwin et al., 2005).As for the ability of the AXH domain to mediate protein-protein interactions, several binding partners with transcriptional activity have been identified whose interaction with ataxin-1 is abolished when the AXH domain is deleted: the silencing mediator of retinoid and thyroid hormone receptors SMRT/SMRTER (Tsai et al., 2004), the repressor Capicua (Lam et al., 2006), the transcription factors Sensless/Gfi-1 (Tsuda et al., 2005) and Sp1 (Goold et al., 2007).A potential role for ataxin-1 in transcriptional regulation was suggested at a very early stage by the homology with HBP1 (Sampson et al., 2001;Tevosian et al., 1997).A general read-out assay for repression of transcription (de Chiara et al., 2005b) confirmed that the AXH domain represses transcription when tethered to DNA similarly to what was observed for full-length ataxin-1 (Tsai et al., 2004).However, cross-linking experiments showed that there is no direct binding between DNA and AXH domain indicating that the interaction is mediated by other co-transcriptional regulators (de Chiara et al., 2005b), as also confirmed experimentally later on (Bolger et al., 2007;Goold et al., 2007;Lam et al., 2006;Serra et al., 2006;Tsuda et al., 2005).

The C-terminal IUR
The region downstream to the AXH domain up to the C-terminal end of the protein (amino acids 690-816) represents an example of possible conflict between the results of different predictors.Disopred2, PONDR, IUPRED and DisEMBL predict the C-terminus as an almost completely disordered region (Figures 3).According to GlobPlot (http://globplot.embl.de),which is based on the Russell/Linding scale (Linding et al., 2003b) predicted as a potential globular domain.There is also no agreement with the prediction from the SMART server which, instead, identifies the AXH as the only folded region in the protein (Figures 2).While still awaiting for a systematic experimental validation, we can already comment on these results in light of our findings.

Prediction of ELMs in the C-terminal IUR: A three-way molecular switch in ataxin-1 C-terminus
Consistent with the presence of disorder, several linear motifs were predicted in the protein C-terminus.Among these three overlapping linear motifs identified downstream to AXH were experimentally verified: a nuclear localization signal (NLS) (771-774) (Klement et al., 1998), a 14-3-3 binding motif (774-778: key conservation RxxSxP) (Chen et al., 2003) and a UHM ligand motif (ULM) (771-776) (de Chiara et al., 2009), present in proteins associated with splicing.These motifs represent a three-way molecular switch which plays an important role both for the function of the native protein and for pathogenesis.In addition to the expansion of the polyQ tract, nuclear localization is a strict requirement for the development of the pathology.Expanded ataxin-1 with mutated NLS fails to enter the nucleus and does not cause aggregation that is the typical phenotypic hallmark of the SCA1 pathology (Klement et al., 1998).Further to polyQ expansion and nuclear localization, phosphorylation of S776 has been identified as a condition necessary for development of SCA1 (Emamian et al., 2003).Phosphorylation of S776 has been confirmed to occur in vivo (Emamian et al., 2003;Jorgensen et al., 2009) and is required for recognition of ataxin-1 by the protein 14-3-3, a molecular adaptor which modulates, in a phosphorylation-dependent manner, the function of different proteins in their specific context (Chen et al., 2003).Mutation of S776 to an alanine in expanded ataxin-1, despite not affecting nuclear localization, prevented the development of the SCA1 phenotype (Emamian et al., 2003).Recently, an UHM ligand motif (ULM) predicted by the ELM server in the C-terminus of the ataxin-1 sequence has been experimentally validated and characterized (de Chiara et al., 2009).The ULM motif was first identified in the splicing factors SF1 and SAP155 and shown to bind the UHM domain of U2AF65 (Corsini et al., 2007).Ataxin-1 ULM (771-776) strongly overlaps with the 14-3-3 ligand motif (774-778).However, whilst phosphorylation of S776 is crucial for recognition by 14-3-3, it only marginally affects the interaction with U2AF65, increasing the dissociation constant by only ~3 folds.Being the K d between ataxin-1 S776phosphorylated ULM and 14-3-3 (ζ isoform) two orders of magnitude smaller than U2AF65 (0.4 M versus 36 M) it was possible to conclude that, when S776 is phosphorylated, 14-3-3 is able to displace U2AF65.Under these conditions, the 14-3-3-bound expanded ataxin-1 is prone to aggregation.The S776A mutation, which hampers the interaction with 14-3-3, still allows the interaction with UHM domain of U2AF65 and potentially other splicing factors.These interactions, likely because of the extended dimension of the spliceosome complex, may play a protective role and prevent aggregation.Our findings allowed us to conclude that phosphorylation of S776 provides the switch that regulates binding of ataxin-1 to the protein 14-3-3 and components of the spliceosome, and suggests that pathology develops when aggregation competes with native interactions (Figure 8).This example also shows how the investigation of the native function of the polyQ proteins have provided valuable hints for understanding the molecular mechanisms of pathogenesis.

Conclusions
We have discussed here the concept of IUPs and IURs and shown how their detailed bioinformatics analysis can assist structural and functional studies using ataxin-1 as a paradigmatic example.Ataxin-1, like other members of the polyQ pathology family, is mostly composed of IUPs.Very little is still known about this protein despite its involvement in human neurodegeneration, yet this knowledge is essential for designing specific therapeutic interventions.Identification of both structured (the AXH domain) and unstructured linear functional motifs has played a key role in advancing our knowledge in the ataxin-1 function in the cell.More advanced information will undoubtedly come from experimental analysis of long stretches of the protein if the formidable challenges of their recombinant production in a pure and stable form can be circumvented.It is also evident from the example reported here how new approaches in the identification and study of IUPs might be highly helpful to advance the field.

Fig. 2 .
Fig. 2. Architecture of non-expanded and expanded ataxin-1.The polyQ tract is shown in magenta and the AXH domain in cyan.The positions of the nuclear localization signal (NLS) and a phosphorylation site important for protein interactions and for pathology are also indicated (de Chiara et al., 2009).

Fig. 4 .
Fig. 4. Sequence alignment of the AXH from ataxin-1 (ATX1), ataxin-1 paralogue BOAT (Brother Of ATaxin-1) and HBP1 from different species.The alignment was prepared by ClustalX (version 2) (Larkin et al., 2007) and is based exclusively on sequence similarity.The secondary structure of the ataxin-1 AXH domain as predicted by Jpred 3 (Cole et al., 2008) is shown on the top for reference.

Fig. 5 .
Fig. 5. X-ray structure of the AXH domain of ataxin-1 (PDB entry 1OA8)(Chen et al., 2004).The monomers forming the dimer of dimers observed in the structure are alternatively indicated in dark and light blue.Detailed analysis shows that they are not related by symmetry.An even bigger surprise was revealed by structure determination of the AXH domain of HBP1.Whilst on the pure basis of sequence homology it would have been reasonable to

Fig. 6 .
Fig. 6.Comparison between the structure of the AXH domains of ataxin-1 (monomer A) (left) (PDB entry 1OA8) (Chen et al., 2004) and HBP1 (right) (PDB entry 1V06) (de Chiara et al., 2005a).The N-terminal of the two monomers show the same elements of secondary structure arranged in a different topology.

Fig. 8 .
Fig. 8.A model of the role played by phosphorylation of S776 in the modulation of expanded ataxin-1 aggregation.

Table 1 .
Prediction and Experimental Detection of Structural and Functional Motifs in Intrinsically Unfolded Proteins 83 partners, or (3) through formation of protein complexes with partners.This class of IUP/IUR does not function in its unstructured form, i.e., the unstructured form is inactive.The second class performs its cellular function without forming a structure: the unstructured form is the functional one.List of predictors of protein disorder used for ataxin-1 in this study.The table, which illustrates the features of the different methods, is adapted fromFerron et al.
. Phosphorylation of these two serine residues supports the prediction by ELM of two candidate Class IV WW domain interaction motifs present in the regions 236-241 and 251-256 which mediate phosphorylation-dependent interactions.In addition to the WW domain motifs, other www.intechopen.com Prediction and Experimental Detection of Structural and Functional Motifs in Intrinsically Unfolded Proteins www.intechopen.com , the region 703-786 is www.intechopen.com