Contributions of Structure Comparison Methods to the Protein Structure Prediction Field

David Piedra; Marco d'Abramo; Xavier de la Cruz

doi:10.5772/21052

Author Information

Show +

David Piedra*
- IBMB-CSIC, Spain
Marco d'Abramo
- CNIO, Spain
Xavier de la Cruz
- IBMB-CSIC, Spain
- ICREA, Spain

*Address all correspondence to:

1. Introduction

Since their development, structure comparison methods have contributed to advance our understanding of protein structure and evolution (Greene et al, 2007; Hasegawa & Holm, 2009), to help the development of structural genomics projects (Pearl et al, 2005), to improve protein function annotations (D. A. Lee et al), etc, thus becoming an essential tool in structural bioinformatics. In recent years, their application range has grown to include the protein structure prediction field, were they are used to evaluate overall prediction quality (Jauch et al, 2007; Venclovas et al, 2001; Vincent et al, 2005; G. Wang et al, 2005), to identify a protein’s fold from low-resolution models (Bonneau et al, 2002; de la Cruz et al, 2002), etc. In this chapter, after briefly reviewing some of these applications, we show how structure comparison methods can also be used for local quality assessment of low-resolution models and how this information can help refine/improve them.

Quality assessment is becoming an important research topic in structural bioinformatics because model quality determines the applicability of structure predictions (Cozzetto et al, 2007). Also, because prediction technology is now easily available and potential end-users of prediction methods, from template-based (comparative modeling and threading) to de novo methods, are no longer specialized structural bioinformaticians. Quality assessment methods have been routinely used for many years in structural biology in the evaluation of experimental models. These methods focus on several features of the protein structure (see (Laskowski et al, 1998) and (Kleywegt, 2000) and references therein). Because a number of quality issues are common to both experimental and predicted models, the use of these methods has been naturally extended to the evaluation of structure predictions. For example, in the case of homology modeling, a widely used structure prediction technique, evaluation of models with PROCHECK (Laskowski et al, 1993), WHAT-CHECK (Hooft et al, 1997), PROSA (Sippl, 1993), and others (see (Marti-Renom et al, 2000) and references therein) is part of the standard prediction protocol; WHATIF (Vriend, 1990) and PROSA (Sippl, 1993) have also been used in the CASP experiment to assess comparative models (Venclovas, 2001; Williams et al, 2001); etc.

Some quality assessment problems are unique to the structure prediction field, given the specific characteristics of computational models, and have led to the development of methods aimed at: the recognition of near-native predictions from a set of decoys (Jones & Thornton, 1996; Lazaridis & Karplus, 2000; Sippl, 1995); identification of a target’s protein family (Bonneau et al, 2002; de la Cruz et al, 2002); overall quality assessment of predictions (Archie et al, 2009; Benkert et al, 2009; Cheng et al, 2009; Larsson et al, 2009; Lundstrom et al, 2001; McGuffin, 2009; Mereghetti et al, 2008; Wallner & Elofsson, 2003; 2005; Z.Wang et al, 2009; Zhou & Skolnick, 2008); and, more recently, residue-level quality assessment (Benkert et al, 2009; Cheng et al, 2009; Larsson et al, 2009; McGuffin, 2009; Wallner & Elofsson, 2006; 2007; Z. Wang et al, 2009). However, in spite of these promising efforts, quality assessment of protein structure predictions remains an open issue(Cozzetto et al, 2009).

Here we focus on the problem of local quality assessment, which consists on the identification of correctly modeled regions in predicted structures (Wallner & Elofsson, 2006; 2007), or, as stated by Wallner and Elofsson(Wallner & Elofsson, 2007): “The real value of local quality prediction is when the method is able to distinguish between high and low quality regions.”. In many cases, global and local quality estimates are produced simultaneously (Benkert et al, 2009; Cheng et al, 2009; Larsson et al, 2009; McGuffin, 2009). However, in this chapter we separate these two issues by assuming that, irrespective of its quality, a structure prediction with the native fold of the corresponding protein is available. From a structural point of view this is a natural requirement, as a correct local feature (particularly if it is one which, like a β-strand (Chou et al, 1983), is stabilized by long-range interactions) in an otherwise wrong structure can hardly be understood. From a practical point of view, successful identification of correct parts within incorrect models may lead to costly errors. For example, identifying a correctly modeled binding site within a structurally incorrect context should not be used for drug design: it would surely have incorrect dynamics; the long-range terms of the interaction potential, like electrostatics, would be meaningless; false neighboring residues could create unwanted steric clashes with the substrate, thus hampering its docking; or, on the contrary, absence of the true neighbors could lead to unrealistic docking solutions; etc. In the remaining of the chapter we describe how structure comparison methods can be applied to obtain local quality estimates for low-resolution models and how these estimates can be used to improve the model quality.

2. A simple protocol for local quality assessment with structure comparison methods

As mentioned before, an important goal in local quality assessment(Wallner & Elofsson, 2006; 2007) is to partition the residues from a structure prediction in two quality classes: high and low. This can be done combining several predictions; however, in the last two rounds of the CASP experiment -a large, blind prediction experiment performed every two years(Kryshtafovych et al, 2009)- evaluators of the Quality Assessment category stressed that methods aimed to assess single predictions are needed(Cozzetto et al, 2007; Cozzetto et al, 2009). These methods are particularly important for users that generate their protein models with de novo prediction tools, which are still computationally costly(Jauch et al, 2007), particularly for large proteins.

Here we describe a single-molecule approach, based on the use of structure comparison methods, that allows to partition model residues in two sets, of high and low quality respectively. In this approach (Fig. 1), the user’s model of the target is first structurally aligned with a target’s homolog. This alignment, which constitutes the core of the procedure, is then used to separate the target’s residues in two groups: aligned and unaligned. The main assumption of this approach is that aligned residues are of higher quality than the average. The validity of this assumption is tested in the next section. In section 3 we discuss the conditions that determine/limit the applicability and usefulness of the method.

Figure 1.
Use of structure comparison methods for local quality assessment of structure predictions

2.1. Performance of structure comparison methods in local quality assessment

To show that structurally aligned residues are usually of higher quality we used a set of de novo predictions with medium/low to very low resolution, obtained with Rosetta(Simons et al, 1997). Although several de novo prediction programs have shown promising results in the CASP experiment(Jauch et al, 2007; Vincent et al, 2005), we used Rosetta predictions because: (i) Rosetta is a well known de novo prediction program that has ranked first in successive CASP rounds (Jauch et al, 2007; Vincent et al, 2005); (ii) many Rosetta predictions are available at the server of Baker’s group, thus allowing a consistent test of our approach, with predictions from the same source; and (iii) the program is available for interested users (http://www.rosettacommons.org/software/).

We downloaded the protein structure predictions from the server of Baker’s laboratory (http://depts.washington.edu/bakerpg/drupal/). This set was constituted by 999 de novo models generated with Rosetta(Simons et al, 1997) for 85 proteins, i.e. a total of 84915 models. Application of the protocol studied here (Fig. 1) requires that the structure of a homolog of the target is available, and that the predictions used have the fold of the target. The former was enforced by keeping only those proteins with a CATH relative at the T-level(Pearl et al, 2005) (homologs with the same fold of the target protein, regardless of their sequence similarity). The second condition was required to focus only on the local quality assessment problem, and was implemented by excluding those models not having the fold of their target protein. Technically, this meant that we only kept those models structurally similar to any member of the target’s structural family: that is, those models giving a score higher than 5.25 for the model-homolog structure comparison done with MAMMOTH (Ortiz et al, 2002), for at least one homolog of the target protein. This step was computationally costly, as it involved 7,493,499 structure comparisons, and could only be carried using MAMMOTH(Ortiz et al, 2002); the 5.25 score threshold was taken from MAMMOTH’s article (Ortiz et al, 2002). The final dataset was constituted by 68 target proteins and a total of 17180 models.

The properties of the selected target’s residues (STR; to avoid meaningless results we only considered STR sets larger than 20 residues) were characterized with four parameters: two structure-based, and two sequence-based. The former were used to check if STR really were of better quality, comparing their parameters’ values with those obtained for the set of all the target residues (ATR), i.e. the whole model structure. It has to be noted that: (i) STR and ATR sets are constituted by residues from the target protein, more precisely STR is a subset of ATR; and (ii) three possible STR sets were produced, because we checked our procedure using three structure comparison methods (MAMMOTH (Ortiz et al, 2002), SSAP(Orengo & Taylor, 1990) and LGA(Zemla, 2003)). The sequence-based properties were utilized to describe how STR spread along the sequence of the target, which helps to assess the usefulness of the protocol. Below we provide a brief description of each parameter, together with the results obtained from their use.

2.1.1. Structural quality: rmsd

Rmsd(Kabsch, 1976) is a quality measure widely employed to assess structure models: it corresponds to the average distance between model atoms and their equivalent in the native structure. Small rmsd values correspond to higher quality predictions than larger values.

In Fig. 2 we see the STR and ATR rmsd distributions. Regardless of the structure comparison method used (MAMMOTH (Ortiz et al, 2002), SSAP(Orengo & Taylor, 1990) and LGA(Zemla, 2003) in blue, yellow and red, respectively), STR distributions are shifted towards lower rmsd values relative to ATR distributions (in grey). This confirms the starting assumption: it shows that model residues structurally aligned to the protein’s homolog usually have a higher structural quality. A consensus alignment (in black), which combined the results of the three structure comparison methods, gave better results at the price of including fewer residues; for this reason we excluded the consensus approach from subsequent analyses.

An interesting feature of STR rmsd distributions was that their maxima were between 3.5 Å and 6.5 Å, and that a majority of individual values were between 3 Å and 8 Å, and below 10 Å. To further explore this issue, we plotted the values of rmsd for STR against ATR (Fig. 3, grey boxes). In accordance with the histogram results, STR rmsd tended to be smaller than ATR rmsd. We distinguished two regions in the graph: in the first region (ATR rmsd between 0 Å and 6-8 Å) there was a roughly linear relationship between ATR and STR rmsds; however, for ATR rmsd values beyond 8 Å, STR rmsd reached a plateau. This plateau is at the origin of the thresholds observed in the histograms (Fig. 2), and confirms that structure alignments can be used to identify subsets of model residues with better rmsd than the rest.

Figure 2.
Quality of structurally aligned regions vs. whole model, rmsd frequency histogram.

As a performance reference we used the PROSA program (Sippl, 1993) (white boxes) which provides a residue-by-residue, energy-based quality assessment, and is a single model method, therefore comparable to the approach presented here. PROSA was executed with default parameters, and we took as high quality residues those having energies below zero. In Fig. 3 we see that for good models, i.e. those with low ATR values, PROSA results (in white) were as good as those obtained with structure comparison methods (in grey). However, as models became poorer, PROSA results became worse, particularly after structure comparison methods reached their plateau. This indicates that when dealing with poor predictions use of structure alignments can improve/complement other quality assessment methods.

Figure 3.
Quality of structurally aligned (obtained with MAMMOTH (Ortiz et al, 2002)) regions vs. whole model, rmsd of selected residues vs. all-residues. Grey: structure comparison-based protocol (Fig. 1); white: PROSA(Sippl, 1993) results.

2.1.2. Structural quality: GDT_TS

GDT_TS is a quality measure routinely utilised by evaluator teams in the CASP community experiment (Jauch et al, 2007; Vincent et al, 2005): it is equal to the average of the percentages of model residues at less than 1 Å, 2 Å, 4 Å and 8 Å from their location in the correct structure. It was computed following the procedure described by Zemla(Zemla, 2003), using Cα atoms to compute residue-residue distances. GDT_TS varies between 0 and 100, with values approaching 100 as models become better.

We found that STR GDT_TS was in general better than ATR GDT_TS (Fig. 4); this was particularly true when the latter was below 40-50. Overall, this shows that STR is enriched in good quality sub-structures relative to ATR, particularly for poor models.

Consistency with rmsd analysis was observed when comparing the performance of structure comparison-based quality assessment (in grey) with that of PROSA (in white): for good models (GDT_TS values above 60-70) both approaches had a similar behavior; however, as model quality decreased, use of structure alignments showed increasingly better performance than PROSA at pinpointing correct substructures.

Figure 4.
Quality of structurally aligned (obtained with MAMMOTH (Ortiz et al, 2002)) regions vs. whole model, GDT_TS of the selected residues vs. all-residues GDT_TS. Grey: structure comparison-based protocol (Fig. 1); white: PROSA(Sippl, 1993) results.

2.1.3. Distribution of high quality residues along the protein sequence

Usually, STR do not form a continuous block, they tend to scatter along the sequence. The nature of this distribution is of interest for some applications of quality assessment methods (like model refinement) for which STR sets may be of little value if the involved residues are either too close in sequence, or contain too many orphan residues.

To characterize the distribution of STR along the sequence we used two measures: maximum distance (MD) between STR runs and normalized size distribution of STR runs (SAS). Both are based on the fact that, for a given model, STR sets are constituted by residue runs of varying size. MD corresponds to the largest sequence distance between STR runs (i.e. the number of residues between the rightmost and leftmost STR runs), divided by whole sequence length. MD values near 1 indicate that STR runs are spread over the whole protein, while smaller values point to a tighter residue clustering. SAS corresponds to the normalized (again by whole sequence length) size distribution for all runs constituting STR sets. SAS gives a view of how the sequence coverage is done: either by large sequence chunks, by small residue clusters, or by a mixture of both. When the alignment is constituted by small, evenly distributed residue clusters the SAS distribution will approach zero.

Our results showed that MD values are more frequent above 0.5, and more than 50% of them were higher than 0.8 (Fig. 5). The three structure comparison methods showed similar distributions, although LGA was slightly nearer to 1. This indicates that STR spread over a substantial part of the predicted protein.

Figure 5.
Frequency distribution of the selected residues along the target sequence: normalized maximum distance between STR runs (unitless parameter).

Results for SAS (Fig. 6) showed that while ~50 % of STR formed clusters of size lower than 10 % of the whole sequence (i.e. SAS values below 0.1), the remaining residues were grouped in medium to large stretches. This means that for a 100 residue protein, clusters of more than 10 residues (which is roughly the size of an average α-helix) are frequent. In addition, for 95 % of the cases, the largest run of adjacent residues was above 30 % of the target length.

The picture arising from MD and SAS distributions is that STR usually extend over the protein length. Although STR sets are constituted by somewhat heterogeneous runs they do not contain too many orphan residues, as they include one large run (the size of a supersecondary structure motif, or larger) and several, smaller runs (the size of secondary structure elements).

Figure 6.
Frequency distribution of the selected residues along the target sequence: normalized maximum distance between STR runs (unitless parameter).

3. Applicability range of structure comparison methods in local quality assessment

The approach described here is easy to use and has few computational requirements; however, it cannot be arbitrarily applied to any model or in any prediction scenario. In this section we describe which are its limits regarding prediction methods, target proteins and protein model nature.

3.1. Prediction methods

As far as the target protein has a homolog of known structure, model-homolog structure alignments can be computed and the quality assessment protocol (Fig. 1) can be applied, regardless of the prediction method originating the model. However, the approach presented here reaches its maximum utility when models are obtained with de novo structure prediction methods (methods originally devised to work using only the target’s sequence and a physico-chemical/statistical potential, irrespective of the availability of homologs). This may seem somewhat contradictory, as one can think that the existence of target’s homologs favors the use of comparative modeling methods instead of de novo methods. However, this is not the case: while de novo methods were initially developed with the de novo scenario in mind (only sequence information is available for the target protein), this situation is changing rapidly. Actually, when prediction problems become difficult, or a given method gives an unclear answer, using more than one technique is considered a good approach within the prediction community, as the evaluators of the de novo section in the CASP6 experiment explain (Vincent et al, 2005): “Many predicting groups now use both de novo and homology modeling/fold recognition techniques to predict structures in all categories”. In addition, it has been shown that de novo methods can compete with template-based methods in the prediction of difficult targets (Jauch et al, 2007; Raman et al, 2009; Vincent et al, 2005). In this situation, which implies the existence of target's homologs, our method can be used to score the local quality of de novo predictions.

In addition, a completely new field of application for de novo methods has been unveiled by the growing interest in knowing the structure of alternative splicing isoforms (C. Lee & Wang, 2005). Due to the very localized nature of sequence changes (Talavera et al, 2007), structure prediction of alternative splicing variants seems a trivial exercise in comparative modeling. However, template-based methods fail to reproduce the structure changes introduced by alternative splicing (Davletov & Jimenez, 2004). De novo approaches with their ability to combine first principles with deep conformational searches are ideal candidates to tackle this problem; in this case, availability of the structure of only one isoform would allow the application of our method.

3.2. Target proteins

Proteins to which our approach can be applied must have a homolog of known structure. The number of these proteins is increasing due to: (i) the progress of structural genomics projects (Todd et al, 2005) (this will increase the number of both easy/medium and hard targets); (ii) the growing number of alternative splicing variants of unknown structure (C. Lee & Wang, 2005).

3.3. Protein models

The approach proposed (Fig. 1) is a local, not a global, quality assessment method and should only be applied to models that have the native fold of the target (see above). Present de novo methods still cannot consistently produce models with a native-like fold (Moult et al, 2009). Therefore, researchers must ascertain that the model’s fold is correct (irrespective of its resolution). This can be done using global quality assessment methods like PROSA (Sippl, 1993), the Elofsson’s suite of programs (Wallner & Elofsson, 2007), etc.

4. Applications

Once available, local quality information can be used with different purposes. For example, it may help to identify those parts of a theoretical model that are more reliable for mutant design, or to interpret the results of mutagenesis experiments; it may be used for in sillico docking experiments involving de novo models, to decide which parts of the models must be employed preferentially; etc. One the most promising applications of quality assessment methods is the refinement of low-resolution models (Wallner & Elofsson, 2007). In this section we illustrate how the results of the procedure here described can be used for this purpose.

Among the possible options available for model refinement, we propose to use the alignment resulting from the structural superimposition between a de novo model and the target’s homolog (Fig. 1) as input to a comparative modeling program. We applied this strategy to 15 proteins (five from each of the three main CATH structural classes: alpha, beta and alpha/beta) from our initial dataset. These 15 proteins contributed a total of 2693 de novo models that resulted in 8033 model-homolog alignments (obtained with MAMMOTH (Ortiz et al, 2002)). These alignments were subsequently used as input to the standard homology modeling program MODELLER (Marti-Renom et al, 2000), which was run with default parameters. For the aligned regions we found (Fig. 7) that most of the refined models had lower model-native rmsd than the starting de novo models, i.e. they were closer to the native structure. A similar, although milder, trend was also observed when considering the whole set of protein residues (i.e. aligned as well as unaligned residues) (Fig. 8). These results show that this simple, computationally cheap model refinement protocol, based on the use of structure comparison local quality analysis, clearly helps to refine/improve low-resolution de novo models to an accuracy determined by the closest homolog of the target.

Figure 7.
Model refinement using structure comparison-based local quality assessment: rmsd of refined models vs. rmsd of original de novo models, subset of aligned residues. Points below the dotted line correspond to refinement-improved models.

Figure 8.
Model refinement using structure comparison-based local quality assessment: rmsd of refined models vs. rmsd of original de novo models, all protein residues. Points below the dotted line correspond to refinement-improved models.

5. Conclusions

In this chapter we have described and tested a protocol for local quality assessment of low-resolution predictions based on the use of structure comparison methods. The testing was carried with de novo predictions, and the results showed that structure comparison methods allow the partitioning of the model’s residues in two sets of high and low quality, respectively. This result holds even when only remote homologs of the target protein are available. The simplicity of the approach leaves room for future improvements and fruitful combination with other quality assessment methods. Two conditions determine the application range of this approach: the target protein must have at least one homolog of known structure, and models reproducing the fold of the target are required. However, results indicating that we may be near a full coverage of the proteins’ fold space, together with advances in overall quality scoring indicate that these two problems are likely to become minor issues in the near future. Finally, our procedure suggests a simple refinement strategy based on the use of comparative modeling programs that may be used to improve low-resolution de novo models.

Acknowledgments

This work is dedicated to the memory of Angel Ramírez Ortíz, leading bioinformatician and designer of the MAMMOTH program for structure comparison. The authors wish to thank the CATH team for their support. Xavier de la Cruz acknowledges funding from the Spanish government (Grants BIO2006-15557 and BFU2009-11527). David Piedra acknowledges economical support from the Government of Catalonia and the Spanish Ministerio de Educación y Ciencia.

References

1. ArchieJ. G.PaluszewskiM.KarplusK.2009Applying Undertaker to quality assessment. Proteins 77Suppl 9191-195, 0887-3585
2. BenkertP.TosattoS. C.SchwedeT.2009Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust. Proteins 77Suppl 9173-180, 0887-3585
3. BonneauR.StraussC. E.RohlC. A.ChivianD.BradleyP.MalmstromL.RobertsonT.BakerD.2002De novo prediction of three-dimensional structures for major protein families. J Mol Biol 322165780022-2836
4. ChengJ.WangZ.TeggeA. N.EickholtJ.2009Prediction of global and local quality of CASP8 models by MULTICOM series. Proteins 77Suppl 9181-184, 0887-3585
5. ChouK. C.NemethyG.ScheragaH. A.1983Role of interchain interactions in the stabilization of the right-handed twist of beta-sheets. J Mol Biol 16823894070022-2836
6. CozzettoD.KryshtafovychA.CerianiM.TramontanoA.2007Assessment of predictions in the model quality assessment category. Proteins 69Suppl 8175-183, 0887-3585
7. CozzettoD.KryshtafovychA.TramontanoA.2009Evaluation of CASP8 model quality predictions. Proteins 77Suppl 9157-166, 0887-3585
8. DavletovB.JimenezJ. L.2004Sculpting a domain by splicing. Nat Struct Mol Biol 11145
9. de la CruzX.SillitoeI.OrengoC.2002Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low-resolution models. Proteins 46172840887-3585
10. GreeneL. H.LewisT. E.AddouS.CuffA.DallmanT.DibleyM.RedfernO.PearlF.NambudiryR.ReidA.SillitoeI.YeatsC.ThorntonJ. M.OrengoC. A.2007The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35No.Database issue, D291D2970305-1048
11. HasegawaH.HolmL.2009Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 1933413480095-9440X.
12. HooftR. W.SanderC.VriendG.1997Objectively judging the quality of a protein structure from a Ramachandran plot. Comput Appl Biosci 1344254300266-7061
13. JauchR.YeoH. C.KolatkarP. R.ClarkeN. D.2007Assessment of CASP7 structure predictions for template free targets. Proteins 69Suppl 857-67, 0887-3585
14. JonesD. T.ThorntonJ. M.1996Potential energy functions for threading. Curr Opin Struct Biol 622102160095-9440X.
15. KabschW. A.1976A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A A32922-923
16. KleywegtG. J.2000Validation of protein crystal structures. Acta Crystallogr D Biol Crystallogr 56No.Pt 3, 2492651399-0047
17. KryshtafovychA.FidelisK.MoultJ.2009CASP8 results in context of previous experiments. Proteins 77Suppl 9217-228, 0887-3585
18. LarssonP.SkwarkM. J.WallnerB.ElofssonA.2009Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins 77Suppl 9167-172, 0887-3585
19. LaskowskiR. A.MacArthur. M. W.MossD. S.ThorntonJ. M.1993PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26283-2911600-5767
20. LaskowskiR. A.MacArthur. M. W.ThorntonJ. M.1998Validation of protein models derived from experiment. Curr Opin Struct Biol 856316390095-9440X.
21. LazaridisT.KarplusM.2000Effective energy functions for protein structure prediction. Curr Opin Struct Biol 1021391450095-9440X.
22. LeeC.WangQ.2005Bioinformatics analysis of alternative splicing. Brief Bioinform 612333
23. LeeD. A.RentzschR.OrengoC.2010GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res 3837207370305-1048
24. Lundstrom, J.; Rychlewski L.; Bujnicki J. & Elofsson A. (2001) Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci Vol.10, No.11, pp. 2354-2362
25. Marti-RenomM. A.StuartA. C.FiserA.SanchezR.MeloF.SaliA.2000Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29291-3251056-8700
26. Mc GuffinL. J.2009Prediction of global and local model quality in CASP8 using the ModFOLD server. Proteins 77Suppl 9185-190, 0887-3585
27. Mereghetti, P.; Ganadu M.L.; Papaleo E.; Fantucci P. & De Gioia L. (2008) Validation of protein models by a neural network approach. BMC Bioinformatics Vol.966, ISSN 1471-2105
28. MoultJ.FidelisK.KryshtafovychA.RostB.TramontanoA.2009Critical assessment of methods of protein structure prediction- Round VIII. Proteins 77Suppl 91-4, 0887-3585
29. OrengoC. A.TaylorW. R.1990A rapid method of protein structure alignment. J Theor Biol 1474517551
30. OrtizA. R.StraussC. E.OlmeaO.2002MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 111126062621
31. PearlF.ToddA.SillitoeI.DibleyM.RedfernO.LewisT.BennettC.MarsdenR.GrantA.LeeD.AkporA.MaibaumM.HarrisonA.DallmanT.ReevesG.DibounI.AddouS.LiseS.JohnstonC.SilleroA.ThorntonJ.OrengoC.2005The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33No.Database issue, D247D251
32. RamanS.VernonR.ThompsonJ.TykaM.SadreyevR.PeiJ.KimD.KelloggE.Di MaioF.LangeO.KinchL.ShefflerW.KimB. H.DasR.GrishinN. V.BakerD.2009Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 77Suppl 989-99, 0887-3585
33. SimonsK. T.KooperbergC.HuangE.BakerD.1997Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 2681209225
34. SipplM. J.1993Recognition of errors in three-dimensional structures of proteins. Proteins 1743553620887-3585
35. SipplM. J.1995Knowledge-based potentials for proteins. Curr Opin Struct Biol 522292350095-9440X.
36. TalaveraD.VogelC.OrozcoM.TeichmannS. A.de la CruzX.2007The (in)dependence of alternative splicing and gene duplication. PLoS Comput Biol 33e33
37. ToddA. E.MarsdenR. L.ThorntonJ. M.OrengoC. A.2005Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 3485123512600022-2836
38. VenclovasC.2001Comparative modeling of CASP4 target proteins: combining results of sequence search with three-dimensional structure assessment. Proteins Vol.Suppl 547540887-3585
39. VenclovasC.ZemlaA.FidelisK.MoultJ.2001Comparison of performance in successive CASP experiments. Proteins Vol.Suppl 51631700887-3585
40. VincentJ. J.TaiC. H.SathyanarayanaB. K.LeeB.2005Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 61Suppl 767-83, 0887-3585
41. VriendG.1990WHAT IF: a molecular modeling and drug design program. J Mol Graph 8152560263-7855
42. WallnerB.ElofssonA.2003Can correct protein models be identified? Protein Sci 125107310860961-8368
43. WallnerB.ElofssonA.2005Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2123424842541367-4803
44. WallnerB.ElofssonA.2006Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 1549009130961-8368
45. WallnerB.ElofssonA.2007Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 69Suppl 8184-193, 0887-3585
46. WangG.JinY.DunbrackR. L.2005Assessment of fold recognition predictions in CASP6. Proteins 61Suppl 746-66, 0887-3585
47. WangZ.TeggeA. N.ChengJ.2009Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 7536386470887-3585
48. WilliamsM. G.ShiraiH.ShiJ.NagendraH. G.MuellerJ.MizuguchiK.MiguelR. N.LovellS. C.InnisC. A.DeaneC. M.ChenL.CampilloN.BurkeD. F.BlundellT. L.de BakkerP. I.2001Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins Vol.Suppl 592970887-3585
49. ZemlaA.2003LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 311333703374
50. ZhouH.SkolnickJ.2008Protein model quality assessment prediction by combining fragment comparisons and a consensus C(alpha) contact potential. Proteins 713121112180887-3585

[1] 1. ArchieJ. G.PaluszewskiM.KarplusK.2009Applying Undertaker to quality assessment. Proteins 77Suppl 9191-195, 0887-3585

[2] 2. BenkertP.TosattoS. C.SchwedeT.2009Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust. Proteins 77Suppl 9173-180, 0887-3585

[3] 3. BonneauR.StraussC. E.RohlC. A.ChivianD.BradleyP.MalmstromL.RobertsonT.BakerD.2002De novo prediction of three-dimensional structures for major protein families. J Mol Biol 322165780022-2836

[4] 4. ChengJ.WangZ.TeggeA. N.EickholtJ.2009Prediction of global and local quality of CASP8 models by MULTICOM series. Proteins 77Suppl 9181-184, 0887-3585

[5] 5. ChouK. C.NemethyG.ScheragaH. A.1983Role of interchain interactions in the stabilization of the right-handed twist of beta-sheets. J Mol Biol 16823894070022-2836

[6] 6. CozzettoD.KryshtafovychA.CerianiM.TramontanoA.2007Assessment of predictions in the model quality assessment category. Proteins 69Suppl 8175-183, 0887-3585

[7] 7. CozzettoD.KryshtafovychA.TramontanoA.2009Evaluation of CASP8 model quality predictions. Proteins 77Suppl 9157-166, 0887-3585

[8] 8. DavletovB.JimenezJ. L.2004Sculpting a domain by splicing. Nat Struct Mol Biol 11145

[9] 9. de la CruzX.SillitoeI.OrengoC.2002Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low-resolution models. Proteins 46172840887-3585

[10] 10. GreeneL. H.LewisT. E.AddouS.CuffA.DallmanT.DibleyM.RedfernO.PearlF.NambudiryR.ReidA.SillitoeI.YeatsC.ThorntonJ. M.OrengoC. A.2007The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35No.Database issue, D291D2970305-1048

[11] 11. HasegawaH.HolmL.2009Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 1933413480095-9440X.

[12] 12. HooftR. W.SanderC.VriendG.1997Objectively judging the quality of a protein structure from a Ramachandran plot. Comput Appl Biosci 1344254300266-7061

[13] 13. JauchR.YeoH. C.KolatkarP. R.ClarkeN. D.2007Assessment of CASP7 structure predictions for template free targets. Proteins 69Suppl 857-67, 0887-3585

[14] 14. JonesD. T.ThorntonJ. M.1996Potential energy functions for threading. Curr Opin Struct Biol 622102160095-9440X.

[15] 15. KabschW. A.1976A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A A32922-923

[16] 16. KleywegtG. J.2000Validation of protein crystal structures. Acta Crystallogr D Biol Crystallogr 56No.Pt 3, 2492651399-0047

[17] 17. KryshtafovychA.FidelisK.MoultJ.2009CASP8 results in context of previous experiments. Proteins 77Suppl 9217-228, 0887-3585

[18] 18. LarssonP.SkwarkM. J.WallnerB.ElofssonA.2009Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins 77Suppl 9167-172, 0887-3585

[19] 19. LaskowskiR. A.MacArthur. M. W.MossD. S.ThorntonJ. M.1993PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26283-2911600-5767

[20] 20. LaskowskiR. A.MacArthur. M. W.ThorntonJ. M.1998Validation of protein models derived from experiment. Curr Opin Struct Biol 856316390095-9440X.

[21] 21. LazaridisT.KarplusM.2000Effective energy functions for protein structure prediction. Curr Opin Struct Biol 1021391450095-9440X.

[22] 22. LeeC.WangQ.2005Bioinformatics analysis of alternative splicing. Brief Bioinform 612333

[23] 23. LeeD. A.RentzschR.OrengoC.2010GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res 3837207370305-1048

[24] 24. Lundstrom, J.; Rychlewski L.; Bujnicki J. & Elofsson A. (2001) Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci Vol.10, No.11, pp. 2354-2362

[25] 25. Marti-RenomM. A.StuartA. C.FiserA.SanchezR.MeloF.SaliA.2000Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29291-3251056-8700

[26] 26. Mc GuffinL. J.2009Prediction of global and local model quality in CASP8 using the ModFOLD server. Proteins 77Suppl 9185-190, 0887-3585

[27] 27. Mereghetti, P.; Ganadu M.L.; Papaleo E.; Fantucci P. & De Gioia L. (2008) Validation of protein models by a neural network approach. BMC Bioinformatics Vol.966, ISSN 1471-2105

[28] 28. MoultJ.FidelisK.KryshtafovychA.RostB.TramontanoA.2009Critical assessment of methods of protein structure prediction- Round VIII. Proteins 77Suppl 91-4, 0887-3585

[29] 29. OrengoC. A.TaylorW. R.1990A rapid method of protein structure alignment. J Theor Biol 1474517551

[30] 30. OrtizA. R.StraussC. E.OlmeaO.2002MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 111126062621

[31] 31. PearlF.ToddA.SillitoeI.DibleyM.RedfernO.LewisT.BennettC.MarsdenR.GrantA.LeeD.AkporA.MaibaumM.HarrisonA.DallmanT.ReevesG.DibounI.AddouS.LiseS.JohnstonC.SilleroA.ThorntonJ.OrengoC.2005The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33No.Database issue, D247D251

[32] 32. RamanS.VernonR.ThompsonJ.TykaM.SadreyevR.PeiJ.KimD.KelloggE.Di MaioF.LangeO.KinchL.ShefflerW.KimB. H.DasR.GrishinN. V.BakerD.2009Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 77Suppl 989-99, 0887-3585

[33] 33. SimonsK. T.KooperbergC.HuangE.BakerD.1997Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 2681209225

[34] 34. SipplM. J.1993Recognition of errors in three-dimensional structures of proteins. Proteins 1743553620887-3585

[35] 35. SipplM. J.1995Knowledge-based potentials for proteins. Curr Opin Struct Biol 522292350095-9440X.

[36] 36. TalaveraD.VogelC.OrozcoM.TeichmannS. A.de la CruzX.2007The (in)dependence of alternative splicing and gene duplication. PLoS Comput Biol 33e33

[37] 37. ToddA. E.MarsdenR. L.ThorntonJ. M.OrengoC. A.2005Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 3485123512600022-2836

[38] 38. VenclovasC.2001Comparative modeling of CASP4 target proteins: combining results of sequence search with three-dimensional structure assessment. Proteins Vol.Suppl 547540887-3585

[39] 39. VenclovasC.ZemlaA.FidelisK.MoultJ.2001Comparison of performance in successive CASP experiments. Proteins Vol.Suppl 51631700887-3585

[40] 40. VincentJ. J.TaiC. H.SathyanarayanaB. K.LeeB.2005Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 61Suppl 767-83, 0887-3585

[41] 41. VriendG.1990WHAT IF: a molecular modeling and drug design program. J Mol Graph 8152560263-7855

[42] 42. WallnerB.ElofssonA.2003Can correct protein models be identified? Protein Sci 125107310860961-8368

[43] 43. WallnerB.ElofssonA.2005Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2123424842541367-4803

[44] 44. WallnerB.ElofssonA.2006Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 1549009130961-8368

[45] 45. WallnerB.ElofssonA.2007Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 69Suppl 8184-193, 0887-3585

[46] 46. WangG.JinY.DunbrackR. L.2005Assessment of fold recognition predictions in CASP6. Proteins 61Suppl 746-66, 0887-3585

[47] 47. WangZ.TeggeA. N.ChengJ.2009Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 7536386470887-3585

[48] 48. WilliamsM. G.ShiraiH.ShiJ.NagendraH. G.MuellerJ.MizuguchiK.MiguelR. N.LovellS. C.InnisC. A.DeaneC. M.ChenL.CampilloN.BurkeD. F.BlundellT. L.de BakkerP. I.2001Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins Vol.Suppl 592970887-3585

[49] 49. ZemlaA.2003LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 311333703374

[50] 50. ZhouH.SkolnickJ.2008Protein model quality assessment prediction by combining fragment comparisons and a consensus C(alpha) contact potential. Proteins 713121112180887-3585

Contributions of Structure Comparison Methods to the Protein Structure Prediction Field

Computational Biology and Applied Bioinformatics

Author Information

David Piedra*

Marco d'Abramo

Xavier de la Cruz

1. Introduction

2. A simple protocol for local quality assessment with structure comparison methods

Figure 1.