Length-dependent prediction of protein intrinsic disorder
详细信息    查看全文
  • 作者:Kang Peng (1)
    Predrag Radivojac (2)
    Slobodan Vucetic (1)
    A Keith Dunker (3)
    Zoran Obradovic (1)
  • 刊名:BMC Bioinformatics
  • 出版年:2006
  • 出版时间:December 2006
  • 年:2006
  • 卷:7
  • 期:1
  • 全文大小:1343KB
  • 参考文献:1. Dyson HJ, Wright PE: Intrinsically unstructured proteins and their functions. / Nat Rev Mol Cell Biol 2005, 6:197-08. CrossRef
    2. Wright PE, Dyson HJ: Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. / J Mol Biol 1999, 293:321-31. CrossRef
    3. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, / et al.: Intrinsically disordered protein. / J Mol Graph Model 2001, 19:26-9. CrossRef
    4. Tompa P: Intrinsically unstructured proteins. / Trends Biochem Sci 2002, 27:527-33. CrossRef
    5. Uversky VN: What does it mean to be natively unfolded? / Eur J Biochem 2002, 269:2-2. CrossRef
    6. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z: Intrinsic disorder and protein function. / Biochemistry 2002, 41:6573-582. CrossRef
    7. Dunker AK, Obradovic Z: The protein trinity -linking function and disorder. / Nat Biotechnol 2001, 19:805-06. CrossRef
    8. Uversky VN: Natively unfolded proteins: a point where biology waits for physics. / Protein Sci 2002, 11:739-56. CrossRef
    9. Anfinsen CB: Principles that govern the folding of protein chains. / Science 1973, 181:223-30. CrossRef
    10. Uversky VN: Protein folding revisited. A polypeptide chain at the folding- misfolding-nonfolding cross-roads: which way to go? / Cell Mol Life Sci 2003, 60:1852-871. CrossRef
    11. Receveur-Brechot V, Bourhis JM, Uversky VN, Canard B, Longhi S: Assessing protein disorder and induced folding. / Proteins 2006, 62:24-5. CrossRef
    12. Bychkova VE, Dujsekina AE, Klenin SI, Tiktopulo El, Uversky VN, Ptitsyn OB: Molten globule-like state of cytochrome c under conditions simulating those near the membrane surface. / Biochemistry 1996, 35:6058-063. CrossRef
    13. Daughdrill GW, Pielak GJ, Uversky VN, Cortese MS, Dunker AK: Natively disordered proteins. / Protein Folding Handbook / (Edited by: Buchner J, Kiefhaber T). Weinheim, Wiley-VCH 2005, 271-53.
    14. Rose GD: Unfolded Proteins. / Advances in Protein Chemistry / (Edited by: Richards FM, Eisenerg DS, Kuriyan J). New York:Academic Press 2002., 62:
    15. Romero P, Obradovic Z, Dunker AK: Sequence data analysis for long disordered regions prediction in the calcineurin family. / Genome Inform Ser Workshop Genome Inform 1997, 8:110-24.
    16. Romero P, Obradovic Z, Kissinger CR, Villafranca JE, Dunker AK: Identifying disordered regions in proteins from amino acid sequences. / Proceedings of IEEE International Conference on Neural Networks Houston TX 1997, 90-5.
    17. Uversky VN, Gillespie JR, Fink AL: Why are " natively unfolded" proteins unstructured under physiologic conditions? / Proteins 2000, 41:415-27. CrossRef
    18. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence complexity of disordered protein. / Proteins 2001, 42:38-8. CrossRef
    19. Vucetic S, Brown CJ, Dunker AK, Obradovic Z: Flavors of protein disorder. / Proteins 2003, 52:573-84. CrossRef
    20. Linding R, Russell RB, Neduva V, Gibson TJ: GlobPlot: exploring protein sequences for globularity and disorder. / Nucleic Acids Res 2003, 31:3701-708. CrossRef
    21. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB: Protein disorder prediction: implications for structural proteomics. / Structure (Camb) 2003, 11:1453-459. CrossRef
    22. Liu J, Rost B: NORSp: predictions of long regions without regular secondary structure. / Nucleic Acids Res 2003, 31:3833-835. CrossRef
    23. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. / J Mol Biol 2004, 337:635-45. CrossRef
    24. Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, Obradovic Z: Optimizing long intrinsic disorder predictors with protein evolutionary information. / J Bio inform Comput Biol 2005, 3:35-0.
    25. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK: Comparing and combining predictors of mostly disordered proteins. / Biochemistry 2005, 44:1989-000. CrossRef
    26. Coeytaux K, Poupon A: Prediction of unfolded segments in a protein sequence based on amino acid composition. / Bioinformatics 2005, 21:1891-900. CrossRef
    27. Dosztanyi Z, Csizmok V, Tompa P, Simon I: The pair wise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. / J Mol Biol 2005, 347:827-39. CrossRef
    28. Yang ZR, Thomson R, McNeil P, Esnouf RM: RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. / Bioinformatics 2005, 21:3369-376. CrossRef
    29. Cheng J, Sweredoski M, Baldi P: Accurate prediction of protein disordered regions by mining protein structure data. / Data Mining and Knowledge Discovery 2005, 11:213-22. CrossRef
    30. Bracken C, akoucheva LM, Romero PR, Dunker AK: Combining prediction, computation and experiment for the characterization of protein disorder. / Curr Opin Struct Biol 2004, 14:570-76. CrossRef
    31. lakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK: Intrinsic disorder in cell-signaling and cancer-associated proteins. / J Mol Biol 2002, 323:573-84. CrossRef
    32. lakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK: The Importance of Intrinsic disorder for protein phosphorylation. / Nucleic Acids Res 2004, 32:1037-049. CrossRef
    33. Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL: Addressing the intrinsic disorder bottleneck in structural proteomics. / Proteins 2005, 59:444-53. CrossRef
    34. Peti W, Etezady-Esfarjani T, Herrmann T, Klock HE, Lesley SA, Wuthrich K: NMR for structural proteomics of Thermotoga maritima: screening and structure determination. / J Struct Funct Genomics 2004, 5:205-15. CrossRef
    35. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. / Nucleic Acids Res 1997, 25:3389-402. CrossRef
    36. Radivojac P, Obradovic Z, Brown CJ, Dunker AK: Improving sequence alignments for intrinsically disordered proteins. / Proceedings of Pacific Symposium on Biocomputing 3- January Lihue, Hawaii, USA 2002, 589-00.
    37. Brown CJ, Takayama S, Campen AM, Vise P, Marshall T, Oldfield CJ, Williams CJ, Dunker AK: Evolutionary rate heterogeneity in proteins with long disordered regions. / J Mol Evol 2002, 55:104-10. CrossRef
    38. Dunker AK, Brown CJ, Obradovic Z: Identification and functions of usefully disordered proteins. / Adv Protein Chem 2002, 62:25-9. CrossRef
    39. Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK: Protein flexibility and intrinsic disorder. / Protein Sci 2004, 13:71-0. CrossRef
    40. Obradovic Z, Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK: Predicting intrinsic disorder from amino acid sequence. / Proteins 2003,53(Suppl 6):566-72. CrossRef
    41. Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK: Exploiting heterogeneous sequence properties improves prediction of protein disorder. / Proteins 2005,61(Suppl 7):176-82. CrossRef
    42. Jin Y, Dunbrack RLJ: Assessment of disorder predictions in CASP6. / Proteins 2005,61(Suppl 7):167-75. CrossRef
    43. Vapnik V: / Statistical Learning Theory New York: John Wiley & Sons 1998.
    44. Davidson R, MacKinnon J: / Estimation and Inference in Econometrics New York: Oxford University Press 1993.
    45. Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, lakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, / et al.: DisProt: a database of protein disorder. / Bioinformatics 2005, 21:137-40. CrossRef
    46. Smith DK, Radivojac P, Obradovic Z, Dunker AK, Zhu G: Improved amino acid flexibility parameters. / Protein Sci 2003, 12:1060-072. CrossRef
    47. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / J Mol Biol 1990, 215:403-10.
    48. Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. / J Mol Biol 1999, 292:195-02. CrossRef
    49. Jones DT, Ward JJ: Prediction of disordered regions in proteins from position specific score matrices. / Proteins 2003,53(Suppl 6):573-78. CrossRef
    50. Rost B, Sander C: Combining evolutionary information and neural networks to predict protein secondary structure. / Proteins 1994, 19:55-2. CrossRef
    51. Wootton JC, Federhen S: Statistics of local complexity in amino acid sequences and sequence databases. / Comput Chem 1993, 17:149-63. CrossRef
    52. Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. / J Mol Biol 1982, 157:105-32. CrossRef
    53. Vihinen M, Torkkila E, Riikonen P: Accuracy of protein flexibility predictions. / Proteins 1994, 19:141-49. CrossRef
    54. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, / et al.: The Universal Protein Resource (UniProt). / Nucleic Acids Res 2005, 33:D154-D159. CrossRef
    55. Radivojac P, Obradovic Z, Dunker AK, Vucetic S: Feature selection filters based on the permutation test. / Proceedings of 15th European Conference on Machine Learning Pisa, Italy 2004, 334-46.
    56. Witten IH, Frank E: / Data Mining: Practical Machine Learning Tools and Techniques / 2 Edition San Francisco: Morgan Kaufmann 2005.
    57. Noble WS, / et al.: Support vector machine applications in computational biology. / Kernal Methods in Computational Biology / (Edited by: Schoelkopf B, Tsuda K, Vert JP). MIT Press 2004, 14:71-2.
    58. Joachims T: Making large-scale SVM learning practical. / Advances in Kernel Methods -Support Vector Learning / (Edited by: Schoelkopf B, Burges C, Smola A). Cambridge, MA: MIT Press 1999.
    59. Platt JC: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. / Advances in Large Margin Classifiers / (Edited by: Smola AJ, Bartlett P, Scholkopf B, Schuurmans D). MIT Press 1999, 61-4.
    60. Bishop CM: / Neural Networks for Pattern Recognition Oxford, UK: Oxford University Press 1995.
    61. Breiman L: Bagging predictors. / Mach Learn 1996, 24:123-40.
    62. Melamud E, Moult J: Evaluation of disorder predictions in CASP5. / Proteins 2003,53(Suppl 6):561-65. CrossRef
    63. Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. / Biochim Biophys Acta 1975, 405:442-51.
    64. Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. / Radiology 1983, 148:839-43.
    65. Efron B, Tibshirani RJ: / An Introduction to the Bootstrap New York: Chapman & Hall 1993.
    66. Prilusky J, Felder CE, Zeev-Ben-Mordehai T, Rydberg EH, Man O, Beckmann JS, Simian I, Sussman JL: Fold Index: a simple tool to predict whether a given protein sequence is intrinsically unfolded. / Bioinformatics 2005, 21:3435-438. CrossRef
    67. She M, Decker CJ, Chen N, Tumati S, Parker R, Song H: Crystal structure and functional analysis of Dcp2p from Schizosaccharomyces pombe. / Nat Struct Mol Biol 2006, 13:63-0. CrossRef
    68. de la Sierra-Gallay IL, Pellegrini O, Condon C: Structural basis for substrate binding, cleavage and allostery in the tRNA maturase R Nase Z. / Nature 2005, 433:657-61. CrossRef
    69. Ehebauer MT, Chirgadze DY, Hayward P, Martinez-Arias A, Blundell TL: High- resolution crystal structure of the human Notch 1 ankyrin domain. / Biochem J 2005, 392:13-0. CrossRef
    70. Gunasekaran K, Tsai CJ, Nussinov R: Analysis of ordered and disordered protein complexes reveals structural features discriminating between stable and unstable monomers. / J Mol Biol 2004, 341:1327-341. CrossRef
    71. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ: Jpred: a consensus secondary structure prediction server. / Bioinformatics 1998, 14:892-93. CrossRef
    72. Bradley CM, Barrick D: Limits of cooperativity in a structurally modular protein: response of the Notch ankyrin domain to analogous alanine substitutions in each repeat. / J Mol Biol 2002, 324:373-86. CrossRef
    73. Garner E, Cannon P, Romero P, Obradovic Z, Dunker AK: Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization. / Genome Inform Ser Workshop Genome Inform 1998, 9:201-13.
  • 作者单位:Kang Peng (1)
    Predrag Radivojac (2)
    Slobodan Vucetic (1)
    A Keith Dunker (3)
    Zoran Obradovic (1)

    1. Center for Information Science and Technology, Temple University, Philadelphia, PA, 19122, USA
    2. School of Informatics, Indiana University, Bloomington, IN, 47408, USA
    3. Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
  • ISSN:1471-2105
文摘
Background Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (?0 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions. Results We proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (?0 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder. Conclusion The VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use at http://www.ist.temple.edu/disprot/predictorVSL2.php

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700