Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties
详细信息    查看全文
  • 作者:Hui-Lin Huang (1) (2)
    I-Che Lin (1)
    Yi-Fan Liou (2)
    Chia-Ta Tsai (2)
    Kai-Ti Hsu (2)
    Wen-Lin Huang (3)
    Shinn-Jang Ho (4)
    Shinn-Ying Ho (1) (2)
  • 刊名:BMC Bioinformatics
  • 出版年:2011
  • 出版时间:December 2011
  • 年:2011
  • 卷:12
  • 期:1-supp
  • 全文大小:1410KB
  • 参考文献:1. Gao M, Skolnick J: A threading-based method for the prediction of DNA-binding proteins with application to the human genome. / PLoS Comput Biol 2009, 5 (11) : e1000567. CrossRef
    2. Shanahan HP, Garcia MA, Jones S, Thornton JM: Identifying DNA-binding proteins using structural motifs and the electrostatic potential. / Nucleic Acids Res 2004, 32: 4732鈥?741. CrossRef
    3. Ahmad S, Gromiha MM, Sarai A: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. / Bioinformatics 2004, 20: 477鈥?86. tics/btg432">CrossRef
    4. Ho SY, Yu FC, Chang CY, Huang HL: Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method. / Biosystems 2007, 90 (1) : 234鈥?41. CrossRef
    5. Cai YD, Lin SL: Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. / Biochim Biophys Acta 2003, 1648 (1鈥?) : 127鈥?33.
    6. Fang Y, Guo Y, Feng Y, Li M: Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features. / Amino Acids 2008, 34 (1) : 103鈥?09. CrossRef
    7. Kumar M, Gromiha MM, Raghava GP: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. / BMC Bioinformatics 2007, 8: 463. CrossRef
    8. Shao X, Tian Y, Wu L, Wang Y, Jing L, Deng N: Predicting DNA- and RNA-binding proteins from sequences with kernel methods. / J Theor Biol 2009, 258 (2) : 289鈥?93. CrossRef
    9. Yu X, Cao J, Cai Y, Shi T, Li Y: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. / J Theor Biol 2006, 240 (2) : 175鈥?84. CrossRef
    10. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M: AAindex: amino acid index database, progress report 2008. / Nucleic Acids Res 2008, 36 (Database issue) : D202鈥?05.
    11. Tomii K, Kanehisa M: Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. / Protein Eng 1996, 9 (1) : 27鈥?6. CrossRef
    12. Tung CW, Ho SY: POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. / Bioinformatics 2007, 23 (8) : 942鈥?49. tics/btm061">CrossRef
    13. Tung CW, Ho SY: Computational identification of ubiquitylation sites from protein sequences. / BMC Bioinformatics 2008, 9: 310. CrossRef
    14. Bezdek JC: / Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press; 1981.
    15. Ho SY, Chen JH, Huang MH: Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications. / IEEE Trans Syst Man Cybern B Cybern 2004, 34 (1) : 609鈥?20. CrossRef
    16. Ho SY, Shu LS, Chen JH: Intelligent evolutionary algorithms for large parameter optimization problems. / IEEE Trans Evolut Comput 2004, 8 (6) : 522鈥?41. CrossRef
    17. Dembele D, Kastner P: Fuzzy C-means method for clustering microarray data. / Bioinformatics 2003, 19 (8) : 973鈥?80. tics/btg119">CrossRef
    18. Gryk MR, Jardetzky O, Klig LS, Yanofsky C: Flexibility of DNA binding domain of trp repressor required for recognition of different operator sequences. / Protein Sci 1996, 5 (6) : 1195鈥?197. CrossRef
  • 作者单位:Hui-Lin Huang (1) (2)
    I-Che Lin (1)
    Yi-Fan Liou (2)
    Chia-Ta Tsai (2)
    Kai-Ti Hsu (2)
    Wen-Lin Huang (3)
    Shinn-Jang Ho (4)
    Shinn-Ying Ho (1) (2)

    1. Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
    2. Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
    3. Department of Multimedia Entertainment Science, Asia Pacific Institute of Creativity, Miaoli, Taiwan
    4. Department of Automation Engineering, National Formosa University, 632, Yunlin, Taiwan
  • ISSN:1471-2105
文摘
Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m =22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m =28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700