Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis
详细信息    查看全文
  • 作者:Bin Liu ; Junjie Chen ; Xiaolong Wang
  • 关键词:Protein remote homology ; Pseudo amino acid composition ; Support vector machine ; Principal component analysis
  • 刊名:Molecular Genetics and Genomics
  • 出版年:2015
  • 出版时间:October 2015
  • 年:2015
  • 卷:290
  • 期:5
  • 页码:1919-1931
  • 全文大小:959 KB
  • 参考文献:Althaus IW, Chou JJ, Gonzales AJ, Deibel MR, Chou KC, Kezdy FJ, Romero DL, Palmer JR, Thomas RC (1993) Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. Biochemistry 32:6548-554CrossRef PubMed
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-10CrossRef PubMed
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-402CrossRef PubMed Central PubMed
    Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32:D226–D229CrossRef PubMed Central PubMed
    Bjorndahl TC, Zhou GP, Liu X, Perez-Pineiro R, Semenchenko V, Saleem F, Acharya S, Bujold A, Sobsey CA, Wishart DS (2011) Detailed biophysical characterization of the acid-induced PrPc to PrPβ conversion process. Biochemistry 50:1162-173CrossRef PubMed
    Brandt BW, Heringa J (2009) WebPRC: the profile comparer for alignment-based searching of public domain databases. Nucleic Acids Res 37:W48–W52CrossRef PubMed Central PubMed
    Brenner SE, Koehl P, Levitt M (2000) The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res 28:254-56CrossRef PubMed Central PubMed
    Cao DS, Xu QS, Liang YZ (2013) Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960-62CrossRef PubMed
    Chang TH, Wu LC, Lee TY, Chen SP, Huang HD, Horng JT (2013) EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC. J Comput Aided Mol Des 27:91-03CrossRef PubMed
    Chen YK, Li KB (2013) Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. J Theor Biol 318:1-2CrossRef PubMed
    Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC (2012) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 7:e47843CrossRef PubMed Central PubMed
    Chen W, Feng PM, Lin H, Chou KC (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41:e68CrossRef PubMed Central PubMed
    Chen W, Lei TY, Jin DC, Lin H, Chou KC (2014) PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53-0CrossRef PubMed
    Chou KC (1989) Graphic rules in steady and non-steady state enzyme kinetics. J Biol Chem 264:12074-2079PubMed
    Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Func Genet 43:246-55 (Erratum: ibid., 2001, vol 44, 60) CrossRef
    Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10-9CrossRef PubMed
    Chou KC (2010) Graphic rule for drug metabolism systems. Curr Drug Metab 11:369-78CrossRef PubMed
    Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review). J Theor Biol 273:236-47CrossRef PubMed
    Chou KC (2014) Impacts of bioinformatics to medicinal chemistry. Med Chem (Shariqah, United Arab Emirates)
    Chou KC, Forsen S (1980) Graphical rules for enzyme-catalyzed rate laws. Biochemistry 187:829-35CrossRef
    Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A (2015) Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general PseAAC. J Theor Biol 364:284-94CrossRef PubMed
    Ding H, Deng EZ, Yuan LF, Liu L, Lin H, Chen W, Chou KC (2014a) iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. Biomed Res Int 2014:286419PubMed Central PubMed
    Ding H, Lin H, Chen W, Li ZQ, Guo FB, Huang J, Rao N (2014b) Prediction of protein structural classes based on feature selection technique. Interdiscip Sci 6:235-40CrossRef PubMed
    Dong QW, Wang XL, Lin L (2006) Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22:285-90CrossRef PubMed
    Du P, Wang X, Xu C, Gao Y (2012) PseAAC-builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117-19CrossRef PubMed
    Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495-506CrossRef PubMed Central PubMed
    Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 2
  • 作者单位:Bin Liu (1) (2) (3)
    Junjie Chen (1)
    Xiaolong Wang (1) (2)

    1. School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, Guangdong, People’s Republic of China
    2. Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, Guangdong, People’s Republic of China
    3. Gordon Life Science Institute, Belmont, MA, 02478, USA
  • 刊物类别:Biomedical and Life Sciences
  • 刊物主题:Life Sciences
    Cell Biology
    Biochemistry
    Microbial Genetics and Genomics
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1617-4623
文摘
Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou’s pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods. Keywords Protein remote homology Pseudo amino acid composition Support vector machine Principal component analysis
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.