Domain enhanced lookup time accelerated BLAST
详细信息    查看全文
  • 作者:Grzegorz M Boratyn (1)
    Alejandro A Sch?ffer (1)
    Richa Agarwala (1)
    Stephen F Altschul (1)
    David J Lipman (1)
    Thomas L Madden (1)
  • 刊名:Biology Direct
  • 出版年:2012
  • 出版时间:December 2012
  • 年:2012
  • 卷:7
  • 期:1
  • 全文大小:322KB
  • 参考文献:1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / J Mol Biol 1990, 215:403-10.
    2. Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. / Proc Natl Acad Sci USA 1988, 85:2444-448. CrossRef
    3. Jones DT, Taylor WR, Thornton JM: A mutation data matrix for transmembrane proteins. / FEBS Lett 1994, 339:269-75. CrossRef
    4. Ng PC, Henikoff JG, Henikoff S: PHAT: a transmembrane-specific substitution matrix. / Bioinformatics 2000, 16:760-66. CrossRef
    5. Müller T, Rahmann S, Rehmsmeier M: Non-symmetric score matrices and the detection of homologous transmembrane proteins. / Bioinformatics 2001, 17:S182-S189. CrossRef
    6. Xu HS, Ren WK, Liu XH, Li XQ: Aligning protein sequence and analyzing substitution pattern using a class-specific matrix. / J Biosci 2010, 35:295-14. CrossRef
    7. Altschul SF: Amino acid substitution matrices from an information theoretic perspective. / J Mol Biol 1991, 219:555-65. CrossRef
    8. Altschul SF: A protein alignment scoring system sensitive at all evolutionary distances. / J Mol Evol 1993, 36:290-00. CrossRef
    9. Jimenez-Morales D, Adamian L, Liang J: Detecting remote homologues using scoring matrices calculated from the estimation of amino acid substitution rates of beta-barrel membrane proteins. / In Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine Biology Society: 20-5 August, 2008; Vancouver, BC, IEEE Eng Med Biol Soc , 2008:1347-350.
    10. Baussand J, Deremble C, Carbone A: Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins. / Proteins 2007, 67:695-08. CrossRef
    11. Kuznetsov IB: Protein sequence alignment with family-specific amino acid similarity matrices. / BMC Res Notes 2011, 4:296. CrossRef
    12. McLachlan AD: Analysis of periodic patterns in amino acid sequences: collagen. / Biopolymers 1977, 16:1271-297. CrossRef
    13. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A: Use of the “perceptron-algorithm to distinguish translational sites inE. coli. / Nucleic Acids Res 1982, 10:2997-011. CrossRef
    14. McLachlan AD: Analysis of gene duplication repeats in the myosin rod. / J Mol Biol 1983, 169:15-0. CrossRef
    15. Staden R: Computer methods to locate signals in nucleic acid sequences. / Nucleic Acids Res 1984, 12:505-19. CrossRef
    16. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A: Information content of binding sites on nucleotide sequences. / J Mol Biol 1986, 188:415-31. CrossRef
    17. Taylor WR: Identification of protein sequence homology by consensus template alignment. / J Mol Biol 1986, 188:233-58. CrossRef
    18. Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: Detection of distantly related proteins. / Proc Natl Acad Sci USA 1987, 84:4355-358. CrossRef
    19. Berg OG, von Hippel PH: Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. / J Mol Biol 1987, 193:723-50. CrossRef
    20. Dodd IB, Egan JB: Systematic method for the detection of potential lambda Cro-like DNA-binding regions in proteins. / J Mol Biol 1987, 194:557-64. CrossRef
    21. Patthy L: Detecting homology of distantly related proteins with consensus sequences. / J Mol Biol 1987, 198:567-77. CrossRef
    22. Berg OG, von Hippel PH: Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. / J Mol Biol 1988, 200:709-23. CrossRef
    23. Altschul SF, Madden TL, Sch?ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. / Nucleic Acids Res 1997, 25:3389-402. CrossRef
    24. Biegert A, S?ding J: Sequence context-specific profiles for homology searching. / Proc Natl Acad Sci USA 2009, 106:3770-775. CrossRef
    25. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J: Database resources of the National Center for Biotechnology Information. / Nucleic Acids Res 2011, 39:D38-D51. CrossRef
    26. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH: CDD: a Conserved Domain Database for the functional annotation of proteins. / Nucleic Acids Res 2011, 39:D225-D229. CrossRef
    27. Eddy SR: A new generation of homology search tools based on probabilistic inference. / Genome Inform 2009, 23:205-11. CrossRef
    28. Sch?ffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. / Bioinformatics 1999, 15:1000-011. CrossRef
    29. Kann MG, Sheetlin SL, Park Y, Bryant SH, Spouge JL: The identification of complete domains within protein sequences using accurate E-values for semi-global alignment. / Nucleic Acids Res 2007, 35:4678-685. CrossRef
    30. Chandonia J-M, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: The ASTRAL Compendium in 2004. / Nucleic Acids Res 2004, 32:D189-D192. CrossRef
    31. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. / J Mol Biol 1995, 247:536-40.
    32. Van Walle I, Lasters I, Wyns L: SABmark–a benchmark for sequence alignment that covers the entire known fold space. / Bioinformatics 2005, 21:1267-268. CrossRef
    33. Gribskov M, Robinson NL: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. / Comput Chem 1996, 20:25-3. CrossRef
    34. Sch?ffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. / Nucleic Acids Res 2001, 29:2994-005. CrossRef
    35. Aravind L, Koonin EV: Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. / J Mol Biol 1999, 287:1023-040. CrossRef
    36. Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. / Protein Sci 2000, 9:232-41. CrossRef
    37. Panchenko AR, Bryant SH: A comparison of position-specific score matrices based on sequence and structure alignments. / Protein Sci 2002, 11:361-70. CrossRef
    38. Brown D, Krishnamurthy N, Dale JM, Christopher W, Sj?lander K: Subfamily hmms in functional genomics. / Pac Symp Biocomput 2005, 10:322-33. CrossRef
    39. Altschul SF, Gertz EM, Agarwala R, Sch?ffer AA, Yu Y-K: PSI-BLAST pseudocounts and the minimum description length principle. / Nucleic Acids Res 2009, 37:815-24. CrossRef
    40. Tatusov RL, Altschul SF, Koonin EV: Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. / Proc Natl Acad Sci USA 1994, 91:12091-2095. CrossRef
    41. Fong JH, Marchler-Bauer A: Protein subfamily assignment using the Conserved Domain Database. / BMC Res Notes 2008, 1:114-20. CrossRef
    42. Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. / Proc Natl Acad Sci USA 1990, 87:2264-268. CrossRef
    43. Sauder JM, Arthur JW, Dunbrack RL: Large-scale comparison of protein sequence alignment algorithms with structure alignments. / Proteins 2000, 40:6-2. CrossRef
  • 作者单位:Grzegorz M Boratyn (1)
    Alejandro A Sch?ffer (1)
    Richa Agarwala (1)
    Stephen F Altschul (1)
    David J Lipman (1)
    Thomas L Madden (1)

    1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA
文摘
Background BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i--. Biegert and S?ding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch. Results We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST. Conclusions DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST-link at http://blast.ncbi.nlm.nih.gov. Reviewers This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700