New finite-size correction for local alignment score distributions
详细信息    查看全文
  • 作者:Yonil Park (1)
    Sergey Sheetlin (1)
    Ning Ma (1)
    Thomas L Madden (1)
    John L Spouge (1)
  • 刊名:BMC Research Notes
  • 出版年:2012
  • 出版时间:December 2012
  • 年:2012
  • 卷:5
  • 期:1
  • 全文大小:295KB
  • 参考文献:1. Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. / Proc Natl Acad Sci U S A 1990,87(6):2264-268. CrossRef
    2. Galambos J: / The asymptotic theory of extreme order statistics. New York: Wiley; 1978.
    3. Mott R: Maximum-likelihood-estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. / Bull Math Biol 1992,54(1):59-5.
    4. Waterman MS, Vingron M: Rapid and accurate estimates of statistical significance for sequence data base searches. / Proc Natl Acad Sci U S A 1994,91(11):4625-628. CrossRef
    5. Altschul SF, Gish W: Local alignment statistics. / Methods Enzymol 1996, 266:460-80. CrossRef
    6. Bundschuh R: Rapid significance estimation in local sequence alignment with gaps. / J Comput Biol 2002,9(2):243-60. CrossRef
    7. Chia N, Bundschuh R: A practical approach to significance assessment in alignment with gaps. / J Comput Biol 2006,13(2):429-41. CrossRef
    8. Newberg LA: Significance of gapped sequence alignments. / J Comput Biol 2008,15(9):1187-194. CrossRef
    9. Agrawal A, Brendel VP, Huang X: Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment. / Int J Computat Biol Drug Des 2008,1(4):347-67. CrossRef
    10. Poleksic A: Island method for estimating the statistical significance of profile-profile alignment scores. / BMC Bioinformatics 2009, 10:112. CrossRef
    11. Ortet P, Bastien O: Where does the alignment score distribution shape come from? / Evol Bioinformatics 2010, 6:159-87.
    12. Agrawal A, Huang X: Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices. / IEEE/ACM Trans Comput Biol Bioinformatics 2011,8(1):194-05. CrossRef
    13. Altschul SF: Evaluating the statistical significance of multiple distinct local alignments. In / Theoretical and computational methods in genome research. Edited by: Suhai S. New York: Plenum Press; 1997:1-4. CrossRef
    14. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. / Nucleic Acids Res 1997,25(17):3389-402. CrossRef
    15. Park Y, Sheetlin S, Spouge JL: Estimating the gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times. / Ann Stat 2009,37(6A):3697-714. CrossRef
    16. Asmussen S: / Applied probability and queues. New York: Springer; 2003.
    17. Altschul SF, Bundschuh R, Olsen R, Hwa T: The estimation of statistical parameters for local alignment score distributions. / Nucleic Acids Res 2001,29(2):351-61. CrossRef
    18. Hartmann AK: Sampling rare events: statistics of local sequence alignments. / Phys Rev E 2002.,65(5): doi:10.1103/PhysRevE.65.056102.
    19. Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: The ASTRAL Compendium in 2004. / Nucleic Acids Res 2004, 32:D189-D192. CrossRef
    20. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP - a structural classification of proteins database for the investigation of sequences and structures. / J Mol Biol 1995,247(4):536-40.
    21. Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. / Nucleic Acids Res 2001,29(14):2994-005. CrossRef
    22. Sheetlin S, Park Y, Spouge JL: Objective method for estimating asymptotic parameters, with an application to sequence alignment. / Phys Rev E 2011.,84(3): doi:10.1103/PhysRevE.84.031914.
  • 作者单位:Yonil Park (1)
    Sergey Sheetlin (1)
    Ning Ma (1)
    Thomas L Madden (1)
    John L Spouge (1)

    1. National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
文摘
Background Local alignment programs often calculate the probability that a match occurred by chance. The calculation of this probability may require a “finite-size-correction to the lengths of the sequences, as an alignment that starts near the end of either sequence may run out of sequence before achieving a significant score. Findings We present an improved finite-size correction that considers the distribution of sequence lengths rather than simply the corresponding means. This approach improves sensitivity and avoids substituting an ad hoc length for short sequences that can underestimate the significance of a match. We use a test set derived from ASTRAL to show improved ROC scores, especially for shorter sequences. Conclusions The new finite-size correction improves the calculation of probabilities for a local alignment. It is now used in the BLAST+ package and at the NCBI BLAST web site ( http://blast.ncbi.nlm.nih.gov).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700