基于频率的Read Mapping种子选择算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Frequency-Based Seed Selection Algorithm for Read Mapping
  • 作者:马海涛 ; 祁实 ; 于长永 ; 赵宇海
  • 英文作者:MA Hai-tao;QI Shi;YU Chang-yong;ZHAO Yu-hai;School of Computer Science & Engineering,Northeastern University;
  • 关键词:种子频率 ; 读取映射 ; 平均种子 ; 频率合并 ; 最优种子
  • 英文关键词:seed frequency;;read-mapping;;average seed;;frequency merge;;optimal seed
  • 中文刊名:DBDX
  • 英文刊名:Journal of Northeastern University(Natural Science)
  • 机构:东北大学计算机科学与工程学院;
  • 出版日期:2019-05-15
  • 出版单位:东北大学学报(自然科学版)
  • 年:2019
  • 期:v.40;No.344
  • 基金:国家自然科学基金资助项目(61772124,61332014,61401080,61402087);; 河北省自然科学基金资助项目(F2015501049);; 河北省教育厅项目(QN2014339);; 中央高校基本科研业务费专项资金资助项目(N150402002)
  • 语种:中文;
  • 页:DBDX201905001
  • 页数:5
  • CN:05
  • ISSN:21-1344/T
  • 分类号:4-8
摘要
选择具有最低频率的最优种子是一个复杂的计算问题,往往需要很长时间.提出了一种read的基于频率的合并种子选择算法(FMSS),该算法能够高效地选择接近最优的种子集合,可用于改善现有映射工具的性能.实验对比了平均种子选择方法和当前最优的种子选择策略(OSS,optimal seed solver),结果显示FMSS算法能够用很少的时间代价给出接近OSS的最优种子集合,这表明FMSS算法可集成到现有映射工具中用于处理更大规模的read mapping问题.
        The selection of the optimal seed( that is,the seed with the lowest frequency) is a complex calculation problem,which often takes a long time. A frequency-based merge seed selection( FMSS) algorithm is proposed,which can efficiently select the suboptimal set of seeds and improve the performance of existing mapping tools. In the experiment,FMSS was compared with the average seed selection method and the optimal seed solver( OSS). Experimental results show that FMSS can select the optimal set of seeds close to OSS,and the time cost of FMSS is far lower than that of the OSS algorithm. The FMSS algorithm is more suitable for seed selection in terms of time cost and seed selection quality.
引文
[1] Paolo F,Giovanni M,Veli M,et al. Compressed representations of sequences and full-text indexes[J]. ACM Transactions on Algorithms,2007,3(2):1-25.
    [2] Xin H Y,Nahar S,Zhu R,et al. Optimal seed solver:optimizing seed selection in read mapping[J]. Bioinformatics,2016,32:1632-1642.
    [3] Flicek P,Birney E. Sense from sequence reads:methods for alignment and assembly[J]. Nature Methods,2009,6(1):6-12.
    [4] Bin M,John T,Ming L. Patternhunter:faster and more ensitive homology search[J]. Bioinformatics,2002,18:440-445.
    [5] Xin H Y,Lee D,Hormozdiari F,et al. Accelerating read mapping with Fast HASH[J]. BMC Genomics,2013,14(1):1-13.
    [6] Santiago M,Michael S,Roderic G,et al. The GEM mapper:fast,accurate and versatile alignment by filtration[J]. Nature Methods,2012,9(12):1185-1187.
    [7] Suzuki S,Kakuta M,Ishida T,et al. Faster sequence homology searches by clustering subsequences[J]. Bioinformatics,2015,31:1183-1190.
    [8] Hao W,Jeffrey X,Can L. String similarity search:a Hashbased approach[J]. Transactions on Knowledge&Data Engineering,2018,99:170-184.
    [9] Heng L,Richard D. Fast and accurate read alignment with burrows-wheeler transform[J]. Bioinformatics,2009,25:1754-1760.
    [10] Ahmadi A,Behm A,Honnalli N,et al. Hobbes:optimized gram-based methods for efficient read alignment[J]. Nucleic Acids Research,2012,40(6):41.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700