摘要
选择具有最低频率的最优种子是一个复杂的计算问题,往往需要很长时间.提出了一种read的基于频率的合并种子选择算法(FMSS),该算法能够高效地选择接近最优的种子集合,可用于改善现有映射工具的性能.实验对比了平均种子选择方法和当前最优的种子选择策略(OSS,optimal seed solver),结果显示FMSS算法能够用很少的时间代价给出接近OSS的最优种子集合,这表明FMSS算法可集成到现有映射工具中用于处理更大规模的read mapping问题.
The selection of the optimal seed( that is,the seed with the lowest frequency) is a complex calculation problem,which often takes a long time. A frequency-based merge seed selection( FMSS) algorithm is proposed,which can efficiently select the suboptimal set of seeds and improve the performance of existing mapping tools. In the experiment,FMSS was compared with the average seed selection method and the optimal seed solver( OSS). Experimental results show that FMSS can select the optimal set of seeds close to OSS,and the time cost of FMSS is far lower than that of the OSS algorithm. The FMSS algorithm is more suitable for seed selection in terms of time cost and seed selection quality.
引文
[1] Paolo F,Giovanni M,Veli M,et al. Compressed representations of sequences and full-text indexes[J]. ACM Transactions on Algorithms,2007,3(2):1-25.
[2] Xin H Y,Nahar S,Zhu R,et al. Optimal seed solver:optimizing seed selection in read mapping[J]. Bioinformatics,2016,32:1632-1642.
[3] Flicek P,Birney E. Sense from sequence reads:methods for alignment and assembly[J]. Nature Methods,2009,6(1):6-12.
[4] Bin M,John T,Ming L. Patternhunter:faster and more ensitive homology search[J]. Bioinformatics,2002,18:440-445.
[5] Xin H Y,Lee D,Hormozdiari F,et al. Accelerating read mapping with Fast HASH[J]. BMC Genomics,2013,14(1):1-13.
[6] Santiago M,Michael S,Roderic G,et al. The GEM mapper:fast,accurate and versatile alignment by filtration[J]. Nature Methods,2012,9(12):1185-1187.
[7] Suzuki S,Kakuta M,Ishida T,et al. Faster sequence homology searches by clustering subsequences[J]. Bioinformatics,2015,31:1183-1190.
[8] Hao W,Jeffrey X,Can L. String similarity search:a Hashbased approach[J]. Transactions on Knowledge&Data Engineering,2018,99:170-184.
[9] Heng L,Richard D. Fast and accurate read alignment with burrows-wheeler transform[J]. Bioinformatics,2009,25:1754-1760.
[10] Ahmadi A,Behm A,Honnalli N,et al. Hobbes:optimized gram-based methods for efficient read alignment[J]. Nucleic Acids Research,2012,40(6):41.