Haplotyping a Diploid Single Individual with a Fast and Accurate Enumeration Algorithm
详细信息    查看全文
  • 关键词:Sequence analysis ; Haplotype ; Minimum error correction (MEC) ; Algorithm ; Enumeration
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9771
  • 期:1
  • 页码:399-411
  • 全文大小:465 KB
  • 参考文献:1.Bafna, V., Istrail, S., Lancia, G., Rizzi, R.: Polynomial and APX-hard cases of the individual haplotyping problem. Theoret. Comput. Sci. 335, 109–125 (2005)MathSciNet CrossRef MATH
    2.Geraci, F.: A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26(18), 2217–2225 (2010)CrossRef
    3.Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H., Duan, J., Carr, J.L., Lee, M.S., Koshy, B., Kumar, A.M., Zhang, G., Newell, W.R., Windemuth, A., Xu, C., Kalbfleisch, T.S., Shaner, S.L., Arnold, K., Schulz, V., Drysdale, C.M., Nandabalan, K., Judson, R.S., Ruano, G., Vovis, G.F.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489–493 (2001)CrossRef
    4.Wu, J.L., Liang, B.B.: A fast and accurate algorithm for diploid individual haplotype reconstruction. J. Bioinform. Comput. Biol. 11(4), 1350010 (2013)MathSciNet CrossRef
    5.Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)
    6.Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Comput. Biol. 8(3), 305–324 (2001)MathSciNet CrossRef
    7.O’Neil, S.T., Emrich, S.J.: Haplotype and minimum-chimerism consensus determination using short sequence data. BMC Genom. 13(Suppl. 2), S4 (2012)CrossRef
    8.Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)CrossRef
    9.Lippert, R., Schwartza, R., Lancia, G., Istrail, S.: Algorithmic strategies for the SNPs haplotype assembly problem. Brief. Bioinform. 3(1), 23–31 (2002)CrossRef
    10.Xie, M.Z., Chen, J.E., Wang, J.X.: Research on parameterized algorithms of the individual haplotyping problem. J. Bioinform. Comput. Biol. 5(3), 795–816 (2007)MathSciNet CrossRef
    11.Xie, M.Z., Wang, J.X.: An improved (and practical) parameterized algorithm for the individual haplotyping problem MFR with mate-pairs. Algorithmica 52, 250–266 (2008)MathSciNet CrossRef MATH
    12.Cilibrasi, R., Iersel, L.V., Kelk, S., Tromp, J.: The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1), 13–36 (2007)MathSciNet CrossRef MATH
    13.Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)CrossRef
    14.He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183 (2010)CrossRef
    15.Panconesi, A., Sozio, M.: Fast Hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)CrossRef
    16.Wang, Y., Wang, E., Wang, R.S.: A clustering algorithm based on two distance functions for MEC model. Comput. Biol. Chem. 31(2), 148–150 (2007)CrossRef MATH
    17.Genovese, L.M., Geraci, F., Pellegrini, M.: SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(4), 492–502 (2008)CrossRef
    18.Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., Lin, Y., MacDonald, J.R., Pang, A.W., Shago, M., Stockwell, T.B., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S.A., Busam, D.A., Beeson, K.Y., McIntosh, T.C., Remington, K.A., Abril, J.F., Gill, J., Borman, J., Rogers, Y.H., Frazier, M.E., Scherer, S.W., Strausberg, R.L., Venter, J.C.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), 2113–2144 (2007)CrossRef
    19.Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)CrossRef
    20.Chen, Z., Fu, B., Schweller, R., Yang, B., Zhao, Z., Zhu, B.: Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol. 15(5), 535–546 (2008)MathSciNet CrossRef
    21.Aguiar, D., Istrail, S.: Haplotype assembly in polyploidy genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)CrossRef
    22.Mazrouee, S., Wang, W.: FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics 30(17), i371–i378 (2014)CrossRef
    23.Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Lengauer, T., Schneider, R., Bork, P., et al. (eds.) ISMB 1999, pp. 202–210. AAAI Press, California (1999)
    24.Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim—a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)CrossRef
  • 作者单位:Xixi Chen (17)
    Jingli Wu (16) (17)
    Longyu Li (17)

    17. College of Computer Science and Information Technology, Guangxi Normal University, Guilin, 541004, China
    16. Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
  • 丛书名:Intelligent Computing Theories and Application
  • ISBN:978-3-319-42291-6
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9771
文摘
The minimum error correction (MEC) model is one of the important computational models for determining haplotype information from sequencing data, i.e., single individual single nucleotide polymorphism (SNP) haplotyping, haplotype reconstruction or haplotype assembly. Due to the NP-hardness of the model, a fast and accurate enumeration algorithm is proposed for solving it. The presented algorithm reconstructs the SNP sites of a pair of haplotypes one after another. It enumerates two kinds of SNP values, i.e., (0 1)T and (1 0)T, for the SNP site being reconstructed, and chooses the one with more support coming from the SNP fragments that are covering the corresponding SNP site. The experimental comparisons were conducted among the presented algorithm, the FAHR, the Fast Hare and the DGS algorithms. The results prove that our algorithm can get higher reconstruction rate than the other three algorithms.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700