Machine-learning-aided precise prediction of deletions with next-generation sequencing
详细信息    查看全文
文摘
When detecting deletions in complex human genomes, split-read approaches using short reads generated with next-generation sequencing still face the challenge that either false discovery rate is high, or sensitivity is low. To address the problem, an integrated strategy is proposed. It organically combines the fundamental theories of the three mainstream methods (read-pair approaches, split-read technologies and read-depth analysis) with modern machine learning algorithms, using the recipe of feature extraction as a bridge. Compared with the state-of-art split-read methods for deletion detection in both low and high sequence coverage, the machine-learning-aided strategy shows great ability in intelligently balancing sensitivity and false discovery rate and getting a both more sensitive and more precise call set at single-base-pair resolution. Thus, users do not need to rely on former experience to make an unnecessary trade-off beforehand and adjust parameters over and over again any more. It should be noted that modern machine learning models can play an important role in the field of structural variation prediction.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700