A Fast-Graph Approach to Modeling Similarity of Whole Genomes.
详细信息   
  • 作者:Breland ; Adrienne E.
  • 学历:Doctor
  • 年:2011
  • 导师:Harris, Frederick C.,eadvisorSchlauch, Karen A.,eadvisorNicolescu, Monicaecommittee memberGunes, Mehmetecommittee memberCushman, Johnecommittee member
  • 毕业院校:University of Nevada
  • Department:Computer Science
  • ISBN:9781124866529
  • CBH:3472740
  • Country:USA
  • 语种:English
  • FileSize:3888434
  • Pages:134
文摘
As increasing numbers of closely related genomic sequences become available, the need to develop methods for detecting fine differences among them also grows apparent. Several calls have been made for improved algorithms to exploit the wealth of pathogenic viral and bacterial sequence data that are rapidly becoming available to researchers. The first stage of our research addresses the computational limitations associated with whole-genome comparisons of large numbers of subspecies sequences. We investigate the potential for the use of fast, word-based comparative measures to approximate computationally expensive, full alignment comparison methods. Recent advances in next generation sequencing are providing a number of large whole-genome sequence datasets stemming from globally distributed disease occurrences. This offers an unprecedented opportunity for epidemiological studies and the development of computationally efficient, robust tools for such studies. In the second stage of our research, we present an approach that enables a quick, effective, and robust epidemiological analysis of large whole-genome datasets. We then apply our method to a complex dataset of over 4; 200 globally sampled Influenza A virus isolates from multiple host types, subtypes and years. These sequences are compared using an alignment-free method that runs in linear-time. These comparisons enable us to build 2-dimensional graphs that represent the relationships between sequences, where sequences are viewed as vertices, and high-degree sequence similarity as edges. These graphs prove useful, as they are able to model potential disease transmission paths when applied to viral sequences. Mixing patterns are then used to study the occurrence and patterns of edges between different types of sequence groups, such as the host type and year of collection, to better understand the potential of genotypic transfer between sequence groups.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700