SearchTree: Mining robust phylogenetic trees.
详细信息   
  • 作者:Deepak ; Akshay.
  • 学历:M.S.
  • 年:2010
  • 导师:Fernandez-Baca, David,eadvisorEulenstein, Oliverecommittee memberHuang, Xiaoqiuecommittee member
  • 毕业院校:Iowa State University
  • Department:Computer Science
  • ISBN:9781109776171
  • CBH:1476290
  • Country:USA
  • 语种:English
  • FileSize:875626
  • Pages:58
文摘
Phylogenetic trees providing high quality information and at the same time covering large number of species are essential for comparative biology. It is a widely accepted fact that with the currently available resources we are far from assembling one completely sampled phylogenetic tree for all life or one based on a very large subset of species), hence a need for an interim solution arises. Here we describe SearchTree, a software tool that allows users to query efficiently on an arbitrary user taxon list and returns high scoring matches from approximately one billion phylogenetic trees being constructed from molecular sequence data in GenBank. The core of SearchTree has two parts. The first is a pre-computed collection of phylogenetic species trees from GenBank sequence data consisting of approximately 10,000,000 data sets with 100 bootstrap trees for each set for a total of around 1 billion trees. The goal here is to ensure high coverage i.e., each taxon occurring in many trees). The second part is the search-retrieval process. The goal is to quickly retrieve the clusters and the subsequent trees from the large data set described above, maximizing the scoring function for the resultant set of trees and all the while keeping computational resources within a limit. Both parts were dealt separately due to their complexity; here we focus on the second part. The complete pre-computed data set of phylogenetic trees will be around 500 GB. Fast response times are achieved by SearchTree through a combination of techniques from information retrieval, notably inverted indexing, and from computational phylogenetics, especially for constructing consensus trees. The use of Redwood cluster, an advanced hardware configuration specifically tuned for this kind of work, has further improved the query times by 100

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700