Large scale data management for the sciences.
详细信息   
  • 作者:Malik ; Tanu.
  • 学历:Doctor
  • 年:2008
  • 导师:Burns, Randal C.
  • 毕业院校:The Johns Hopkins University
  • 专业:Computer Science.
  • ISBN:9780549312109
  • CBH:3288499
  • Country:USA
  • 语种:English
  • FileSize:5510929
  • Pages:150
文摘
Traditional enterprises and novel scientific applications are accumulating petabyte-scale datasets, which makes the need for large-scale data management more pressing than ever. Geographic distribution of the datasets accompanied by complex demands on data makes large-scale data management challenging. This is especially true for sciences that model complex physical and biological phenomena using data from multiple sources.;This dissertation addresses two critical problems for data management of scientific datasets: combining large number of diverse data sources for execution of scientific queries and executing data-intensive scientific queries efficiently, in terms of both network and I/O. As a first step towards scientific data management, this thesis describes design and specification of SkyQuery, a system that federates data seamlessly from several petabyte size, autonomous and heterogeneous Astronomy databases scattered worldwide. Using SkyQuery, scientists can write declarative queries that compare and merge multiple astronomical datasets. For efficient query execution and scalability, we propose Bypass-Yield Caching---a novel caching framework for database systems that dramatically reduces the network bandwidth requirements of data-intensive federations such as SkyQuery making them good network citizens. Our description of the bypass-yield cache includes novel cache evaluation metrics and several innovative algorithms. Distributed applications such as the Bypass Yield Cache often rely on a-priori knowledge of query cardinalities to make cache optimization decisions. In this context, we present a black-box approach to cardinality estimation that is suitable for distributed applications.;All our techniques are general in that they can be adapted to different scientific domains such as life and earth sciences where similar data management problems abound. The success of SkyQuery and its adoption by the National Virtual Observatory (NVO) is an example of data management systems enabling scientific endeavors.
      

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700