Sequence and structure similarity search in biological and XML databases.

详细信息

作者：Aghili ; S. Alireza.
学历：Doctor
年：2005
导师：Agrawal, Divyakant
毕业院校：University of California
专业：Computer Science.
ISBN：0542476614
CBH：3202701
Country：USA
语种：English
FileSize：1041705
Pages：170

文摘

The unprecedented growth of the Internet and biological databases has introduced challenging and complex data formats and hence furnishing unique collaborative venues for scientists of various disciplines. The set of such complex databases includes, (1) XML (eXtended Markup Language) databases, (2) DNA and Protein sequence and structure databases, (3) Microarray gene expressions, (4) Biomedical images, and (5) Sensor data stream and Time series databases. Given a source query pattern and a target database, the similarity search (range query or top-k) seeks to identify those records of the database which match the given query. The problem of similarity search in biological and textual databases has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to address the scalability issues and reduce the curse of dimensionality. However, complex applications demand special customization based on the inherent and underlying dynamics of the data. In this work, we study the integration of various transformation and shape summarization techniques on biological sequence and protein structure data, as well as path encoding in the tree-structured XML data, for more efficient similarity search query processing.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700