Spatial approximate string search.
详细信息   
  • 作者:Yao ; Bin.
  • 学历:Doctor
  • 年:2011
  • 导师:Li, Feifei,eadvisor
  • 毕业院校:The Florida State University
  • ISBN:9781124983448
  • CBH:3483669
  • Country:USA
  • 语种:English
  • FileSize:1486990
  • Pages:124
文摘
Many applications are featured with both text and location information, which leads to a novel type of search: spatial approximate string search Sas). The Sas is gaining attention from the database community only recently. A large variety of problems remain open. Spatial and text data have been independently studied for decades. Spatial data have many unique features that are drastically different from text data. As a result, most of the existing techniques for string processing are either inapplicable or inefficient when adapted to spatial databases. We have investigated four issues in the general area of Sas. They are: i) Spatial approximate string search in Euclidean space Esas); ii) Spatial approximate string search on road networks Rsas); iii) Selectivity Estimation for Esas Range Queries; iv) Multi-Approximate-Keyword Routing query on road networks. For efficiently answering spatial approximate string queries in Euclidean space, we propose a novel index structure, MhR-tree, which is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes. For the Rsas, we propose a novel method, RsasSol , which is superior to the base line algorithm in practice. The RsasSol is a combination technique of inverted list and reference points. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our methods. We also discuss how to estimate range query selectivity in Euclidean space since the query selectivity estimation is always important for the query optimization. Based on the spatial uniformity principle and the minimum number of neighborhoods principle for strings, we define a partitioning metric to facilitate the construction of buckets. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the R-tree, which works well in practice. Lastly, we go further beyond the basic kNN and range queries for Sas and consider an interesting application of Sas on the road network: find the shortest path with keywords constraints on the road network. We propose both exact and approximate solutions for this issue. For small set of query keywords m ≤ 6), our exact solutions can quickly find the top-k shortest pathes. For large set of query keywords, our approximate solutions can still return the answer on real time with good approximate quality.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700