基于R-树多维索引结构的优化研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机技术的研究和发展,空间数据库在计算机图形学,地理信息系统和多媒体数据库等各个领域都有广泛的应用,空间数据库的研究越来越受到人们的重视。传统的关系数据库虽然能够支持空间数据的存储,但却无法支持对其高效的访问,这是因为空间数据的多维特性与关系数据库中的一般索引不相适应。一般索引只适合对一维数据进行索引,因为其索引项是一维线性且严格有序的。空间数据的多维特性在任何方向上并不存在优先级问题,因此需要研究特殊的高效多维索引以适应多维特性的空间数据。多维索引由此应运而生,多维索引主要依靠空间对象之间的邻接性对数据进行组织,它的索引项通常是多维空间下的点或区域。
     由于空间数据本身的复杂性,以及目前对海量空间数据快速查询的要求日益提高,多维索引作为空间数据库中的重要组成部分,可以加快对空间对象的检索。因此,如何建立更有效的多维索引结构一直是空间数据库领域最现实、最紧迫、也是最前沿的研究课题之一。
     在论述了多维索引技术的相关概念以及多维索引技术的发展历程后,本文在研究R-树等有代表性且效率较高的多维索引技术的基础上,主要围绕两个问题进行了研究和取得的相应成果:
     第一:本文针对R*-树多维索引在强制重插算法上的不足,提出了一种新的强制重插算法用以改进R*-树多维索引结构。研究实验表明:改进后的R*-树与传统的R*-树相比在索引空间利用率,动态创建索引,索引检索等方面具有更高的性能。
     第二:R-树静态生成技术的Hilbert R-树算法在构建R-树的过程中容易造成结点之间的重叠。针对这一问题,本文提出了一种改进的静态生成算法。该算法具有存储利用率高,而且查询效率高的优点。
Along with the research and development of computer technology, spatial database is used in so many domains as the computer graphics, the geographic information system,the multimedia database and so on. The research of spatial database has become more and more important. Although relational database can store spatial objects's data, but it does not support efficient access to spatial objects. The multi-dimensional aspect of spatial data makes incompatible with nomal index in relational database. Search key in normal index is one-dimention-based, and is only suitable for indexing one dimentional data. Each dimention of spatial data is not in precedence to other, so special index is needed to support spatial data, which leads to the introduction of multi-dimensional index. Multi-dimensional index utilizes some kind of spatial relationship to organize data entries, with each key value seen as a point or region in a multi-dimensional space.
     Due to the high complexity of spatial objects and spatial queries and also due to extremely large spatial data volumes,currently spatial database is faced with urgent requirements on large data storage,effcient access method and fast query processing. Multi-dimensional index is a indispensable part of spatial database, and can dramatically improve efficiency when doing a spatial query. So, how to design proper multi-dimensional index structure is one of the most realistic, pressing and leading scope in spatial database.
     After addressing the related concepts of multi-dimensional index technologies and reviewing the development of multi-dimensional index technologies, this paper analyses the typical and efficient multi-dimensional index technologies especial as R-tree in detail. This thesis puts forward research concerning the following two issues:
     Firstly, Aiming at the shortcoming of Forced Reinsert algorithm in R*-Tree multi-dimensional index, the thesis introduces a new Force Reinsert algorithm in order to optimize R*-Tree multi-dimensional index structure. The experiment results show that improved R*-Tree is better performance than the original R*-Tree in index space utility, index dynamic construction and index retrieval.
     Secondly,In the aspect of buck-loading R-tree algorithm,the Hilbert R-Tree algorithm always makes lots of overlap between tree nodes.To solve this problem,this paper propose a new bulk-loading algorithm for the R-Tree,which called new Hilbert R-tree. This new algorithm has the merit of high space utilization and perfect performance in searching.
引文
[1] Antonin Guttman, R-trees: a dynamic index structure for spatial searching[C],USA, Proc. ACM SIGMOD, 1984,pp.47-57.
    [2] T. Sellis, N. Roussopoulos, C. Faloutsos. The R+-tree: A dynamic index for mutidimensional objects[C]. In Proceedings of the Thirteenth International Conference on Very Large Databases (VLDB), 1987,pp 507--518.
    [3] BECKMANN N,et al.R*-Tree:An efficient and robust access method for points and rectangles[C]. USA,Proc.ACM SIGMOD, 1990,pp.322-331.
    [4] Ibrahim Kamel, Christos Faloutsos . Hilbert R-tree: An Improved R-tree using Fractals[C]. Proceedings of the 20th International Conference on Very Large Data Bases. 1994,pp. 500– 509.
    [5] V.Gaede and O.Gunther, multidimentional access methods[C] Computing Surveys, 1998,pp.217- 231.
    [6]郭薇,郭箐,胡志勇空间数据库索引技术[M]上海,上海交通大学出版社,2006,pp.105-117
    [7] Oracle Corporation.Oracle Spatial User's Guide and Reference 10g Release10.2 http://www.oracle.com/global/cn/documentation/10g/database/oracle_spatial_best_practices_twp.pdf .2005-10
    [8] IBM Corporaion. Informix Spatial DataBlade Overview http://www-01.ibm.com/software/data/informix/blades/spatial. 2006-9
    [9] Sun Corporation. Mysql User's Guide and Reference 5.1 (open source database) http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html 2008-1
    [10] ESRI's Reseach Report of Spatial Database Engine http://proceedings.esri.com/library/userconf/proc96/to100/pap094/p94.htm. 1995
    [11] N. Roussopoulos,S.Kelley, and F. Vincent, Nearest neighbor queries [C],USA, Proc.ACM SIGMOD, 1995,pp. 71-79.
    [12] Katayama N,Satoh S.The SR-Tree:An index structure for high-dimensional nearest neighbor queries[C],USA, Proc.ACM SIGMOD, 1997,pp.369-380.
    [13] Donghui Zhang, Tian Xia. A novel improvement to the R*-tree spatial index using gain/loss metrics[C],USA,Proceedings of the 12th annual ACM international workshop on Geographic information systems Washington DC, 2004,pp. 204– 213.
    [14] Gisli R. Hjaltason,Hanan Samet. Distance browsing in spatial databases[C].ACM Transactions on Database Systems NY, USA , 1999 Volume 24,Issue 2 ,pp. 265 - 318.
    [15] Pentland A, Picard R W, and Sclaroff S. Photobook: Content-Based Manipulation of Image Databases[J]. Int. Journal of Computer Vision,1996,18(3): pp.233-254.
    [16]谈晓军,涂建光.一种自适应的两阶段R-树批生成算法[J].武汉大学学报(信息科学版).2003.vol.28 pp.31-38.
    [17]郑玉明,廖湖声,陈镇虎.空间数据库引擎的R-树索引[J].计算机工程,2004.vol 30 pp.38-39.
    [18] Finlayson G D. Color in perspective[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1996,18(10):pp.1034-1038.
    [19] Forsyth D A. A novel algorithm for color constancy[J]. International Journal of Computer Vision, 1990,5(1):pp.5-36.
    [20] Mojsilovic A, Kovacevic J,Hu J et al. Matching and retrieval based on the vocabulary and grammar of color patterns[J]. IEEE Trans. Image Processing, 2000,9(1):pp.38-54.
    [21] Swain M J and Ballard B H. Color indexing[J]. International Journal of Computer vision,1991 7(1):pp.11-32.
    [22]蒋子阳,周志强等基于改进R-树的空间索引技术研究[J]工程地球物理学报2007,4(6) pp.637-643.
    [23]卢炎生,向祥兵等CQR-tree空间数据库索引结构及实现算法[J]计算机工程与科学2006,28(6) pp.108-111.
    [24]张奋,潘梅生,邹北骥基于SR-树的空间对象最近邻查询[J]计算机工程与应用2007,43(4) pp.173-175.
    [25]张奋,肖政宏基于SR-树的空间对象反最近邻查询技术研究[J]西华大学学报2007,26(3) pp.44-47.
    [26]龚俊,柯胜男,鲍曙明一种全新的R-树节点选择算法[J]计算机应用研究2008,25(10) pp.46-49.
    [27]Chang T and Kuo C.-C J. Texture analysis and classification with tree-structured wavelet transform[C]. IEEE Trans Image Proc. 1993,2(4), pp.429-441.
    [28] Papadias D,Theodoridis Y, and Sellis T. The Retrieval of Direction Relations using R-trees Database and Expert Systems Applications[C]. DEXA’94 Athens Greece 1994,pp.173-182.
    [29] Ohanian P and Dubes R C. Performance evaluation for four classes of texture feature[J]. Pattern Recogniton. 1992,25(8), pp.819-833.
    [30] T.Brinkhoff, H.Kriegel, R.Schneider Multi-step Processing of Spatial join[C] In Proc ACM SIGMOD Symp Conf on the Management of Data 1994, pp.197-208..
    [31] Roussopoulos N Leifker D.Drect. Spatial Search on Pictorial Database Using Packed R-trees[C].ACM SIGMOD Int.Conf.on Management of Data.Austin,Texas,1995, pp:17~31.
    [32] Orenstein J A Spatial query processing in an object-oriented database system[C] Proc ACM SIGMOD Int Conf on Management of Data Washington DC 1996 pp.181-190.
    [33] Kwon,D.,Lee,S.,Lee,S.Indexing the Current Positions of Moving Objects Using the Lazy Update R-tree[C].in Proceedings of the Third International Conference on Mobile Data Management.Singapore.2002.pp.113-120
    [34] Bins J,Draper B A. Feature Selection from Huge Feature Sets[C]. In Proc. of the 8th IEEE International conference on Computer Vision,2001,pp.159-165.
    [35] Porkaew,K.,Lazaridis,I.,Mehrotra,S.Querying Mobile Objects in Spatio-Temporal Databases[C].in Proceedings of 7th International Symposium on Advances in Spatial andTemporal Databases.Redondo Beach,CA,USA.2001.pp.59-78.
    [36] Arnold W M, Marcel W, Simone S, et al. Content-based image retrieval at the end of the early years[J]. IEEE Trans. On PAMI,2000,22(12):pp.1349-1379.
    [37] Lee et al.Information embedding based on user’s relevance feedback for image retrieval[C].In Proc SPIE Symposium on Voice,Video and Data Communications,Multi-media Storage and Archiving Systems IV, 1999,pp.20-22.
    [37] Dimitris Papadias.Hill Climbing Algorithms for Content-Based Retrieval of Similar Configurations[C].the ACM Conference on Information Retrieval(SIGIR), Athens,2000,pp.24-28.
    [38] Ganeshanandam S & Krzanowski W J,On selecting variables and assessing their performance in linear discriminant analysis[J]. Australian journal of statistics 1989.31(3):pp.433-447.
    [39]边肇棋,张学工.模式识别(第二版)[M].北京,清华大学出版社. 2000.
    [40] Theodoridis S & Koutroumbas K. Pattern recognition[M].Elsevier, 2003.
    [41] Dougherty E R et al. Genomic signal processing: the salient issue[J]. EURASIP Journal on applied signal processing 2004,1:pp.146-153.
    [42] Kim S et a1. Strong feature sets from small samples[J]. Journal of computational biology, 2002,9: pp.127-146.
    [43] Jain A K, Duin P, and Mao J. Statistical pattern recognition: A review[J]. IEEE Trans. on PAAMI, 2000, 22(1):pp.4-37.
    [44] Baudat G & Anouar F. Feature vector selection and projection using kernels[J].Neurocomputing,2003,pp.21-38.
    [45] Dash M & Liu H. Consistency-based search in feature selection[J]. Artificial intelligence ,2003,pp.155-176.
    [46] Delltling M & Buhlmann P. Finding predictive gene groups from microarray data[J]. Journal of multivariate analysis, 2004,pp.106-131.
    [47]陆锋,周成虎.一种基于Hilbert排列码的GIS空间索引方法[J].计算机辅助设计与图形学学报, 2001,13 (5),pp.424-429.
    [48]张明波,陆锋等R-树家族的演变和发展[J]计算机学报, 2005,28(3) pp.289-299.
    [49]董道国,梁刘红,薛向阳VAR-Tree一种新的高维数据索引结构[J].计算机研究与发展, 2005,42(1):pp.10-17
    [50] Wu P,Manjunath B S,Shin H D. Dimensionality Reduction for Image Retrieval[C]. In Proc. IEEE International Conference on Image Processing(ICIP 2000),Vancouver, Canada, 2002,3 pp.726-729.
    [51] Lee T. , Sukho. OMT. Overlap minimizing top-down bulk loading algorithm for R-tree[J]. Advanced Information Systems Engineering, 2003,7 pp.69-72
    [52] Chen Li,Choubey R.,Rundensteiner E. Merging R-trees : Efficient strategies for local bulk insertion[J]. Geoinformatica , 2002,6(1) pp.7-34.
    [53] Lee Taewon, Moon Bongki, Lee Sukho. Bulk insertion for R-tree by seeded clustering[C]. In : Proceedings of DEXA,Prague,Czech Republic,2003,pp.129-138
    [54] Corral A.,Vassilakopoulos M.,Manolopoulos Y. The impact of buffering on closest pairs queries using R-trees[C]. In Proceedings of the 5th ADBIS, Vilnius,Lit huania,2001,pp.41-54.
    [55] Korn F.,Mut hukrishnan S.Influence sets based on reverse nearest neighbor queries[C]. Proceedings of SIGMOD , Dallas ,Texas,USA,2000,pp.201-212.
    [56] Rui Ding,Xiaofeng Meng. A Quadtree Based Dynamic Attribute Index Structure and Query Process [C]. In Proceedings of Computer Networks and Mobile Computing.2001, pp.446-451.
    [57]郑坤,朱良峰,吴信才等. 3D GIS空间索引技术研究[J].地理与地理信息科学,2006,22(4),pp 35-39.
    [58]朱庆,龚俊.一种改进的真三维R-树空间索引方法[J].武汉大学学报(信息科学版), 2006,31(4),pp 340~343.
    [59]邝颖杰,刘燕.基于改良NB树的内存高维索引算法[J].农业网络信息, 2007,4,pp59~62.
    [60]姜素芳,陈天滋.多路R-树连接的加权处理[J].计算机工程与应用, 2006,31,pp 174-178.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700