基于主存的高维空间连接及查询算法研究

英文题名：Research of High-Dimensional Space Join and Query Algorithms Based on Main-Memory
作者：刘艳
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：主存高维连接 ; 相似连接 ; kNN查询 ; kNN连接 ; RkNN连接
英文关键词：main-memory high-dimensional join ; similarity join ; k nearest
英文关键词：neighbor query ; k nearest neighbor join ; reverse k nearest neighbor join
学位年度：2011
导师：郝忠孝
学科代码：081203
学位授予单位：哈尔滨理工大学
论文提交日期：2011-12-01

摘要

许多新出现的数据库应用,如CAD、多媒体、医学图像、时间序列、分子生物学和科学数据库,将它们的数据表示为多/高维特征向量。每个特征向量由d个值组成,可以解释为d维空间中的坐标以及一些相关内容的数据。通常使用两个特征向量之间的距离表示原始高维对象之间的(不)相似性。在许多应用中,需要结合两个数据集中的点对,该操作本质上是高维连接(高维数据空间中各种相似连接的统称)。
     到目前为止,对高维连接的大多数研究主要集中在基于磁盘的大量数据的高维连接。目前计算机可用主存的容量越来越大且价格逐渐低廉,以及对高维连接的有效处理的需求表明,一大类问题的高维连接能够在主存中处理。△-tree是一种新提出的多层索引,能够加速主存环境中的高维查询,已被证实优于其它主存索引。因此,本文以△-tree为基础,设计了几种类型的高维连接及查询算法,以满足不同应用的需求。本文的主要工作可概括为以下5点：
     (1)研究了高效的主存索引△-tree及相似连接的特性,分析了△-tree中节点的修剪策略。充分利用△-tree的特性,为单个数据集提出了主存自相似连接算法△-tree-Join。充分利用聚类重合的特性,以△-tree中各节点的中心为聚类中心,设计了主存索弓△-tree-S,基于△-tree-S采用自顶向下的模式为两个不同的数据集提出了主存相似连接算法△-tree-Join*,在这两个算法中使用较少的维数计算节点之间的距离及数据点与节点之间的距离,通过该距离过滤掉不必要的节点和数据点,减少计算量,从根本上提高主存相似连接的效率,实验结果表明这两种连接算法均具有较高的连接效率。
     (2)分析了kNN查询的搜索半径未知的特点,对主存中的高维k最近邻(kNN)查询算法进行了研究。基于△-tree提出了三种用于高维数据的主存kNN查询算法,通过性能分析和实验证明它们比已有的算法具有更高的查询效率,其中以第三种查询算法BU_DF_knn_Search效率最高。该算法首先通过隶属节点的定义确定查询点在△-tree中的隶属叶子节点,并在该节点中进行kNN搜索,利用较优的kNN候选尽快缩小修剪距离,然后按自底向上的顺序对以隶属叶子节点的各祖先节点为根的子树进行深度优先遍历。
     (3)分析了kNN连接的性质,对主存中的kNN连接算法进行了研究。根据需求提出了主存kNN连接的索引结构△-tree-R,该索引对生成的每个节点进行了编码。基于△-tree-R和△-tree-S提出了主存kNN连接算法,该算法利用△-tree-R中的编码在△-tree-S中进行解码,快速定位位置对应节点,利用其中的近优kNN候选,并采用自底向上和深度优先搜索相结合的遍历策略,快速缩小kNN修剪距离。实验结果显示A-tree-KNN-Join是一种有效的主存kNN连接算法。
     (4)分析了RkNN查询的解决方法及高维RkNN查询的特点,得出选择自修剪方法解决高维RkNN查询的结论。基于△-tree-R提出了△-Rdknn-tree索引结构,通过在该索引结构上进行△-tree-KNN-Join的自连接算法,预处理数据集中点的kNN距离信息,并将这些距离扩展到索引△-Rdknn-tree的各层节点中,基于该索引给出了主存反向k最近邻(RkNN)查询算法。
     (5)研究了主存环境中面向集合的RkNN查询即主存RkNN连接问题。为进行RkNN连接的两个数据集分别构建索引△-Rdknn-tree和△-tree,利用△-Rdknn-tree中各节点具有的RyNN搜索距离信息,基于相似连接算法△-tree-Join*设计了主存RkNN连接算法,通过性能分析证明该算法是有效的。
     本课题提出的所有主存查询和连接算法均经理论和实验证明,具有实用价值,为主存中高维数据的各种查询及连接算法的研究提供了理论和实践基础。
Many emerging database applications such as CAD, multimedia, medical image, time series, molecular biology and scientific databases represent their data as multi/high-dimensional feature vectors. Each feature vector consists of d values which can be interpreted as coordinates in a d-dimensional space plus some associated content data. The distance between two feature vectors is commonly used to measure the degree of (dis-)similarity between the original high-dimensional objects (with regard to the feature represented). There is increasing need to combine similar tuples (points) of two datasets for many applications.The operation of generating all pairs is in essence high-dimensional join (high-dimensional join is a general designation of all kinds of similarity join).
     So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes and lower price of main memory available on current computers, and the need for efficient processing of high-dimensional joins suggest that high-dimensional joins for a large class of problems can be processed in main memory. Δ-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Therefore, in order to meet different application needs, several kinds of high-dimensional joins and queries were designed based on index structure Δ-tree. The main work of this topic can be summarized as follows:
     Firstly, the efficient main-memory index Δ-tree and the characteristics of similarity join were studied, the pruning strategies of the nodes in Δ-tree were analyzed. Made full use of the properties of Δ-tree, A novel main-memory similarity self-join algorithm Δ-tree-Join was presented for a single dataset. Used the properties of cluster overlap and taked the center of nodes in Δ-tree as the center of clusters, main-memory index structure Δ-tree-S was designed.A main- memory similarity join algorithm called Δ-tree-Join*adopted the top-down join scheme for combining two different database sets was presented based on A-tree-S. The two algorithms computed the distances between clusters and between point and cluster with fewer number of dimensions, so as to filter unnecessary nodes or points, reduce computations and improve main-memory join efficiency fundamentally. Experimental results indicate that these two join algorithms have high join efficiency.
     Secondly, the feature that the searching radius of kNN query is unknown apriori was analyzed, the algorithm of high-dimensional kNN(k Nearest Neighbor) query in main-memory was studied. Three kinds of main-memory kNN query algorithms based on A-tree for high dimensional data were developed. Preformace analysis and experiments prove that these three algorithms improve query efficiency compare to existing algorithm, among them the third query algorithm named BU_DF_knn_Search has the best efficiency. This algorithm determines the membership leaf node of query point in A-tree through the definition of membership node firstly, and searches kNN in this node, narrows the pruning distance quickly using near optimal kNN candidates, then traverses the subtrees used the ancestors of membership leaf nodes as root nodes by depth first manner from bottom to top.
     Thirdly, the properties of kNN join was analyzed, the algorithm of kNN join in main-memory was studied. According to requirements index structure A-tree-R for main-memory kNN join was proposed, every generated node in this index was coded. A main-memory kNN join algorithm was presented based on Δ-tree-R and Δ-tree-S, this algorithm decodes in Δ-tree-S by using the code of A-tree-R, fast locates positional correspondence nodes, uses the near optimal kNN candidates in them and adopts the traversal order of combining bottom up and depth first search, narrows kNN pruning distances rapidly. Experiments results illustrate that Δ-tree-KNN-Join is an efficient kNN-join method in main memory.
     Fourthly, the solutions of RkNN (Reverse k Nearest Neighbor) query and the characteristics of high-dimensional RkNN query were analyzed, the conclusion that choose self-pruning approaches to process high-dimensional RkNN query was drawed. An index structure called Δ-Rdknn-tree was proposed based on Δ-tree-R, precomputed kNN distances of points in the dataset by main-memory kNN self-join algorithm Δ-tree-KNN-Join based on this index and propagated these distances to higher level index nodes. Main-memory RkNN query algorithm based on this index was proposed.
     Fifthly, the problem of set-oriented RkNN queries in main-memory enviroment namely main-memory RkNN join was studied. Constructed indexes Δ-Rdknn-tree and Δ-tree for the two datasets that were going to join, used the RkNN searching distance information of nodes in Δ-Rdknn-tree,main-memory RkNN join algorithm was designed based on the similarity join algorithm Δ-tree-Join*, performance analysis was conducted for the effectiveness of this algorithm.
     All the main-memory query and join algorithms in this paper were certificated by theories and experiments, have practical value and lay the theoretical and practical foundation of the research for all kinds of high-dimensional queries and joins algorithms in main memory.

引文

[1]JAGADISH H V. A Retrieval Technique for Similar Shapes[C]//CLIFFORD J, KING R, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Denver, Colorado,1991:208-217.
    [2]SAMET H. Applications of Spatial Data Structures:Computer Graphics, Image Processing, and GIS[M]. Boston, MA, USA:Addison-Wesley, Reading,1990.
    [3]KRIEGEL H P, SEIDL T. Approximation-Based Similarity Search for 3-D Surface Segments[J]. GeoInformatica Journal,1998,2(2):113-147.
    [4]KORN F, SIDIROPOULOS N, FALOUTSOS C, et al. Fast Nearest Neighbor Search in Medical Image Databases[C]//VIJAYARAMAN T M, BUCHMANN A P, MOHAN C, et al, eds. Proc. of the 22th International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, 1996:215-226.
    [5]AGRAWAL R, LIN K, SAWHNEY H, et al. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases[C]// DAYAL U, GRAY P M D, NISHIO S, eds. Proc. of the 21th International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995:490-501.
    [6]BOHM C, BRAUNMULLER B, BREUNIG M M, et al. High Performance Clustering Based on the Similarity Join[C]//Proc. of the 9th International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA,2000:298-305.
    [7]GUHA S, RASTOGI R, SHIM K. CURE:An Efficient Clustering Algorithm for Large Databases[C]//HAAS L M, TIWARY A, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Seattle, Washington, USA,1998:73-84.
    [8]KNORR E M, NG R T. Algorithms for Mining Distance-Based Outliers in Large Datasets[C]//GUPTA A, SHMUELI O, WIDOM J, eds. Proc. of the 24th International Conference on Very Large Databases (VLDB), New York City, New York, USA,1998:392-403.
    [9]KOPERSKI K, HAN J. Discovery of Spatial Association Rules in Geographic Information Databases[C]//EGENHOFER M J, HERRING J R, eds. Proc. of the 4th International Symposium on Advances in Spatial Databases(SSD), Portland, Maine, USA,1995:47-66.
    [10]SANDER J, ESTER M, KRIEGEL H P, et al. Density-Based Clustering in Spatial Databases:the Algorithm GDBSCAN and its Applications[J]. Data Mining and Knowledge Discovery,1998,2(2):169-194.
    [11]HATTORI K, TORII Y. Effective Algorithms for the Nearest Neighbor Method in the Clustering Problem[J]. Pattern Recognition,1993,26(5):741-746.
    [12]SIBSON R. SLINK:An Optimally Efficient Algorithm for the Single-Link Cluster Method[J]. The Computer Journal,1973,16(1):30-34.
    [13]ANKERST M, BREUNIG M M, KRIEGEL H-P, et al. OPTICS:Ordering Points to Identify the Clustering Structure[C]//DELIS A, FALOUTSOS C, GHANDEHARIZADEH S, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Philadelphia, Pennsylvania, USA,1999,28(2):49-60.
    [14]KNORR E M, NG R T. Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining[J]. IEEE Transactions on Knowledge and Data Engineering,1996,8(6):884-897.
    [15]FAYYAD U M, PIATETSKY-SHAPIRO G, SMYTH P. From Data Mining to Knowledge Discovery:An Overview[C]//FAYYAD U M, PIATETSKY-SHAPIRO G, SMYTH P, et al, eds. Advances in Knowledge Discovery and Data Mining. Menlo Park, CA, USA:AAI Press/The MIT Press,1996:1-34.
    [16]BRACHMANN R, ANAND T. The Process of Knowledge Discovery in Databases[C]//FAYYAD U M, PIATETSKY-SHAPIRO G, SMYTH P, et al, eds. Advances in Knowledge Discovery and Data Mining. Menlo Park. CA, USA:AAI Press/The MIT Press,1996:37-57.
    [17]BOHM C, KREBS F. The k-nearest Neighbor Join:Turbo Charging the KDD Process[J]. Knowledge and Information Systems (KAIS),2004,6(6):728-749.
    [18]BREUNIG M M, KRIEGEL H P, KROGER P, et al. Data Bubbles:Quality Preserving Performance Boosting for Hierarchical Clustering[C]//AREF W G, ed. Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Santa Barbara, CA, USA,2001:79-90.
    [19]BOHANNON P, MCLLROY P, RASTOGI R. Main-Memory Index Structures with Fixed-Size Partial Keys[C]//AREF W G, ed. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Santa Barbara, CA, USA,2001:163-174.
    [20]CHEN S, GIBBONS P B, MOWRY T C. Improving Index Performance through Prefetching[C]//AREF W G, ed. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Santa Barbara, CA, USA,2001:235-246.
    [21]CUI BIN, OOI B C, SU J W, et al. Main Memory Indexing:The Case for BD-Tree[J]. IEEE Transactions on Knowledge and Data Engineering,2004, 16(7):870-874.
    [22]KIM K, CHA S K, KWON K. Optimizing Multidimensional Index Trees for Main Memory Access[C]//AREF W G, ed. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Santa Barbara, CA, USA,2001:139-150.
    [23]RAO JUN, ROSS K A. Making B+-Trees Cache Conscious in Main Memory[C]//CHEN WEIDONG, NAUGHTON J F, BERNSTEIN P A, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Dallas, Texas, USA,2000:475-486.
    [24]YU XIAOHUI, DONG JUNFENG. Indexing High-Dimensional Data for Main-Memory Similarity Search[J]. Information Systems,2010,35(7):825-843.
    [25]NIBLACK C W, BARBER R J, EQUITZ W R, et al. The Qbic Project: Querying Images by Content Using Color, Texture and Shape[C]// NIBLACK W, ed. Proc. of the Storage and Retrieval for Image and Video Databases(SPIE), San Jose, CA, USA,1993:173-187.
    [26]NARASIMHALU A D, CHRISTODOULAKIS S. Multimedia Information Systems:the Unfolding of A Reality[J]. IEEE Computer,1991,24(10):6-8.
    [27]AGRAWAL R, FALOUTSOS C, SWAMI A N. Efficient Similarity Search in Sequence Databases[C]//LOMET D B, ed. Proc. of the 4th International Conference on Foundations of Data Organization and Algorithms(FODO), Chicago, Illinois, USA,1993:69-84.
    [28]VASSILIADIS D. The Input-State Space Approach to the Prediction of Auroral Geomagnetic Activity from Solar Wind Variables[C]//JOSELYN J et al, eds. Proc. of the International Workshop on Artifcial Intelligence(AI) Applications in Solar Terrestrial Physics, Lund, Sweden,1993:145-151.
    [29]ORENSTEIN J A. Strategies for Optimizing the Use of Redundancy in Spatial Databases[C]//BUCHMANN A, GUNTHER O, SMITH T R, et al, eds. Proc. of the 1st Symposium on Design and Implementation of Large Spatial Databases (SSD), Santa Barbara, CA,1989:115-134.
    [30]RAMAKRISHNAN R, GEHRKE J. Database Management Systems[M]. Second Edition. New York, NY:McGraw-Hill,2000.
    [31]Lo M L, RAVISHANKAR C V. Spatial Hash Joins[C]//JAGADISH H V, MUMICK I S, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Montreal, Quebec, Canada,1996:247-258.
    [32]PATEL J M, DEWITT D J. Partition Based Spatial-Merge Join[C]// JAGADISH H V, MUMICK I S, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Montreal, Quebec, Canada,1996:259-270.
    [33]BLASGEN M W, ESWARAN K P. Storage and Access in Relational Databases[J]. IBM Systems Journal,1977,16(4):362-377.
    [34]GRAEFE G. Sort-Merge-Join:An Idea Whose Time Has(h) Passed?[C]// Proc. of the 10th IEEE International Conference on Data Engineering (ICDE), Houston, Texas,1994:406-417.
    [35]LU H, TAN K L. On Sort-Merge Algorithm for Band Joins[J]. IEEE Transactions on Knowledge and Data Engineering,1995,7(3):508-510.
    [36]ORENSTEIN J A. Spatial Query Processing in An Object-Oriented Database System[C]//CARLO Z, ed. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Washington, D.C., 1986:326-226.
    [37]ARGE L, PROCOPIUC O, RAMASWAMY S, et al. Scalable Sweeping-Based Spatial Join[C]//GUPTA A, SHMUELI O, WIDOM J, eds. Proc. of the 24rd International Conference on Very Large Databases (VLDB), New York City, New York, USA,1998:570-581.
    [38]GURRET C, RIGAUX P. The Sort/Sweep Algorithm:A New Method for R-Tree Based Spatial Joins[C]//GUNTHER O, LENZ H J, eds. Proc. of the 12th International Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany,2000:153-165.
    [39]KOUDAS N, SEVCIK C. Size Separation Spatial Join[C].//PECKHAM J, ed.Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Tucson, Arizona, USA,1997:324-335.
    [40]KOUDAS N, SEVCIK C. High Dimensional Similarity Joins:Algorithms and Performance Evaluation[C]//Proc. of the 14th International Conference on Data Engineering (ICDE), Orlando, Florida, USA,1998:466-475.
    [41]BOHM C, KRIEGEL H P. A Cost Model and Index Architecture for the Similarity Join[C]//Proc. of the 17th International Conference on Data Engineering (ICDE), Heidelberg, Germany,2001:411-420.
    [42]DITTRICH J P, SEEGER B. GESS:A Scalable Similarity-Join Algorithm for Mining Large Data Sets in High Dimensional Spaces[C]//Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.2001:47-56.
    [43]BOHM C, BRAUNMULLER B, KREBS F, et al. Epsilon Grid Order:An Algorithm for the Similarity Join on Massive High-Dimensional Data[C]// AREF W G, ed. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Santa Barbara, CA, USA,2001:379-388.
    [44]KALASHNIKOV D V, PRABHAKAR S. Similarity Join for Low-and High-Dimensional Data[C]//Proc. of the 8th International Conference on Database Systems for Advanced Applications (DASFAA), Kyoto, Japan,2003:7-16.
    [45]BRINKHOFF T, KRIEGEL H P, SEEGER B. Efficient Processing of Spatial Joins Using R-trees[C]//BUNEMAN P, JAJODIA S, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Washington, D.C.,1993:237-246.
    [46]HUANG Y W, JING N, RUNDENSTEINER E A. Spatial Joins Using R-trees:Breadth-First Traversal with Global Optimizations[C]//JARKE M, CAREY M J, DITTRICH K R, et al, eds. Proc. of the 23rd International Conference on Very Large Databases (VLDB), Athens, Greece,1997:396-405.
    [47]SHIM K, SRIKANT R, AGRAWAL R. High-Dimensional Similarity Joins[C]//GRAY W A, LARSON P A, eds. Proc. of the 13th International Conference on Data Engineering (ICDE), Birmingham U.K.,1997:301-311.
    [48]SHAFER J C, AGRAWAL R. Parallel Algorithms for High-Dimensional Similarity Joins for Data Mining Applications[C]//JARKE M, CAREY M J, DITTRICH K R, et al, eds. Proc. of the 23rd International Conference on Very Large Databases (VLDB), Athens, Greece,1997:176-185.
    [49]BERCHTOLD S, BOHM C, KEIM D, et al. A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space[C]//Proc. of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(PODS), Tucson, Arizona,1997:78-86.
    [50]LO M L, RAVISHANKAR C V. Spatial Joins Using Seeded Trees[C]// SNODGRASS R T, WINSLETT M, eds. Proc. of the 1994 ACM SIGMOD International Conference on Management of Data(SIGMOD), Minneapolis, Minnesota,1994:209-220.
    [51]KAHVECI T, LANG C, SINGH A K. Joining Massive High-Dimensional Datasets[C]//DAYAL U, RAMAMRITHAM K, VIJAYARAMAN T M, eds. Proc. of the 19th International Conference on Data Engineering(ICDE), Bangalore, India,2003:265-276.
    [52]YANG ZHENGWU, YANG GUOQIANG. A Near-Optimal Similarity Join Algorithm and Performance Evaluation[J]. Information Sciences,2004, 167(1-4):87-108.
    [53]KRIEGEL H P, KUNATH P, PFEIFLE M, et al. Probabilistic Similarity Join on Uncertain Data[C]//LEE M L, TAN K L, WUWONGSE V, eds. Proc. of the 11th International Conference on Database Systems for Advanced Applications(DASFAA), Singapore,2006:295-309.
    [54]TOK W H, BRESSAN S, LEE M L. Progressive High-Dimensional Similarity Join[C]//R WAGNER, N REVELL, G PERNUL, eds. Proc. of the 18th International Conference on Database and Expert Systems Applications(DEXA), Regensburg, Germany,2007:233-242.
    [55]JACOX E H, SAMET H. Metric Space Similarity Joins[J]. ACM Transactions on Database Systems (TODS),2008,33(2):1-38.
    [56]LIEBERMAN M D, SANKARANARAYANAN J, SAMET H. A Fast Similarity Join Algorithm Using Graphics Processing Units[C]//Proc. of the 24th International Conference on Data Engineering(ICDE), Cancun, Mexico, 2008:1111-1120.
    [57]BOHM C, NOLL R, PLANT C. Index-supported Similarity Join on Graphics Processors[C]//FREYTAG J C, RUF T, LEHNER W, et al, eds. Proc. of the Datenbanksysteme in Business, Technologie und Web(BTW),13. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), Munster, Germany,2009:57-66.
    [58]KNUTH D E. The Art of Computer Programming, volume Ⅰ:Fundamental Algorithms, Chapter 2[M]. Second Edition. Reading, Massachusetts: Addison-Wesley,1973:10-119.
    [59]GARCIA V, DEBREUVE E, NIELSEN F, et al. K-nearest neighbor search: Fast GPU-based Implementations and Application to High-Dimensional Feature Matching[C]//Proc. of the IEEE International Conference on Image Processing (ICIP), Hong Kong, China,2010.
    [60]FENG JUN, WU LINYAN, ZHU YUELONG, et al. Continuous k-Nearest Neighbor Search under Mobile Environment[C]//DONG GUOZHU, LIN XUEMIN, WANG WEI, et al, eds. Proc. of the 9th Joint Asia-Pacific Web Conference on Advances in Data and Web Management(APWeb) and the 8th International Conference on Web-Age Information Management(WAIM), HuangShan, China,2007:566-573.
    [61]郝忠孝.时空数据库查询与推理[M].北京：科学出版社.2010.
    [62]孙冬璞,郝忠孝.局部范围受限的多类型最近邻查询[J].计算机研究与发展,2009,46(6)：1036-1042.
    [63]孙冬璞,郝忠孝.基于Voronoi图的组最近邻查询[J].计算机研究与发展,2010,47(7)：1244-1251.
    [64]孙冬璞,郝忠孝.时空数据库变体最近邻查询问题探讨[J].计算机工程与应用,2010,46(14)：12-16.
    [65]廖巍,熊伟,王钧等.可伸缩的增量连续k近邻查询处理[J].软件学报,2007,18(2)：268-278.
    [66]郭小发.空间对象的连续可视最近邻查询处理研究[D].浙江省：浙江大学,2008：12-17.
    [67]CHAVEZ E, NAVARRO G, BAEZA-YATES R, et al. Searching in Metric Spaces[J]. ACM Computing Surveys,2001,33(3):273-321.
    [68]THOMAS IAN A, ZHANG LIJUAN. Persistent Clustered Main Memory Index for Accelerating k-NN Queries on High Dimensional Datasets[J]. Multimedia Tools and Applications.2008,38(2):253-270.
    [69]THOMASIAN A, LI YUE, ZHANG LIJUAN. Optimal Subspace Dimensionality for K-Nearest-Neighbor Queries On Clustered and Dimensionality Reduced Datasets with SVD[J]. Multimedia Tools and Applications.2008,40(2):241-259.
    [70]徐红波,郝忠孝.基于Hilbert曲线的近似k-最近邻查询算法[J].计算机工程.2008,34(12)：47-49
    [71]ZHUANG YI, WU FEI. Fast Answering k-Nearest-Neighbor Queries over Large Image Databases Using Dual Distance Transformation[C]//CHAM T J, CAI JIANFEI, DORAI C, et al, eds. Proc. the 13th International Multimedia Modeling Conference on Advances in Multimedia Modeling(MMM Part I), Singapore,2007:408-417.
    [72]ZHUANG YI, ZHUANG YUETING, WU FEI. Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces[J]. Journal of Computer Science and Technology,2007, 22(2):208-217.
    [73]ZHUANG YI, HU HUA, LI XIAOJUN, et al. Optimal K-Nearest-Neighbor Query in Data Grid[C]//LI QING, FENG LING, PEI JIAN, et al, eds. Proc. of the Joint International Conferences Advances in Data and Web Management(APWeb/WAIM), Suzhou, China,2009:568-575.
    [74]LIANG JUNJIE, FENG YUCAI. Indexing the Bit-Code and Distance for Fast KNN Search in High-Dimensional Spaces[J]. Journal of Zhejiang University Science A.2007,8(6):857-863.
    [75]梁俊杰,王长磊.利用分区和距离实现高维空间快速KNN查询[J].计算机研究与发展,2007,44(11)：1980-1985.
    [761梁俊杰，冯玉才.LBD：基于局部位码比较的高维空间KNN搜索算法[J].计算机科学,2007,34(06)：145-149.
    [77]周国亮,陈红,王珊.基于MMOLAP的What-if分析[J].计算机工程,2009,35(21)：23-25.
    [78]SUN LIMEI, SONG BAOYAN, YU YAXIN, et al. Cache-Conscious Index Mechanism for Main-Memory Databases[J]. Wuhan University Journal of Natural Sciences,2006, 11(1):309-312.
    [79]CUI BIN,001 B C. SU J W, et al. Contorting High Dimensional Data for Efficient Main Memory Processing[C]//HALEVY A Y, IVES Z G, DOAN A, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), San Diego, California, USA,2003:479-490.
    [80]梁俊杰,冯玉才.CSA-Tree:一种改进的高维主存索引树[J].计算机学报,2007,30(3)：415-423.
    [81]冯玉才,梁俊杰,曹忠升.基于主存的优化高维索引树[J].计算机研究与发展,2006,43(z3)：189-194.
    [82]BREUNIG M M, KRIEGEL H P, NG R T, et al. Lof:Identifying Density-Based Local Outliers[C]//CHEN WEIDONG, NAUGHTON J F, BERNSTEIN P A, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Dallas, Texas, USA, 2000:93-104.
    [83]KOUDAS N, SEVCIK K C. High Dimensional Similarity Joins:Algorithms and Performance Evaluation[J]. IEEE Transactions on Knowledge and Data Engineering,2000,12(1):3-18.
    [84]BOHM C, KREBS F. Supporting KDD Applications by the k-Nearest Neighbor Join[C]//MARIK V, RETSCHITZEGGER W, STEPANKOVA O, eds. Proc. of the 14th International Conference on Database and Expert Systems Applications(DEXA), Prague, Czech Republic,2003:504-516.
    [85]XIA CHENYI, LU HONGJUN, OOI B C, et al. Gorder:An Efficient Method for KNN Join Processing[C]//NASCIMENTO M A, OZSU M T, KOSSMANN D, et al, eds. Proc. of the 30th International Conference on Very Large Data Bases(VLDB), Toronto, Canada,2004:756-767.
    [86]YU CUI, CUI BIN, WANG SHUGUANG, et al. Efficient Index-Based KNN Join Processing for High-Dimensional Data[J]. Information and Software Technology.2007,49(4):332-344.
    [87]何洪辉,王丽珍,周丽华.Pgi-distance:一种高效的并行KNN-Join处理方法[J].计算机研究与发展,2007,44(10)：1774-1781.
    [88]EMRICH T, GRAF F, KRIEGEL H P, et al. Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning[C]//GERTZ M, LUDASCHER B, eds. Proc. of the 22nd International Conference on Scientific and Statistical Database Management(SSDBM), Heidelberg, Germany,2010:501-518.
    [89]WANG JIJIE, LIN LEI, HUANG TING, et al. Efficient K-Nearest Neighbor Join Algorithms for High Dimensional Sparse Data[J]. The Computing Research Repository (CoRR),2010, abs/1011.2807.
    [90]YU CUI, ZHANG RUI,HUANG YAOCHUN, et al. High-Dimensional Knn Joins with Incremental Updates[J]. GeoInformatica,2010,14(1):55-82.
    [91]KORN F, MUTHUKRISHNAN S. Influence Sets Based on Reverse Nearest Neighbor Queries[C]//CHEN WEIDONG, NAUGHTON J F, BERNSTEIN P A, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Dallas, Texas, USA,2000:201-212.
    [92]SINGH A, FERHATOSMANOGLU H, TOSUN A S. High Dimensional Reverse Nearest Neighbor Queries[C]//Proc. of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA,2003:91-98.
    [93]YANG CONGJUN, LIN K I. An Index Structure for Efficient Reverse Nearest Neighbor Queries[C]//Proc. of the 17th International Conference on Data Engineering(ICDE), Heidelberg, Germany,2001:485-492.
    [94]STANOI I, AGRAWAL D, ABBADI A E. Reverse Nearest Neighbor Queries for Dynamic Databases[C]//GUNOPULOS D, RASTOGI R, eds. Proc. of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery(DMKD), Dallas, Texas, USA,2000:44-53.
    [95]TAO YUFEI, PAPADIAS D, LIAN XIANG. Reverse kNN Search in Arbitrary Dimensionality[C]//NASCIMENTO M A, OZSU M T, KOSSMANN D, et al, eds. Proc. of the 30th International Conference on Very Large Data Bases(VLDB), Toronto, Canada,2004:744-755.
    [96]KANG J M, MOKBEL M F. SHEKHAR S, et al. Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors[C]//Proc. of the 23rd International Conference on Data Engineering(ICDE), Istanbul, Turkey,2007:806-815.
    [97]STANOI I, AGRAWAL D, Abbadi A E. Reverse Nearest Neighbor Queries for Dynamic Databases[C]//GUNOPULOS D, RASTOGI R, eds. Proc of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery(DMKD), Dallas, Texas, USA,2000:44-53.
    [98]YIU M L, PAPADIAS D, MAMOULIS N, et al. Reverse Nearest Neighbors in Large Graphs[C]//Proc. of the 21st International Conference on Data Engineering(ICDE), Tokyo, Japan,2005:186-187.
    [99]李松,郝忠孝.基于Voronoi图的反向最近邻查询方法研究[J].哈尔滨工程大学学报,2008,29(3)：261-265.
    [100]李松,郝忠孝.移动对象的动态反向最近邻查询技术[J].计算机工程,2008,34(10)：40-42.
    [101]EMRICH T, KRIEGEL H P, KROGER P, et al. Incremental Reverse Nearest Neighbor Ranking in Vector Spaces[C]//MAMOULIS N, SEIDL T, PEDERSEN T B, et al, eds. Proc. of the 11th International Symposium on Spatial and Temporal Databases (SSTD), Aalborg, Denmark,2009:265-282.
    [102]LIAN XIANG, CHEN LEI. Efficient Processing of Probabilistic Reverse Nearest Neighbor Queries over Uncertain Data[J]. Very Large Data Bases Journal(VLDBJ),2009,18(3):787-808.
    [103]何云斌.一种基于VARdnnTree的反向最近邻查询方法[J].计算机应用研究,2010,27(5)：1816-1819.
    [104]YIU M L, MAMOULIS N. Reverse Nearest Neighbors Search in Ad-hoc Subspaces[C]//LIU LING, REUTER A, WHANG K Y, et al, eds. Proc. of the 22nd International Conference on Data Engineering(ICDE), Atlanta, GA, USA,2006:412-426.
    [105]TAO YUFEI, PAPADIAS D, LIAN XIANG, et al. Multidimensional Reverse KNN Search[J]. International Journal on Very Large Data Bases(VLDBJ),2007,16(3):293-316.
    [106]SHARIFZADEH M, SHAHABI C. VoR-Tree:R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries[C]// Proc. of the 36th International Conference on Very Large Data Bases(VLDB), Singapore,2010:1231-1242.
    [107]ACHTERT E, BOHM C, KROGER P, et al. Efficient Reverse k-Nearest Neighbor Search in Arbitrary Metric Spaces[C]//CHAUDHURI S, HRISTIDIS V, POLYZOTIS N, eds. Proc. of the ACM SIGMOD International Conference on Management of Data(SIGMOD), Chicago, Illinois, USA,2006:515-526.
    [108]XIA CHENYI, HSU WYNNE, LEE M L, et al. BORDER:Efficient Computation of Boundary Points[J]. IEEE Transactions on Knowledge and Data Engineering,2006,18(3):289-303.
    [109]JOLLIFFE I T. Principal Component Analysis[M]. New York:Springer-Verlag,1986
    [110]GAO YUNJUN, ZHENG BAIHUA, CHEN GENCAI, et al. On Efficient Mutual Nearest Neighbor Query Processing in Spatial Databases[J]. Data & Knowledge Engineering,2009,68(8):705-727.
    [111]廖巍,吴晓平,胡卫,等.基于最短路径的道路网络k近邻查询处理[J].计算机科学,2010,37(11)：180-183.
    [112]CUI BIN, OOI B C, SU J W, et al. Indexing High-Dimensional Data for Efficient In-Memory Similarity Search[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17(3):339-353.
    [113]BOHM C. Powerful Database Support for High Performance Data Mining[D]. Germany, Munchen:Ludwig-Maximilians-Universitat Munchen, 2001:17-22.
    [114]JIN H, OOI B C, SHEN H T, et al. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing[C]// DAYAL U, RAMAMRITHAM K, VIJAYARAMAN T M, eds. Proc. of the 19th International Conference on Data Engineering(ICDE), Bangalore, India, 2003:87-98.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700