数据流分析关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着硬件、网络与通信技术的飞速发展和现实应用需求的持续推动,数据流(Data Stream)作为一种新的数据类型,在诸如金融分析、网络数据管理、移动对象跟踪、通信网监控和传感器网络数据处理等众多领域有着广泛的应用。传统的数据库查询处理技术通常只适合处理存储在磁盘或内存等介质中的静态数据,难以直接应用到无限、连续、快速、“单遍扫描”的数据流中,因而,数据流应用对数据管理与分析提出了更高的要求。如何从海量流数据中快速提取有价值的信息已成为数据库及相关研究领域面临的一个重大挑战。
     数据流相关研究已经引起了学术界和工业界的广泛关注,现有的研究可大致分为数据流管理和数据流分析两个方面。本文在总结和分析国内外已有研究工作的成果与不足的基础上,针对数据流分析中的四个重要问题:离群点检测、Skyline计算、子序列匹配和高效索引结构,展开深入研究,主要工作包括:
     1.在分布式数据流离群点检测方面。在比较和分析现有离群点度量的基础上,结合核密度估计技术扩展了基于距离和基于密度的离群点定义。针对分布式数据流离群点检测中面临如何提高全局离群点检测率和降低网络通信开销的两大问题,以常见的星型网络拓扑模型为基础,提出了一种高检测率、低通信开销的分布式数据流离群点检测算法—DisOutlierStreams。采用非参数核密度估计技术快速计算出当前滑动窗口内流数据的概率密度函数,结合指数衰减策略处理数据流的动态演化性,通过散度技术(Divergence Technology)在检测率可控的前提下较大地减少了协调结点与其子结点之间的通信开销。在算法的具体实现上,充分发挥了Matlab软件强大的符号和数值计算功能。理论分析和实验结果表明,与已有同类数据流离群点检测算法相比,该方法的网络传输量与滑动窗口大小无关,更有效地降低了网络通信开销,具有良好的性能和可扩展性。
     2.在数据流稀疏Skyline计算方面。由于Skyline集合的平均数目与数据点数和数据维数成指数增长,并受数据分布的严重影响,从而Skyline集合的急速增长严重降低了在线服务和决策支持等应用的服务质量。针对该问题,首先在总结已有Skyline计算的相关工作基础上,采用一个Skyline点来代表其周围在可接受偏差δ邻域内的所有Skyline点,给出了数据流稀疏Skyline问题的形式化定义。然后,提出了两个算法:基于界限裁剪的BSS算法和基于特征树的ESS算法。前者以现有数据流Skyline算法为基础,通过界限裁剪策略降低稀疏Skyline的计算开销;而后者则直接对滑动窗口内的流数据构建其稀疏Skyline特征索引树,并支持增量更新、可根据数据分布自适应地调整稀疏Skyline的计算结果。最后实验结果表明,与BSS算法相比,ESS算法具有更强的可控性和更高的处理效率。
     3.在数据流子序列匹配方面。子序列匹配问题在时间序列数据库中早有研究,但数据流子序列匹配还处于发展初期。本文在总结并分析现有序列匹配度量差异的基础上,选用抗噪音和形变效果良好的动态时间弯曲距离(Dynamic Time WarpingDistance)作为序列匹配的衡量标准,将数据流子序列匹配归纳为“最佳匹配”、“区域匹配”、“最优区域匹配”和“Top-K最优区域匹配”四个子问题。针对已有数据流子序列匹配算法中时间弯曲矩阵计算开销过大的问题,提出了一种低时空复杂度、近实时的数据流子序列匹配算法—FSM,它充分利用相似性阈值和上下界剪枝技术尽量减少时间弯曲矩阵中的冗余计算。理论分析和实验结果表明,与已有数据流子序列匹配算法相比,算法准确率并未有任何降低,在合理设置相似性阈值和查询序列的情况下,仅需增加几个字节的空间开销,计算速度提高明显,特别是在高维流数据和长查询序列下性能提升更为显著。
     4.在数据流索引结构方面。索引技术是提高数据流查询与分析性能的关键技术之一。本文在比较并分析现有支持数据流频繁更新的R-Tree变种索引的基础上,针对数据流索引结构更需同时考虑如何提高索引更新性能和降低索引存储开销的问题,提出了改进的高效数据流索引结构—QDM-Tree,并给出了相应的数据插入、删除和查询算法。该索引树利用Hash表替换耗时的索引遍历,并支持数据流的Lazy组删除策略;采用“自底向上”的索引更新方式,并结合R-Tree结点的量化压缩技术。实验结果表明,与已有同类索引树相比,QDM-Tree的存储开销与之相当,而更新和查询的性能均有明显的提升。
     综上所述,本文针对数据流分析中四个关键问题提出了更为高效的解决方法,并就其计算、存储、通信等开销作了全面的分析,对于数据流的理论研究和实用化具有一定的理论意义和应用价值。
With the fast development of hardware, network and communication technology, and the ac-celeration of comprehensive real world applications, there occurs a new kind of data type, namelydata stream, which exits in many fields, such as finance analysis, network data management, trac-ing moving objects, communication network monitoring, sensor network management, and etc.The traditional database query processing technology generally manage the static data, which arestored in the medium such as disk or memory, and is hardly suit for the unlimited, continuous,rapid and“one scan”data stream applications. So, the data stream applications provide higherrequirements for the data management and analysis. It is a major challenge in database researchfield that how to quickly find the valuable information over massive data streams.
     The research of data stream has been widely concerned by the academy and industrial field,and can be divided into two parts, data stream management and data stream analysis. Basedon the fruit and deficiency of existing works, this dissertation conducts an in-depth study whichfocuses on the four important issues of data stream analysis, outlier detection, skyline computing,subsequence matching and index structure. The main contents of this dissertation are as follows:
     1. Outlier detection on distributed data streams. Based on the comparison and analysis ofexisting outlier metrics, this dissertation extends the distance-based and density-based outlier def-initions which combine the kernel density estimation technique. It will face with two problems inoutlier detection over distributed data streams, how to improve the detection ratio of global outliersand reduce the network communication. On the base of the common star network structure, Dis-OutlierStreams algorithm is provided which has high efficiency and lower communication. Thenon-parameter kernel estimation technique can obtain the probability density function in sliding-window quickly. An exponential fading strategy is used to catch the evolution of data stream.Divergence Technology can reduce the overhead of network communication with the guaranteeof detection precision. In the implementation this algorithm takes full advantage of the powerfulsymbol and numerical computation in Matlab. theoretical analysis and experiments show that thisalgorithm can effectively reduce the communication overhead which has no relation with the sizeof sliding window, and has high performance and scalability.
     2. Sparse skyline computing over data stream. The average number of skyline set increasesexponentially with the number of objects and the dimensions. On the other hand, skyline set isseriously affected by the data distribution. The quick increase of skyline set decreases the qualityfor the online service and decision making applications. Based on summarization of the relatedwork for skyline computing, we propose a novel concept, called sparse skyline, which uses a skyline object to represent its nearby skyline neighbors within the acceptable difference (δ). Then,two algorithms are provided, the basic sparse skyline BSS algorithm and the extend ESS algorithm.BSS algorithm builds on the original skyline algorithm, and adopts an effective pruning techniqueto reduce the computation. ESS algorithm constructs a sparse skyline feature tree to organize thepoints in sliding window directly, which can adjust adaptively and progressively the quality ofthe sparse skyline set according to the correlations of data distribution. Comparing with the BSSalgorithm, The ESS algorithm has stronger manageability and higher efficiency.
     3. Subsequence matching on data stream. Subsequence matching on time series database hasbeen researched for many years, however, subsequence matching on data stream is still in earlystages of development. On the basis of analysis the existing matching metrics, we select the DTW(dynamic time warping) distance which can tolerate the noise and skew. It can been consideredfrom four sub-problems: best matching, range matching, optimal range matching, top-k optimalrange matching. Aiming to the high computation in time warping matrix, this dissertation providesa new FSM algorithm, which has lower space complexity and can process the stream data in realtime. It makes the best of similarity threshold to prune the redundant computation as early aspossible. Theoretical analysis and experiments with synthetic and real data show that this methodis faster than the existing algorithms, especially to the high dimensions and long query sequences,it only increases several bytes on the memory cost without the loss of precision.
     4. Index structure for data stream. Index technique is one of key techniques to improvethe performance of data stream query and analysis. This dissertation analyzes the prior R-Treevariants to support frequent updates on data stream, and designs an improved QDM-Tree whichconsiders how to improve the update performance and decrease the storage overhead all together.It also provides the corresponding algorithms for the insert, delete and query operations. Usingthe hash method replaces the high time-consuming operation– tree traversing. It also supports thelazy group deletion. It quantizes the tree nodes and updates the data in a“bottom-to-top”approach.The experimental results show that, comparing with the existing index tree variants, QDM-Treehas almost the same storage overhead and higher performance for updating and querying.
     In summary, this dissertation addresses the four key issues in data stream analysis, andpresents several high efficient solutions which analysis the overhead of computation, storage andcommunication in a comprehensive way. It is a promotion of stream data processing on boththeoretical study and practical applications.
引文
[1] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom. Modelsand Issues in Data Stream Systems. In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, Wisconsin,USA. ACM Press, 2002, 1–16
    [2]金澈清,钱卫宁,周傲英.流数据分析与管理综述.软件学报. 2004, 15(8):1172–1181
    [3] S. Muthu Muthukrishnan. Data Streams: Algorithms and Applications (Foundations andTrends in Theoretical Computer Science). Now Publishers Inc, 2005
    [4] Charu C. Aggarwal, (Editor) Data Streams: Models and Algorithms (Advances in DatabaseSystems). Springer, 2006
    [5] Yunyue Zhu, Dennis Shasha. StatStream: Statistical Monitoring of Thousands of DataStreams in Real Time. In: Proceedings of the 28th International Conference on Very LargeData Bases (VLDB), Hong Kong, China. Morgan Kaufmann, 2002, 358–369
    [6] Nick Koudas, Divesh Srivastava. Data Stream Query Processing: A Tutorial. In: Pro-ceedings of the 29th International Conference on Very Large Data Bases (VLDB), Berlin,Germany. Morgan Kaufmann, 2003, 1149
    [7] S. Muthukrishnan. Data streams: algorithms and applications. In: Proceedings of theFourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Baltimore,Maryland, USA. ACM/SIAM, 2003, 413–413
    [8] Lukasz Golab, M. Tamer O¨zsu. Issues in data stream management. SIGMOD Record.2003, 32(2):5–14
    [9] Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Itaru Nishizawa,Justin Rosenstein, Jennifer Widom. STREAM: The Stanford Stream Data Manager. In:Proceedings of International Conference on Management of Data (SIGMOD), San Diego,California, USA. ACM Press, 2003, 665
    [10] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M.Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Vijayshankar Raman,Frederick Reiss, Mehul A. Shah. TelegraphCQ: Continuous Data?ow Processing for anUncertain World. In: Proceedings of the First Biennial Conference on Innovative DataSystems Research (CIDR), Asilomar, CA, USA. 2003
    [11] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M.Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Frederick Reiss, MehulA. Shah. TelegraphCQ: Continuous Data?ow Processing. In: Proceedings of InternationalConference on Management of Data (SIGMOD), San Diego, California, USA. ACM Press,2003, 668
    [12] Charles D. Cranor, Yuan Gao, Theodore Johnson, Vladislav Shkapenyuk, OliverSpatscheck. Gigascope: high performance network monitoring with an SQL interface. In:Proceedings of International Conference on Management of Data (SIGMOD), Madison,Wisconsin. ACM Press, 2002, 623
    [13] Charles D. Cranor, Theodore Johnson, Oliver Spatscheck, Vladislav Shkapenyuk. Gigas-cope: A Stream Database for Network Applications. In: Proceedings of International Con-ference on Management of Data (SIGMOD), San Diego, California, USA. ACM Press,2003, 647–651
    [14] Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk, Oliver Spatscheck. A Heart-beat Mechanism and Its Application in Gigascope. In: Proceedings of the 31st InternationalConference on Very Large Data Bases (VLDB), Trondheim, Norway. ACM Press, 2005,1079–1088
    [15] Daniel J. Abadi, Donald Carney, Ugur C? etintemel, Mitch Cherniack, Christian Convey,Sangdon Lee, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik. Aurora: a newmodel and architecture for data stream management. VLDB Journal. 2003, 12(2):120–139
    [16] Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur C? etintemel, Mitch Cherni-ack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina,Nesime Tatbul, Ying Xing, Stanley B. Zdonik. The Design of the Borealis Stream Process-ing Engine. In: Proceedings of Second Biennial Conference on Innovative Data SystemsResearch (CIDR), Asilomar, CA, USA. 2005, 277–289
    [17] Arvind Arasu, Shivnath Babu, Jennifer Widom. The CQL continuous query language:semantic foundations and query execution. VLDB Journal. 2006, 15(2):121–142
    [18] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Dilys Thomas. Operatorscheduling in data stream systems. VLDB Journal. 2004, 13(4):333–353
    [19] Arvind Arasu, Jennifer Widom. Resource Sharing in Continuous Sliding-Window Aggre-gates. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases(VLDB), Toronto, Canada. Morgan Kaufmann, 2004, 336–347
    [20] Brian Babcock, Mayur Datar, Rajeev Motwani. Load Shedding for Aggregation Queriesover Data Streams. In: Proceedings of the 20th International Conference on Data Engineer-ing (ICDE), Boston, MA, USA. IEEE Computer Society, 2004, 350–361
    [21] Samuel Madden, Mehul A. Shah, Joseph M. Hellerstein, Vijayshankar Raman. Continu-ously adaptive continuous queries over streams. In: Proceedings of International Confer-ence on Management of Data (SIGMOD), Madison, Wisconsin, USA. 2002, 49–60
    [22] Tolga Urhan, Michael J. Franklin. Dynamic Pipeline Scheduling for Improving InteractiveQuery Performance. In: Proceedings of the 27th International Conference on Very LargeData Bases (VLDB), Roma, Italy. Morgan Kaufmann, 2001, 501–510
    [23] Donald Carney, Ugur C? etintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, GregSeidman, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik. Monitoring Streams -A New Class of Data Management Applications. In: Proceedings of the 28th InternationalConference on Very Large Data Bases (VLDB), Hong Kong, China. Morgan Kaufmann,2002, 215–226
    [24] Yanif Ahmad, Bradley Berg, Ugur C? etintemel, Mark Humphrey, Jeong-Hyon Hwang, An-jali Jhingran, Anurag Maskey, Olga Papaemmanouil, Alex Rasin, Nesime Tatbul, WenjuanXing, Ying Xing, Stanley B. Zdonik. Distributed operation in the Borealis stream process-ing engine. In: Proceedings of International Conference on Management of Data (SIG-MOD), Baltimore, Maryland, USA. ACM Press, 2005, 882–884
    [25] Jeong-Hyon Hwang, Magdalena Balazinska, Alex Rasin, Ugur C? etintemel, Michael Stone-braker, Stanley B. Zdonik. High-Availability Algorithms for Distributed Stream Processing.In: Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo,Japan. IEEE Computer Society, 2005, 779–790
    [26] Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, Michael Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. In: Proceedings of Interna-tional Conference on Management of Data (SIGMOD), Baltimore, Maryland, USA. 2005,13–24
    [27] Jeong-Hyon Hwang, Ying Xing, Ugur C? etintemel, Stanley B. Zdonik. A Cooperative,Self-Configuring High-Availability Solution for Stream Processing. In: Proceedings of the23rd International Conference on Data Engineering (ICDE), The Marmara Hotel, Istanbul,Turkey. IEEE, 2007, 176–185
    [28] Arvind Arasu, Gurmeet Singh Manku. Approximate Counts and Quantiles over SlidingWindows. In: Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Sympo-sium on Principles of Database Systems (PODS), Paris, France. ACM Press, 2004, 286–296
    [29] Zhihong Chong, Jeffrey Xu Yu, Zhengjie Zhang, Xuemin Lin, Wei Wang, Aoying Zhou.Efficient Computation of -Medians over Data Streams Under Memory Constraints. 2006,vol. 21, 284–296
    [30] Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan. Maintaining varianceand k-medians over data stream windows. In: Proceedings of the Twenty-Second ACMSIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), SanDiego, CA, USA. ACM Press, 2003, 234–243
    [31] Sudipto Guha, Andrew McGregor. Approximate quantiles and the order of the stream.In: Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium onPrinciples of Database Systems (PODS), Chicago, Illinois, USA. 2006, 273–279
    [32] Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings ofthe Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS), Chicago, Illinois, USA. 2006, 263–272
    [33] Linfeng Zhang, Yong Guan. Variance estimation over sliding windows. In: Proceedings ofthe Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS), Beijing, China. 2007, 225–232
    [34] Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S. Yu. Mining Frequent Patternsin Data Streams at Multiple Time. Next Generation Data Mining. 2003:191–212
    [35] Gurmeet Singh Manku, Rajeev Motwani. Approximate Frequency Counts over DataStreams. In: Proceedings of the 28th International Conference on Very Large Data Bases(VLDB), Hong Kong, China. Morgan Kaufmann, 2002, 346–357
    [36] Joong Hyuk Chang, Won Suk Lee. Finding recent frequent itemsets adaptively over onlinedata streams. In: Proceedings of the Ninth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Washington, DC, USA. 2003, 487–492
    [37] Graham Cormode, S. Muthukrishnan. What’s hot and what’s not: tracking most frequentitems dynamically. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Diego, CA, USA.ACM Press, 2003, 296–306
    [38] Cheqing Jin, Weining Qian, Chaofeng Sha, Jeffrey Xu Yu, Aoying Zhou. Dynamicallymaintaining frequent items over a data stream. In: Proceedings of ACM CIKM InternationalConference on Information and Knowledge Management, New Orleans, Louisiana, USA.2003, 287–294
    [39] Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo. Verifying and Mining Frequent Patternsfrom Large Windows over Data Streams. In: Proceedings of the 24th International Confer-ence on Data Engineering (ICDE), Cancu′n, Me′xico. IEEE, 2008, 179–188
    [40] Pedro Domingos, Geoff Hulten. Mining high-speed data streams. In: Proceedings of thesixth ACM SIGKDD international conference on Knowledge discovery and data mining,Boston, MA, USA. 2000, 71–80
    [41] Geoff Hulten, Laurie Spencer, Pedro Domingos. Mining time-changing data streams. In:Proceedings of the seventh ACM SIGKDD international conference on Knowledge discov-ery and data mining, San Francisco, CA, USA. 2001, 97–106
    [42] Maleq Khan, Qin Ding, William Perrizo. k-nearest Neighbor Classification on Spatial DataStreams Using P-trees. In: Proceedings of the 6th Pacific-Asia Conference of Advancesin Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan. Springer, 2002, vol.2336 of Lecture Notes in Computer Science, 517–518
    [43] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. On demand classificationof data streams. In: Proceedings of the Tenth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Seattle, Washington, USA. 2004, 503–508
    [44] Charu C. Aggarwal, Philip S. Yu. LOCUST: An Online Analytical Processing Frameworkfor High Dimensional Classification of Data Streams. In: Proceedings of the 24th Interna-tional Conference on Data Engineering (ICDE), Cancu′n, Me′xico. IEEE, 2008, 426–435
    [45] Sudipto Guha, Nina Mishra, Rajeev Motwani, Liadan O’Callaghan. Clustering DataStreams. In: Proceedings of the 41st Annual Symposium on Foundations of ComputerScience (FOCS), Redondo Beach, California, USA. IEEE Computer Society, 2000, 359–366
    [46] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for ClusteringEvolving Data Streams. In: Proceedings of the 29th International Conference on Very LargeData Bases (VLDB), Berlin, Germany. Morgan Kaufmann, 2003, 81–92
    [47] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Pro-jected Clustering of High Dimensional Data Streams. In: Proceedings of the ThirtiethInternational Conference on Very Large Data Bases (VLDB), Toronto, Canada. MorganKaufmann, 2004, 852–863
    [48] Olfa Nasraoui, Carlos Rojas. Robust Clustering for Tracking Noisy Evolving Data Streams.In: Proceedings of the Sixth SIAM International Conference on Data Mining (SDM),Bethesda, MD, USA. SIAM, 2006
    [49] Feng Cao, Martin Ester, Weining Qian, Aoying Zhou. Density-Based Clustering over anEvolving Data Stream with Noise. In: Proceedings of the Sixth SIAM International Con-ference on Data Mining (SDM), Bethesda, MD, USA. SIAM, 2006
    [50] Aoying Zhou, Feng Cao, Ying Yan, Chaofeng Sha, Xiaofeng He. Distributed Data StreamClustering: A Fast EM-based Approach. In: Proceedings of the 23rd International Confer-ence on Data Engineering (ICDE), The Marmara Hotel, Istanbul, Turkey. IEEE ComputerSociety, 2007, 736–745
    [51] Yixin Chen, Li Tu. Density-based clustering for real-time stream data. In: Proceedingsof the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, San Jose, California, USA. ACM Press, 2007, 133–142
    [52] Sharmila Subramaniam, Themis Palpanas, Dimitris Papadopoulos, Vana Kalogeraki, Dim-itrios Gunopulos. Online Outlier Detection in Sensor Data Using Non-Parametric Models.In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB),Seoul, Korea. ACM Press, 2006, 187–198
    [53] Fabrizio Angiulli, Fabio Fassetti. Detecting distance-based outliers in streams of data. In:Proceedings of the Sixteenth ACM Conference on Information and Knowledge Manage-ment (CIKM), Lisbon, Portugal. ACM Press, 2007, 811–820
    [54] Dragoljub Pokrajac, Aleksandar Lazarevic, Longin Jan Latecki. Incremental Local OutlierDetection for Data Streams. In: Proceedings of the IEEE Symposium on ComputationalIntelligence and Data Mining (CIDM), Honolulu, Hawaii, USA. IEEE, 2007, 504–515
    [55] Jon M. Kleinberg. Bursty and hierarchical structure in streams. In: Proceedings of theEighth ACM SIGKDD International Conference on Knowledge Discovery and Data Min-ing, Edmonton, Alberta, Canada. 2002, 91–101
    [56] Charu C. Aggarwal. A Framework for Diagnosing Changes in Evolving Data Streams. In:Proceedings of International Conference on Management of Data (SIGMOD), San Diego,California, USA. ACM Press, 2003, 575–586
    [57] Yunyue Zhu, Dennis Shasha. Efficient elastic burst detection in data streams. In: Proceed-ings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery andData Mining, Washington, DC, USA. 2003, 336–345
    [58] Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S. Yu, Hongjun Lu. Parameter Free BurstyEvents Detection in Text Streams. In: Proceedings of the 31st International Conference onVery Large Data Bases (VLDB), Trondheim, Norway. ACM Press, 2005, 181–192
    [59] Arno Siebes, Jilles Vreeken, Matthijs van Leeuwen. Item Sets that Compress. In: Proceed-ings of the Sixth SIAM International Conference on Data Mining (SDM), Bethesda, MD,USA. SIAM, 2006
    [60] Xuanhui Wang, ChengXiang Zhai, Xiao Hu, Richard Sproat. Mining correlated burstytopic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, San Jose, California,USA. 2007, 784–793
    [61] Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi. Efficient Computation of Frequentand Top-k Elements in Data Streams. In: Proceedings of the 10th International Conferenceon Database Theory (ICDT), Edinburgh, UK. Springer, 2005, vol. 3363 of Lecture Notes inComputer Science, 398–412
    [62] Gautam Das, Dimitrios Gunopulos, Nick Koudas, Nikos Sarkas. Ad-hoc Top-k QueryAnswering for Data Streams. In: Proceedings of the 33rd International Conference on VeryLarge Data Bases (VLDB), University of Vienna, Austria. ACM Press, 2007, 183–194
    [63] Xuemin Lin, Yidong Yuan, Wei Wang, Hongjun Lu. Stabbing the Sky: Efficient SkylineComputation over Sliding Windows. In: Proceedings of the 21st International Conferenceon Data Engineering (ICDE), Tokyo, Japan. IEEE Computer Society, 2005, 502–513
    [64] Yufei Tao, Dimitris Papadias. Maintaining Sliding Window Skylines on Data Streams.IEEE Trans Knowl Data Eng. 2006, 18(2):377–391
    [65] Like Gao, Zhengrong Yao, Xiaoyang Sean Wang. Evaluating continuous nearest neighborqueries for streaming time series via pre-fetching. In: Proceedings of International Con-ference on Information and Knowledge Management (CIKM), McLean, VA, USA. ACMPress, 2002, 485–492
    [66] Xiaoyan Liu, Hakan Ferhatosmanoglu. Efficient k-NN Search on Streaming Data Series.In: Proceedings of the 8th International Symposium on Advances in Spatial and TemporalDatabases (SSTD), Santorini Island, Greece. Springer, 2003, vol. 2750 of Lecture Notes inComputer Science, 83–101
    [67] Nick Koudas, Beng Chin Ooi, Kian-Lee Tan, Rui Zhang. Approximate NN queries onStreams with Guaranteed Error/performance Bounds. In: Proceedings of the Thirtieth In-ternational Conference on Very Large Data Bases (VLDB), Toronto, Canada. Morgan Kauf-mann, 2004, 804–815
    [68] Christian Bo¨hm, Beng Chin Ooi, Claudia Plant, Ying Yan. Efficiently Processing Continu-ous k-NN Queries on Data Streams. In: Proceedings of the 23rd International Conferenceon Data Engineering (ICDE), The Marmara Hotel, Istanbul, Turkey. IEEE Computer Soci-ety, 2007, 156–165
    [69] Yasushi Sakurai, Christos Faloutsos, Masashi Yamamuro. Stream Monitoring under theTime Warping Distance. In: Proceedings of the 23rd International Conference on DataEngineering (ICDE), The Marmara Hotel, Istanbul, Turkey. IEEE Computer Society, 2007,1046–1055
    [70] Dongseop Kwon, Sangjun Lee, Sukho Lee. Indexing the Current Positions of Moving Ob-jects Using the Lazy Update R-tree. In: Proceedings of the Third International Conferenceon Mobile Data Management (MDM), Singapore. IEEE Computer Society, 2002, 113–120
    [71] Mong-Li Lee, Wynne Hsu, Christian S. Jensen, Bin Cui, Keng Lik Teo. Supporting Fre-quent Updates in R-Trees: A Bottom-Up Approach. In: Proceedings of the 29th Interna-tional Conference on Very Large Data Bases (VLDB), Berlin, Germany. Morgan Kaufmann,2003, 608–619
    [72] Bin Lin, Jianwen Su. Handling frequent updates of moving objects. In: Proceedings ofInternational Conference on Information and Knowledge Management (CIKM), Bremen,Germany. ACM Press, 2005, 493–500
    [73] Xiaopeng Xiong, Walid G. Aref. R-trees with Update Memos. In: Proceedings of the22nd International Conference on Data Engineering (ICDE), Atlanta, Georgia, USA. IEEEComputer Society, 2006, 22
    [74] Laurynas Biveinis, Simonas Sˇaltenis, Christian S. Jensen. Main-Memory Operation Buffer-ing for Efficient R-Tree Update. In: Proceedings of the 33rd International Conference onVery Large Data Bases (VLDB), University of Vienna, Austria. ACM Press, 2007, 591–602
    [75] Travis Gagie. Bounds for Compression in Streaming Models. 2007, vol. abs/0711.3338
    [76] Binh Vo, Gurmeet Singh Manku. RadixZip: Linear-Time Compression of Token Streams.In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB),University of Vienna, Austria. ACM Press, 2007, 1162–1172
    [77] Richard Cocci, Thanh Tran, Yanlei Diao, Prashant J. Shenoy. Efficient Data Interpretationand Compression over RFID Streams. In: Proceedings of the 24th International Conferenceon Data Engineering (ICDE), Cancu′n, Me′xico. IEEE, 2008, 1445–1447
    [78] Aoying Zhou, Zhiyuan Cai, Li Wei, Weining Qian. M-Kernel Merging: Towards DensityEstimation over Data Streams. In: Proceedings of the Eighth International Conference onDatabase Systems for Advanced Applications (DASFAA), Kyoto, Japan. IEEE ComputerSociety, 2003, 285–292
    [79] Cecilia Magdalena Procopiuc, Octavian Procopiuc. Density Estimation for Spatial DataStreams. In: Proceedings of the 9th International Symposium on Advances in Spatial andTemporal Databases (SSTD), Angra dos Reis, Brazil. Springer, 2005, vol. 3633 of LectureNotes in Computer Science, 109–126
    [80] Christoph Heinz, Bernhard Seeger. Adaptive Wavelet Density Estimators over DataStreams. In: Proceedings of the 19th International Conference on Scientific and StatisticalDatabase Management (SSDBM), Banff, Canada. IEEE Computer Society, 2007, 35
    [81] Christoph Heinz, Bernhard Seeger. Cluster Kernels: Resource-Aware Kernel Density Esti-mators over Streaming Data. IEEE Trans Knowl Data Eng. 2008, 20(7):880–893
    [82] Jiawei Han, Jian Pei, Yiwen Yin. Mining Frequent Patterns without Candidate Generation.In: Proceedings of International Conference on Management of Data (SIGMOD), Texas,USA. 2000, 1–12
    [83] Tian Zhang, Raghu Ramakrishnan, Miron Livny. BIRCH: An Efficient Data ClusteringMethod for Very Large Databases. In: Proceedings of International Conference on Man-agement of Data (SIGMOD), Montreal, Quebec, Canada. ACM Press, 1996, 103–114
    [84]杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测.计算机研究与发展. 2005, 42(9):1498–1504
    [85] Eamonn J. Keogh, Shruti Kasetty. On the need for time series data mining benchmarks: asurvey and empirical demonstration. In: Proceedings of the Eighth ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada.2002, 102–111
    [86] Volker Gaede, Oliver Gu¨nther. Multidimensional Access Methods. ACM Computing Sur-veys. 1998, 30(2):170–231
    [87] Christian Bo¨hm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces:Index structures for improving the performance of multimedia databases. ACM ComputingSurveys. 2001, 33(3):322–373
    [88] Edgar Cha′vez, Gonzalo Navarro, Ricardo A. Baeza-Yates, Jose′L. Marroqu′?n. Searching inmetric spaces. ACM Computing Surveys. 2001, 33(3):273–321
    [89] Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In: Pro-ceedings of International Conference on Management of Data (SIGMOD), Boston, Mas-sachusetts. ACM Press, 1984, 47–57
    [90] Stephan Bo¨rzso¨nyi, Donald Kossmann, Konrad Stocker. The Skyline Operator. In: Pro-ceedings of the 17th International Conference on Data Engineering (ICDE), Heidelberg,Germany. IEEE Computer Society, 2001, 421–430
    [91] D.M. Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
    [92] Vic Barnett, Toby Lewis. Outliers in Statistical Data. Wiley, 1994, 3 edn.
    [93] Jiawei Han, Micheline Kamber. Data Mining : Concepts and Techniques (Second Edition).Morgan Kaufmann, 2005
    [94] Edwin M. Knorr, Raymond T. Ng. Algorithms for Mining Distance-Based Outliers in LargeDatasets. In: Proceedings of the 24rd International Conference on Very Large Data Bases(VLDB), New York City, New York, USA. Morgan Kaufmann, 1998, 392–403
    [95] Jon Louis Bentley. Multidimensional Binary Search Trees Used for Associative Searching.Communications of the ACM (CACM). 1975, 18(9):509–517
    [96] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jo¨rg Sander. LOF: IdentifyingDensity-Based Local Outliers. In: Proceedings of the International Conference on Manage-ment of Data (SIGMOD), Dallas, Texas, USA. ACM Press, 2000, 93–104
    [97] Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, Christos Faloutsos. LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: Proceedings of the 19thInternational Conference on Data Engineering (ICDE), Bangalore, India. IEEE ComputerSociety, 2003, 315–326
    [98]薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究.计算机学报. 2007,30(8):1455–1463
    [99]杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法.软件学报. 2006, 17(8):1796–1803
    [100] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger. The R*-Tree:An Efficient and Robust Access Method for Points and Rectangles. In: Proceedings of In-ternational Conference on Management of Data (SIGMOD), Atlantic City, NJ. ACM Press,1990, 322–331
    [101]倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法.计算机研究与发展. 2006, 43(9):1639–1643
    [102] Emanuel Parzen. On the estimation of a probability density function and the mode. Annalsof Mathematical Statistics. 1962, 33(3):1065–1076
    [103] Longin Jan Latecki, Aleksandar Lazarevic, Dragoljub Pokrajac. Outlier Detection withKernel Density Functions. In: Proceedings of the 5th International Conference of MachineLearning and Data Mining in Pattern Recognition (MLDM), Leipzig, Germany. Springer,2007, vol. 4571 of Lecture Notes in Computer Science, 61–75
    [104] David W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization.Wiley-Interscience, 1992
    [105] I. Gijbels, A. Pope, M. Wand. Automatic Forecasting via Exponential Smoothing: Asymp-totic Properties, 1997
    [106] Lillian Lee. On the effectiveness of the skew divergence for statistical language analysis. In:Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics(AISTATS), Key West, Florida, USA. 2001
    [107] I. Csisza′r. Why least squares and maximum entropy? An axiomatic approach to inference-for linear inverse problems. Annals of Statistics. 1991, 19(4):2032–2066
    [108] Surajit Chaudhuri, Nilesh N. Dalvi, Raghav Kaushik. Robust Cardinality and Cost Estima-tion for Skyline Operator. In: Proceedings of the 22nd International Conference on DataEngineering (ICDE), Atlanta, GA, USA. IEEE Computer Society, 2006, 64
    [109] H. T. Kung, Fabrizio Luccio, Franco P. Preparata. On Finding the Maxima of a Set ofVectors. Journal of the ACM (JACM). 1975, 22(4):469–476
    [110] Franco P. Preparata, Michael Ian Shamos. Computational Geometry - An Introduction.Springer, 1985
    [111] Donald Kossmann, Frank Ramsak, Steffen Rost. Shooting Stars in the Sky: An OnlineAlgorithm for Skyline Queries. In: Proceedings of the 28th International Conference onVery Large Data Bases (VLDB), Hong Kong, China. Morgan Kaufmann, 2002, 275–286
    [112] Wen Jin, Jiawei Han, Martin Ester. Mining Thick Skylines over Large Databases. In:Proceedings of the 8th European Conference on Principles and Practice of Knowledge Dis-covery in Databases (PKDD), Pisa, Italy. Springer, 2004, vol. 3202 of Lecture Notes inComputer Science, 255–266
    [113] Wolf-Tilo Balke, Jason Xin Zheng, Ulrich Gu¨ntzer. Approaching the Efficient Frontier:Cooperative Database Retrieval Using High-Dimensional Skylines. In: Proceedings of the10th International Conference of Database Systems for Advanced Applications (DASFAA),Beijing, China. Springer, 2005, vol. 3453 of Lecture Notes in Computer Science, 410–421
    [114] Mohamed A. Soliman, Ihab F. Ilyas, Nick Koudas. Finding Skyline and Top-k Bargain-ing Solutions. In: Proceedings of the 23rd International Conference on Data Engineering(ICDE), The Marmara Hotel, Istanbul, Turkey. IEEE Computer Society, 2007, 1263–1267
    [115] Ronald Fagin. Fuzzy Queries in Multimedia Database Systems. In: Proceedings of theSeventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Sys-tems (PODS), Seattle, Washington. ACM Press, 1998, 1–10
    [116] King Lum Cheung, Ada Wai-Chee Fu. Enhanced Nearest Neighbour Search on the R-tree.SIGMOD Record. 1998, 27(3):16–21
    [117] Jon Louis Bentley, H. T. Kung, Mario Schkolnick, Clark D. Thompson. On the AverageNumber of Maxima in a Set of Vectors and Applications. Journal of the ACM (JACM).1978, 25(4):536–543
    [118] Pin-Kwang Eng, Beng Chin Ooi, Kian-Lee Tan. Indexing for progressive skyline computa-tion. Data & Knowledge Engineering (DKE). 2003, 46(2):169–201
    [119] Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger. Progressive skyline computationin database systems. ACM Transactions on Database Systems (TODS). 2005, 30(1):41–82
    [120] Kian-Lee Tan, Pin-Kwang Eng, Beng Chin Ooi. Efficient Progressive Skyline Computation.In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB),Roma, Italy. Morgan Kaufmann, 2001, 301–310
    [121] Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger. An Optimal and Progressive Al-gorithm for Skyline Queries. In: Proceedings of International Conference on Managementof Data (SIGMOD), San Diego, California, USA. ACM Press, 2003, 467–478
    [122] Jian Pei, Wen Jin, Martin Ester, Yufei Tao. Catching the Best Views of Skyline: A Se-mantic Approach Based on Decisive Subspaces. In: Proceedings of the 31st InternationalConference on Very Large Data Bases (VLDB), Trondheim, Norway. ACM Press, 2005,253–264
    [123] Yidong Yuan, Xuemin Lin, Qing Liu, Wei Wang, Jeffrey Xu Yu, Qing Zhang. EfficientComputation of the Skyline Cube. In: Proceedings of the 31st International Conference onVery Large Data Bases (VLDB), Trondheim, Norway. ACM Press, 2005, 241–252
    [124] Tian Xia, Donghui Zhang. Refreshing the sky: the compressed skycube with efficientsupport for frequent updates. In: Proceedings of International Conference on Managementof Data (SIGMOD), Chicago, Illinois, USA. ACM Press, 2006, 491–502
    [125] Jian Pei, Yidong Yuan, Xuemin Lin, Wen Jin, Martin Ester, Qing Liu, Wei Wang, Yufei Tao,Jeffrey Xu Yu, Qing Zhang. Towards multidimensional subspace skyline analysis. ACMTransactions on Database Systems (TODS). 2006, 31(4):1335–1381
    [126] Jian Pei, Ada Wai-Chee Fu, Xuemin Lin, Haixun Wang. Computing Compressed Multidi-mensional Skyline Cubes Efficiently. In: Proceedings of the 23rd International Conferenceon Data Engineering (ICDE), The Marmara Hotel, Istanbul, Turkey. IEEE Computer Soci-ety, 2007, 96–105
    [127] Yufei Tao, Xiaokui Xiao, Jian Pei. Efficient Skyline and Top-k Retrieval in Subspaces.IEEE Transactions on Knowledge and Data Engineering (TKDE). 2007, 19(8):1072–1088
    [128] Michael D. Morse, Jignesh M. Patel, William I. Grosky. Efficient Continuous SkylineComputation. In: Proceedings of the 22nd International Conference on Data Engineering(ICDE), Atlanta, GA, USA. IEEE Computer Society, 2006, 108
    [129] Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee. Approaching the Skyline inZ Order. In: Proceedings of the 33rd International Conference on Very Large Data Bases(VLDB), University of Vienna, Austria. ACM Press, 2007, 279–290
    [130]黄震华,汪卫. Skyline查询处理数据立方体代数.计算机研究与发展. 2007,44(6):990–999
    [131] Cuiping Li, Beng Chin Ooi, Anthony K. H. Tung, Shan Wang. DADA: a data cube for dom-inant relationship analysis. In: Proceedings of International Conference on Management ofData (SIGMOD), Chicago, Illinois, USA. ACM Press, 2006, 659–670
    [132] Cuiping Li, Anthony K. H. Tung, Wen Jin, Martin Ester. On Dominating Your Neighbor-hood Profitably. In: Proceedings of the 33rd International Conference on Very Large DataBases (VLDB), University of Vienna, Austria. ACM Press, 2007, 818–829
    [133]周红福,宫学庆,郑凯,周傲英.基于高维空间的在线高效子空间Skyline算法―CSky.计算机学报. 2007, 30(8):1409–1417
    [134]孙圣力,黄震华,李金玖,郭建奎,朱扬勇.数据流上高效计算子空间Skyline的算法.计算机学报. 2007, 30(8):1418–1428
    [135] Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming Liang. Skyline with Presorting. In:Proceedings of the 19th International Conference on Data Engineering (ICDE), Bangalore,India. IEEE Computer Society, 2003, 717–816
    [136] Parke Godfrey, Ryan Shipley, Jarek Gryz. Maximal Vector Computation in Large Data Sets.In: Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim,Norway. ACM Press, 2005, 229–240
    [137] Wolf-Tilo Balke, Ulrich Gu¨ntzer, Jason Xin Zheng. Efficient Distributed Skylining for WebInformation Systems. In: Proceedings of the 9th International Conference on ExtendingDatabase Technology (EDBT), Heraklion, Crete, Greece. Springer, 2004, vol. 2992 of Lec-ture Notes in Computer Science, 256–273
    [138] Eric Lo, Kevin Y. Yip, King-Ip Lin, David W. Cheung. Progressive skylining over Web-accessible databases. Data & Knowledge Engineering (DKE). 2006, 57(2):122–147
    [139] Zhiyong Huang, Christian S. Jensen, Hua Lu, Beng Chin Ooi. Skyline Queries AgainstMobile Lightweight Devices in MANETs. In: Proceedings of the 22nd International Con-ference on Data Engineering (ICDE), Atlanta, GA, USA. IEEE Computer Society, 2006
    [140] Ping Wu, Caijie Zhang, Ying Feng, Ben Y. Zhao, Divyakant Agrawal, Amr El Abbadi.Parallelizing Skyline Queries for Scalable Distribution. In: Proceedings of the 10th In-ternational Conference on Extending Database Technology (EDBT), Munich, Germany.Springer, 2006, vol. 3896 of Lecture Notes in Computer Science, 112–130
    [141] Xuegang Huang, Christian S. Jensen. In-Route Skyline Querying for Location-Based Ser-vices. In: Proceedings of the 4th International Workshop Web and Wireless GeographicalInformation Systems (W2GIS), Goyang, Korea. Springer, 2004, vol. 3428 of Lecture Notesin Computer Science, 120–135
    [142] Evangelos Dellis, Bernhard Seeger. Efficient Computation of Reverse Skyline Queries.In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB),University of Vienna, Austria. ACM Press, 2007, 291–302
    [143] Jing Yu, Xin Liu, Guo hua Liu. A Window-based Algorithm for Skyline Queries. In:Proceedings of the Sixth International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT), Dalian, China. IEEE Computer Society, 2005,907–909
    [144]刘欣,余靖,刘国华.基于窗口查询的轮廓查询算法.燕山大学学报. 2005,29(5):398–402
    [145] Zhenhua Huang, Wei Wang. A Novel Incremental Maintenance Algorithm of SkyCube. In:Proceedings of the 17th International Conference Database and Expert Systems Applica-tions (DEXA), Krako′w, Poland. Springer, 2006, vol. 4080 of Lecture Notes in ComputerScience, 781–790
    [146] Zhenhua Huang, Wei Wang. Mining the Useful Skyline Set Based on the Acceptable Dif-ference. In: Proceedings of the Second International Conference Advanced Data Miningand Applications (ADMA), Xi’an, China. Springer, 2006, vol. 4093 of Lecture Notes inComputer Science, 927–933
    [147] Charu C. Aggarwal, Philip S. Yu. A Framework for Clustering Uncertain Data Streams. In:Proceedings of the 24th International Conference on Data Engineering (ICDE), Cancu′n,Me′xico. IEEE, 2008, 150–159
    [148] Paolo Ciaccia, Marco Patella, Pavel Zezula. M-tree: An Efficient Access Method for Sim-ilarity Search in Metric Spaces. In: Proceedings of the 23rd International Conference onVery Large Data Bases (VLDB), Athens, Greece. Morgan Kaufmann, 1997, 426–435
    [149] Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim. Fast Similarity Searchin the Presence of Noise, Scaling, and Translation in Time-Series Databases. In: Pro-ceedings of the 21th International Conference on Very Large Data Bases (VLDB), Zurich,Switzerland. Morgan Kaufmann, 1995, 490–501
    [150] Rakesh Agrawal, Christos Faloutsos, Arun N. Swami. Efficient Similarity Search In Se-quence Databases. In: Proceedings of the 4th International Conference of Foundations ofData Organization and Algorithms (FODO), Chicago, Illinois, USA. Springer, 1993, vol.730 of Lecture Notes in Computer Science, 69–84
    [151] Dina Q. Goldin, Paris C. Kanellakis. On Similarity Queries for Time-Series Data: Con-straint Specification and Implementation. In: Proceedings of the First International Confer-ence of Principles and Practice of Constraint Programming (CP), Cassis, France. Springer,1995, vol. 976 of Lecture Notes in Computer Science, 137–153
    [152] Davood Rafiei. On Similarity-Based Queries for Time Series Data. In: Proceedings ofthe 15th International Conference on Data Engineering (ICDE), Sydney, Austrialia. IEEEComputer Society, 1999, 410–417
    [153] Woong-Kee Loh, Sang-Wook Kim, Kyu-Young Whang. Index Interpolation: An Approachto Subsequence Matching Supporting Normalization Transform in Time-Series Databases.In: Proceedings of the International Conference on Information and Knowledge Manage-ment (CIKM), McLean, VA, USA. ACM Press, 2000, 480–487
    [154] Eamonn J. Keogh, Michael J. Pazzani. An Enhanced Representation of Time Series WhichAllows Fast and Accurate Classification, Clustering and Relevance Feedback. In: Pro-ceedings of the Fourth International Conference on Knowledge Discovery and Data Mining(KDD), New York City, New York, USA. AAAI Press, 1998, 239–243
    [155] Donald J. Berndt, James Clifford. Using Dynamic Time Warping to Find Patterns in TimeSeries. In: Proceedings of Knowledge Discovery in Databases (KDD-Workshop), Seattle,Washington. AAAI Press, 1994, 359–370
    [156] Mohammed Waleed Kadous. Learning Comprehensible Descriptions of Multivariate TimeSeries. In: Proceedings of the Sixteenth International Conference on Machine Learning(ICML), Bled, Slovenia. Morgan Kaufmann, 1999, 454–463
    [157] Eamonn J. Keogh, Michael J. Pazzani. Scaling up Dynamic Time Warping to MassiveDataset. In: Proceedings of the Third European Conference of Principles of Data Miningand Knowledge Discovery (PKDD), Prague, Czech Republic. Springer, 1999, vol. 1704 ofLecture Notes in Computer Science, 1–11
    [158] Eamonn J. Keogh, Michael J. Pazzani. Scaling up dynamic time warping for datamin-ing applications. In: Proceedings of the sixth ACM SIGKDD international conference onKnowledge discovery and data mining, Boston, MA, USA. ACM Press, 2000, 285–289
    [159] Sanghyun Park, Dongwon Lee, Wesley W. Chu. Fast retrieval of similar subsequences inlong sequence databases. In: Proceedings of the 3rd IEEE Knowledge and Data EngineeringExchange Workshop. IEEE Computer Society, 1999, 60–67
    [160] Sanghyun Park, Sang-Wook Kim, Wesley W. Chu. Segment-based approach for subse-quence searches in sequence databases. In: Proceedings of the 16th ACM Symposium onApplied Computing (SAC), Las Vegas, NV, USA. ACM Press, 2001, 248–252
    [161] Byoung-Kee Yi, H. V. Jagadish, Christos Faloutsos. Efficient Retrieval of Similar TimeSequences Under Time Warping. In: Proceedings of the Fourteenth International Confer-ence on Data Engineering (ICDE), Orlando, Florida, USA. IEEE Computer Society, 1998,201–208
    [162] Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Chih-Cheng Hsu. Efficient Searches forSimilar Subsequences of Different Lengths in Sequence Databases. In: Proceedings of the16th International Conference on Data Engineering (ICDE), San Diego, California, USA.IEEE Computer Society, 2000, 23–32
    [163] Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Jung-Im Won. Similarity search of time-warped subsequences via a suffix tree. Information Systems. 2003, 28(7):867–883
    [164] Sanghyun Park, Sang-Wook Kim, June-Suh Cho, Sriram Padmanabhan. Prefix-Querying:An Approach for Effective Subsequence Matching Under Time Warping in SequenceDatabases. In: Proceedings of the International Conference on Information and Knowl-edge Management (CIKM), Atlanta, Georgia, USA. ACM Press, 2001, 255–262
    [165] Sumit Ganguly, Minos N. Garofalakis, Rajeev Rastogi. Processing Set Expressions overContinuous Update Streams. In: Proceedings of International Conference on Managementof Data(SIGMOD), San Diego, California, USA. ACM Press, 2003, 265–276
    [166] M. Hoffman, S. Muthukrishnan, Rajeev Raman. Location Streams: Models and Algo-rithms. Tech. rep., DIMACS TR: 2004-28, 2004
    [167] Sumit Ganguly. Counting Distinct Items over Update Streams. Theoretical Computer Sci-ence. 2007, 378(3):211–222
    [168] Sumit Ganguly, Abhayendra N. Singh, Satyam Shankar. Finding Frequent Items over Gen-eral Update Streams. In: Proceedings of the 20th International Conference on Scientific andStatistical Database Management(SSDBM), Hong Kong, China. Springer, 2008, vol. 5069of Lecture Notes in Computer Science, 204–221
    [169] Xiaopeng Xiong, Mohamed F. Mokbel, Walid G. Aref. LUGrid: Update-tolerant Grid-based Indexing for Moving Objects. In: Proceedings of the 7th International Conference onMobile Data Management (MDM), Nara, Japan. IEEE Computer Society, 2006, 13
    [170] Junfeng Dong, Xiaohui Yu. CSR+-tree: Cache-conscious Indexing for High-dimensionalSimilarity Search. In: Proceedings of the 19th International Conference on Scientific andStatistical Database Management (SSDBM), Banff, Canada. IEEE Computer Society, 2007,14–23
    [171] Sumit Ganguly, Anirban Majumder. CR-precis: A deterministic summary structure forupdate data streams. The Computing Research Repository (CoRR). 2006
    [172]张明波,陆锋,申排伟,程昌秀. R树家族的演变和发展.计算机学报. 2005,28(3):289–300
    [173] David A. White, Ramesh Jain. Similarity Indexing with the SS-tree. In: Proceedings of theTwelfth International Conference on Data Engineering (ICDE), New Orleans, Louisiana.IEEE Computer Society, 1996, 516–523
    [174] Norio Katayama, Shin’ichi Satoh. The SR-tree: An Index Structure for High-DimensionalNearest Neighbor Queries. In: Proceedings of International Conference on Management ofData (SIGMOD), Tucson, Arizona, USA. ACM Press, 1997, 369–380
    [175] Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, David A. Wood. DBMSs on a ModernProcessor: Where Does Time Go? In: Proceedings of the 25th International Conferenceon Very Large Data Bases (VLDB), Edinburgh, Scotland, UK. Morgan Kaufmann, 1999,266–277
    [176] Jun Rao, Kenneth A. Ross. Cache Conscious Indexing for Decision-Support in MainMemory. In: Proceedings of the 25th International Conference on Very Large Data Bases(VLDB), Edinburgh, Scotland, UK. Morgan Kaufmann, 1999, 78–89
    [177] Jun Rao, Kenneth A. Ross. Making B+-Trees Cache Conscious in Main Memory. In: Pro-ceedings of International Conference on Management of Data (SIGMOD), Dallas, Texas,USA. ACM Press, 2000, 475–486
    [178] Yasushi Sakurai, Masatoshi Yoshikawa, Shunsuke Uemura, Haruhiko Kojima. The A-tree:An Index Structure for High-Dimensional Spaces Using Relative Approximation. In: Pro-ceedings of the 26th International Conference on Very Large Data Bases (VLDB), Cairo,Egypt. Morgan Kaufmann, 2000, 516–526
    [179] Kihong Kim, Sang Kyun Cha, Keunjoo Kwon. Optimizing Multidimensional Index Treesfor Main Memory Access. In: Proceedings of International Conference on Management ofData (SIGMOD), Santa Barbara, California, USA. ACM Press, 2001, 139–150
    [180] Trishul M. Chilimbi, Bob Davidson, James R. Larus. Cache-Conscious Structure Defini-tion. In: Proceedings of SIGPLAN Conference on Programming Language Design andImplementation (PLDI), Atlanta, Georgia, USA. ACM Press, 1999, 13–24
    [181] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, Mario A. Lopez. Indexing thePositions of Continuously Moving Objects. In: Proceedings of International Conference onManagement of Data (SIGMOD), Dallas, Texas, USA. ACM Press, 2000, 331–342
    [182] Yufei Tao, Dimitris Papadias, Jimeng Sun. The TPR*-Tree: An Optimized Spatio-TemporalAccess Method for Predictive Queries. In: Proceedings of the 29th International Conferenceon Very Large Data Bases (VLDB), Berlin, Germany. Morgan Kaufmann, 2003, 790–801
    [183]张巨,肖予钦,景宁,陈宏盛.面向层次编制移动对象的混合特征索引方法.软件学报. 2004, 15(3):371–378
    [184]廖巍,熊伟,景宁,陈宏盛,钟志农.支持频繁更新的移动对象混合索引方法.计算机研究与发展. 2006, 43(5):888–893
    [185]廖巍,唐桂芬,景宁,钟志农.基于速度分布的移动对象混合索引方法.计算机学报. 2007, 30(4):661–671
    [186] Thomas Brinkhoff. Generating Network-Based Moving Objects. In: Proceedings of the12th International Conference on Scientific and Statistical Database Management (SS-DBM), Berlin, Germany. IEEE Computer Society, 2000, 253–255