分布式数据流查询处理若干关键技术的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

分布式数据流查询处理若干关键技术的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Some Key Techniques of Distributed Data Stream for Query Processing
作者：杨颖
论文级别：博士
学科专业名称：控制理论与控制工程
中文关键词：分布式数据流 ; 概要数据结构 ; 离散小波变换 ; 语义查询 ; 一致哈希 ; 中间件
英文关键词：Distributed Data Stream ; Synopsis Data Structure ; DWT Semantic Query ; Consistent Hash ; Middleware
学位年度：2006
导师：乐嘉锦
学科代码：081101
学位授予单位：东华大学
论文提交日期：2006-03-01

摘要

随着大规模网络的发展和Web的广泛应用，在网络监控、入侵检测、传感器网络、通讯数据管理、股票分析等应用领域中产生了一种新型数据—数据流(或流数据)，如关系元组、传感器读入值、网络性能参数、电话记录和股票交易数据等。与传统数据库应用模型不同，数据流模型具有以下特点：(1)数据连续、实时到达；(2)数据量大、无限制并且难以预测；(3)数据一经处理，除非特意保存，否则不能被再次取出处理，即一次性处理(one-pass)，或者再次提取数据的代价昂贵。如何对这些流数据进行存储、查询处理已经成为当前国际数据库研究领域的热点问题。
     在许多实际应用中，如决策支持系统、查询优化等，用户并不需要获得确切值，而只需要一个近似值。因此，数据流分析和管理的核心是设计一次扫描算法，即在一个远小于数据规模的内存空间里不断更新一个代表数据集的结构—概要数据结构，使得在任何时候都能够根据这个结构快速实时地获得近似查询结果。如果流的长度为N，则概要数据结构的规模大小不超过0(polylog(N))，并且处理流上每一组数据的时间不超过0(polylog(N))。
     传统数据库中的查询主要是一次查询，即系统根据当前数据集合的快照给出查询结果，并将该结果返回给用户。而数据流的查询为连续查询，即查询随着新数据的到来而不断的返回查询结果。连续查询是数据流上特有的操作，具有长期运行的特点。由于数据流环境中的数据集不是静态的，而是不断有数据插入和更新。用户需要的也不是在某个时刻的静态查询结果，而是对整个数据流的一个动态监测，随着数据流的不断变化持续地产生查询结果。
     现有的数据流的研究主要为集中式的流数据系统，而数据流的本质是分布式的，越来越多如传感器网络、数据通讯、Internet流量分析和Web日志等的大量数据都来自不同的远程数据源，因此，需要构建分布式数据流查询处理的中间件以支持上述各种应用。
     P2P技术利用互联网的终端机来建立一个庞大的分布式计算网络，并对迅速涌出的大量信息进行处理。这些计算机(即对等点)在网络中处于同等的地位，各自拥有独立的网络自主权，以解决把所有的计算压力全部加在服务器一端所造成的瓶颈问题。P2P以其可扩展性、通信负载平衡，资源的高利用率以及由基于内容的路由机制所提供的动态变化的适应性等特性成为构建中间件的良好平台，以便在减少网络带宽和网络连接所消耗的计算资源情况下，提供快速有效的数据流查询处理的实时响应。
     本论文以分布式数据流为主要研究对象，分析了国内外的研究现状，从目前存在的问题和不足出发，研究数据流基于时间变化的特性，监测当前流入的数据，探索数据流变化的表示与建模方法，分析数据进化和变化的趋势，并对未来流入的数据进行预测。在大规模分布式环境中，研究时间和空间复杂度最小的分布式数据流查询处理和挖掘算法。一方面，研究小波分解技术，利用小波系数的近似处理方法构建和维护小波直方图，以获得好的精确度，并且将其扩展到多维直方图的构建和维护，解决传统的直方图技术难以解决的问题，并利用小波系数构造数据流集的概要，建立一个复合索引结构来响应各种查询；还研究小波多分辨分析思想，构造一种小波神经网络模型，解决了传统神经网络中隐层节点数难以确定的问题，初步建立分布式时间序列数据流的预测模型。另一方面，运用草图技术解决在数据流上的聚集查询等难点问题。研究分布式数据流中频繁项的发现算法，通过设置精确梯度来减少通信开销，实现数据流查询的实时响应。同时，以P2P环境的Chord网络结构和协议为平台，研究分布式数据流挖掘和及时响应查询处理的中间件，探索在对等计算系统中提供流数据的近似查询功能所涉及到的数据和查询路由、定位与查找、索引及数据流概要的映射等关键技术问题。具体来说，本论文的主要创新点在于以下四个方面：
     (1)研究了基于小波技术的分布式数据流的查询处理算法。首先通过离散小波变换理论与DWT分解哈尔小波方法获得小波系数，然后分析了数据流的计算模型，形式化了数据流的查询模型。在此基础上，提出了一种新的方法来构造数据流集的概要，建立一种复合索引结构来处理内积查询和相似查询。此外，还结合小波神经网络WNN良好的时频局部化性质以及神经网络的自学习功能，初步建立适应于时间序列数据流的预测模型。
     (2)研究了基于草图技术的分布式数据流的聚集查询算法。首先分析了基于草图的近似处理算法，然后利用随机技术，在数据流到达时实时计算数据的伪草图概要。在此基础上，提出新颖的草图分割技术，通过属性值域的智能分割来减小分割后的自联接规模以及为每个分割的独立草图公平地分配存储空间两个方面来保证近似估算质量。
     (3)研究了大规模分布式数据流中频繁项的发现算法。通过对单个数据流频繁项的发现算法的分析，形式化地定义了基于时间点的分布式数据流频繁项的发现问题。并提出了基于Lossy Counting算法的、分布式的合并算法DMA(Distributed Merging Algorithm)的一种分层结构来发现从叶子结点直至根结点的概要结构，并通过设置精确梯度使网络数量最小及数据中心和网络链接所消耗的计算资源最小来优化分布式系统的通信负载。
     (4)研究了基于P2P的分布式数据流查询处理的中间件和原型开发。首先利用P2P的特性改进了索引结构的定位查询过程和稳定性。然后，将数据流的概要映射到改进的弦环节点，将基于内容的路由扩展到分布式流索引中，在此基础上，提供连续近似查询，并利用最小边界矩形MBR等优化方法，通过自适应地调整MBR的每一维f的高低边界来改进系统的精确度。在减小中心数据和网络链接所消耗的计算资源的情况下，加快和提高流数据查询和挖掘的效率，及时响应客户的查询请求。
     本论文的研究依托于国家863项目“基于Web服务的数据库新技术”的子项目“基于Web服务的电子商务”的研究来进行。所有的科研工作是建立在对大量参考文献的阅读理解、理论分析和实验测试的基础上，经实验和分析表明，所提出的算法和基于P2P的中间件具有良好的性能特性，可以为分布式数据流应用提供运行与开发的环境。
With the development of large network and Web application, a new kind of data—Data stream come into being in the application areas of monitor and sensor networks in-break detecting, communication data management stock analysis and so on. These data sequence are relational tuples、sensor values、network parameters、phone records and share data etc. Different from traditional database application model, data stream model have characteristics as below: (1) Continuous, real time arrival; (2)Vast, unrestrainted and hard estimated; (3) Unless storing especially, they can't be taken out to handle again, namely one-pass processing. Otherwise, withdrawing data again is very expensive. How to store、query and mine these data streans has become the hot issues in the field of international database currently.
     In many practical application, such as decision support system, query optimization, the exact values are not necessary to obtain but approximate value. Therefore, the core issue of data stream management and analysis is designing one-pass algorithm, namely the feature structure of data streamsynopsis data structure are unceasingly updated in the least memory than data scale in order to quickly achieve approximation answer in time at all hours. If the length of data stream is N, the size of synopsis structure are not excess to O(polylog(N)), and the processing time of each group of stream are not excess to O(polylog(N)).
     The primary query in traditional database is one-time queries, namely the system give answer according to the snapshot of data set. But in data streams, the query is continuous and long-run because the query answers are returned incessantly along with new data. The data streams are not static but ceaseless insertion and update. Users need dynamic monitoring along with the changed data streams but not static result of some time.
     The exist research of data stream are centralized stream system. But the essence of data stream is distributed. More and more practical data in sensor network、communication、Internet flow analysis and Web log etc, are transported from teledata sources. Therefore, we need to construct the middleware of distributed query processing for data streams to support above application.
     P2P technique is used to build a huge distributed network for the processing of large number of information by internet terminals. These computers (Peers) are equal in independent computing and processing to solve the bottleneck issue of single server. P2P can build the good platform of middleware, which validly provide real-time query answer quickly, by means of its good performance of scalability、load balance、high efficiency and dynamic adaptability based on content-indexing in order to reduce the overhead of network bandwidth and links.
     Focusing on distributed data streams, this thesis analyzes the domestic and international present research. From the current existent problem and shortage, we study the data stream's characteristic based on time variety, monitor the real time streams, investigate on the presentation and modeling methods for stream's variety, mine the data evolution with the trend of the variety, and forecast the future trend of data streams. In large-scale distributed environment, we study the distributed approximate algorithms for query processing and mining with minimum complexity of time and space. On one hand, we utilize the wavele coefficients by DWT to construct histogram with good precision, and extent to multi-dimension histogram to solve the traditional difficulty, and build a composite indexing structure with data stream synopsis, at the same time, we build wavelet-neural network model with the wavelet resolution to solve the tradition difficulty of hidden number of nodes, and elementaryly establish the forecast model of time sequences. On the other hand, we solve the difficulty of aggregate query in data streams by utilizing sketch tenique. We study the algorithm of frequent items in distributed data streams and set the precision gradient to reduce the load of communication. Meanwhile, with the platform of Chord network and protocol in P2P system, we study the middleware of distributed query processing and mining data streams, investigate on key functions and techniques such as query routing、lookup and location、indexing and mapping of data stream synopsis in the peer computing system. That is concrete to say, this thesis has four central innovations as follow:
     (1) Studying the query processing algorithm of distributed data streams based on wavelet.The first step is according to the theory of discrete wavelet transition (DWT), we decompose Haar wavelets in order to obtain the important wavelet coefficients.Then, we analyze the computing model of data streams, formalize the stream's query model. Based on above research, we provide a kind of new method to construct synopsis data structure of data streams, establishing a compound indexing construction to handle the inner product queries and similarity queries. In addition, combining the good performance of time-frequent localization in wavelet neural network (WNN) and self-study in neural network (NN), we establish the forecast model of time sequence.
     (2) Studying the query processing algorithm of distributed data streams based on sketching technique. Firstly, through analyzing the approximate sketch-based processing methods, and utilizing the random techniques, we computing the pseudo sketching synopisis to the on-time arrival streams. Based on above research, we provide a novel technique of sketching partition through intelligent division for the value range of attribute and reduction of the self-join scale to fairly allocate memory space. In result, we ensure the good estimated quantity of approximate query.
     (3) Studying the mining algorithm of frequent items in large scale distributed data streams. Firstly, we formally define the mining algorithm of frequent items base on time point. Then, we take forward to the Distributed Merging Algorithm (DMA), a hierarchical structure to discover the stream synopsies from all leaves to root node, through setting the precision gradient to optimize the communication load and overhead of distributed system.
     (4) Studying the query processing middleware of distributed data stream based on P2P system. The first step is utilizing the P2P characteristic to improve the procedure of location searching and stabilization.Then, the data stream synopisis are mapping to the improved Chord node and content-based routing are extended to distributed stream index. Based on above research, the continuous approximate query are providing to improve the efficiency of query and mining processing for data stream. In meanwhile, the optimization method of Minimum Bounded Rectangle (MBR) is introduced to reduce center data and minimize the computing resource of network link. Through adjusting adaptively the high and low bounding of MBR, The precision of system are improving to achieve the query answer in time.
     By means of the sub-project "Web Services-based E-commerce" in 863 national project "Web Services-based database new technology", the research work in this thesis utilize large number of domestic and international references、theory analysis and experiment test. Estimated experiment and analysis illustrate that above algorithms and the middleware have good performance, which can be maked use of in running and development environment of data stream applications.

引文

[1] D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. Monitoring streams A New Class of Data Management Applications. In Proc. 28th Int. Conf. on Very Large Data Bases, 2002, pp. 215-226.

    [2] Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc, of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. pp1-16.
    [3] Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc, of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. pp909-910.
    [4] Babcock B, Datar M, Motwani R, O'Callaghan L. Maintaining variance and k-Medians over data stream windows. In: Neven F, ed. Proc, of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. San Diego: ACM Press, 2003. pp234-243.
    [5] M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, S. Zdonik. Scalable Distributed Stream Processing. In Proc. 1st Biennial Conf. on Innovative Data Syst. Res, 2003.

    [6] B. Babcock, S. Babu, M. Datar, R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. To appear in Proc. ACM Int. Conf. On Management of Data, June 2003.
    [7] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah. TelegraphCQ: Continuous Data Processing for an Uncertain World. In Proc. 1st Biennial Conf. on Innovative Data Syst. Res, 2003,
    [8] Gibbons PB, Tirthapura S. Distributed streams algorithms for sliding windows. In: SPAA 2002: Proc, of the 14th Annual ACM Symp. on Parallel Algorithms and Architectures. Winnipeg: ACM Press, 2002. 63-72.
    [9] A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Stanford University Technical Report 2002-57, November 2002. Available at http://dbpubs.stanford.edu:8090/pub/2002-57.
    [10] Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. 28th Int. Conf. on Very Large Data Bases, 2002, pp. 358-369.

    [11] P. Tucker, D. Maier, T. Sheard, L. Fegaras. Enhancing relational operators for querying over punctuated data streams. Manuscript, 2002. Available at http://www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf.

    [12] Zhu Y, Shasha D. StatStream: Statistical monitoring of thousands of data streams in real time. In: Bernstein P, Ioannidis Y, Ramakrishnan R, eds. Proc, of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann, 2002. 358-369.

    [13]Vitter JS. Random sampling with a reservoir. ACM Trans, on Mathematical Software, 1985,11(1):37-57.
    [14] Garofalakis M, Gehrke J, Rastogi R. Querying and mining data stream: you only get one look.A tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc, of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison:. 635.

    [15] J. Feigenbaum, S. Kannan,M. Strauss, and M. Viswanathan. An approximate 11-difference algorithm for massive data streams. In Proc, of the 1999 Annual IEEE Symp. On Foundations of Computer Science, pages 501-511,1999.

    [16] P. Flajolet and G. Martin. Probabilistic counting. In Proc. Of the 1983 Annual IEEE Symp. on Foundations of Computer Science, 1983.
    [17] Gibbons PB, Matias Y. New sampling-based summary statistics for improving approximate query answers. In: Haas LM, Tiwary A, eds. SIGMOD 1998, Proc, of the ACM SIGMOD Int'l Conf. on Management of Data. Seattle, 331-342.
    [18] Guha S, Koudas N, Shim K. Data-Streams and histograms. In: Yannakakis M, ed. Proc, of the 33rd Annual ACM Symp. on Theory of Computing. Heraklion: ACM Press, 2001. 471-475.
    [19] Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M. Fast, small-space algorithms for approximate histogram maintenance. In: Reif JH, ed. Proc, of the 34th Annual 2002. ACM Symp. on Theory of Computing. Montreal
    [20] Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In: Eppstein D, ed. Proc, of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms. San Francisco: ACM/SIAM, 2002. 635-644.
    [21] Gibbons PB, Tirthapura S. Distributed streams algorithms for sliding windows. In: SPAA 2002: Proc, of the 14th Annual ACM Symp. on Parallel Algorithms and Architectures. Winnipeg: ACM Press, 2002. 63-72.
    [22] DeHaan D, Demaine ED, Golab L, Lopez-Ortiz L, Munro JI. Towards identifying frequent items in sliding windows. Technical Report, CS-2003-06, Waterloo: University of Waterloo, 2003.
    [23] Babcock B, Datar M, Motwani R. Sampling from a moving window over streaming data. In: Eppstein D, ed. Proc, of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms. San Francisco: ACM/SIAM, 2002. 633-634.

    [24] X. Lin, H. Lu, J. Xu, and J. X. Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In Proc, of the 20th Intl.Conf. on Data Engineering, Mar. 2004.

    [25] Cormode G. Stable distributions for stream computations: It's as easy as 0, 1, 2. 2003. http://www.research.att.com/conf/mpds2003/schedule/cormode.ps
    [26] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In Proc, of the 1998 Intl. Conf. on Very Large Data Bases, pp 299-310, 1998.
    [27] Bloom B. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM, 1970,13(7), pp422-426.
    [28] Cohen S, Matias Y. Spectral bloom filters. In: Halevy AY, Ives ZG, Doan AH, eds. Proc, of the 2003 ACM SIGMOD Int'l Conf. on Management of Data. San Diego: ACM Press, 2003. pp241-252.
    [30] Greenwald M, Khanna S. Space-Efficient online computation of quantile summaries. In: Proc, of the 2001 ACM SIGMOD Int'l Conf. on Management of Data. Santa Barbara: ACM Press, 2001.
    [30] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc, of the 1996 Annual ACM Symp. on Theory of Computing, pp 20-29, 1996.
    [31] N. Alon, P. Gibbons, Y. Matias, andM. Szegedy. Tracking join and self-join sizes in limited storage. In Proc, of the 1999 ACM Symp. on Principles of Database Systems, pp 10-20, 1999.
    [32] J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. Of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pp 13-24, May 2001.streams. SIGMOD Record, 30(3):109-120, Sept. 2001.
    [33] A. Dobra, J. Gehrke,M. Garofalakis, and R. Rastogi. Processing complex aggregate queries over data streams. In Proc, of the 2002 ACM SIGMOD Intl. Conf. on Management of Data,
    [34] S. Chandrasekaran, M. J. Franklin. Streaming Queries over Streaming Data. In Proc. 28th Int. Conf. on Very Large Data Bases, 2002, pp. 203-214.

    [35] G. Manku and R. Motwani. Approximate frequency counts over streaming data, 2002.
    [36] Jawerth B, Sweldens W. An overview of wavelet based multiresolution analyses. SIAM Review, 1994,36(3), pp377-412.
    [37] Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss MJ. Surfing wavelets on streams: One-Pass summaries for approximate aggregate queries. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. VLDB 2001, Proc, of the 27th Int'l Conf. Roma: Morgan Kaufmann, 2001. pp79-88.
    [38] Garofalakis M, Gibbons PB. Wavelet synopses with error guarantees. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc, of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002., pp476-487.
    [39] A. Deshpande, S. Nath, P. Gibbons, S. Seshan. Cacheand-Query for Wide Area Sensor Databases, in Proc. ACM Int. Conf. on Management of Data, June 2003.
    [40] R. Sadri, C. Zaniolo, A. M. Zarkesh, J. Adibi. Optimization of Sequence Queries in Database Systems. In Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, 2001.
    [41] P. Seshadri, M. Livny, R. Ramakrishnan. The Design and Implementation of a Sequence Database System. In Proc. 22nd Int. Conf. on Very Large Data Bases, 1996,
    [42] Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. Theoretical Computer Science, 2004,312(1), pp3-15.
    [43] Y. Chen, G. Dong, J. Han, B. W.Wah, J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. In Proc. 28th Int. Conf. on Very Large Data Bases, 2002, pp. 323-334.
    [44] P. Domingos, G. Hulten. Mining High-Speed Data Streams. In Proc. 6th ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining, 2000, pp. 71-80.
    [45] G. Hulten, L. Spencer, P. Domingos. Mining Time-Changing Data Streams. In Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2001, pp. 97-106.

    [46]S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering data Streams. In Proc. IEEE Symp. on Foundations of Computer Science, pp. 359-366.
    [47]F. Korn, S. Muthukrishnan, D. Srivastava. Reverse Nearest Neighbor Aggregates over Data Streams. In Proc. 28th Int. Conf. on Very Large Data Bases, 2002, pp. 814-825.
    [48]O'Callaghan L, Mishra N, Meyerson A, et al. Streaming-data algorithms for high-quality clustering[C]. Proc of IEEE International Conference on Data Engineering, 2002.
    [49]Guha S,Mishra N,MotwaniR, et al. Clustering data streams[C]. Proc of IEEE Symposium on Foundations of Computer Science(FOCS'00), 2000, pp71-80.
    [50]Aggarwal C, Han J, Wang J, et al. A framework for clustering evolving data streams[C]. Berlin, Germany: Proc of lnt Conf on VLDB'03, 2003.
    [51]Guha S, Meyerson A, Mishra N, et al. Clustering data streams: Theory and practice[J]. Knowledge andData Engineering, IEEE Transactions, 2003, 15(3), pp515-528
    [52]Dora Cai Y, Clutter D, Pape G, et al. MAIDS mining alarming incidents from data streams[C]. Paris, France:Proc of the 23rd ACMSIGMOD, 2004.
    [53]Dong G, Hart J, LVS Lakshmanan, et al. Online mining of changes from data streams: Research problems and preliminary result[C]. Proc of ACM SIGMOD Workshop on Management and Processing of Data Streams, 2003.
    [54]HE Zeng-you, XU Xiao-fei, DENG Sheng-chun. Squeezer: An efficient algorithm for clustering categorical data[J]. Journal of Computer Science and Technology, 2002, 17(5), pp611-624.
    [55]王鹏，施伯乐等．CAPE—数据流上的基于频繁模式的分类算法[J]．计算机研究与发展。VOL．41，NO．10，OCT．2004，PP1677～1683
    [56]Giannella C, HAN Jia-wei, JIAN Pei, et al. Mining frequent patterns in data streams at multiple time granularities[C]. Proc of the NSF Workshop on Next Generation Data Mining, 2002.
    [57]Ganguly S, Garofalakis M, Rastogi R. Processing set expressions over continuous update streams. In: Halevy AY, Ives ZG, Doan AH, eds. Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data. San Diego: ACM Press, pp265-276.
    [58]JIANG Sheng-yi, LI Qing-hua, WANG Hui. A novel intrusion detection method[C]. Proc of IFIP International Conference on Network and Parallel Computing, 2004.
    [59]J. M. Hellerstein, W. Hong, S. Madden, K. Stanek. Beyond Average: Toward Sophisticated Sensing with Queries. To appear in Proc. 2nd Int. Workshop on Information Processing in Sensor Networks, Apr. 2003.
    [60]Stream Query Repository. www-db.stanford.edu/stream/sqr.
    [61]M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. To appear in Proc. 19th Int. Conf. on Data Engineering, 2003.
    [62]张联峰，刘乃安等，综述：P2P技术计算机工程与应用2003，12，pp142-145．
    [63]Ion Stoica, Robert Morris, David Karger; MFrans Kaashoek, Hari Balakrishnan. Chord. A Scalable Peer-to-peer lookup service for internet applications. Technical Report. TR-819, MIT LCS, March 2001. http://www.pdos.lcs.mit.edu/chord/papersl.
    [64]Gnutella. http://gnutella.wego.coml.
    [65]Ohaha, Smart decentralized peer-to-peer sharing http://www.ohaha.comldesign.html.
    [66]ROWSTRON, A., AND DRUSCHEL, P. Pastry: Scalable, distributed object location crud routing for large-scale peer-to-peer systems. In Proceedings of the ACM International Conference on Distributed Systems Platforms, Nov. 2001, pp. 329-350.
    [67]徐佩霞, 孙功宪《小波分析与应用实例》第二版, 中国科学技术大学出版社,2001.10, pp1-8 Xu Pei-Xia, Sun Gong-Xian《Wavelet analysis and application instances》the second edition, The publishing of University of science technology of China,2001.10, pp1-8.
    [68]E. J. Stollnitz, T. D. Derose, and D. H. Salesin. Wavelets for Computer Graphics. Morgan Kaufmann, 1996.
    [69]A.Bulut and A.K.Singh. SWAT: Hierarchical stream summarization in large networks. In IEEE International Conference on Data Endineering, pp 563-575.2003.
    [70]N.Beckmann, H.P.Kriegel, R.Schneider, B.Seeger. The R*_tree: An efficient and access method for points and rectangles. In SIGMOD, pp 322-331, 1990.
    [71]S.Guha, N.Koudas, and K.Shim. Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In ICDE, pp 567-576, 2002.
    [72]HTTP://kumo.swcp.com/stocks/. S&P500 historical stock exchange data.
    [73]Yi-Leh Wu, Divyakant Agrawal, and Amr Ei. A comparison of DFT and DWT based similarity search in time-series databases. In CIKM, pp 488-495, 2000.
    [74]潘维民，沈理．基于神经网络的时间序列动态预测器的调整学习算法[J]．电子学报，1999，27(11)，pp1-4．
    [75]杨一文，刘贵忠．基于小波网络的非线性时间序列预测及其在股市中应用[J]．模式识别与人工智能，2001，14(2)，pp213-247．
    [76]刘志刚，王晓茹．小波变换、神经网络和小波网络的函数逼近能力分析与比较[J]．电力系统自动化．2002，26(20)，pp39-43．
    [77]Mallat S. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on PAM I, 1989, 11(7), pp674-693
    [78]Zhang Qinghau, Benveniste Albert,. Wavelet Neural Networks. IEEE Trans. On Neural Networks, 2003, 3(6), pp889-898
    [79]S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy."Join Synopses for Approximate Query Answering". In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, May 1999.
    [80]L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone."Classification and Regression Trees". Chapman & Hall, 1984.
    [81]P.J. Haas and J.M. Hellerstein."Ripple Joins for Online Aggregation". In Proc. of the 1999 ACM SIGMOD Intl. Conf. On Management of Data, May 1999.
    [82]N. Alon, P.B. Gibbons, Y. Matias, and M. Szegedy."Tracking Join and Self-Join Sizes in Limited Storage". In Proc. of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, May 1999.
    [83]S. Chaudhuri and U. Dayal."An Overview of Data Warehousing and OLAP Technology". ACM SIGMOD Record, 26(1), March 1997.
    [84]J.Feigenbaum, S.Kannan, M.Strauss, and M. Viswanathan."An Approximate L~1-Difference Algorithn for Massive Data" In Proc. of the 40th Annual IEEE Symp. on Foundations of Computer Science, October 1999.
    [85]J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates over Continual Data Streams. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, September 2001.
    [86]G. Manku, S. Rajagopalan, and B. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. On Management of Data, May 1999.
    [87]ARASU, A., AND MANKU, G. S. Approximate quantiles and frequency counts over sliding windows. In Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(2004).
    [88]DEMAINE, E. D., LOPEZ-ORTIZ, A., AND MUNRO, J. I. Frequency estimation of internet packet streams with limited space. In Proceedings of the Eleventh Annual European Symposium on Algorithms(2003).
    [89]KIM, H.-A., AND KARP, B. Autograph: Toward automated, distributed worm signature detection. In Proceedings of the 13th Usenix Security Symposium(2004).
    [90]CORMODE, G., AND MUTHUKRISHNAN, S. What's hot and what's not: Tracking frequent items dynamically. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(2003).
    [91]BABCOCK, B., AND OLSTON, C. Distributed top-k monitoring. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data(2003).
    [92]J S VITTER. Random Sampling with a Reservoir. ACM Tran. Math. Software, 11(1), pp 37-57, 1985.
    [93]AGRAWAL, R., AND SRIKANT, R. Fast algorithms for mining association roles. In Proceedings of the Twentieth International Conference on Very Large Data Bases(1994).
    [94]JIN, C., QIAN, W., SHA, C., YU, J. X., AND ZHOU, A. Dynamically maintaining frequent items over a data stream. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management(2003).
    [95]K.-Y. WHANG, B. T. VANDER-ZANDEN, AND H. M. TAYLOR. A linear-time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2), pp208-229, 1990.
    [96]AKELLA, A., BHARAMBE, A., REITER, M., AND SESHAN, S. Detecting DDoS attacks on ISP networks. In Proceedings of the Twenty-Second ACM IGMOD/PODS Workshop on Management and Processing of Data Streams(2003).
    [97]I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications.In ACM SIGCOMM, August 2001.
    [98]S. Ratnasamy, P. Francis, M. Handley, and R. Karp. A Scalable Content-Addressable Network. In SIGCOMM, August 2001.
    [99]A. Rowstron and P. Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In IFIP/ACM Middleware, November 2001.
    [100]Ion Stoica. Chord simulator, http://pdos.lcs.mit.edu/cgibin/cvsweb.cgi/sfsnet/simulator/.
    [101]S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker. GHT: A Geographic Hash Table for Data-Centric Storage in SensorNets. In ACM WSNA, September 2002
    [102] FIPS 180-1. Secure hash standard. In US Department of Commerce/NIST, National Technical Information Service, Springfield, VA, April, 1995.
    [103] LEWIN, D. Consistent hashing and random trees: Algorithms caching in distributed networks. Master's thesis, Department of EECS, MIT, 1998. ailable at the MIT Library, http://thesis.mit.edul
    [104] B. Zhao, J. Kubiatowicz, and A. Joseph. Tapestry: An Infrastructure for Fault-Resilient Wide-Area Location and Routing. Technical Report UCB/CSD-01-1141, U. C. Berkeley, 2001.
    [105] A. Deshpande, S. Nath, P. B. Gibbons, and S. Seshan. Cache-and-query for wide area sensor databases. In SIGMOD, pp503-514, 2003.
    [106] A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Prentice-Hall, Englewood Cliffs, N.J., 1975.
    [107] LI, J, JANNOTTI, J, DE COUTO, D., KARGER, D., AND MORR1 S, R. A calable location service for geographic ad hoc routing. In Proceedings/the 6~(th) 4CAlInternational Conference on Mobile Computing and Networking , Boston., Massachusetts, August 2000, pp 120-130.
    [108] RATNASAMY S., FRANCIS, P., HANDLEY, M, KAPP, R., AND SHENKER, S. A scalable content-addressable network. In Proc. ACM SIGCOMM ,San Diego, CA, August 2001, pp 161-172.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700