数据流管理系统的研究与设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,在金融服务、电信数据管理、网络监控和传感器监控等领域中,出现了一类新的数据密集型应用。这类应用都有类似的特征:数据是瞬时的、连续的、随时间变化的,数据不宜用持久稳定的关系建模,而适合用瞬态的数据流来建模。把这类连续到达的数据放到传统的关系数据库中进行管理是不可行的,由此产生了一些新的数据流方向上的研究。
     本文的重点在于数据流管理系统方面的研究和设计。分析了国外主要的现有的数据流管理系统,比较了传统的数据库管理系统和数据流管理系统的区别;设计了数据流管理系统MyStream的整体框架,并且给出了各个模块的详细描述,给出了关键部分的数据结构描述,提出了一种新的以SQL语言为基础的数据流查询语言MyCQL,在用户接口方面引入了COM接口的方式;提出了一种改进的数据流时间戳管理算法,在原有的STREAM系统的时间戳管理算法上引入了概率,使得算法的参数设定更加灵活;在操作符调度方面,给出了一种基于速率的抢占式批调度算法,给出了一些速率计算方面的定义,并且比较了和FIFO、RR等调度方式在效率上的区别;将MyStream应用到交通管制中,通过根据交通状况来设置不同的费用,来降低道路的拥堵状况。
Recently there are new data-intensive applications, such as financial service, telecommunication data management, network monitoring, sensor detection, and so on. Such applications have similar features: data is instantaneous, continuous, and time-varying. The data is more suitable for being modeled by transient data streams than by permanent relations. Putting such continuous arrival data into the traditional relation database to manage is not feasible. Then there are some new researches on data stream.
     In this paper the emphasis is the research and design of the data stream management system. We analyze the foreign major existing data stream management systems, compare the difference between traditional database management system and the data stream management system, design the framework of the data stream management system MyStream, give the detailed descriptions of each module and the data structure descriptions of the key parts, propose a new SQL-based data stream query language MyCQL, use the COM interface in the user interface, propose an improved data stream timestamp management algorithm, adding probability to the original STREAM timestamp management algorithm. The algorithm makes the parameter settings more flexible. In the operator schedule aspects, we propose a rate-based preemptive batching scheduling algorithm, give some definitions of the rate computation, and compare with other scheduling algorithms such as FIFO, RR. We apply MyStream to the traffic control, set the different fees according to the traffic conditions to reduce the traffic congestion.
引文
[1] B. Babcock, S. Babu, M. Datar, et al. Models and Issues in Data Stream Systems. Proc. of the 2002 ACM Symp on Principles of Database Systems, 2002:1~16.
    [2] L. Golab, M. Tamer ?zsu. Issues in Data Stream Management. SIGMOD Record, 2003, 32(2):5~14.
    [3] A. Arasu, B. Babcock, S. Babu, et al. STREAM: The Stanford Data Stream Management System. http://dbpubs.stanford.edu/pub/2004-20, 2004.
    [4] D. Carney, U. Centintemel, M. Cherniack, et al. Monitoring Streams-A New Class of Data Management Applications. In Proc. of the 28th Int’l. Conf. on Very Large Data Bases, 2002:215~225.
    [5] D.Abadi, D.Carney, U.Cetintemel, et al. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, 2003, 2(12):120~139.
    [6] S. Chandrasekharan, O. Cooper, A. Deshpande, et al. TelegraphCQ: Continuous Data Flow Processing for an uncertain World. In Proc. of the 1st Conf. on Innovative Data Systems Research, 2003:269~280.
    [7] S. Krishnamurthy, S. Chandrasekaran, O. Cooper, et al. TelegraphCQ: An Architectural Status Report. Proc. of IEEE Conference on Data Engineering, 2003, 26(1):11~18.
    [8] The Telegraph Project. http://telegraph.cs.berkeley.edu.
    [9] A. Arasu, S. Babu, J. Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. Technical report, Stanford University. http://dbpubs.stanford.edu/pub/2003-67, 2003.
    [10] A. Arasu, S. Babu, J. Widom. CQL: A Language for Continuous Queries over Streams and Relations. In 9th Int’l. Workshop on Database Programming Languages, 2003:1~11.
    [11] A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. http://dbpubs.stanford.edu:8090/pub/2002-57, 2002.
    [12] M. Cherniack, H. Balakrishnan, M. Balazinska, et al. Scalable Distributed Stream Processing. Proc. of the First Biennial Conference on Innovative Database SystemsResearch, 2003:257~268.
    [13] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, et al. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. http://telegraph.cs.berkeley.edu/papers.html, 2003.
    [14] R. Avnur, J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. Proc. of ACM SIGMOD Conference, 2000:261~272.
    [15] L. Liu, C. Pu, W. Tang. Continual Queries for Internet-Scale Event-Driven Information Delivery. IEEE Trans. Knowledge and Data Eng. 1999, 11(4):610~628.
    [16] J. Chen, D. J. DeWitt, F. Tian, et al. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In ACM SIGMOD Conf., 2000:379~390.
    [17] D. Terry, D. Goldberg, D. Nichols, et al. Continuous Queries over Append-only Databases. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, 1992:321~330.
    [18] U. Schreier, H. Pirahesh, R. Agrawal, et al. Alert: Architecture for Transforming a Passive DBMS into an Active DBMS. Proc. of the 1991 Intl. Conf. on Very Large Data Bases, 1991:469~478.
    [19] Y. Yao, J. E. Gehrke. Query Processing for Sensor Networks. Proc. Conf. on Innovative Data Syst. Res, 2003:233~244.
    [20] C. Cranor, Y. Gao, T. Johnson, et al. GigaScope: High Performance Network Monitoring with an SQL Interface. Proc. ACM Int. Conf. on Management of Data, 2002:623~623.
    [21] Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. Proc. Int. Conf. on Very Large Data Bases, 2002:358~369.
    [22] A. Arasu, B. Babcock, S. Babu, et al. Characterizing Memory Requirements for Queries over Continuous Data Streams. ACM PODS, 2002:221~232.
    [23]杜威,邹先霞.基于数据流的滑动窗口机制的研究.计算机工程与设计, 2005, 26(11):2922~2924.
    [24] S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record, 2001, 30(3):109~120.
    [25] C. Cranor, T. Johnson, O. Spatscheck, et al. The Gigascope Stream Database. IEEE Data Engineering Bulletin, 2003, 26(1):27~32.
    [26] R. Motwani, J. Widom, A. Arasu, et al. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In CIDR Conf., 2003:245~256.
    [27] J. Gehrke, S. Madden. Query Processing in Sensor Networks. IEEE Pervasive Computing, 2004, 03(1):46~55.
    [28] J. R. Levine. Lex与Yacc(杨作梅,张旭东).北京,机械工业出版社, 2003:61~63.
    [29] R. Motwani, D. Thomas. Caching Queues in Memory Buffers. http://dbpubs.stanford.edu:8090/pub/2003-63, 2003.
    [30]潘爱民. COM原理与应用.北京,清华大学出版社, 2001:1~12.
    [31] U. Srivastava, J. Widom. Flexible Time Management in Data Stream Systems. In Proc. of PODS 2004, 2004:263~274.
    [32] S. D. Viglas, J. F. Naughton. Rate-based Query Optimization for Streaming Information Sources. In ACM SIGMOD Conf., 2002:37~48.
    [33] S. D. Viglas, J. F. Naughton, J. Burger. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. VLDB 2003:285~296.
    [34] A. M. Ayad, J. F. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams. http://pages.cs.wisc.edu/~naughton/includes/papers/, 2007.
    [35] C. Lee, C. H. Ke, J. B. Chang, et al. Minimization of Resource Consumption for Multidatabase Query Optimization. Proceedings of the 3rd IFCIS International Conference, 1998:241~250.
    [36] M. N. Garofalakis, Y. E. Ioannidis. Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources. VLDB 1997:296~305.
    [37] B. Babcock, S. Babu, M. Datar, et al. Operator Scheduling in Data Stream Systems. VLDB Journal, 2004, 13(4):333~353.
    [38] B. Babcock, S. Babu, M. Datar, et al. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, 2003:253~264.
    [39] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis. Preemptive Rate-based Operator Scheduling in a Data Stream Management System. http://www.cs.pitt.edu/, 2005.
    [40] D. Carney, U. ?etintemel, A. Rasin, et al. Operator Scheduling in a Data Stream Manager. http://www.cs.brown.edu/research/aurora/, 2003.
    [41]宋宝燕,李志强,李巍,等.数据流系统中一种基于速率的抢占式批处理方法.计算机工程, 2007, 33(3):50~52.
    [42] J. Kang, J. F. Naughton, S. D. Viglas. Evaluating Window Joins over Unbounded Streams. ICDE, 2003:341~352.
    [43] The Linear Road Benchmark. http://www.cs.brown.edu/research/aurora/linear-road.pdf.
    [44] N. Tatbul, U. ?etintemel, S. Zdonik, et al. Load Shedding in a Data Stream Manager. http://www.cs.brown.edu/research/aurora/, 2003.
    [45] B. Babcock, M. Datar, R. Motwani. Load Shedding for Aggregation Queries over Data Streams. Proc. of 20th International Conference on Data Engineering, 2004:350~355.
    [46] A. Das, J. Gehrke, M. Riedewald. Approximate Join Processing Over Data Streams. SIGMOD, 2003:40~51.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700