分布式数据流负载管理技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据流处理技术是最近几年数据库领域的一个新的研究方向,由于其广泛的应用前景而得到了广大研究人员的关注。分布式系统具有廉价的成本、强大的处理能力,处理速度快、数据量大的数据流具有先天的优势,分布式数据流处理技术一直是数据流研究中的重要组成部分。
     论文主要就分布式数据流管理系统中的负载管理技术进行研究,主要工作包括以下几个方面:对已有DSMS原型系统的负载管理技术进行了分析,指出了数据流管理系统中负载管理技术的特点和设计原则,改进了Borealis中的负载管理模块,建立了一个实验平台;提出了一种基于算子相关性分析和网络流量分析的负载平衡算法,有效的减少了算子迁移所带来的负面影响;对分布式查询网络中的降载问题进行了研究,分析了节点间的负载依赖性,改进了一种分布式降载策略;研究了已有的滑动窗口连接查询中的降载技术,结合前人一些研究成果的优点,提出了一种基于基本窗口输出频率的滑动窗口连接查询降载算法。本文对负载管理技术的研究,立足于实际应用环境,尽可能减少不必要的假设,为数据流应用中的负载管理问题的进一步研究提供了理论支持和实验借鉴。
Recently, data stream processing technology has becoming a new research direction in database fields. It wins increasing researchers’attention because of its wide application prospects. Distributed systems, with low cost and powerful processing capability, have inherent advantages in processing fast speed and large volume data streams. Distributed data streams processing technology has been one of the most important parts of data streams research area.
     In this paper, we study the load management technology in distributed data stream management system, the contributions conclude the following aspects. Analyzes the load manangement technology of existing DSMS prototype system, summarizes the load management technology’s characteristics and design principles, improves the Borealis’s load management module and establishes a system for experiment. Presents a load balance algorithm based on operator correlation analysis and network traffic analysis, moreover, it reduces the negative impact of operator migration. Investigates load shedding issues on the distributed query network, analyzes the load dependence between each node and improves a distributed load shedding strategy. Studies the existing load shedding technology in sliding window join and proposes a sliding window join query load shedding algorithm which based on output frequency of base window. The study of load management technology in this paper is based on the actual application environment with unnecessary assumptions, provides the theory and application support for further study.
引文
[1]金澈清,钱卫宁,周傲英,流数据分析与管理综述,软件学报, 2004,15(8):1172~1181.
    [2]张玲东,毛宇光,曹晨光等,数据流管理系统研究与进展,计算机应用研究, 2005,6:12~15.
    [3]梁保平,分布式数据流查询处理的研究, [硕士学位论文],南京,南京航空航天大学, 2007.
    [4]周晓峰,王志坚,分布式计算技术综述,计算机时代, 2004.12: 3~5.
    [5] B. Babcock, S. Babu, M. Datar, et al. Models and Issues in Data Stream Systems, In Proc. of the 2002 ACM Symp on Principles of Database Systems, 2002:1~16.
    [6] The STREAM Group, STREAM: The Stanford Stream Data Manager, IEEE Data Engineering Bulletin, 2003, 26(1):19~26.
    [7] S. Chandrasekharan, O. Cooper, et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World, In Proc. of the 1st Conference on Innovative Data Systems Research, 2003, 269~280.
    [8] D. J. Abadi, D. Carney, U. Cetintemel, et al. Aurora: a New Model and Architecture for Data Stream Management, In VLDB Journal 2003, 2(12): 120~139.
    [9] S. Zdonik, M. Stonebraker, M. Cherniack, et al. The Aurora and Medusa Projects, IEEE Data Engineering Bulletin, March 2003, 26(1): 3~10.
    [10] D. J. Abadi, Y. Anmad, M. Balazinska, et al. The Design of the Borealis Stream Processing Engine, In CIDR Conference, January 2005.
    [11] J. H. Hwang, M. Balazinska, A. Rasin, et al. High-Availability Algorithms for Distributed Stream Processing, The 21st International Conference on Data Engineering (ICDE 2005). Tokyo, Japan, April 2005.
    [12] N. Tatbul, U. Cetintemel, S. Zdonik, et al. Load Shedding in a Data Stream Manager, In Proc. of the 29th Intl Conference on Very Large Databases (VLDB’03) , 2003.
    [13]李卫民,于守健,骆轶妹等,流数据管理的降载技术:研究进展,计算机科学, 2007,34(6): 112~115.
    [14]王金栋,数据流系统中的负载管理技术应用研究, [博士学位论文],南京,南京航空航天大学, 2006.
    [15] D. Gupta, P. Bepari, Load Sharing in Distributed Systems, In Proc. of the National Workshop on Distributed Computing. June 1999.
    [16]张宇晴,佟振声,胡旦华,分布式系统中动态负载平衡算法的研究,计算机仿真, 2003,20(9): 69~70.
    [17] Y. Xing, Load Distribution for Distributed Stream Processing, Current Trends in Database Technology-EDBT 2004 Workshops. Berlin / Heidelberg: Springer, 2004: 112~120.
    [18] Y. Xing, S. Zdonik, J. H. Hwang, Dynamic Load Distribution in the Borealis Stream Processor, The 21st International Conference on Data Engineering. Tokyo, Japan, 2005.
    [19] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, et al. Flux: An Adaptive Partitioning Operator for Continuous Query Systems, In Proc. of the 19th Intl. Conference on Data Engineering (ICDE 2003), Mar. 2003: 25~36.
    [20] M. Balazinska, H. Balakrishnan, M. Stonebraker, Contract-Based Load Management in Federated Distributed Systems, 1st Symposium on Networked Systems Design and Implementation (NSDI) San Francisco, CA, March 2004: 197~210.
    [21] B. Babcock, S. Babu, M. Datar, et al. Operator Scheduling in Data Stream Systems, The International Journal on Very Large Data Bases. 2004, 12(4): 333~353.
    [22]王金栋,周良,张磊等,面向分布式数据流系统的可扩展负载平衡算法,应用科学学报, 2006, 24(3): 250~255.
    [23]王金栋,周良,张磊等,分布式数据流处理中的负载分配策略,南京航空航天大学学报, 2006, 38(2): 212~216.
    [24] B. Babcock, M. Datar, R. Motwani, Load Shedding for Aggregation Queries over Data Streams, In Proc. of the 20th International Conference on Data Engineering, 30 March-2 April 2004: 350 ~361.
    [25] N. Tatbul, U. Cetintemel, S. Zdonik, Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing, In Proc. of the Intl Conference on Very Large Databases (VLDB’07), Vienna, Austria, 2007.
    [26] A. Das, J. Gehrke, M. Riedewald, Approximate Join Processing Over Data Streams, In Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data. 2003: 40~51.
    [27]闫莺,金澈清,曹锋等,多数据流共享窗口连接查询上的降载策略,计算机研究与发展, 2004,41(10): 1836~1841.
    [28]张龙波,李战怀,朱立平等,数据流滑动窗口连接查询降载策略研究,西北工业大学学报, 2006, 24(5): 595~598.
    [29]林锦贤,林钦仙,数据流滑动窗口连接的自适应询降载策略,福州大学学报(自然科学版), 2007, 35(3): 381~386.
    [30] G. S. Manku, R. Motwani, Approximate Frequency Counts over Data Streams, In Proc. of the 28th International Conference on Very Large Data Bases, August 2002: 346~357.
    [31] R. Karp, C. Papadimitriou, S. Shenker, A Simple Algorithm for Finding Frequent Elements in Streams and Bags, ACM Transactions on Database Systems (TODS), March 2003: 28(1): 51~55.
    [32] B. Babcock, S. Babu, M. Datar, et al. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems, In Proc. of 2003 ACM SIGMOD Intl Conference on Management of Data. San Diego: ACM, 2003: 253~264.
    [33]宋宝燕,李巍,李志强等,一种基于优先级的数据流查询实时调度策略,计算机工程, 2007, 33(9): 106~108.
    [34] A. Arasu, B. Babcock, S. Babu, et al. STREAM: The Stanford Data Stream Management System, http://dbpubs.stanford.edu/pub/2004-20, 2004.
    [35] S. Madden, M. J. Franklin, Fjording the Stream: An Architecture for Queries over Streaming Sensor Data, In ICDE Conference, 2002.
    [36] F. Reiss, J. Hellerstein, Data Triage: An Adaptive Architecture for Load Shedding in TelegraphCQ, In : IEEE ICDE Conference, Tokyo, Japan, April 2005.
    [37] J. Jannotti, D. K. Gifford, K. L. Johnson, et al. Overcast: Reliable Multicasting with an Overlay Network, In Proc. of Operating Systems Design and Implementation (OSDI), October 2000:192~212.
    [38] L. Zhi, P. Mohapatra, QRON: QoS-Aware Routing in Overlay Networks, IEEE Journal on Selected Areas in Communications, 2004, 22(1): 29~40.
    [39]张冬冬,李建中,王伟平等,分布式复式数据流的处理,计算机研究与发展, 2004, 41(10): 1780~1785
    [40] K. Munagala, U. Srivastava, J. Widom, Optimization of Continuous Queries with Shared Expensive Filters, http://dbpubs.stanford.edu:8090/pub/2005-36, 2005.
    [41] D. Carner, U. Cetintemel, M. Cherniack, et al. Monitoring Streams: A New Class of Data Management Applications, In Proc. of the 28th International Conference on Very LargeDataBases (VLDB’02), Hong Kong, China, August 2002.
    [42] L. Amini, N. Jain, A. Sehgal, et al. Adaptive Control of Extreme-scale Stream Processing Systems, In IEEE ICDCS Conference, Lisboa, Portugal, July 2006.
    [43] A. Ayad, J. F. Naughton, Static Optimization of Conjunctive Queries with Sliding Windows Over Infnite Streams, In ACM SIGMOD Conference, Paris, France, June 2004.
    [44] The GNU Linear Programming Kit (GLPK), http://www.gnu.org/software/glpk/.
    [45] A. Arasu, S. Babu, J. Widom, The CQL Continuous Query Language: Semantic Foundations and Query Execution, Technical report, Stanford University, October 2003, http://dbpubs.standford.edu/pub/2003-67.
    [46]韩东红,王国仁,数据流系统中卸载技术研究综述,计算机科学, 2005,32(8):102~105.
    [47] J. Kang, J. F. Naughton, S. D. Viglas, Evaluating Window Joins over Unbounded Streams, In Proc. of the 28th VLDB Conference, Hong Kong, China, 2002.
    [48] A. Dobra, M. Garofalakis, J. Gehrke, et al. Processing Complex Aggregate Queries over Data Stream, In Proc. of the 2002 ACM SIGMOD, Madison, Wisconsin, 2002.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700