基于事件的分布式系统监控
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着分布式系统趋于复杂,运行时的监控在提高系统性能和可靠性上发挥了越来越重要的作用。本文主要提出一个结合监控探针平台与复杂事件处理技术的新方法,可以完成运行时的分布式系统监控,降低监控组件的开发和使用难度,提高监控管理的效率。
     监控探针平台运行于受监控的资源之上,提供对JMX组件的通用管理接口。监控组件被封装为JMX探针,从而探针平台可以对探针进行运行时的部署、元数据生成、管理和检索,统一了探针的信息查询和操作调用方式,且与现有JMX产品兼容。
     探针采用事件方式汇报监控信息,为了提高事件在网络中的传输效率与可靠性,在传输事件之前,会经过扩展的事件过滤,之后再封装为消息发往监控服务器。
     为了能迅速应对大量的探针监控事件并分析事件间的时序与关联关系,监控服务器使用了基于复杂事件处理的监控规则,将监控事件交由复杂事件引擎进行实时处理。监控规则使用类似SQL的语法描述复杂事件,对输入的基本监控事件进行过滤、关联和聚集等操作,抽象出更高层的管理事件。管理事件一旦被判定发生,对应的管理决策动作会被触发,通过操控各个监控探针操作,实现运行时的分布式系统自动配置与管理。
     上述的分布式系统监控与复杂事件处理技术已经用于仿真计算平台。根据该项目的实际需求与实践经验,本文以仿真作业分发调度、作业运行监控、系统性能评估以及节点信息统计等为例展示了监控系统的事件定义、规则配置、响应动作绑定和决策调度等功能。
With the increasing complexity of distributed systems, run-time monitoring and management have become an essential service for improving the system performance and reliability. This paper proposed a combination of monitoring probes and complex event processing (CEP) technology to achieve a new method for distributed system monitoring, which can perform run-time automated management for distributed systems, improve monitoring and management efficiency.
     A JMX based monitoring probe platform is employed on top of the managed resources to provide a common management interface, which standardizes runtime deployment, meta-data generation, location and configuration of monitoring components as probes. Since all monitoring components in the system are wrapped in probes and loaded by the probe platform, meta-data structures and operation calls are unified and compatible with the existing JMX products.
     Efficiency and reliability of monitoring information transmission in the network are also considered. This is achieved by using the event driven mode and expanded event filters. Monitoring events are filtered and encapsulated in messages during the transmission from probes to the management server.
     The management server adopts the complex event processing technology to analyze the high volume of monitoring events and perform event time- sequence correlation in real-time, which plays an important role in decision support of monitoring service. The management rules abstract higher-level management events, using SQL-like syntax to describe filtering, correlation and aggregation over basic monitoring events. Once the management events are found, the corresponding decisions will be triggered to perform system auto-management, by manipulating probes on the probe platforms.
     The aforementioned distributed monitoring infrastructure has been used in an actual simulation platform. According to the project demands and practical experience, at the end of this paper, several cases are introduced to demonstrate management process such as event definition, rule configuration, event action binding and decision execution.
引文
[1] S. Zanikolas, R. Sakellariou. A taxonomy of grid monitoring systems [J]. Future Generation Computer Systems, ELSEVIER, 2005: 163–188
    [2] Foster I., Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure [M]. Morgan Kaufmann, 2003
    [3] Marcio S. Dias. A Flexible and Dynamic Approach for Reconfigurable Software Monitoring [D]. Ph.D. Dissertation, University of California, Irvine, 2005
    [4] D.Patterson, A.Brown, P.Broadwell, et al. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and CaseStudies [R]. Computer Science Technical Report UCB, U.C. Berkeley, 2002
    [5] Common Information Model (CIM) [EB/OL]. Distributed Management Task Force, Inc. http://www.dmtf.org/standards/cim
    [6] GLUE Schema [EB/OL]. http://glueschema.forge.cnaf.infn.it
    [7] Resource Description Framework (RDF), W3C Semantic Web [EB/OL]. http://www.w3.org/RDF
    [8] Web Service Definition Language (WSDL) [EB/OL]. http://www.w3.org/TR/wsdl
    [9]姚远哲,方滨兴,刘欣然,张鸿,陈瓅,石凌. R-Net监测系统侵扰的自主控制机制、算法与策略[J].软件学报, 2007
    [10] R.Orfali, D. Harkey, J.Edwards. The Essential Distributed Objects Survival Guide [M]. John Wiley & Sons, Inc., 1996
    [11] W3C XML Schema 1.1 [EB/OL]. http://www.w3.org/XML/Schema
    [12] M. Mansouri-Samani. Monitoring of Distributed Systems [D]. Ph.D. Dissertation, University of London, 1995
    [13] P. Bates. Debugging heterogeneous distributed systems using event-based models of behavior [J]. ACM Trans Computer System, vol.13, 1995: 1-31
    [14] Ehab Al-Shaer, Hussein Abdel-Wahab, Kurt Maly. HiFi: A New Monitoring Architecture for Distributed Systems Management [C]. 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99), 1999: 171
    [15] Giovanni Vigna, Steve T. Eckmann, Richard A. Kemmerer. The STAT Tool Suite [C]. DARPAInformation Survivability Conference & Exposition, vol.2, 2000: 1046
    [16] Marcio S. Dias, Debra J. Richardson. Adaptable Analysis of Dependable System Architectures through Monitoring [J]. Architecting dependable systems III, Springer, 2005:122-147
    [17] C. L. Jerry, K. Templer W. Zhou, M. Brazell. A lightweight architecture for program execution monitoring [C]. In Proceedings of the ACM SIGPLAN/SIGSOFT Workshop Program Analysis for Software Tools and Engineering: 67-74, 1998
    [18] W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, et al. Falcon: Online monitoring and steering of large-scale parallel programs [R]. College of Computing, Georgia Institute of Technology, GIT-CC-94-21, 1994
    [19] Brian Tierney, Brian Crowley, Dan Gunter, et al. A Monitoring Sensor Management System for Grid Environments [J]. Cluster Computing Volume 4, 2001: 19-28
    [20] Umeshwar Dayal, Barbara T. Blaustein, Alejandro P. Buchmann, et al. The HiPAC Project: Combining Active Databases and Timing Constraints [C]. SIGMOD Record 17(1): 51-70, 1988
    [21] N. H. Gehani, H. V. Jagadish, O. Shmueli. Composite Events Specification in Active Databases: Model and Implementation [C]. VLDB, 1992:327-338
    [22] Stella Gatziu, Klaus R. Dittrich. Events in an Active Object-Oriented Database System [C]. Intl. Workshop on Rules in Database Systems, Edinburgh, 1993
    [23] David Luckham. A Short History of Complex Event Processing, Part 1: Beginnings [EB/OL]. http://complexevents.com/2008/02/15/a-short-history-of-complex-event-processing/
    [24] Dave Ogle, Prabha Gopinath, Karsten Schwan. Tool Integration in Distributed Programming and Execution Environments - Representing and Using Monitored Information [C]. Experimental Distributed Systems, IEEE Workshop, 1990: 83-87
    [25] Roberto S. Silva Filho, Cleidson R.B. de Souza, David F. Redmiles. Design and Experiments with YANCEES: a Versatile Event Notification Service [R]. Institute for Software Research, University of California, Irvine, UCI-ISR-04-1, 2004
    [26] David Luckham, Roy Schulte. Event Processing Glossary - Version 1.1 [EB/OL]. Event Processing Technical Society, 2008. http://complexevents.com/2008/08/31/event-processing-glossary-version-11/
    [27] Brenda M. Michelson. Event-Driven Architecture Overview [EB/OL]. Patricia Seybold Group, 2006. http://www.omg.org/soa/Uploaded%20Docs/EDA/bda2-2-06cc.pdf
    [28] David Luckham. What’s the Difference between ESP and CEP? [EB/OL]. 2006. http://complexevents.com/2006/08/01/what%E2%80%99s-the-difference-between-esp-and-cep/
    [29]刘思中.面向SOA的事件信息组件[D].硕士论文,上海交通大学, 2007
    [30]李好.复杂事件处理技术在分布式系统中的应用[D].硕士论文,华中科技大学, 2007
    [31]吴思楠.基于复杂事件处理的RFID物品分拣系统研究与实现[D].硕士论文,电子科技大学, 2009
    [32] Mark Palmer. An Overview and History of Complex Event Processing [EB/OL]. The Event Processing Blog. 2007
    [33]张昕.基于复杂事件处理的金融软件系统实现及改进[D].硕士论文,浙江大学, 2007
    [34]丁剑,白晓民,赵伟,等.基于复杂事件处理技术的电网故障信息分析及诊断方法[J].中国电机工程学报Vol. 27, 2007
    [35] Java Management Extensions (JMX) Specification, version 1.4 [EB/OL]. Sun Microsystems Inc, 2006. http://java.sun.com/javase/6/docs/technotes/guides/jmx/
    [36] Ben G. Sullins, Mark B. Whipple. JMX in Action [M]. Manning Publications Co. 2003
    [37] Esper Reference Documentation, version 3.0.0 [EB/OL]. EsperTech Inc., 2009. http://esper.codehaus.org/esper/documentation/documentation.html
    [38] Qusay H. Mahmoud. Distributed Programming with Java [M]. Manning.欧阳光,安锦(译), Java分布式程序设计.国防工业出版社, 2002
    [39] Sharma Chakravarthy, Deepak Mishra. Snoop: An Expressive Event Specification Language for Active Databases [R]. UF-CIS-TR-93-007, 1993
    [40]姚远哲,方滨兴,刘欣然,等. R-Net监测系统侵扰的自主控制机制、算法与策略[J].软件学报, 2007
    [41] Mark Hapner, Rich Burridge. Java Message Service API Tutorial and Reference.康博(译), Java消息服务API参考指南.清华大学出版社, 2002
    [42]蒋健,王昱. GlassFish--开源的Java EE应用服务器[M].北京:清华大学出版社, 2007
    [43] David Luckham. What’s the Difference Between ESP and CEP? [EB/OL]. 2006. http://complexevents.com/2006/08/01/what%E2%80%99s-the-difference-between-esp-and-cep/
    [44] David Luckham, Brian Frasca. Complex Event Processing in Distributed Systems [A]. 1998

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700