动态跟踪系统的性能模型研究及基于动态跟踪技术的机群监测软件的设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近10年来,个人电脑在性能有了长足进步的同时,其价格也越来越便宜。因此在需要服务器的领域和在需要一定的运算性能的领域,以廉价的个人电脑而组建的机群正在以其不俗的性能和更低廉的价格逐渐在中低端服务器市场和商用机群市场中抢得一席之地。这种机群大多以开源软件为基础进行构建。一个不可否认的现状是:虽然单机环境下的各种软件均已成熟,但在机群环境中各种管理和控制软件却仍有待进步。对于监测软件,单机中已有各种实现,可以说监测技术和监测软件本身已经没有什么难点了,扩展到机群环境中时只需考虑其功能的扩展性和易用性即可。但事实却相反,目前大多数的监测软件均只能查看简单的全局信息。这其中存在两个问题:一是机群监测软件均只针对通用化的平台和通用化的目的进行构建;二是各种机群监测软件的开发中忽视了扩展性和可定制性。因此构建一个具备“动态”、“可扩展”特性的“全面”且“多功能”的适用于自我组建的中小规模机群的“可定制”的机群监测软件成为一个必要。
     本文在详细分析了目前机群发展的现状和机群监测软件的现状之后,提出了评估机群监测软件的一系列标准。并针对这些标准进行了机群监测软件——DDTC/DDTS的设计和开发。
     该监测软件基于动态跟踪系统而构建,因此在功能性和扩展性方面能够满足要求。软件本身是一个只提供基本功能的运行框架,通过插件机制来满足所需的动态性和多功能性的要求。
     由于监测软件的一个重要评价标准是低侵扰性,因此在构建DDTC/DDTS前,本文还对动态跟踪系统(DTrace)进行了性能模型的分析,通过性能模型可以预判出监测软件在不同条件下会给被监测节点带来的性能影响情况。实践证明,在正常的使用情况下,对工作节点的侵扰度小于1%。
In the last 10 years,personal computer becomes cheaper and more powerful.The trend is that more and more PC clusters are applied as servers because they are cheap and as powerful as high-end servers and commercial clusters.Most of them are built with open-source software and free software.This raises the problem that there are lack of useful management software and monitoring software on PC cluster even though the relevant software is well-designed on single PCs.It is because most of them are designed for general purpose usage and designed for universal clusters. Another problem is that the design of these cluster management software and cluster monitoring software are not extensible and tailorable.Hence,we want to design and implement a new cluster monitor which is "dynamic","extensible","tailorable".And it will be deployed on small-scale non-comercial PC clusters only.
     In the paper,we give out a survey about the clusters and the monitoring software first.After this,we put forward our criterions about what the cluster monitor should be.We build our cluster monitor,DDTC/DDTS,under the direction of these criterions.
     The data-scope problem and extensibility problem will be well-solved because DDTC/DDTS is based on Dynamic Tracing System,which could tracing system as you want.Through our plug-in mechanism,the monitor's function could be extended easily by developing new plug-ins.
     Another significant criterion is monitors' run-time affection on it resident systems.Before designing DDTC/DDTS,we model the performance of DTrace,and give expressions about the relation between probes and CPU usage.It is easy to estimate the affection brought in by DTrace scripts through this model.It is proved that the affection is less than one percent in general.
引文
[1]百度.eniac[EB/OL].http:/baike.baidu.com/view/22486.htm,2008-01-18
    [2]Randal E.Bryant,David R.O'Hallaron.Computer Systems:A Programmer's Perspective [M].USA:Pearson Education Inc,2003:127
    [3]John L.Hennessy,David A.Patterson.Computer Architecture:A Quantitative Approach 4Ed [M].北京:机械工业出版社,2007:16
    [4]Wikipedia.计算机集群.[EB/OL].http://zh.wikipedia.org/wiki/计算机集群
    [5]Abrahan Silberschatz,Peter Baer Galvin,Greg Gagne.Operating System Concepts 6Ed[M].北京:高等教育出版社,2002:16
    [6]Karl Kopper.The Linux Enterprise Cluster[M].北京:中国水利水电出版社,2007
    [7]Andrew S.Tanenbaum,Maarten van Steen.Distributed Systems:Principles and Paradigms [M].北京:清华大学出版社,2004:283
    [8]Abrahan Silberschatz,Peter Baer Galvin,Greg Gagne.Operating System Concepts 6Ed[M].北京:高等教育出版社,2002:505
    [9]金戈.Linux高性能计算集群概述[EB/OL].http://www.ibm.com/developerworks/cn/linux/cluster/hpc/partl/index.html,2002-11-09
    [10]车静光.微机机群组建、优化和管理[M].北京:机械工业出版社,2004
    [11]The Beowulf Cluster Site[EB/OL].http://www.beowulf.org
    [12]Abrahan Silberschatz,Peter Baer Galvin,Greg Gagne.Operating System Concepts 6Ed[M].北京:高等教育出版社,2002:13
    [13]Top500 Supercomputing Site[EB/OL].http://www.top500.org/2007/11/
    [14]服务器市场分析报告[EB/OL].http://server.51cto.com/art/200602/20620.htm,2006-02-10
    [15]赵月辉.大规模机群远程监控管理关键技术及实现[D].中国科学院计算技术研究所,2006
    [16]Wikipedia.Computer Cluster[EB/OL].http://en.wikipedia.org/wiki/Computer cluster
    [17]What is server farm.[EB/OL].http://searchcio-midmarket.techtarget.com/sDefini tion/0,,sidl 83 gci213707,00.html
    [18]王金伟.分布式数据采集与监测系统的设计、实现及应用[D].中国科学院计算技术研究所,2006
    [19]胡春光.并行与分布式监测系统的设计与实现[D].复旦大学,2000
    [20]Abrahan Silberschatz,Peter Baer Galvin,Greg Gagne.Operating System Concepts 6Ed[M].北京:高等教育出版社,2002:63,99
    [21]Randal E.Bryant,David R.O'Hallaron.Computer Systems:A Programmer's Perspective [M].USA:Pearson Education Inc,2003:665-687
    [22]何淼.分布式网格监测系统中监测与基础分析组什的研究与实现[D].北京邮电大学,2006
    [23]Reed,D.A.,Roth,P.C.,Aydt,R.A.,Shields,K.A.,Tavera,L.F.,Noe,R.J.,Schwartz,B.W.,Scalable Performance Analysis:The Pablo Performance Analysis Environment[C].Scalable Parallel Libraries Conference,Oct.6-9,1993:104-113
    [24]Miller,B.P.,Callaghan,M.D.,Cargille,J.M.,Hollingsworth,J.K.,Irvin,R.B.,Karavanic,K.L.,Kunchithapadam,K.,Newhall,T.The Paradyn parallel performance measurement tool [J].Computer Volume 28,Issue 11,Nov.1995:37-46
    [25]来淼.并行算法性能评测及并行监测工具关键技术的研究与实现[D].国防科技大学,2003
    [26]Sottile,M.J.,Minnich,R.G.Supermon:a high-speed cluster monitoring system[C].Proceedings 2002 IEEE International Conference on Cluster Computing.23-26 Sept.2002:39-46
    [27]Matthew L.Massie,Brent N.Chun,David E.Culler.The ganglia distributed monitoring system:design,implementation and experience[J].Parallel Computing 30(2004):817-840
    [28]Federico D.Sacerdoti,Mason K.Katz,Matthew L.Massie,David E.Culler.Wide Area Cluster Monitoring with Ganglia[C].Proceedings of the 2003 IEEE International Conference on Cluster Computing.2003:289-298
    [29]Michael T.Heath,Jennifer E.Finger.ParaGraph:A Performance Visualization Tool for MPI [EB/OL].http://www.csar.uiuc.edu/software/paragraph/userguid e.pdf,2003-08-31
    [30]陈倩.并行程序性能分析系统的研究与实现[D].国防科技大学,2005
    [31]Portable Instrumented Communication Library[EB/OL].http://www.netlib.org/picl/
    [32]W.E.Nagel,A.Arnold,M.Weber,H.C.Hoppe.VAMPIR:Visualization and Analysis of MPI Resources[J].Supercomputer,12(1),1996:69-80
    [33]于策.IBM Cluster 1350与CSM[EB/OL].http://www.ibm.com/developerworks/cn/linux/cluster/l-ibm 1350/index.html,2003-07-09
    [34]Ferreto,T.C.,de Rose,C.A.E,de Rose,L.RVision:An Open and High Configurable Tool for Cluster Monitoring[C].2nd IEEE/ACM International Symposium on Cluster Computing and the Grid,2002,21-24 May:75-75
    [35]Cheng Liao,Dongming Jiang,Liviu Iftode,Margaret Martonosi,Douglas W.Clark.Monitoring shared virtual memory performance on a Myrinet-based PC cluster[C].Proceedings of the 12th international conference on Supercomputing,1998:251-258
    [36]Daniel A.Reed.The Roadmap for the Revitalization of High-End Computing[R].Workshop on the Roadmap for the Revitalization of High-End Computing,Washington,D.C,June 16-18,2003.http://www.cra.org/reports/supercomputing,pdf
    [37]B Cantrill,M Shapiro,A Leventhal.Dynamic instrumentation of production systems[C].2004 USENIX Annual Technical Conference.USENIX Association,2004:2-17
    [38]SUN Inc.DTrace User Guide[EB/OL].http://docs.sun.com/app/docs/doc/819-5488
    [39]M Ronsse,J Maebe,K De Bosschere.Software instrumentation using dynamic techniques[J].Program Acceleration through Application and Architecture,2002
    [40]Randal E.Bryant,David R.O'Hallaron.Computer Systems:A Programmer's Perspective [M].USA:Pearson Education Inc,2003:438
    [41]CK Luk,R Cohn,R Muth,H Patil,etc.Pin:building customized program analysis tools with dynamic instrumentation[C].Proceedings of the 2005 PLDI conference.ACM press,2005:190-200
    [42]T.Suganuma,T.Ogasawara,K.Kawachiya,M.Takeuchi,K.Ishizaki.Evolution of a Java just-in-time compiler for IA-32 platform.[EB/OL].http://www.research.ibm.com/journal/rd/85/suganuma.pdf,2004
    [43]胡晨光.基于Cluster并行环境的Cache模拟器[D].中国科学技术大学,2007
    [44]Paul Barham,Boris Dragovic.Keir Fraser,Steven Hand,Tim Harris,Alex Ho,Rolf Neugebauer,Ian Pratt,Andrew Warfield.Xen and the art of virtualization[C].Proceedings of the 19th ACM symposium on Operating systems principles,2003:164-177
    [45]杨皴.Linux下的一个全新的性能测量和调式诊断工具Systemtap[EB/OL].http://www.ibm.com/developerworks/cn/linux/1-cn-systemtap3/index.html,2007-8-31
    [46]Mikhail Auguston,Clinton Jeffery,Scott Underwood.A monitoring language for run time and post-mortem behavior analysis and visualization[C].In 5th International Workshop on Automated and Algorithmic Debugging,Ghent,Belgium,2003.
    [47]Jeffrey K.Hollingsworth,Barton P.Miller,Marcelo J.R.Goncalves,Oscar Naim,Zhichen Xu,and Ling Zheng.MDL:A language and compiler for dynamic program instrumentation [C].In Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques,November 1997.
    [48]Barton P.Miller,Mark D.Callaghan,Jonathan M.Cargille,Jeffrey K.Hollingsworth,R.Bruce Irvin,Karen L.Karavanic,Krishna Kunchithapadam,Tia Newhall.The Paradyn parallel performance measurement tool[J].IEEE Computer,28(11),1995:37-46
    [49]Richard J.Moore.A universal dynamic trace for Linux and other operating systems[C].In Proceedings of the FREENIX Track,June 2001.
    [50]R McDougall,J Mauro,Brendan Gregg.Solaris Performance and Tools[M].北京:机械工 业出版社,2007:367-395
    [51]Intel Corporation.Intel 64 and IA32 Architectures Software Developers' Manual,Volume 1.[M]Intel Corporation,2006:chapter 6
    [52]Intel Corporation.Intel 64 and IA32 Architectures Software Developers' Manual,Volume 2A.[M]Intel Corporation,2006:541-546
    [53]Intel Corporation.Intel 64 and IA32 Architectures Software Developers' Manual,Volume 2A.[M]Intel Corporation,2006:362-269
    [54]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:110-129
    [55]Intel Corporation.Intel 64 and IA32 Architectures Software Developers' Manual,Volume 2A.[M]Intel Corporation,2006:565-579
    [56]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:901-927
    [57]John L-Hennessy,David A.Patterson.Computer Organization and Design:The Hardware/Soflware Interface[M].北京:机械工业出版社,2003:194
    [58]Abrahan silberschatz,Peter Baer Galvin,Greg Gagne.Operating system Concepts 6Ed[M].北京:高等教育出版社,2002,p113
    [59]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:192-207
    [60]Intel Coorporation.Using the RDTSC Instruction for Performance Monitoring[EB/OL].http://pasta.east.isi.edu/algorithms/IntegerMath/Timers/rdtscpm1.pdf,1997
    [61]Larry McVoy,Carl Staelin.LMbench Tools for Performance Analysis[EB/OL].http://www.bitmover.com/lmbench/
    [62]DTrace Community.DTraceToolKit[EB/OL].http://www.opensolaris.org/os/community/dtra ce/dtracetoolkit,2007-10-23
    [63]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:242-246
    [64]P Boothby.Solaris Kernel Statistics[EB/OL]http://developers.sun.com/solaris/articles/kstatc.html,2001-7
    [65]R McDougall,J Mauro.Solaris Internals,2nd[M].北京:机械工业出版社,2007:707-722
    [66]Anders Persson.OpenSolaris Project:Kernel Sockets[EB/OL].http://opensolaris.org/os/proj ect/kernel-sockets/,2007-05-24
    [67]Randal E.Bryant,David R.O'Hallaron.Computer Systems:A Programmer's Perspective [M].USA:Pearson Education Inc,2003:816-826
    [68]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:905-910
    [69]Rich Teer.Solaris System Programming[M].北京:机械工业出版社,2006:452-458
    [70]Rich Teer.Solaris System Programming[M].北京:机械工业出版社,2006:468-475
    [71]Russ Blaine,X86 系统调用入门[EB/OL].http://myfaq.com.cn/2005September/2005-09-13 /202710.html,2005
    [72]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:89-98
    [73]R McDougall,J Mauro.Solaris Internals,2nd[M].北京:机械工业出版社,2007:15-23
    [74]configure runtime linking environment[EB/OL].http://docs.sun.com/app/docs/doc/819-2239/crle-1?l=zh TW&a=view,2007-03
    [75]R McDougall,J Mauro.Solaris Internals,2~(nd)[M].北京:机械工业出版社,2007:246-253
    [76]R McDougall,J Mauro.Solaris Internals,2nd[M].北京:机械工业出版社,2007:83-84
    [77]Abrahan Silberschatz,Peter Baer Galvin,Greg Gagne.Operating System Concepts 6Ed[M].北京:高等教育出版社,2002:348-351
    [78]Apache HTTP Server Project[EB/OL].http://httpd.apache.org
    [79]Sanjeev Bagewadi.DTrace to identify memory leaks[EB/OL].http:/blogs.sun.com/sanje evb/,2005-06
    [80]SUN Inc.Solaris Service Management Facility-Quickstart Guide[EB/OL].http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.html
    [81]J.Postel.INTERNET CONTROL MESSAGE PROTOCOL[EB/OL].http://www.ietf.org/rfc/rfcO792.txt,1981-09
    [82]刘亚宾,杨红.精通Eclipse[M]北京:电子工业出版社,2005:432-463
    [83]Firefox:Building an Extension[EB/OL].http://developer.mozilla.org/en/docs/Building an Extension,2008-04-06
    [84]Cay S.Horstmann,Gray Comell.Core Java2 Volume 1:Fundamentals[M].北京:机械工业出版社,2002:184-201
    [85]MPICH-A Portable Implementation of MPI[EB/OL].http://www-unix.mcs.anl.gov/mpi/mpichl/
    [86]NAS Parallel Benchmarks[EB/OL].http://www.nas.nasa.gov/Software/NPB/
    [87]NAS-SP code[EB/OL].http://www.parallelsp.com/results message/appsp.htm
    [88]Prashanth P.Bungale,Chi-Keung Luk.PinOS:a programmable framework for whole-system dynamic instrumentation[C].Proceedings of the 3rd international conference on Virtual execution environments.California,USA,2007:137-147

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700