基于消息传递的机群机间通讯系统

副题名：PMI的设计和实现
作者：毛永捷
论文级别：硕士
学科专业名称：计算机组织与系统结构
中文关键词：机群 ; 消息传递 ; 带宽 ; 延时
英文关键词：cluster ; message-passing ; bandwidth ; latency
学位年度：1997
导师：祝明发
学科代码：081201
学位授予单位：中国科学院研究生院（计算技术研究所）
论文提交日期：1997-05-01

摘要

并行处理与并行计算机是当今计算机科学研究的热点之一，并行计算机以其卓越的性能日益受到世人的瞩目。其中，基于消息传递的并行计算机逐渐成为了超级计算机发展的主流，它包括两种机型：大规模并行处理机和机群。本文以基于消息传递的机群系统为原型，给出了机群机间通讯系统的性能评价模型，并详细地介绍了该系统的实现方案。
     随着半导体工艺的日新月异，处理器越来越快，并行计算机的性能提高越来越依赖于相对较慢的通讯系统功能和效率的改善。一方面，通讯系统带宽和延时性能的提高，能直接加快消息传递的过程，使解题速度加快；另一方面，实现计算与通讯在时间上的重叠，能从宏观上隐藏通讯开销，从而使延时大大减小。通讯系统设计的好坏，直接关系到并行计算机的解题速度和解题规模。
     本文讨论了机群机间通讯系统的数据链路构成，将其大致分为两种链路模型：单链路缓冲机制和基于Wormhole机制的二维Mesh网络。文中详细分析了它们的带宽、延时性能与数据流量和数据缓冲区的关系，给出了机群机间通讯系统的性能评价模型。
     根据这样的性能评价模型，当数据流量比较小、Mesh网络负载比较轻时，机群机间通讯系统的延时与通讯距离近似无关；在数据流量比较大、Mesh网络负载比较重时，随着网络负载率的升高系统延时迅速增大。在稳定的数据流量下，系统带宽取决于最小的子链路带宽；在数据流量不稳定或网络出现阻塞时，缓冲机制将适时地发挥作用。同时，在考虑信道噪声的情况下，一定间隔的数据缓冲区将有可能使误码率降低。
     机群机间通讯系统的关键设备—PMI的实现，是本文进
Parallel processing is one of the hottest topics of the computer science today. And of all kinds of parallel computers two ones are becoming the mainstream which take advantages of the message-passing technique. These two kinds of parallel computers are massively parallel processors(MPP) and clusters. This thesis gives a model to evaluate the performance of the inter-node communication subsystem of cluster and tells the details of its implementation.
    With the fast development of micro-processors the performance of parallel computers are becoming more and more dependent on the functions and efficiency of their communication subsystems. On the one way the acceleration of communication subsystems can reduce the overhead and latency so as to make applications run faster, on the other hand overlapping of computation and communication can hide the communication latency so applications can see little of it.
    This thesis analyses all the data links of cluster and classifies them as two types of channels: buffered single-link and wormhole Mesh. The models to evaluate their performance are given and then equations listed which tell the relations of bandwidth, latency, flow rate and buffer sizes.
    Following these models we can see: when the rate of data flow is low or the load of Mesh is light the latency of such a communication subsystem seems approximately irrelevant to the communication distance; when the flow rate rises or the load of Mesh becomes heavier the latency rises much more rapidly. And with the stable data flow the bandwidth of communication subsystem keeps identical with the minimum one of all sublinks. When the flow fluctuate or the Mesh becomes blocked data buffers will work to sustain the bandwidth. At the same time if the noise of channels is considered the buffers which divide a long link into some shorter ones may make errors fewer.
    The implementation of PMI which is one of the key devices of inter-node communication subsystem of cluster is the basis to

引文

[1] 杜晓黎，(曙光1000)计算结点板技术报告，国家智能机研究开发中心，1995年
    [2] 董向军，曙光1000大规模并行计算机系统EMI板技术报告，国家智能计算机研究开发中心，1995年5月
    [3] 王川宝，2维Mesh互连网消息传递的性能分析，中国科学院计算技术研究所，硕士学位论文，1994年5月
    [4] 肖利民，大规模并行计算机系统通讯问题研究—曙光1000上Active Message的实现与分析，中国科学院计算技术研究所，硕士学位论文，1996年1月
    [5] D．W．戴维斯，D．L．A．巴伯著，谢益裕译，计算机通讯网，人民邮电出版社，1985年9月
    [6] 黄铠，F．A．布里格斯著，金兰等译，科学出版社，1990年6月
    [7] Chu, W. W., Optimal fixed block size for computer communications, IFIP Congress 71, Ljubljana
    [8] Craig B. Stunkel, Dannis G. Shea , Bulent Abali, Mark Atkins, Carl A. Bender, Don G. Grice, Peter H. Hochschild, Douglas J. Joseph, Ben J. Nathanson, Richard A. Swetz, Robert F. Stucke, Michael Tsao, Philip R. Varker, The SP2 Communication Subsystem, IBM Thomas J. Wastson Research Center, August 22, 1994
    [9] U. Bruening, W. K. Giloi, W. Schroeder-Preikschat, Latency Hiding in Message-Passing Architectures, IEEE, 1994
    [10] David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus E. Schauser, Eunice Santos, Ramesh Subramonian and Thorsten yon Eicken, LogP: Towards a Realistic Model of Parallel Computation, 4th ACM PPOPP, 5/93/CA, USA
    [11] Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, And Wen-King Su, Myrinet--A Gigabit-per-Second Local-Area Network, 16 Nov 94

    [12] Richard B. Gillett, Memory Channel Network for PCI, IEEE, 1996

    [13] Matthias A. Blumrich, Kai Li, Richard Alpert, Cezary Dubnicki, and Edward W. Felten, Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer, IEEE, 1994

    [14] James C. Hoe, Flexible User-level Network Interface based on Embedded Processors, MIT laboratory for Computer Science, 1995

    [15] Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su, Myrinet-A Gigabit-per-Second Local-Area Network, IEEE MICRO, Feb 1995

    [16] Robert Felderman, Annette DeSchon, Danny Cohen, Gregory Finn, ATOMIC: A High-Speed Local Communication Architecture, Journal of High Speed Network 1(1994), IOS Press

    [17] James C. Hoe, Network Interface for Message-Passing Parallel Computation on a Workstation, Laboratory for Computer Science, MIT, 1995

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700