基于KUSU的超大规模Linux集群系统的设计与实现

英文题名：Ultra-large-scale Custer Linux Sstem Design and Implementation Based on KUSU
作者：王维高
论文级别：硕士
学科专业名称：计算机技术
中文关键词：集群 ; 超大规模 ; 高可靠性 ; 高可用性
英文关键词：Cluster ; Ultra-large-scaler ; High-reliability ; High-availability
学位年度：2011
导师：吴江
学科代码：081202
学位授予单位：西北大学

摘要

随着计算机技术进入高速的网络时代,网络业务量、数据流量和计算强度的爆炸性增长,单一的服务器已经难以满足高性能计算的要求。而大型机由于非常昂贵的价格,同时由于硬件和系统软件的专用性,软硬件系统的维护费用也非常高,致使一般的企业无能力购买。集群是利用许多普通的PC机以某种拓扑结构组织起来,成为一个具有高性能计算能力的服务器。而对于超大规模集群的搭建耗时和难以管理成为企业使用集群的瓶颈。
     本文研究的内容是优化KUSU,从而满足KUSU的用户富士通公司在云计算时代对超大规模集群的快速搭建、容易管理、高可用性和高可靠性的需求,主要完成的工作包括：
     1)系统架构优化。以减轻超大规模集群中主节点的承载能力为目标,采用树的思想提出分层结构对KUSU的系统架构进行优化。
     2)高可用性集群的研究。对于超大规模的集群处理一些关键业务时,要求集群不间断的处理作业,为了防止主节点停机导致集群的崩溃,本文采用高可用性集群的基础技术——心跳机制,实现Linux集群的高可用性。利用postgresql9.0的流式复制解决由KUSU搭建的超大规模集群主备节点之间的数据同步。从而解决超大规模集群的高可用性。
     3)高可靠性集群的研究。超大规模集群节点之间的通信和数据传输对网络的依赖性很大,网络的畅通是高可靠性集群的保障,本文采用Linux网卡绑定的方法,把多个网卡结构组合成一个逻辑’'bonded"接口,从而防止了集群节点之间的通信故障,提高了集群节点之间的数据传输。设置bond的工作模式为0,实现数据传输的负载均衡。从而实现高可靠性集群。
     4)集群的实现。文章最后给出了如何使用KUSU部署集群,展现了一个超大规模集群的原型,并给出了对超大规模集群的实验数据和用户在真实环境下的实验数据。
     本文的研究是建立在自动化搭建集群——KUSU的基础之上,首先对小型集群的架构优化,从而适应与大规模集群,其次从高可用性和高可靠性集群等方面研究以到达集群的稳定性,使其能满足企业对大规模集群的需求。
With the computer technology into the high speed of network times, network traffic, data flow and calculation of the strength of explosive growth, single server has been difficult to meet the requirements of high performance computing. While large machine due to the very expensive price, at the same time as the hardware and system software software Specificity, hardware and software system maintenance costs are also very high, resulting in general business without the ability to buy. Cluster is the use of many ordinary PC machine organizations by a certain topological structure, become a high computing capability server. For large scale cluster structures is time-consuming and difficult to manage become to bottleneckof the enterprise using clusters.
     The contents of this paper are to optimize KUSU, to meet KUSU user of Fujitsu the needs for quickly set up a largescale cluster, easy management, high availability and high reliability in the cloud computing era. The main work of down including:
     1) System architecture optimization. In order to reduce the large scale clusters's the main node's carrying capacity as the goal, adopt the thinking of the tree and use the hierarch-cal structure to optimization system architecture for KUSU.
     2) High-availability cluster research. For ultra-large-scale clusters to deal with some critical business, requires a cluster continuous processing operations, in order to prevent the master node shutdown lead to the cluster collapse, this paper adopts the high-availability cluster's basis technology-the heartbeat mechanism to achieve Linux cluster's high availability. Used the postgresql 9.0's streaming replication to solove the date synchronizati-on between the master node and standby node by KUSU built the large scale cluster. So as to solve the large scale cluster's high-availability.
     3) High-reliability cluster reseach. Large scale cluster node communication and date transfer highly dependent on the network, the network flow is the high reliability cluster gua-rantee. This paper adopts Linux NIC bonded method, the multiple NIC structure are combined to a single logical "bonded" interface, thus preventing communication failure bet -ween cluster nodes, and improve the date teansmission between the cluster nodes. Set the bond working mode it 0, to achieve data transmission load balancing. So as to achieve high-Reliability cluster.
     4) The cluster implementation. Finally, the article shows how to use KUSU deploy cl-uster and cluster implementation, show an ultra-large-scale cluster's prototype. And gives to large scale clusters of the experimental data and the user's experimental date in the real environment.
     The paper;s research is based on the automanted build cluster-KUSU. Firstly, cluster ar-chitecture optimized for the small cluster, to thus to fit with the ultra-large-scaler cluster. Secondly, research cluster's high-acailabilityhigh and high-reliability to reach the cluster's stability. So that is can meet the enterprise needs for ultra-large-scale cluster.

引文

[1]王菊芬.基于Linux的PC集群系统的研究与实现[D].成都：四川大学,2007
    [2]http://www-900. ibm.com/cn/takebackcontrol/iter/2009v11/realm3.shtml
    [3]李敏,张宜生,李德群.用于并行计算的PC集群系统构建[D].武汉：华中科技大学,2009
    [4]屈钢,邓健青,韩云路.Linux集群技术研究[J].北京：中国科学院计算所,2005
    [5](美)Rajkumar Buyya编郑纬民等译.高性能集群计算结构与系统[M]第一卷.电子工业出版社,2005
    [6](美)Rajkumar Buyya编郑纬民等译.高性能集群计算结构与系统[M]第二卷.电子工业出版社,2005
    [7]何入海.基于Linux的集群系统的研究与实现[D].重庆：重庆大学,2003
    [8]王勇超.高性能计算集群技术应用研究[D].西安：西安理工大学,2007
    [9]彭士有,张宗福.开源Linux集群技术研究[J].广州,2008
    [10]闫长新.基于Linux集群技术的研究和应用[D].北京：中国石油大学硕士研究生论文,2008
    [11]张军华,雷凌,仝兆岐.PC Cluster技术的国内外现状与发展趋势[J].石油物探,2003
    [12]董娜.基于Linux高性能负载均衡的集群系统的研究与实现[D].大连：辽宁,2008
    [13]http://www.top500.org/
    [14]http://ha.redhat.com/, Red Hat High Availability Server Project
    [15]http://www.linuxVirtualServer.org/,The Linux Virtual Server Progect
    [16]Robertson A.Linux-HA Heartbeat System Design[C].4h Annual Linux Showcase and Conference, Atlanta,Georgia
    [17]http://linux-ha.org, Linux HA Project Website
    [18]谢长平,胡庆生,谭志虎.Heartbeat-Gear:一种新型的实时心跳检测技术[J].计算机工程与科学,2004
    [19]谢斌,高扬.Linux高可用集群心跳机制研究[J].计算机工程与应用,2004.1
    [20]申志冰,罗宇.利用heartbeat实现Linux上的双机热备份系统[J].计算机应用
    [21]Harald Milz.Linux High Availability, http://www.ibiblio.org/pub/Linux/ALPHA/linux-ha/Hih-Availability-Howto .HTML
    [22]邢宇.多机集群心跳技术研究[D].成都：电子科技大学,2010
    [23]纪洪波.PostgreSQL数据库集群基本技术分析与实现[J].吉林工商学院院报,2010
    [24]张国业.PostgreSQL利用Pgpool-Ⅱ的集群搭建方案[EB/OL].http://blog.csdn.net/xtlog/archive/2009/05/27/4219353.aspx
    [25]charles. pgpool-Ⅱ使用指南-安装与配置[EB/OL].http://bbs.linuxtone.org/thread-2701-1-1.html,2009-5-07
    [26]张振锋.多网卡绑定技术的研究与实现[D].天津：天津大学,2005
    [27]张延红,王康平,程国斌.多网卡绑定构建负载均衡服务器[M].计算机与网络,2006
    [28]胡修林,王云鹏,郭辉.多网卡链路绑定策略的研究与实现[J].小型微型计算机系统,2005
    [29]路红霞.基于多网卡绑定的负载均衡技术的研究[D].北京：中国石油大学,2009
    [30]曹连刚,由德凯,关慧Linux下双网卡绑定技术的实现[D].沈阳：沈阳化工学院,2008
    [31]孟成武,程劲,罗克露.基于Linux的高可用集群系统的设计及实现[J].成都：电子科技大学,2004
    [32]刘东君.服务器集群系统请求调度与高可用性研究[D].合肥：中国科学技术大学,2010
    [33]硕珺.PC集群负载均衡调度策略研究[D].北京：中国石油大学,2010
    [34]王更生,熊松.基于Linux的服务器集群负载均衡系统的研究[J].南昌：华东交通大学,2008
    [35]修长虹.基于Linux PC集群负载均衡的研究与实现[D].吉林：吉林大学,2004
    [36]顾崇林.虚拟机集群负载均衡的研究[D].哈尔滨：哈尔滨工业大学,2010
    [37]http://www.platform.com.cn/
    [38]http://www.osgdc.org/
    [39]Linux内核主页,http://www.kernel.org
    [40]Seung Hoon Paik,Ji Joong Moon, Seung Jo Kim,M.Lee. Parallel performance of large scale impact simulations on linux cluster super computer. Computers and Structures,2006
    [41]Semra aydin, Omer Faruk BAY. Building a high performance computeing clusters to use in computing course applications. World Conference on Educational Sciences,2009
    [42]Charles Bookman. Linux Clustering:Building and Maintaining Linux Clusters. New Riders Publishing
    [43]He Chuan, Sun Chuanwen, Lu Mi. Prestack Kirchlr off time migration on high performancereconfigurable computing platform. SEG Technical Program Expanded Abstracts,2005
    [44](美)科珀著,沈金河等译.Linux企业集群一用商用硬件和免费软件构件高可用集群[M].中国水利水电出版社,2007
    [45]杨宗德,邓玉春.Linux高级程序设计[M].人民邮电出版社.2009
    [46]魏刚,赵杰.基于SNMP的集群服务器状态监听系统设计[J].北京：北京交通大学,2011
    [47]朱良杰,宋祖勋,刘真.基于SNMP集群系统的MPI-OpenMP混合并行FDTD算法研究[J].西安：西北工业大学,2011
    [48]卢云娥,黄宗宇,李超阳,郭祥斌,殷慧明.基于微机集群系统的MPI并行计算[J].长沙：长沙理工大学,2011
    [49]张钰森.高性能计算集群文件系统的优化技术研究[D].长沙：国防科学技术大学,2010
    [50]王小伟,郭力,葛蔚,杨章远.高性能并行集群计算环境的构建与性能测试[D].中国科学院,2008
    [51]聂作先,沈军.基于Linux的大规模集群的搭建与管理[J].福州：福建工程学院,2006
    [52]张帆,袁道华,叶振.基于Linux的服务器集群系统设计及实现[J].成都：四川大学,2006
    [53]刘详港.在Linux平台下基于MPI的并行PC集群搭建的实现[J].天津：天津科技大学,2010
    [54]曲兆伟,余文华.基于MPI的并行PC集群搭建的实现[J].北京：中国传媒大学,2007
    [55]冯保民.油田高性能集群系统性能优化技术研究[D].沈阳：东北石油大学,2010
    [56]李帅帅.企业集群互动合作与创新机制研究[D].长春：吉林大学,2010
    [57]张金石.网络服务器配置与管理—Red Hat Enterprise Linux[M].北京：人民邮电出版社,2011
    [58]秦刘,杨帅.DHCP协议及其在ACR中的应用与研究[D].郑州：国家数字交换工程技术研究所,2007
    [59]陈小全,张继红.Linux服务器结社、性能调优、集群管理教程-实训与项目案例[M].北京：邮电大学出版社,2011
    [60]何世哓,杜朝辉等编著.Linux系统案例精解[M]北京：清华大学出版社,2010
    [61]葛新,陈华平,杜冰,李书鹏.基于云计算集群扩展中的调度策略研究[J].合肥：中国科学技术大学,2011
    [62]张为民,唐剑锋,罗治国,钱玲.云计算深刻改革未来.科学出版社.2009
    [63]王小伟,郭力,葛蔚,杨章远.高性能并行集群计算环境的构建与性能测试.中国、科学院,2008
    [64](美)Alex Vrenios著,马朝辉等译.Linux集群体系结构.机械工业出版社,2003

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700