用于数据检索的Linux集群系统中的负载均衡机制研究与应用

英文题名：The Research and Application of Linux Cluster System's Load Balance Mechanism for Data Retrieval
作者：崔爽
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：集群 ; 负载均衡 ; 实时反馈 ; 负载冗余 ; 二叉排序树
英文关键词：Cluster ; Load Balance ; Real-time Feedback ; Load Redundancy ; Binary Sort Tree
学位年度：2010
导师：房至一
学科代码：081201
学位授予单位：吉林大学
论文提交日期：2010-04-01

摘要

本文的研究课题来源于国家科技部科技型中小企业创新基金项目—用于数据检索的Linux集群系统,该项目是基于Linux平台上的高性能、高可靠的用于数据检索的集群系统软件,由于它集成了高可用软件、负载均衡和集群文件系统于一体,简化了集群的管理,方便了使用,能为企业级关键业务应用提供强大的保障。负载均衡是Linux集群系统的关键技术,可以拓展网络设备和服务器带宽,增加吞吐量和提高网络处理能力,为高可用集群系统的正常运行提供了可靠保证。
     本文结合当前负载均衡的算法分析和比较,提出一种负载均衡的动态反馈策略。在负载均衡检查方面,综合了服务器性能指标值和服务器节点动态负载值作为评估服务器节点负载能力的指标。引入了服务器节点负载冗余值,可以有效预测当前节点的负载能力,协助负载调度器分配任务请求,避免单个服务器节点负载过量的问题。采用基于二叉排序树的负载调度策略,简化了负载均衡器分配任务的过程和方法。
     本文给出了系统的具体实现方法,并搭建基于Linux的集群系统平台,调试并运行程序,实现Linux集群系统下的负载均衡。本项目成果已通过中国软件评测中心吉林分中心的测试。
With fast growth of the network business, the nodes, which provide network services, are facing the increasing service requests from users, data flow and computing intensity are increasing constantly, bringing tremendous challenges about the network bandwidth and server. In the future, there will be more and more bottlenecks appear in the server port, it is an emergency that how to build the highly availability, better function and price, scalable network services to meet demand of the growing load. In this case, the load balance technology of cluster based on Linux has emerged.
     The research topic of this article comes from National Ministry of Science and Technology, the project of Science and Innovation Fund for SMEs, Linux cluster for data retrieval system, the project product is based on the Linux cluster system, with high-performance, highly reliable cluster system for data retrieval software products. Because of it integrates high-availability software, load balance and cluster file system in whole, simplifies the cluster management and convenient application, provides a strong protection for enterprise’s important business application. Load balance is a Linux cluster system’s key technology, which can expand the network device and server bandwidth, increase throughput and network capacity, and provide a reliable guarantee for the normal operation of high-availability.
     Firstly, this paper introduces the project’s background and relevant technical knowledge and discusses the characteristics of the cluster system as well as its classification. Secondly, the paper introduces some contents about the load balance, through research to the frequently-used load balance strategy, proposing dynamic real-time feedback information, the load balance strategy can predict the load capacity, and give a description of the algorithm. Finally, the paper introduces the realization process of the load balance based on Linux cluster system, and conducts a simulated environment for debugging and running.
     There is a technique called the cluster technology, that organizes multiple computers to work together to simulate a more powerful computer to solve the problem. A cluster system consists of few servers that have shared data storage, each server communicates with each other through the network, when a server is out of order, its application automatically taken over by the other server. In the most models, all the computers of the cluster have a common name, any running service on the system of the cluster can be used by all the users, presenting a whole system.
     Load balance can be divided into a static form of load balance and dynamic load balance in accordance with the allocation of the task. In the network environment, when the load balance is receiving task request from user, it will allocate task as much as possible to each server of the cluster according to some particular algorithm, so that maintain the user request amount for each server at a relative balance, but this balance can not take into consideration about the load capacity of the server itself. The method of the dynamic load balance has many advantages than that of the static load balance, dynamic load balance refers to the thing that assign the task to the lighter load of the server node, gives a real-time dynamic record for the load information of server node, so as to avoid a single node overload , so as to the sever, that the members of the cluster server achieve uniform as much as possible, this technology can achieve dynamic allocation , which would take into account the various nodes in the server's actual carrying capacity.
     This paper discusses the balance strategy of dynamic feedback load information; the algorithm mainly has the following characteristics: First, it gives full consideration to each server node’s processing power and the current load conditions. As the cluster system, the performance of the various server nodes may different, so in practice, to consider the server's performance index, allocate high- server with high process ability, while the low-performance configuration of the server with the low process ability. When introducing the server node performance indicator, it sets the value of the dynamic load by real-time monitoring of each server node responding the actual load capacity of this server. Second, the collection, calculations of the node information are put into various nodes, avoiding load balancer work too heavy itself, that become the system bottlenecks. The system transforms the focus dispatch collection work of all the nodes by the central node before, to the work of collection by each node itself, according to its status, it sends to the scheduling center. Thus, this central dispatch node only need to accord to the current node, sending the load information to make dispatch decision, instead of taking collect information from each node machine, such to reduce the additional communication overhead due to the load information collection and the scheduling node burden. Finally, the algorithm introduces the concept of binary sort tree, according to unite the server node performance and real-time load information indicators to calculate the weight so as to generate the binary sort tree, so only need LDR this binary sort tree can arrange the current load condition of server node in order like small to large, load balancers depend on the load information dispatch task. Because of the algorithm introduces the concept of the load redundancy value for each server node to predict the real-time load redundancy ability, avoiding the single server node will be requested excessive task in a short period of time.
     In the Linux virtual machine environment, we imitate a Linux load balance cluster. Load balancer’s main task is to collect the information from the sub-server node at regular time, receive the task from external request, and then allocate the tasks to the sub-server nodes. Server node's main task is that send its system information to the load balancer at regular time, while dealing with the task request from balancer. The establishment of simulation test WEB server, sending request from the user page, exchange data though XMLHTTP Request technology and Apache2 Web server, after the server analyses the type and quantity of the request, using the CGI that written by the Perl script, invoking relevant procedure based on the Socket communication, sending external requests to the load balancer. It uses the HTML and JavaScript scripting languages to write simulate task request user interface, where a page is simulated about user task request, which the user can set the request type ,the number of task request,the IP of the load balancer, and the port number. Linux cluster is built in the simulation system for the program’s debugging and running. The system has passed through the test by the Jilin Branch Center of China Software Testing Center and obtained the product test report.

引文

[1] Mark Baker, University of Portsmouth[J], Cluster Computing White Paper, UK, 2000.
    [2] The Open Cluster Group, OSCAR Cluster Users Guide[J], Software Version 2.2, Documentation Version 2.2, 2003.
    [3]屈钢,邓健青,韩云路. Linux集群技术研究[J].计算机应用研究, 2005, 05.
    [4]张志友.计算机集群技术概述[J].实验室研究与探索, 2006, 05.
    [5]张小芳,胡国正.高可用性集群技术的研究和应用[J].计算机工程, 2003, 29(4): 26-37.
    [6]王丽华.计算机容错系统的体系结构与安全性研究[D].西南交通大学, 2002.
    [7]黄曦. Web服务器集群负载均衡技术的应用研究[D].重庆大学, 2004.
    [8]刘同.负载均衡技术在数据库集群系统中的应用与实现[D].国防科学技术大学, 2009.
    [9]李彬,任国林. Linux内核基于对称多处理机的实现分析[J].计算机技术与发展, 2006.
    [10] Gerth St?lting Brodal, Erik D. Demaine, J. Ian Munro. Fast allocation and deallocation with an improved buddy system[J]. Acta Informatica. Springer-Verlag, 2003, 273-291.
    [11] Francis C. M. Lau. RAID-M: A high performance RAID Matrix mass storage[J]. Science in China (Series F: Information Sciences) , 2005.
    [12]刘二稳.论网络信息资源检索中的问题及对策[C].图书馆的区域合作与共享国际研讨会论文集., 2004.
    [13]戴光明,孟永良.网络并行计算中动态负载平衡的实现[J].计算机工程与应用, 1998.
    [14] The Linux virtual server project[EB/0L]. http://www.Linuxvirtualserver.org,2002.
    [15] Cardellini V Colajanni M. Yu P S. Dynamic Load Balancing on Web Server Systems [J]. IEEE lnternet Computing, 1999, 8(6): 34-35.
    [16] Mark Allen Weiss .Data Structures and Algorithm Analysis in C (Second Edition) [M]. 2004,128-140.
    [17] Andrew S. Tanenbaum著,潘爱民译,徐明伟审.计算机网络(第4版)[M].清华大学出版社. 2004, 32.
    [18] Douglas E. Comer, David L. Stevens. Internetworking With TCP/IP VolⅢ:Client-Server Programming And Applications Linux/POSIX Sockets Version[M].2007.
    [19] W.Richard Stevens (美)著,范建华,胥光辉,张涛等译,谢席仁校. TCP/IP详解卷1:协议[M].机械工业出版社. 2000, 60-80.
    [20] Hideo Taniguchi. Parallel processing and distributed processing[M].Japan: CORONA Publishing Co, LTD. 2003, 38-50.
    [21]宋敬彬,孙海滨等.进程和线程差别: Linux网络编程(Linux典藏大系)[M].清华大学出版社.2010.
    [22] Bunt R B, Eager D L, Sstart F M, et al. Archieving Load Balancing and Effective Caching in Clustered Web Servers[C]//Proc. of the 4th International Web Caching Workshop 1999-03: 19-20.
    [23]陈培久,陈序广.用XML与XMLHTTP组件实现网页信息的传递[J].微机发展. 2003.
    [24]方义. Apache Server的配置与管理[M].人民邮电出版社.2001.
    [25] Perl脚本语言: Cgi/Perl脚本语言[EB/0L]. http://0907pc.host.ctrit.net/r.
    [26]刘伟,李小武,罗明. CGI技术全面接触[M].清华大学出版社.2001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700