用户名: 密码: 验证码:
并行计算系统的负载平衡算法与并行执行时间预测
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文主要研究并行计算系统中的负载平衡算法与并行执行时间预测问题。
     为能较快平衡系统中的负载,提出了“均分负载”平衡算法。该方法先对各处理器结点的已有负载按网络中各处理器的速度进行划分,然后把这些划分好的小负载迁移到对应的处理器上,以平衡系统中各处理器的负载。分析表明:该算法时间性能较好,适于解决系统初始分配负载问题与系统负载极度失衡的平衡问题;但对于负载较平衡的系统,其负载迁移量很大。
     为减小负载迁移量同时保持较快的平衡速度,提出了“二分网络”平衡算法。该方法按网络的结点数把网络分为两子网络,然后按两子网络的处理速度之比进行两子网络间的负载迁移,递归上述过程,直到各子网络中只有一个结点时,系统经负载迁移后可达到平衡状态。该算法具有平衡负载速度较快、负载迁移量较小的优点,适于大多数条件下的负载平衡。
     针对环与线性阵列的负载平衡速度较慢与迁移量较大的问题,提出了“贪心线性推移”平衡算法。该算法的思想是:把重负载结点过重的那部分负载按线性或环的路径推移到下一邻居结点,循环推移直到整个系统负载平衡。此算法适用于任何具有哈密顿通路的图结构网络。一般情况下,其平衡过程的负载迁移量不大,且平衡负载速度较快。此外,还对网状网等网络结构的“贪心线性推移”平衡算法进行改进,而得到分两阶段的贪心线性推移平衡算法。实验与分析表明:当平衡条件减弱时,这种改进能较大地提高算法的时间性能。
     为解决由具有独立同分布随机执行时间的子任务组成的并行任务的执行时间预测问题,提出了基于Johnson分布的并行执行时间预测方法。该方法使用Johnson变换把并行子任务的执行时间分布变换成标准正态分布,然后利用正态分布的性质预测并行执行时间。这种方法不仅适于解决“最大”或“最小”的并行执行时间预测问题,而且适于任何求独立同分布随机变量的最大值与最小值问题。
This paper focus on the research of load balancing algorithm and parallel execution time prediction for parallel computing systems.
     “Equal-division load”balancing algorithm is presented to balance the system’s load. Load in every node is divided into smaller tasks based on all power of nodes on internetwork. Then these smaller tasks are sent to corresponding nodes to balance the load among nodes. Analysis results show that the algorithm has lower time complexity, and the algorithm can be used when system is to allocate load initially or when the system’s load is extremely unbalanced. But when the system’s load is approximately balanced, the amount of the transferred load on the internetwork will be considerable.
     “Two-division network”balancing algorithm is presented to reduce the amount of the transferred load and to balance the load fast. According to this algorithm, the network is divided into two sub-networks with same number of the nodes, then the load is transferred between the sub-networks on the basis of their process power. Repeat this procedure until every sub-network only contains one node, and then the system’s load will be balanced. For this algorithm, there is not a large amount of load transferred among nodes and not a large amount of the consumed time. This algorithm is suitable for most of static internetwork.
     “Greedy linear shift”balancing algorithm is presented for ring or linear array to improve balancing time performance and to reduce the amount of the transferred load. According to this algorithm, overloaded task of every node will be transferred into next neighbor-node on the linear array or ring path until the system is balanced. The algorithm can be used for balancing any system in which internetwork contains at least one Hamilton path. For this algorithm, generally, there is not a large amount of load transferred among nodes and not a large amount of the consumed time. In addition, we present two-stage greedy linear shift balancing algorithm to improve the“greedy linear shift”balancing algorithm for mesh etc. Experiments and analysis show that the two-stage linear shift balancing algorithm will improve the execution time performance when the balancing condiction is weaken.
     To prediting the parallel execution time of the parallel program that is composed of a number of sub-tasks, each of which has identically, independent distributed random execution time, this paper presents the parallel execution time predicting method which is based on the Johnson distribution. Johnson system is used for transforming random variable of execution time of parallel sub-tasks into standard normal variable. Then prediction time is derived by using features of the normal distribution. The method can be used not only for predicting the maximum or minimum parallel execution time, but also for solving any problem about the maximum or minimum of identically, indentical distributed random variables.
引文
[1]K.D.devine, E.G.Boman, R.T.Heaphy et al, New Challenges in Dynamic Load Balancing, Applied Numerical Mathematics, 2005, 52: 133~152
    [2]L.Kencl, J.Y.Le Boudec, Adaptive load sharing for network processors, Networking, IEEE/ACM Transactions on, 2008, 16(2): 293~306
    [3]R.Wu, J.Sun, J.Chen, Parallel execution time prediction of the multitask parallel programs, Performance Evaluation, 2008
    [4]徐高潮,胡亮,鞠九滨,分布计算系统,北京:高等教育出版社,2004
    [5]Y.K.Kwok, I.Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys, 1999, 31(4): 406~471
    [6]J.A.Bergstraa, C.A.Middelburg, Distributed strategic interleaving with load balancing, Future Generation Computer Systems, 2008, 24(6): 530~548
    [7]陈国良,并行计算-结构算法编程,北京:高等教育出版社,1999
    [8]A.Grama,并行计算导论(英文版),北京:机械工业出版社,2003
    [9]R.Buyya, High performance cluster computing: architectures and systems, NJ.USA: Prentice Hall PTR, 1999
    [10]Y.K.Kwok,I.Ahamad, Benchmarking the task graph scheduling algorithms, Proceedings of IPPS’98, 1998, 531~537
    [11]F.Sourd, Dynasearch for the earliness–tardiness scheduling problem with release dates and setup constraints, Operations Research Letters, 2006, 34(5): 591~598
    [12]L.Gao, G.Malewicz, Toward maximizing the quality of results of dependent tasks computed unreliably, Theory of Computing Systems, 2007, 41(4): 731~752
    [13]A.Gierscha, Y.Robertb, F.Vivienb, Scheduling tasks sharing files on heterogeneous master–slave platforms, 2006, 51(2): 88~104
    [14]K.Andreev, H.Racke, Balanced graph partitioning, Theory of Computing Systems, 2006, 39(6): 929~939
    [15]J.W.Jeong, K.W.Park, J.H.Lee et al, SP2SP: subnet-partition based 2-level Grid service scheduling policy, Semantics, Knowledge and Grid, Third International Conference on, Oct.2007, 158~163
    [16]F.Diedrich., K.Jansen, F.Pascual et al, Approximation algorithms for scheduling with reservations, Lecture Notes in Computer Science, 2007, 4873: 297~307
    [17]D.Grosu, A.T.Chronopoulos, Algorithmic mechanism design for load balancing in distributed systems, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2004, 34(1): 77~84
    [18]M.Ohara, H.Inoue, Y.Sohda, MPI microtask for programming the Cell Broadband Engine processor, IBM Systems Journal, 2006, 45(1): 85~102
    [19]Z.Shi, J.J.Dongarra, Scheduling workflow applications on processors with different capabilities, Future Generation Computer Systems, 2006, 22(6): 665~675
    [20]T.Davidovic, T.G.Crainic, Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems, Computers & Operations Research, 2006, 33(8): 2155~2177
    [21]K.E.Coons, X.Chen, D.Burger et al, A spatial path scheduling algorithm for EDGE architectures, ACM SIGOPS Operating Systems Review, 2006, 40(5): 129~140
    [22]Y.Qu, J.P.Soininen, J.Nurmi, Static scheduling techniques for dependent tasks on dynamically reconfigurable devices, 2007, 53(11): 861~876
    [23]N.Satish, K.Ravindran, K.Keutzer, A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors, Proceedings of the conference on Design, automation and test in Europe, 2007, 57~62
    [24]S.H.Bokhari, Dual processor scheduling with dynamic reassignment, IEEE Trans. Softw. Eng., 1979, 5(4): 341~349
    [25]H.S.Stone, Multiprocessor scheduling with the aid of network flow algorithms. IEEE Trans. Softw. Eng., 1977, 3(1): 85~93
    [26]S.H.Bokhari, On the mapping problem. IEEE Trans. Comput., 1981, 30(5): 207~214
    [27]M.J.Gonzalez, Jr, Deterministic processor scheduling, ACM Comput. Surv., 1977, 9(3): 173~204
    [28]S.J.Kim, J.C.Browne, A general approach to mapping of parallel computation upon multiprocessor architectures, In Proceedings of International Conference on Parallel Processing, 1988, 1~8
    [29]Y.K.Kwok, I.Ahmad, Dynamic critical- path scheduling: An effective technique for allocating task graphs to multiprocessors, IEEE Trans. Parallel Distrib. Syst., 1996, 7(5): 506~521.
    [30]V.Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, Cambridge: MIT Press, 1989
    [31]W.S.Wong, R.J.T.Morris, A new approach to choosing initial points in local search. Inf. Process. Lett., 1989, 30(2): 67~72
    [32]T.Yang, A.Gerasoulis, DSC: Scheduling parallel tasks on an unbounded number of processors, IEEE Trans. Parallel Distrib. Syst., 1994, 5(9): 951~967
    [33]T.L.Adam, K.M.Chandy, J.R.Dickson, A comparison of list schedulingfor parallel processing systems, Commun.ACM, 1974, 17(12): 685~690
    [34]F.D.Anger, J.J.Hwang, Y.C.Chow, Scheduling with sufficient loosely coupledprocessors, J. Parallel Distrib. Comput., 1990, 9(1): 87~92
    [35]D.Kim, B.G.Yi, A two-pass scheduling algorithm for parallel programs, Parallel Comput., 1994, 20(6): 869~885
    [36]Y.K.Kwok, I.Ahmad, Efficient scheduling of arbitrary task graphs to multiprocessors using a parallel genetic algorithm, J. Parallel Distrib. Comput., 1997, 47(1): 58~77
    [37]C.Mccreary, H.Gill, Automatic determination of grain size for efficient parallel processing, Commun. ACM 1989, 32(9): 1073~1078
    [38]M.A.Palis, J.C.Liou, D.S.L.Wei, Task clustering and scheduling fordistributed memory parallel architectures, IEEE Trans., Parallel Distrib. Syst., 1996, 7(1): 46~55
    [39]G.C.Sih, E.A.Lee, Declustering: a new multiprocessor scheduling technique, IEEE Trans. Parallel Distrib. Syst., 1993, 4(6): 625~637
    [40]N.Mehdiratta, K.Ghose, A bottom-up approach to task scheduling on distributed memory multiprocessor. In Proceedings of the 1994 International Conference on Parallel Processing, CRC Press, Inc., Boca Raton, FL, 1994, 151~154.
    [41]I.Ahmad, Y.K.Kwok, On exploiting task duplication in parallel program scheduling, IEEE Trans. Parallel Distrib.Syst., 1998, 9(9): 872~892
    [42]H.Chen, B.Shirazi, J.Marquis, Performance evaluation of a novel scheduling method: Linear clustering with task duplication, In Proceedings of the 2nd International Conference on Parallel and Distributed Systems, 1993, 270~275
    [43]W.W.Chu, M.T.Lan, J.Hellerstein, Estimation of intermodule communication (IMC) and its applications in distributed processing systems. IEEE Trans. Comput., 1984, 33(8): 691~699
    [44]D.S.Milojicic, F.Douglis, Y.Paindaveine et al, Process migration, ACM Computing Survers, 2000, 32(3): 241~299
    [45]M.H.Willebeek-LeMair, A.p.Reeves, Strategies for Dynamic Load Balancing on Highly Parallel Computers, IEEE Transactions on Parallel and Distributed Systems,1993, 4(9): 979~993
    [46]S.Kandula, D.Katabi, S.Sinha et al, Dynamic load balancing without packet reordering, ACM SIGCOMM Computer Communication Review, 2007, 37(2): 51~62
    [47]Y.f.Hu,R.J.Blake,D.R.Emerson, An optimal migration algorithm for dynamic load balancing,Concurrency: Pract.Exper,1998,10(6): 467~483
    [48]S.Kolev, R.Segala, A.Shvartsman, Dynamic load balancing with group communication, Theoretical Computer Science, 2006, 369(1~3): 348~360
    [49]R.D.Galavao, L.G.Acosta Espejo, B.Boffey et al, Load balancing and capacity constraints in a hierachical, European Journal of Operational Research, 2006, 17(2): 631~646
    [50]C.Bertelle, A.Dutot, F.Guinand et al, Organization detection for dynamic load balancing in individual-based simulations, Multiagent and Grid Systems, 2007, 3(1): 141~163
    [51]G.E.Jan, Y.S.Hwang, An efficient algorithm for perfect load balancing on hypercube multiprocessors, 2003, 25(1): 5~15
    [52]K.D.Devine, B.A.Hendrickson, Tinkertoy parallel programming: A case study with Zoltan, J. Parallel and Distributed Scientific and Engineering,2005,1(2~4): 64~72
    [53]P.Berenbrink, T.Friedetzky, L.A.Goldberg, Distributed selfish load balancing, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, 2006, 354~363
    [54]R.Elsasser, B.Monien, R.Preis, Diffusion schemes for load balancing on heterogeneous networks, Theory of Computing Systems, 2002, 35(3): 305~320
    [55]T.Rotaru, H.H.Nageli, Dynamic load balancing by diffusion in heterogeneous systems,Journal of Parallel and distributed Computing, 2004, 64(4): 481~497
    [56]G.Cybenko, Dynamic load balancing for distributed memory multiprocessors, J. Parallel Distrib. Comput., 1989, 7: 279~301
    [57]K.Schloegel, G.Karypis, V.Kumar, Multilevel diffusion algorithms for repartitioning of adaptive meshes, Journal of Parallel and Distributed Computing, 1997, 47(2): 109~124
    [58]A.G.Bronevich, W.Meyer, Load balancing algorithms based on gradient methods and their analysis through algebraic graph theory,Journal of Parallel and distributed computing, 2008, 68(2): 209~220
    [59]M.J.Clement, M.J.Quinn, Multivariate statistical techniques for parallel performance prediction, Proc. 28th Hawaii Int'l Conf. System Sciences, 1995, 446~455
    [60]C.L.Mendes, J.C.Wang, D.A.Reed, Automatic performance prediction and scalability analysis for data parallel programs, Proc. Second Workshop Automatic Data Layout and Performance Prediction, 1995.
    [61]A.J.C.van Gemund, Symbolic performance modeling of parallel systems, IEEE Trans. Parallel and Distributed Systems, 2003, 14(2): 154~165
    [62]R.A.Sahner, K.S.Trivedi, Performance and reliability analysis using directed acyclic graphs, IEEE Trans. Software Eng., 1987, 13(10): 1105~1114
    [63]D.R.Liang, S.K.Tripathi, On performance prediction of parallel computations with precedent constraints, IEEE Trans. Parallel and Distributed Systems, 2000, 11(5): 491~508
    [64]H.Gautama, A.J.C.van Gemund, Low-cost static performance prediction of parallel stochastic task compositions, IEEE Transactions on Parallel and Distributed Systems, 2006, 17(1): 78~91
    [65]G.L.Reijns, A.J.C.van Gemund, Predicting the execution times of parallel-independent programs using Pearson distributions, Elsevier Science Publishers B.V., Parallel Computing, 2005, 31(8~9): 877~899
    [66]K.Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability, New York: McGraw-Hill Higher Education, 1992
    [67]D.Lenoski, J.Laudon, K.Gharachorloo et al, The Stanford dash multiprocessor, IEEE Computer, 1992, 25(3): 63~79
    [68]T.Anderson,D.Culler,D.Patterson, A case for networks of workstations, Hot Interconnects II,1994.Symposium Record, 1994, 43~45
    [69]M.A.Baker, G.C.Fox, H.W.Yau, Review of cluster management softwqre, NHSE Review, 1996
    [70]C.L.Leiserson, Fat tree: universal networks for hardware-efficient supercomputing, IEEE Transaction on Computers, 1985, 34(10): 892~901
    [71]T.Kunz, The influence of different workload descriptions on a heuristic load balancing scheme, IEEE Transactions on Software Engineering, 1991, 17(7): 725~730
    [72]J.Wu, E.Fernandez, Ascheduling scheme for communicating tasks, Proceedings of the 22nd Southeastern Symposium on System Theory, 1990, 91~97
    [73]F.Bonomi, A.Kumar, Adaptive optimal load balancing in a non-homogeneous multi-server system with a central task scheduler, IEEE Transactions on Computers, 1990, 39(10): 1232~1250
    [74]R.Elsasser, B.Monien, S.Schamberger, Load balancing in dynamic networks, Proceedings of the 7th International symposium on Parallel Architectures, Algorithms and Networks, 2004
    [75]G.Attiya, Y.Hamam, Two phase algorithm for load balancing in heterogeneous, Proceeding of the 12th Euromicro Conference on Parallel, distributed and Network-Based Processing, 2004
    [76]R.L.Carino, I.Banicescu, A load balancing tool for distributed parallel loops, Cluster Computing, 2005, 8(4): 313~321
    [77]R.F.de Mello, L.C.Trevelin, M.S.V.de Paiva et al, Comparative study of the server-initiated lowest algorithm using a load balancing index based on the process behavior for heterogeneous environment, Cluster Computing, 2006, 9(3): 313~319
    [78]X.Qin, Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters, Journal of Network and computer Applications, 2008, 31(1): 32~46
    [79]N.Imani, H.Sarbazi-Azad, S.G.Akl, Perfect load balancing on the star interconnection network, J.Supercomput., 2007, 41(3): 269~286
    [80]R.diekmann, A.Frommer, B.Monien, Efficient schemes for nearest neighbor load balancing, Parallel computing, 1999, 25(7): 189~812
    [81]B.Ghosh, S.Muthukrishnan, M.H.Schultz, First and second order diffusive methods for rapid, coarse, distributed load balancing, Theory of Computing Systems, 1998, 31(4): 331~354
    [82]C.Xu, B.Monien, R.Luling, Nearest neighbor algorithms for Load balancing in parallel computers, Concurrency: Practice and Experience, 1995, 7(7): 707~736
    [83]S.C.Chau, Load balancing between heterogeneous computing clusters, Lectur Notes in Computer Science, 2004
    [84]H.Arndt, Load balancing: dimension exchange on product graphs, Parallel and Distributed Processing Symposium, 2004.Proceedings.18th International, 2004, 20~27
    [85]S.Ranka, Y.Won, S.Sahni, Programming a hypercube multicomputer, IEEE Software, 1988, 5(5): 69~77
    [86]M.Y.Wu, W.Shu, A load-balancing algorithm for N-cubes, 1996 International Conference on Parallel Processing, 1996
    [87]E.Ipek, B.R.de Supinski, M.Schulz et al, An approach to performance prediction for parallel applications, Lecture Notes in Computer Science, 2005, 196~205
    [88]S.A.Jarvis, D.P.Spooner, H.N.Lim Choi Keung et al, Performance prediction and its use in parallel and distributed computing systems, Future Generation Computer Systems, 2006, 22(7): 745~754
    [89]S.Pllana, T.Fahringer, PerformanceProphet: A performance modeling and prediction tool for parallel and distributed programs, Proceedings of the 2005 International Conference on Parallel Processing Workshops (ICPPW’05), 2005
    [90]G.Zheng, T.Wilmarth, P.Jagadishprasad et al, Simulation-based performance prediction for large parallel machines, International Journal of Parallel Programming, 2005, 33(2~3): 183~207
    [91]C.Ferdinand, Worst case execution time prediction by static program analysis, Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04), 2004
    [92]L.T.Yang, X.Ma, F.Mueller, Cross-platform performance prediction of parallel applications using partial execution, Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05), 2005
    [93]V.Blanco, J.A.Gonzalez, C.Leon et al, Predicting the performance of parallel programs, Parallel Computing, 2004, 30(3): 337~356
    [94]L.Carrington, A.Snavely, N.Wolter, A performance prediction framework for scientific applications, Future Generation Computer Systems, 2006, 22(3): 336~346
    [95]A.Thomasian, P.Bay, Analytic queuing network models for parallel processing of task systems, IEEE Trans. Computers, 1986, 35(12): 1045~1054
    [96]E.J.Gumbel, Statistical theory of extreme values (main results),in: A.E.Sarhan, B.G.Greenberg (Eds.), Contributions to Order Statistics, John Wiley & Sons, 1962, 56~93
    [97]H.Gautama, A.J C.van Gemund, A statistical approach to performance modeling of parallel systems, PhD thesis, Delft Univ. of Technology, 2004
    [98]H.Gautama, A.J.C.van Gemund, Low-cost performance prediction of data-dependent data parallel programs, Proc. IEEEE Int'l Symp.Modeling, Analysis and Simulation of Computer and Telecomm. Systems '01, 2001, 173~182
    [99]J.S. Ramberg et al., A probability distribution and its uses in fitting data, Technometrics, 1979, 21(2): 201~214
    [100]A.Stuart, J.K.Ord, Kendall's advanced theory of statistics, vol.1, 5th ed., Charles Griffin & Comjpany Limited London, 1986
    [101]N.L.Johnson, Systems of frequency curves generated by methods of translation, Biometrika, 1949, 36(1~2): 149~176
    [102]J.Aitchison, J.A.C.Brown, The lognormal distribution with special reference to its uses in economics, Cambridge: Cambridge University Press, 1957
    [103]N.L.Johnson, Tables to facilitate fitting SU frequency curves, biometrika, 1965, 52(3~4): 547~558
    [104]N.L.Johnson, J.O.Kitchen, Some notes on tables to facilitate fitting SB curves, Biometrika, 1971, 58(1): 223~226
    [105]K.O.Bowman,C.A.Serbin,L.R.Shenton, Explicit approximate solutions for SB, Commmun. Statist. B., 1981, 10(1): 1~15
    [106]I.D.Hill, R.Hill, R.L.Holder, Algorithm AS99.Fitting Johnson curves by moments, Appl. Statist., 1976, 25(2): 180~189
    [107]J.Bukac, Fitting SB curves using symmetrical percentile points, Biometrika, 1972, 59(3): 688~690
    [108]D.T.Mage, An explicit solution for SB parameters using four percentile points, Technometrics, 1980, 22(2): 247~251
    [109]J.F.Slifker, S.S.Shapiro, The Johnson system: Selection and parameter estimation, Technometrics, 1980, 22(2): 239~246
    [110]J.Galambos, The Asymptotic Theory of Extreme Order Statistics, New York, 1978
    [111]R.D.Reiss, Approximate Distribution of Order Statistics. New York: Springer-Verlag, 1989

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700