详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     本文在深入分析、研究现有云计算平台的缺点和MPI (Message Passing Interface)技术容错容灾能力的基础之上,自主研发了一种基于MPI的高性能云计算平台原型系统(MPI-based HPCCP)。该云平台不经过虚拟化,直接使用异构计算节点构建云平台底层;采用增加多级容错容灾功能的MPI技术和多线程技术重写MapReduce编程模型,避免大量无用的I/O操作,从而提高云计算的效率,以满足兼具数据密集和计算密集的海量数据高性能计算问题对云计算的需要。
Cloud computing is a main technique for massive data processing, however it is inefficient for dealing with both data intensive and computational intensive problems. The low layer of the cloud computing uses the virtual technique, so that all system and application softwares execute on the virtual hardware, which reduce performance up to20percent pointed out by a literature. In other hand the MapReduce paradigm of cloud computing adopts the store-forward stratagem for medium data, which would create great amount I/O operations for big data and cannot be applied efficiently to high performance science computing.
     Based on above considerations and in view of the MPI weakness in fault-tollerent capability the dissertation focuses on developing a MPI-based high performance cloud computing platform (HPCCP), which configures the low layer of the platform directly using heterogeneous computing nodes without virtualization, and reprograms the MapReduce paradigm with integration of multilayer fault-tolerant MPI techniques and multi thread techniques to avoid great amount unnecessary I/O operations and increase the efficiency. The proposed and implemented MPI-based HPCCP platform prototype can efficiently deal with the data-intensive as well as computing intensive problems to satisfy high performance cloud computing requirements.
     The main creations of the proposed MPI-based HPCCP platform are as followings:
     1. A methodology, which configures the low layer of the cloud-computing platform directly using heterogeneous computing nodes without virtualization.
     The proposed and implememted MPI-based HPCCP platform in the dissertation, instead of adopting the fashionable virturalization techniques, fully takes advantage of the MPI ability of exploration and adaptivity in heterogeneous computing nodes to directly construct the IaaS layer of the cloud-computing platform. This is an important creation that increases productivity of the cloud plateform by decreasing harm influences of the virtualization to hardware capability in the IaaS layer.
     2. Amelioration and implementation of the MPI multi-layer fault-tollerent techniques.
     The weak fault-tollerent ability is a crucial defect of the MPI, comparing with its excellent ability of high performance computing, which limits the MPI application in big data processing. The MPI technique could not be adapted in the cloud computing provided that the defect would not be solved. The dissertation has comprehensively studied the MPI fault torrelent technoques, proposed and implemented three different fault tollenent techniques:job rescheduling, job/task recovering, and task dynamic migration, which are allocated in three different layers. This creation has remedied the defect of MPI in the fault tolerant ability, which is another distinguishing feature of the dissertation.
     3. An efficient MapReduce prototype of the MPI-based HPCCP platform has been designed and implemented.
     The data transfer in current MapReduce paradigm implemtation is encapsulated by the distributed file system (DFS), so that repeated I/O operations are taken place to the DFS during the data processing, which seriously reduce system efficiency. The dissertation reprograms the MapReduce paradigm on a redisgned multi-layer fault-tolerant MPI platform, which can directly process the medium results, reducing unnecessary I/O operation, speeding up the cloud computing and obtaining higher efficiency. Comparing with the Hadoop, the current fashionable implementation of the MapReduce, our MPI-based HPCCP can reduce a big data processing time of the fingerprint recognition to25percent.
     The dissertation has done intensive tests and case studies for the MPI-based HPCCP platform. Among them there are some of them:the influence of data block size to data processing performance; robustness and efficiency of the multi layer fault tolerancy; gerenal performance of the MPI-based HPCCP platform. Finally the comparision between the Hadoop and the MPI-based HPCCP platform has been done. The experiments have shown that the proposed and implemented cloud-computing platform in the dissertation has four more times better runtime than the traditional Hadoop platform.
     In the last section, conclusions and some to be solved problems have been listed. The near future reseach proposal is also described briefly.
[1]2010 Digital Universe Study [EB/OL]. [2010-09-27]. http://
    [2]Armando Fox, Above the clouds:a Berkeley view of cloud computing, UC Berkeley Reliable Adaptive Distributed Systems Lab, Technical Report UCB/EECS-2009-28[R]. 2009.
    [3]Stephen B. Google and the Wisdom of Clouds [Z].2007.12.
    [4]中科院计算所所长李国杰院士.云计算与HPC——兼谈加强计算机系统研究的必要性[EB/OL].2011. pdf
    [6]中国国家自然科学基金委员会. 52.htm
    [13]OpenMPI:Open Source High Performance Computing.
    [15]Torsten Hoefler, Andrew Lumsdaine, Jack Dongarra. Towards Effcient MapReduce Using MPI. Lecture Notes in Computer Science.2009. P 240-249.
    [16]Hisham Mohamed, Stephane Marchand-Maillet. Enhancing MapReduce using MPI and an optimized data exchange policy.2012 41st International Conference on Parallel Processing Workshops.2012.6.P11-18.
    [17]美国sandia国家实验室.1ittp://[online] February 26,2013.
    [18]Yu-Fan Ho, Sih-Wei Chen. A Mapreduce Programming Framework Using Message Passing. Computer Symposium (ICS).2010.11. P 883-888.
    [19]Ying Peng, Fang Wang. Cloud computing model based on MPI and OpenMP. Computer Engineering and Technology (1CCET).2010.4. Vol.7. P 85-87.
    [26]Status:Guest OSes. VirtualBox [online].
    [27]Virtual Appliance Marketplace. Vmware Inc [EB/OL].2011.01.
    [28]The Xen virtual machine monitor. University of Cambridge [EB/OL].
    [29]E.Walker. Benchmarking Amazon EC2 for High-Performance Scientific Computing. The USENIX Magazine.vol 33. no.5.2008.10.
    [30]Parallel Virtual Machine [EB/OL]. [2012-02-15].
    [31]Message Passing Interface [EB/OL].2012.2.
    [32]FT-MPI [EB/OL]. [2012-02-09].
    [33]LA-MPI [EB/OL]. [2012-02-09].
    [34]LAM-MPI [EB/OL]. [2012-02-09].
    [35]OpenMPI [EB/OL]. [2012-02-08].
    [36]DeinoMPI [EB/OL]. [2012-02-06].
    [37]about MPICH2[EB/OL]. [2012-02-09].
    [38]Berkeley Lab Checkpoint/Restart (BLCR) [EB/OL]. [2012-02-08].
    [39]Condor [EB/OL].
    [40]Chpox [EB/OL].
    [41]CryoPID [EB/OL].
    [42]DMTCP [EB/OL].
    [43]OpenVZ [EB/OL].
    [44]Berkeley Lab Checkpoint/Restart (BLCR) User's Guide [EB/OL]. [2011-12-12].
    [45]Application checkpointing [EB/OL]. [2012-02-09].
    [46]E.N. Elnozahy, L. Alvisi, Y-M. Wang, D.B. Johnson. A survey of rollback-recovery protocols in message-passing systems [J]. ACM Comput. Surv.,2002, vol.34, no.3: 375-408.
    [47]Yibei Ling, Jie Mi, Xiaola Lin. A Variational Calculus Approach to Optimal Checkp-oint Placement [J]. IEEE Trans. Computers 50(7),2001:699-708.
    [48]BLCR Flyer for SC2004 [EB/OL]. [2012-02-12].
    [49]Jason Duell. The Design and Implementation of Berkeley Lab's LinuxCheckpoint/Restart [Z]. Lawrence Berkeley National Laboratory,2005-04-30
    [50]Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdair(?) Jason Duell, Paul Hargrove. The LAM/MPI Checkpoint/Restart Framework:System-Initiated Check-pointing[R]. LACSI Symposium.2003-10.
    [51]SciDAC [EB/OL].
    [52]PaulPaul Hargrove, Eric RomanHargrove, Eric Roman, Jasonand Jason. Advanced Ch-eckpoint Fault Tolerance Solutions for HPC [R]. WTTC2008.2008-07-09
    [53]Using the Hydra Process Manager [EB/OL]. [2012-02-08].
    [54]MPI:A Message-Passing Interface Standard Version 2.2[R]. Message Passing Interface. 2009-09-04.
    [55]Hydra Process Management Framework [EB/OL]. [2012-01-25].
    [56]FAQ:Fault tolerance for parallel MPI jobs [EB/OL]. [2011-12-02].
    [57]Xu Liu, Bibo Tu, Jianfeng Zhan, Dan Meng. A Fast-start, Fault-tolerant MPI Launcher on Dawning Supercomputers[C]. New Zealand:PDCAT 2008,2008:263-266
    [59]Ron Brightwell, Kurt B Ferreira, Rolf Riesen. Transparent Redundant Computing with MPI [C]. Univ Stuttgart:EuroMPI 2010,2010:208-218
    [60]Varghese Blesson, Mckee Gerard, Alexandrov Vassil. Implementing Intelligent Cores using Processor Virtualization for Fault Tolerance [C]. Netherlands:ICCS 2010,2010: 2191-2199
    [61]Walters John Paul, Chaudhary Vipin. A fault-tolerant strategy for virtualized HPC Cl-usters [J]. JOURNAL OF SUPERCOMPUTING,2009, Volume 50, Issue 3:209-239
    [62]Liu Tiantian, Ma Zhong, Ou Zhonghong. A Novel Process Migration Method for M-PI Applications [C]. Shanghai:PRDC 2009,2009:247-251
    [63]Walters John Paul, Chaudhary Vipin. Replication-Based Fault Tolerance for MPI App-lications [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2009, Volume 20, Issue 7:997-1010
    [64]LeBlanc Troy, Anand Rakhi, Gabriel Edgar, Subhlok Jaspal. VolpexMPI:An MPI Li-brary for Execution of Parallel Applications on Volatile Nodes [C]. Espoo:16th Euro-pean PVM/MPI Users' Group Meeting,2009:124-133
    [65]Wang Chao, Mueller Frank, Engelmann Christian, Scott Stephen. Proactive processlev-el live migration and back migration in HPC environments [J]. JOURNAL OF PAR-ALLEL AND DISTRIBUTED COMPUTING,2009, Volume 72, Issue 2:254-267
    [66]Collaborators [EB/OL]. [2012-02-09].
    [67]news & events [EB/OL]. [2012-02-09].
    [68]Sandra Loosemore, Richard M. Stallman, Roland McGrath, Andrew Oram, Ulrich Dr-epper. The GNU C Library Reference Manual [M].2007-10-27:127-620
    [69]A Ubuntu MPI Cluster (PART 1-server setup) [EB/OL]. [2012-02-04].
    [70]Mark Shuttleworth:200 million Ubuntu users by 2015 [EB/OL]. [2012-01-10]. 015.html
    [72]MapReduce [EB/OL]. [2012-02-01].
    [76]Stephen B. Google and the Wisdom of Clouds [Z]. [2007-12-13].
    [77]Shadi Ibrahim. Performance-Aware Scheduling for Data-Intensive Cloud Computing [D].华中科技大学博士学位论文,2011.6.
    [81]Sanwon Seo, Edward J.Yoon, Jaehong Kim, Seong wook Jin, Jin-Soo kim, Seungryoul Maeng. HAMA:An Efficient Matrix Computation with the MapReduce Framework [J]. 2nd IEEE International Conference on Cloud Computing Technology and Science.2010, P721-726.
    [90]M.D. Linderman, J.D. Collins, H. Wang, T.H. Meng. Merge:a programming model for heterogeneous multi-core systems. ACM SIGPLAN Notices.2008.
    [91]Chen R, Chen H, Zang B. Tiled-MapReduce:optimizing resource usages of data-parallel appli-cations on multicore with tiling. Proceedings of the 19th international conference on Parallel architectures and compilation techniques.2010.
    [94]Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G. Twister:A runtime for iterative MapReduce. Proceedings of the 19th ACM International Symposium onHigh Performance Distributed Computing.2010.
    [95]Condie T, Conway N, Alvaro P, Hellerstein J M, ElmeleegyK, Sears R. MapReduce online. Proceedings of the 7thUSENIX Symposium on Networked Systems Design and Im-plementation (NSDI 2010).2010.
    [97]Kirsten Hildrum, John D. Kubiatowicz, Satish Rao, Ben Y. Zhao. Distributed Object Location in a Dynamic Network [J]. Theory of Computing Systems.2004 (3).
    [98]Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters. Communications of the ACM.2008.
    [101]Pseudo-Random Numbers [EB/OL]. [2012-01-17].