详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
Cloud computing is parallel computing, distributed computing and grid computing’s development, and make parallel technology into people's life. Cloud computing, technology of personal high-performance computer (PHPC) developed deeply, which make many technical personnel to start working from Stand-alone mode transfer to parallel computing mode. The popular of Cloud computing make parallel programming as a key problem many programmers must confront and resolve.
     Google suggest the MapReduce parallel programming model greatly reduced difficulty of the parallel programming. Comparing with traditional distributed program design, MapReduce encapsulates the parallel processing, tolerant, localization calculation, load balancing etc. details. Also provides a simple and powerful programming interface, and greatly simplifies the design of parallel programs.
     This paper firstly introduces the concept of cloud computing, basic theory and research status, and state several traditional parallel programming models, analyses and studies its principle and development. Briefly introduce Google computing clouds and Hadoop cloud computing structure, and compare MapReduce will with the MPI, studies the difference between the two with their respective advantages.
     This paper elaborates the thoughts of MapReduce programming in details, analyzes and studies principle of MapReduce solving work problems and specific steps and methods. MapReduce fault is introduced, and scheduling algorithm of MapReduce is analyzed in details when in working. then studies the difference for properties of MapReduce in heterogeneous Hadoop cluster environment, and analysis the influence on MapReduce in heterogeneous environment. This article suggests a new data distribution mechanism HDDM, according to calculation ratio of heterogeneous cluster nodes input file, improve performance of MapReduce in heterogeneous Hadoop cluster.
     Finally, the experiments show that the proposed data allocation mechanism HDDM can greatly improve the efficiency of MapReduce programs.
[1] Michael Armbrust , Armando Fox. Above the clouds: A Berkeley view of cloud computing[J]. Technical Report No. UCB/EECS-2009-28,University of California at Berkley, USA, 2009.2:3-5
    [2] Eugene Ciurana. Developing with Google App Engine[M]. New York: Berkeley, CA Apress , 2008
    [3] J.Dean and S.Ghemawat. MapReduce: Simplified data processing on large clusters [J]. Operating Systems Design and Implementation, 2004(9)8:137-149 .
    [4] IBM, IBM Introduces Ready-to-Use Cloud Computing[J/OL]. 2007.11.5(2-4), [2009-06-1], http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
    [5] Peng Liu,Yao Shi,Francis C.M.Lau,Cho-Li Wang,San-Li Li,Grid Demo Proposal:AntiSpamGrid,IEEE International Conference on Cluster Computing,Hong Kong,Dec1-4,2003,selected as one of the excellent Grid research projects for the GridDemo session
    [6] John Darlington, Yi-ke Guo, Hing Wing To. Structured parallel programming: theory meets practice. Computing tomorrow: future research directions in computer science book contents Pages: 49-65
    [7] K. Birman, G. Chockler, and R. van Renesse. Towards a cloud computing research agenda. SIGACT News,40(2):68-80, 2009.
    [8] Sanjay Ghemawat,Howard Gobioff,Shun-Tak Leung. The Google file system [J]. ACM SIGOPS Operating Systems, 2003,9(8):1-5
    [9] Hadoop. Hadoop homepage. http://hadoop.apache.org/
    [10] Dhruba Borthaku, The Hadoop Distributed File System: Architecture and design [R]. http://hadoop.apache.org/core/docs/current/hdfs design.pdf.
    [11] Hbase Development Team. Hbase: Bigtable-like structured storage for hadoop hdfs[J/OL]. http://wiki.apache.org/lucene-hadoop/Hbase, 2007.
    [12] Heli Xu,Guixin Wu.Parallel programming in Grid: Using MPI[C]. Third International Symposium on Electronic Commerce and Security Workshops(ISECS 2010),ISBN 978-952-5726-11-4,2010.7
    [13] Jeffrey Dean,sanjay Ghemawat .MapReduce:Simplified Data Processing on Large ClustersCommunications of the ACM [J]. 2008(9):107-1 13.
    [14] Proc. 15~(th) International Conference on Parallel Architectures and Compilation Techniques.Experiences with MapReduce, an abstraction for large-scale computation [C], Google Inc, 2006.1
    [15] Jeffrey Dean, Sanjay Ghemawat. Distributed Programming with MapReduce Beautiful code [J]. Google Inc ,2007.9( 23):1-4.
    [16] Jeffrey Dean,sanjay Ghemawat. MapReduce:Simplified Data Processing on Large Clusters [J]. 2008(1)4:107-1 13.
    [17] Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, Map-reduce for Machine Learning on Multicore [J]. Stanford University, 2004(5): 5-15.
    [18] Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining. Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching [C]. Carnegie Mellon University.2000
    [19] Torsten Hoefler, Andrew Lumsdaine, Jack Dongarra. Towards Efficient MapReduce Using MPI[C]. In Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing interface. 2009.9
    [20] Tom White. Running Hadoop MapReduce on Amazon EC2 and Amazon S3[J] . http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873, 2008.10
    [21] H.Yang, A.Dasdan, R.Hsiao,DS.Parker. Map-reduce-merge: Simple relational data processing on large clusters [J]. SIGMOD,2007.(5)1209-1233.
    [22] The Hadoop Distributed File System : Architecture and Design[J/OL]. http ://hadoop.apache.org/core/docs/r0.16.0/hdfs design.html 2009-04-15.
    [23] Michael O.Rabin. Efficient dispersal of information for security, load balancing and fault tolerance [J]. Journal of the ACM, 1989, 4(36) 335-340.
    [24] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, Improving mapreduce performance in heterogeneous environments, in: Proc. 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, San Diego,USA, Dec. 2008.
    [25] L. Zhang. The efficiency and fairness of a fixed budget resource allocation game. In International Colloquium on Automata, Languages and Programming, pages 485–496,2005.
    [26] Y. Becerra, V. Beltran, D. Carrera, M. Gonzalez, J. Torres, E. Ayguadé,Speeding up distributed mapreduce applications using hardware accelerators, ICPP’09: Proceedings ofthe 2009 International Conference on Parallel Processing, IEEE Computer Society, 2009, p. 42–49.
    [27] J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, I. Whalley. Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters. 2010 39th International Conference on Parallel Processing ,2010.
    [28] T. Sandholm and K. Lai. MapReduce Optimization using Regulated Dynamic Prioritization. In ACM SIGMETRICS’09: International Conference on Measurement and Modeling of Computer Systems, 2009.
    [29] Benjamin Mako Hill, Jono Bacon, Corey Burger,Jonathan Jesse, Ivan Krstic. The Official Ubuntu Book[M]. Englewood Cliffs,NJ:Prentice Hall, 2006.
    [30]刘鹏.云计算[M] .北京:电子工业出版社,2010
    [35]孙广中,肖锋,熊曦. MapReduce模型的调度及容错机制研究[J].微电子学与计算机, 2007,9(24):1-2
    [36] (美)库勒瑞思(Coulouris,G.)著.分布式系统概念与设计[M].北京:机械工业出版社,2004.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700