MapReduce计算任务调度的资源配置优化研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

MapReduce计算任务调度的资源配置优化研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Researches on Optimization of Resource Allocation for MapReduce Scheduling
作者：韩海雯
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：MapReduce编程模型 ; 运行概貌 ; 性能预测 ; 性能调优 ; 任务调度 ; 资源感知
英文关键词：MapReduce Programming Model ; Profile ; Performance Predict ; Performance
英文关键词：Optimization ; Task scheduling ; Resource-aware
学位年度：2013
导师：齐德昱
学科代码：081203
学位授予单位：华南理工大学
论文提交日期：2013-10-08

摘要

大数据处理平台中任务密度和数据厚度不断增加，平台资源规模也随之不断扩展。面对错综复杂的大数据计算任务串并行执行过程和并发调度过程，如何合理配置平台资源，这直接决定了大数据处理平台的业务承载能力。现有的以面向数据并行编程模型为核心的大数据处理技术，主要着眼于计算任务调度执行过程中各种资源的并行化及相关伸缩性实现研究，却在基于不同用户和不同计算任务间相异的资源需求展开资源配置优化方面尚未展开充分的研究。
     大数据处理平台的资源配置优化问题，是大数据应用发展推动下形成的重要研究领域，目前相关的研究工作仍处于起步阶段。瞄准这一薄弱点，着眼于新兴的MapReduce大数据处理框架，本文对大数据处理技术特点和MapReduce计算任务调度执行过程进行了全面而深入的分析，并提出了资源配置优化的系统解决方案，从纵向的单计算任务串行执行和横向的多计算任务并发调度这两个层面对大数据处理平台资源的配置进行优化，以达到提高大数据处理平台资源利用率、加强平台业务承载能力的最终目的。
     本文的主要研究工作和创新点概括如下：
     1.从大数据处理显著的动态特性出发，为构建自适应的资源配置优化体系框架，提出计算任务运行概貌概念，为大数据处理计算任务塑型负载表征。由此出发，基于新兴大数据处理系统—MapReduce编程模型及其支撑系统的工作原理和工作机制，对MapReduce计算任务运行概貌的实际结构及组成字段进行了详细的设计和构建。进一步地，基于BTrace技术开发了非入侵式的动态探针程序，实现对MapReduce计算任务实际执行情况的细粒度实时探测，并生成具体的计算任务运行概貌值。
     2.基于MapReduce计算任务运行概貌，从纵向的单MapReduce计算任务串行执行层面，提出一种自适应动态资源配置自调优方法，即运行概貌－性能预测－性能优化（Profile－Predict－Optimize，PPO）方法，并依次构建了相应的MapReduce计算任务性能预测模型和MapReduce计算任务性能优化模型。其中，MapReduce计算任务性能预测模型采用基于已知计算任务运行概貌及假设计算任务资源配置计划的白盒分析方法和基于决策树学习的黑盒评估方法等进行综合建模，实现对计算任务执行性能的预测和估算。MapReduce计算任务性能优化模型则在此基础上进一步采用子空间分解和递归随机搜索技术对庞大而高维的资源配置计划解空间进行有效搜索，并基于用户优化目标和相应约束条件进行寻优比较，求出资源配置计划最优解。深入的实验评测结果表明，性能预测模型在运行探针程序额外开销下，会产生平均15.1%的计算任务执行时间过量预测，但基本能够清晰有效地识别出导致好的优化效果的计算任务配置参数值；与目前常用的经验规则方法相比，性能优化模型能在多计算任务并发执行中把计算任务执行时长改善幅度的平均值提高42%、最大值提高25.7%。
     3.基于计算任务运行概貌和计算任务性能预测模型，从横向的多MapReduce计算任务并发调度层面，提出一种自适应的资源感知动态并发调度方法(Resource-awareDynamic Scheduler，RDS)，并据此设计和开发了RDS调度器原型。RDS调度器创新性地在多任务并发调度过程中纳入了对来自多用户的不同计算任务完成质量需求的考虑，面向多个动态随机到达的MapReduce计算任务，通过资源放置矩阵感知系统资源使用情况的最新状态，基于用户计算任务完成质量需求建立计算任务效用评估模型，以计算任务效能总值最大化为调度目标，不断动态更新计算任务在各处理机节点的资源调度分配，以达到满足平台多用户计算任务完成质量要求和提高平台总体资源利用率的双赢。综合评测结果表明， RDS调度器能够对平台资源在多个并发执行的计算任务间的分配情况进行动态调整，在放松的计算任务完成时长目标和紧缩的计算任务完成时长目标下，其表现均优于Hadoop系统提供的公平调度器，达到与其相比5-100%的计算任务执行时长的缩减。
The job frequency and data density increase continuously in big data processing platform,together with the platform resources. To achieve the excellent carrying capacity in big dataprocessing platform, it is important to allocate the platform resources properly among big datacomputation jobs in the complicated execution and concurrent scheduling process. Theexisting research on big data processing technology about data-oriented parallel programmingmodel pays more attention on the implementation of computation job’s parallelism executionthan on the different resource demand of different users and different computation jobexecution processes, where hide a huge opportunity of resource utilization improvement andbusiness carrying capacity enhancement by optimizing the resources allocation amongdifferent computation jobs and different computation job execution processes.The resource allocation optimization of big data processing platform is a so brand newresearch scope being developed by the big data application development that the relatedresearch work is still in shortage currently. Targeting at this gap, a complete model of resourceallocation optimization for the emerging big data processing MapReduce framework isproposed according to the in-depth study and creative development of the resource allocationoptimization during the vertical MapReduce computation job execution and horizontalmulti-jobs' concurrent scheduling process in the big data processing platform. This modeldevelops the existing technology of MapReduce programming model and its supportingsystem by optimizing the resource allocation from both levels including vertical computationjob execution and horizontal jobs concurrent scheduling process to reach the target ofresource utilization improvement and business bearing capacity enhancement in big dataprocessing platform.
     Specifically, the main contributions of this study are as follows:
     1. A new concept, computation job execution profile, is proposed in this study to developthe self-adaptive capacity for the dynamic feature in big data processing. Bycomprehensively studying the detailed mechanism of the MapReduce programmingmodel and its support system, the construction and the composed fields of thecomputation job execution profile are formed according to the MapReduce job’s micro-processing execution phases. Afterward, a non-invasive dynamic probe program isdesigned and developed using BTrace technique to trace the actual MapReducecomputation job’s execution procedures during its execution and get the detail executioninformation in granular real-time to count out the result, which is the specific value ofeach profile field.
     2. With the vertical job execution point, a new adaptive dynamic auto-tuning methodcomposed of three phases including job execution status profiling, job performancepredicting and job performance optimizing (Profile-Predict-Optimize,PPO) is proposed,with the corresponding MapReduce job performance prediction model and theMapReduce job performance optimization model. The MapReduce job performanceprediction model is constructed to predict the MapReduce computation job performanceaccording to the given computation job running profile and computation job resourceallocation plan. And, using the MapReduce job performance prediction model, theMapReduce job performance optimization model could find out the most optimalresource allocation plan by searching the resource allocation plans space effectivelyaccording to the user’s optimization demand. The experiment results show that theperformance prediction model basically could clearly and effectively identify the betteroptimization configuration values, though producing an average of15.1%of thecalculated excess predict task execution time because of the probe overhead. On the basis,the performance optimization model would improve the computation job's execution timeby average42%, maximum25.7%than the commonly used rule and thumb methods forthe concurrently multiple computation jobs.
     3. A new adaptive resource-aware dynamic scheduler (Resource-aware Dynamic Scheduler,RDS) for multi tasks concurrently scheduling problem is proposed and constructed. RDSachieves both the different levels of customer satisfaction and the resource utilizationimprovement by sensing the resource usage status timely through a resource placementmatrix of each processor node computing resource scheduling assignment constantlyupdated dynamically and maximizing the total tasks utility through task effectivenessevaluation model based on user QoS requirements. The comprehensive evaluation resultsshow that the RDS scheduler is able to dynamically adjust the platform resources allocation among the concurrently multiple computation jobs under no matter the relaxedlong completion time goal or the crunched completion time goal with the superiorperformance than the Hadoop's fair scheduler about5-100%completion time reduced forthe multiple computation jobs.

引文

[1] Dean J, Ghemawat S. MapReduce: simplified data processing on largeclusters[J]. Communications of the ACM,2008,51(1):107-113
    [2] DBMS. Facebook, hadoop, and hive[EB/OL].http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive,2009
    [3] Erieson J. Proeessing avatar[EB/OL]. http://www.information-management.eom/newsletters/avatar-data-proeessing-10016774-1.html,2009
    [4]覃雄派,王会举,杜小勇,等.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45
    [5] Winter Corporation.2005Top Ten Program Summary: The survey of the world'slargest databases[EB/OL]. http://www.wintercorp.com/WhitePapers/WC_TopTen WP.pdf
    [6]王元卓,靳小龙,程学旗.网络大数据:现状与展望[J].计算机学报,2013,36(6):1125-1138
    [7] AsterData,Aster Data nCluster[EB/OL], http://www.asterdata.com/product/ncluster cloud.php
    [8] Greenplum[EB/OL].http://www.greenplum.com
    [9] Friedman E, Pawlowski P, Cieslewicz J. SQL/MapReduce: A practical approachto self-describing, polymorphic, and parallelizable user-definedfunctions[J]. Proceedings of the VLDB Endowment,2009,2(2):1402-1413
    [10]孟小峰,慈祥.大数据管理:概念,技术与挑战[J].计算机研究与发展,2013,50(1):146-169
    [11] Hadoop[EB/OL].http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html
    [12] Gates, A.Comparing Pig Latin and SQL for Constructing Data Pro-cessingPipelines[EB/OL]. http://developer.yahoo.com/blogs/hadoop/posts/2010/01/comparing_pig_latin_and_sql_fo
    [13] Thusoo A, Sarma J S, Jain N, et al. Hive: a warehousing solution overa map-reduce framework[J]. Proceedings of the VLDB Endowment,2009,2(2):1626-1629
    [14] Isard M, Budiu M, Yu Y, et al. Dryad: distributed data-parallel programsfrom sequential building blocks[J]. ACM SIGOPS Operating Systems Review,2007,41(3):59-72
    [15]罗军舟,金嘉晖,宋爱波.云计算-体系架构与关键技术[J].通信学报,2011,32(7):3-21
    [16] Mell P, Grance T. The NIST definition of cloud computing (draft)[J].NIST special publication,2011,800(145):7
    [17] Power R, Li J. Piccolo: Building Fast, Distributed Programs withPartitioned Tables[C].9th USENIX Symposium on Operating Systems Design andImplementation(OSDI’10). Vancouver：OSDI,2010:294-306
    [18] Yang H, Dasdan A, Hsiao R L, et al. Map-reduce-merge: simplifiedrelational data processing on large clusters[C]. Proceedings of the2007ACM SIGMOD international conference on Management of data. Beijing: ACM,2007:1029-1040
    [19] Borthakur D.The hadoop distributed file system: Architecture anddesign[EB/OL]. http://hadoop.apaehe.org/hdfs/does/eurrent/hdfs-design.html
    [20] Bialecki A, Cafarella M, et al. Hadoop:a frarnework On large clustersbuilt of commodity hardware[EB/OL]. http://lucene.apache.org/hadoop
    [21] Zaharia M, Konwinski A, Joseph A D, et al. Improving MapReduce Performancein Heterogeneous Environments[C].8th USENIX Symposium on Operating SystemsDesign and Implementation (OSDI’08). San Diego：OSDI，2008，8:4-7
    [22] Amazon. Elastic compute cloud [EB/OL]. http://aws.amazon.com/cc
    [23] Amazon Elastic MapReduce [EB/OL]. http://aws.amazon.com/elasticmapreduce
    [24] Elteir M, Lin H, Feng W. Enhancing mapreduce via asynchronous dataprocessing[C]. Parallel and Distributed Systems (ICPADS),2010IEEE16thInternational Conference on. Shanghai:IEEE,2010:397-405
    [25] Apache Software Foundation [EB/OL]. Applications Powered by Hadoop.http://wiki.apache.org/hadoop/PoweredBy
    [26] Yu Y, Isard M, Fetterly D, et al. DryadLINQ: A System for General-PurposeDistributed Data-Parallel Computing Using a High-Level Language[C].8thUSENIX Symposium on Operating Systems Design and Implementation(OSDI’08). SanDiego：OSDI，2008，8:1-14
    [27] Pike R, Dorward S, Griesemer R, et al. Interpreting the data: Parallelanalysis with Sawzall [J]. Scientific Programming,2005,13(4):277-298
    [28] Olston C, Reed B, Srivastava U, et al. Pig latin: a not-so-foreignlanguage for data processing[C]. Proceedings of the2008ACM SIGMODinternational conference on Management of data. Vancouver: ACM,2008:1099-1110
    [29] Yang C, Yen C, Tan C, et al. Osprey: Implementing MapReduce-style faulttolerance in a shared-nothing distributed database[C].2010IEEE26thInternational Conference on Data Engineering (ICDE’10). Long Beach:IEEE,2010:657-668
    [30]黄訸,易晓东,李姗姗,等.面向高性能计算机的海量数据处理平台实现与评测[J].计算机研究与发展,2012,1:357-361.
    [31] Stonebraker M, Abadi D, DeWitt D J, et al. MapReduce and parallel DBMSs:friends or foes?[J]. Communications of the ACM,2010,53(1):64-71
    [32] Yu Y, Gunda P K, Isard M. Distributed aggregation for data-parallelcomputing: interfaces and implementations[C]. Proceedings of the ACM SIGOPS22nd symposium on Operating systems principles. Big Sky:ACM,2009:247-260
    [33] Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: anarchitectural hybrid of MapReduce and DBMS technologies for analyticalworkloads[J]. Proceedings of the VLDB Endowment,2009,2(1):922-933
    [34] Ng A Y, Bradski G, Chu C T, et al. YuanYuan Yu. MapReduce for machinelearning on multicore[J]. NIPS, December,2006:281-288
    [35] Lin H, Ma X, Archuleta J, et al. Moon: Mapreduce on opportunisticenvironments[C]. Proceedings of the19th ACM International Symposium onHigh Performance Distributed Computing. Chicago: ACM,2010:95-106
    [36] Ekanayake J, Pallickara S, Fox G. Mapreduce for data intensive scientificanalyses[C]. IEEE Fourth International Conference on e-Science(eScience'08). Indiana:IEEE,2008:277-284
    [37] Peng D, Dabek F. Large-scale Incremental Processing Using DistributedTransactions and Notifications[C].9th USENIX Symposium on Operating SystemsDesign and Implementation(OSDI’10). Vancouver：OSDI,2010:1-15
    [38] Logothetis D, Olston C, Reed B, et al. Stateful bulk processing forincremental analytics[C]. Proceedings of the1st ACM symposium on Cloudcomputing. Indianapolis: ACM,2010:51-62
    [39] Ekanayake J, Li H, Zhang B, et al. Twister: a runtime for iterativemapreduce[C]. Proceedings of the19th ACM International Symposium on HighPerformance Distributed Computing. Chicago: ACM,2010:810-818
    [40] Butenhof D. Programming with POSIX (R) threads [M]. Addison-WesleyProfessional,1997
    [41] Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computingwith working sets[C]. Proceedings of the2nd USENIX conference on Hot topicsin cloud computing(HotClound’10). Whashington: Hot Cloud,2010:10-10
    [42]陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,1(20):1337-1348
    [43] AmazonEMR.AmazonElasticMapReduce[EB/OL]. http://aws.amazon.com/elasticmapreduce
    [44] Herodotou H, Lim H, Luo G, et al. Starfish: A Self-tuning System forBig Data Analytics[C]. Fifth Biennial Conference on Innovative Data SystemsResearch. Asilomar: CIDR,2011,11:261-272
    [45] Babu S. Towards automatic optimization of MapReduce programs[C].Proceedings of the1st ACM symposium on Cloud computing. Indianapolis: ACM,2010:137-142
    [46] Oozie. Oozie: Workow Engine for Hadoop[EB/OL]. http://yahoo.github.com/oozie/
    [47] Olston C, Reed B, Srivastava U, et al. Pig latin: a not-so-foreignlanguage for data processing[C]. Proceedings of the2008ACM SIGMODinternational conference on Management of data (SIMOD/PODS’08). Vancouver:ACM,2008:1099-1110
    [48] Cascading. Cascading[EB/OL]. https://github.com/BertrandDechoux/cascading.learn
    [49] Armbrust M, Fox A, Griffith R, et al. A view of cloud computing [J].Communications of the ACM,2010,53(4):50-58
    [50]邹永贵,万建斌.云计算环境下的资源管理研究[J].数字通信,2012，08：39-43
    [51]李建锋,彭舰.云计算环境下基于改进遗传算法的任务调度算法[J].计算机应用,2011,31(1):184-186
    [52] Braun T D, Siegel H J, Beck N, et al. A comparison of eleven staticheuristics for mapping a class of independent tasks onto heterogeneousdistributed computing systems [J]. Journal of Parallel and Distributedcomputing,2001,61(6):810-837
    [53] Chauhan S S, Joshi R C. A weighted mean time min-min max-min selectivescheduling strategy for independent tasks on grid[C]. Advance ComputingConference (IACC). Patiala:2010IEEE2nd International. IEEE,2010:4-9
    [54] Danskin J M. The theory of max-min and its application to weaponsallocation problems [M]. New York: Springer-Verlag,1967
    [55] Munir E U, Li J, Shi S. QoS sufferage heuristic for independent taskscheduling in grid[J]. Information Technology Journal,2007,6(8):1166-1170
    [56] Sandholm T, Lai K. Dynamic proportional share scheduling in hadoop[C].Job scheduling strategies for parallel processing. Atlanta: JSSPP,2010:110-131
    [57] Isard M, Prabhakaran V, Currey J, et al. Quincy: fair scheduling fordistributed computing clusters[C]. Proceedings of the ACM SIGOPS22ndsymposium on Operating systems principles(SOSP’09).Big Sky:ACM,2009:261-276
    [58] Capacity Scheduler Guide[EB/OL]. Http://hadoop.apache.org/common/docs/current/capacity_scheduler.html
    [59] Verma A, Llora X, Goldberg D E, et al. Scaling genetic algorithms usingmapreduce[C]. Ninth International Conference on Intelligent Systems Designand Applications(ISDA'09). Washington: IEEE,2009:14-18
    [60] Berlińska J, Drozdowski M. Scheduling divisible MapReducecomputations[J]. Journal of Parallel and Distributed Computing,2011,71(3):450-459
    [61] Plimpton S J, Devine K D. MapReduce in MPI for large-scale graphalgorithms[J]. Parallel Computing,2011,37(9):610-632
    [62] Castillo C, Rouskas G N, Harfoush K. Resource co-allocation forlarge-scale distributed environments[C]. Proceedings of the18th ACMinternational symposium on High performance distributed computing(HPDC’09). Garching: ACM,2009:131-140
    [63] Wang X, Wang Y. Coordinating power control and performance managementfor virtualized server clusters [J]. Parallel and Distributed Systems, IEEETransactions on,2011,22(2):245-259
    [64] Zhu X, Young D, Watson B J, et al.1000islands: an integrated approachto resource management for virtualized data centers [J]. Cluster Computing,2009,12(1):45-57
    [65]李强,郝沁汾,肖利民,等.云计算中虚拟机放置的自适应管理与多目标优化[J].计算机学报,2011,34(12):2253-2264
    [66] Srinivasan A, Faruquie T A, Joshi S. Data and task parallelism in ILPusing MapReduce [J]. Machine learning,2012,86(1):141-168
    [67] Provide ability to run memory intensive jobs without affecting otherrunning tasks on the nodes[EB/OL]. http://issues.apache.org/jira/browse/hadoop-3759
    [68] Cantrill B, Shapiro M W, Leventhal A H. Dynamic Instrumentation ofProduction Systems[C]. Proceedings of the General Track:2004USENIX AnnualTechnical Conference. Boston:USENIX2004:15-28
    [69] BTrace. BTrace: A Dynamic Instrumentation Tool for Java[EB/OL].http://kenai.com/projects/btrace
    [70] White T. Hadoop: the definitive guide[M]. O'Reilly,2012.
    [71] Stillger M, Lohman G M, Markl V, et al. Leo-db2's learning optimizer[C].Procedings of27thInternational Conference on Very Large Data Bases.Roma:VLDB2001:19-28
    [72] Sheers K R. Hp openview event correlation services[J]. Hewlett PackardJournal,1996,47:31-33
    [73] Karjoth G. Access control with IBM Tivoli access manager[J]. ACMTransactions on Information and System Security (TISSEC),2003,6(2):232-257
    [74] Dageville B, Das D, Dias K, et al. Automatic SQL tuning in Oracle10g[C].Proceedings of the Thirtieth international conference on Very large databases-Volume30. VLDB Endowment,2004:1098-1109
    [75] Belknap P, Dageville B, Dias K, et al. Self-tuning for SQL performancein Oracle database11g[C]. IEEE25th International Conference on DataEngineering(ICDE'09).Shanghai:IEEE,2009:1694-1700
    [76] IBM Corp. DB2SQL Performance Analyzer[EB/OL].http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2tools.anl.doc.iug/anlhome.htm
    [77] Agrawal S, Chaudhuri S, Kollar L, et al. Database tuning advisor forMicrosoft SQL Server2005: demo[C]. Proceedings of the ACM SIGMODinternational conference on Management of data.Baltimore: ACM2005:930-932
    [78] Astrahan M M, Blasgen M W, Chamberlin D D, et al. System R: relationalapproach to database management[J]. ACM Transactions on Database Systems(TODS),1976,1(2):97-137
    [79] Chen C M, Roussopoulos N. Adaptive selectivity estimation using queryfeedback[M]. ACM,1994
    [80] Aboulnaga A, Chaudhuri S. Self-tuning histograms: building histogramswithout looking at data[C]. ACM SIGMOD Comference.Philadelphia:28(2):181-192
    [81] Stillger M, Lohman G M, Markl V, et al. Leo-db2's learning optimizer[C].Proceedings of27thInternational Conference on Very Large DataBases.Roma:VLDB2001VLDB,1:19-28
    [82] Chaudhuri S, Narasayya V, Ramamurthy R. A pay-as-you-go framework forquery execution feedback[J]. Proceedings of the VLDB Endowment,2008,1(1):1141-1152
    [83] Kabra N, DeWitt D J. Efficient mid-query re-optimization of sub-optimalquery execution plans[C]. Proceedings of the1998ACM SIGMOD InternationalConference on Mangement of Data. Seattle: ACM1998,27(2):106-117
    [84] Babu S, Bizarro P, DeWitt D. Proactive re-optimization[C]. Proceedingsof the2005ACM SIGMOD International Conference on Management of data.Athens:ACM2005:107-118
    [85] Bruno N, Chaudhuri S. Automatic physical database tuning: arelaxation-based approach[C]. Proceedings of the2005ACM SIGMODinternational conference on Management of data. Seattle:ACM2005:227-238
    [86] Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: workloadas a sequence[C]. Proceedings of the2006ACM SIGMOD International Conferenceon Management of data.Chicago: ACM,2006:684-694
    [87] Agrawal S, Chaudhuri S, Narasayya V R. Automated Selection ofMaterialized Views and Indexes in SQL Databases[C]. Proceedings of26thInternational Conference on Very Large Data Bases. Cairo:VLDB,2000:496-505
    [88] Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontalpartitioning into automated physical database design[C]. Proceedings ofthe2004ACM SIGMOD International Conference on Management of data. Paris:ACM,2004:359-370
    [89] Papadomanolakis S, Ailamaki A. Autopart: Automating schema design forlarge scientific databases using data partitioning[C]. Proceedings of the16th International Conference on Scientific and Statistical DatabaseManagement(SSDBM2004). Santorini Island:IEEE,2004:384-392
    [90] S, Chaudhuri S, Kollar L, et al. Database tuning advisor for MicrosoftSQL Server2005: demo[C]. Proceedings of the2005ACM SIGMOD InternationalConference on Management of data. Seattle: ACM,2005:930-932
    [91] Zilio D C, Rao J, Lightstone S, et al. DB2design advisor: integratedautomatic physical database design[C]. Proceedings of the Thirtiethinternational conference on Very large data bases-Volume30. VLDB Endowment,2004:1087-1097
    [92] Chaudhuri S, Narasayya V. Self-tuning database systems: a decade ofprogress[C]. Proceedings of the33rd International Conference on Very LargeData Bases.University of Vienna: VLDB Endowment,2007:4-14
    [93] Kwon Y C, Balazinska M, Howe B, et al. Skew-resistant parallel processingof feature-extracting scientific user-defined functions[C]. Proceedingsof the1st ACM symposium on Cloud computing. Indianapolis:SoCC,2010:75-86
    [94] Zhou J, Larson P A, Chaiken R. Incorporating partitioning and parallelplans into the SCOPE optimizer[C].2010IEEE26th International Conferenceon Data Engineering(ICDE2010). Long Beach:IEEE,2010:1060-1071
    [95] Urhan T, Franklin M J, Amsaleg L. Cost-based query scrambling for initialdelays[C]. Proceedings of the1998ACM SIGMOD International Conference onMangement of Data. Seattle: ACM,1998,27(2):130-141
    [96] Pavlo A, Paulson E, Rasin A, et al. A comparison of approaches tolarge-scale data analysis[C]. Proceedings of the2009ACM SIGMODInternational Conference on Management of data.Rhode Island: ACM,2009:165-178
    [97] Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB: anarchitectural hybrid of MapReduce and DBMS technologies for analyticalworkloads[J]. Proceedings of the VLDB Endowment,2009,2(1):922-933
    [98] Bu Y, Howe B, Balazinska M, et al. HaLoop: Efficient iterative dataprocessing on large clusters[J]. Proceedings of the VLDB Endowment,2010,3(1-2):285-296
    [99] Dittrich J, Quiané-Ruiz J A, Jindal A, et al. Hadoop++: Making a yellowelephant run like a cheetah (without it even noticing)[J]. Proceedings ofthe VLDB Endowment,2010,3(1-2):515-529
    [100] Jiang D, Ooi B C, Shi L, et al. The performance of mapreduce: An in-depthstudy[J]. Proceedings of the VLDB Endowment,2010,3(1-2):472-483
    [101] Olston C, Reed B, Silberstein A, et al. Automatic Optimization of ParallelDataflow Programs[C].2008USENIX Annual TechnicalConference(USENIX’08).Boston: USENIX,2008:267-273
    [102] Blanas S, Patel J M, Ercegovac V, et al. A comparison of join algorithmsfor log processing in mapreduce[C]. Proceedings of the2010ACM SIGMODInternational Conference on Management of data. Indianapolis: ACM,2010:975-986
    [103] Cafarella M J, Ré C. Manimal: relational optimization for data-intensiveprograms[C]. Procceedings of the13th International Workshop on the Weband Databases. Indianapolis: ACM,2010:10
    [104] Nykiel T, Potamias M, Mishra C, et al. MRShare: sharing across multiplequeries in MapReduce[J]. Proceedings of the VLDB Endowment,2010,3(1-2):494-505
    [105] Iu M Y, Zwaenepoel W. HadoopToSQL: a mapreduce query optimizer[C].Proceedings of the5th European conference on Computer systems.Paris: ACM,2010:251-264
    [106] Afrati F N, Ullman J D. Optimizing joins in a map-reduce environment[C].Proceedings of the13th International Conference on Extending DatabaseTechnology(EDBT).Genoa: ACM,2010:99-110
    [107] Hadoop Vaidya. Hadoop Vaidya[EB/OL].http://hadoop.apache.org/mapreduce/docs/r0.21.0/vaidya.html
    [108] Hadoop Perf UI.Hadoop Performance Monitoring UI[EB/OL].http://code.google.com/p/hadoop-toolkit/wiki/HadoopPerformanceMonitoring
    [109] Hadoop Tutorial. Hadoop MapReduce Tutorial[EB/OL].http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html
    [110] Lipcon, T.Cloudera:7tips for Improving MapReduce Performance[EB/OL].http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
    [111] Wu S, Li F, Mehrotra S, et al. Query optimization for massively paralleldata processing[C]. Proceedings of the2nd ACM Symposium on CloudComputing(SOCC’11).Cascais: ACM,2011:12
    [112] Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data layouts: Right shoesfor a running elephant[C]. Proceedings of the2nd ACM Symposium on CloudComputing(SOCC’11). Cascais: ACM,2011:21
    [113] Yu Y, Gunda P K, Isard M. Distributed aggregation for data-parallelcomputing: interfaces and implementations[C]. Proceedings of the ACM SIGOPS22nd symposium on Operating systems principles. Big Sky: ACM,2009:247-260
    [114]郑湃,崔立真,王海洋,等.云计算环境下面向数据密集型应用的数据布局策略与方法[J].计算机学报,2010,33(8):1472-1480.
    [115] Kwon Y C, Balazinska M, Howe B, et al. Skew-resistant parallel processingof feature-extracting scientific user-defined functions[C]. Proceedingsof the1st ACM symposium on Cloud computing. Indianapolis: ACM,2010:75-86
    [116] Chaudhuri S, Ganesan P, Narasayya V. Primitives for workloadsummarization and implications for SQL[C]. Proceedings of the29thinternational conference on Very large data bases-Volume29. VLDB Endowment,2003:730-741
    [117] Romeijn H E, Smith R L. Simulated annealing for constrained globaloptimization[J]. Journal of Global Optimization,1994,5(2):101-126
    [118] Goldberg D E, Korb B, Deb K. Messy genetic algorithms: Motivation,analysis, and first results[J]. Complex systems,1989,3(5):493-530
    [119] Ye T, Kalyanaraman S. A recursive random search algorithm for large-scalenetwork parameter configuration[J]. ACM SIGMETRICS Performance EvaluationReview,2003,31(1):196-205
    [120]步立新,罗文钰,冯允成.随机递归算法求解车辆路径问题[J].系统工程理论与实践,2008,28(11):121-127
    [121] Gantz, J. and Reinsel, D. Extracting Value from Chaos.[EB/OL]http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
    [122] Hadoop MapReduce[EB/OL]. http://hadoop.apache.org/mapreduce/
    [123] Zaharia M, Konwinski A, Joseph A D, et al. Improving MapReduce Performancein Heterogeneous Environments[C].8thUSENIX Symposium on Operating SystemsDesign and Implementation. San Diego: OSDI.2008,8(4):7
    [124] Ananthanarayanan G, Kandula S, Greenberg A G, et al. Reining in theOutliers in Map-Reduce Clusters using Mantri[C].9thUSENIX Symposium onOperating Systems Design and Implementation. Vancouver: OSDI,2010,10(1):24
    [125] Polo J, Carrera D, Becerra Y, et al. Performance-driven taskco-scheduling for mapreduce environments[C]. Network Operations andManagement Symposium (NOMS). Osaka: IEEE,2010:374-380
    [126] Wolf J, Rajan D, Hildrum K, et al. Flex: A slot allocation schedulingoptimizer for mapreduce workloads[M]. Middleware2010. Springer BerlinHeidelberg,2010:1-20
    [127] Verma A, Cherkasova L, Campbell R H. ARIA: automatic resource inferenceand allocation for mapreduce environments[C]. Proceedings of the8th ACMinternational conference on Autonomic computing(ICAC’11). Karlsruhe: ACM,2011:235-244
    [128] J. Dhok and V. Varma, Using pattern classication for taskassignment[EB/OL]. http://researchweb.iiit.ac.in/jaideep/jd-thesis.pdf
    [129] Thusoo A, Shao Z, Anthony S, et al. Data warehousing and analyticsinfrastructure at facebook[C]. Proceedings of the2010ACM SIGMODInternational Conference on Management of data. Indianapolis: ACM,2010:1013-1020
    [130]刘正伟,文中领,张海涛.云计算和云数据管理技术[J][J].计算机研究与发展,2012,49(z1):38-45.
    [131] Herodotou H, Babu S. Profiling, what-if analysis, and cost-basedoptimization of MapReduce programs[J]. Proc. of the VLDB Endowment,2011,4(11):1111-1122
    [132] Pacifici G, Segmuller W, Spreitzer M, et al. Dynamic estimation of cpudemand of web traffic[C]. Proceedings of the1st international conferenceon Performance evaluation methodolgies and tools. New York: ACM,2006:26
    [133] Tesauro G, Jong N K, Das R, et al. A hybrid reinforcement learning approachto autonomic resource allocation[C]. Proceedings of the3rdIEEEInternational Conference on Autonomic Computing(ICAC'06). IEEE,2006:65-73
    [134] Arun Murthy. Next Generation Hadoop[EB/OL]. http://developer.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/
    [135] Tang C, Steinder M, Spreitzer M, et al. A scalable application placementcontroller for enterprise data centers[C]. Proceedings of the16thInternational conference on World Wide Web. Banff: ACM,2007:331-340
    [136]任萱萱.基于hadoop平台的作业调度研究[D].天津师范大学硕士生学位论文,2011
    [137] Sood, A. How to dynamically assign reducers to a Hadoop Job atruntime[EB/OL].http://www.hadoop-blog.com/2010/12/how-to-dynamically-assign-reducers-to.html
    [138] Zhang Y, Zhou Y.4VP: a novel meta OS approach for streaming programsin ubiquitous computing[C].21st International Conference on AdvancedInformation Networking and Applications(AINA'07).Niagara Falls: IEEE,2007:394-403
    [139] White R G. Simulated annealing algorithm for SAR and MTI image crosssection estimation[C]. Satellite Remote Sensing. International Society forOptics and Photonics,1994:137-145
    [140] L'ecuyer P. Good parameters and implementations for combined multiplerecursive random number generators[J]. Operations Research,1999,47(1):159-164
    [141] Gates, A. Comparing Pig Latin and SQL for Constructing Data ProcessingPipelines[EB/OL]. http://developer.yahoo.com/blogs/hadoop/posts/2010/01/comparing_pig_latin_and_sql_fo/
    [142] Quinlan J R. Learning with continuous classes[C]. Proceedings of the5th Australian joint Conference on Artificial Intelligence. Hobart:1992,92:344-348
    [143] Cohen J, Dolan B, Dunlap M, et al. MAD skills: new analysis practicesfor big data[J]. Proceedings of the VLDB Endowment,2009,2(2):1481-1492
    [144] Isard M, Yu Y. Distributed data-parallel computing using a high-levelprogramming language[C]. Proceedings of the2009ACM SIGMOD InternationalConference on Management of data.Rhode Island: ACM,2009:987-994
    [145]王珊；王会举；覃雄派，周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011，10(34):1741-1752
    [146] Polo J, Castillo C, Carrera D, et al. Resource-aware adaptive schedulingfor mapreduce clusters[M]. Middleware2011. Springer Berlin Heidelberg,2011:187-207
    [147]陈榕.多核环境下面向数据并行编程模型的性能和可伸缩性研究[D].复旦大学博士生学位论文,2011
    [148]史恒亮.云计算任务调度研究[D].南京理工大学博士生学位论文,2012
    [149] Herodotou H. Automatic Tuning of Data-Intensive Analytical Workloads[D].Duke University,2012
    [150] Gunda P K, Ravindranath L, Thekkath C A, et al. Nectar: AutomaticManagement of Data and Computation in Datacenters[C].9th USENIX Symposiumon Operating Systems Design and Implementation(OSDI’10). Vancouver：OSDI2010:75-88
    [151] Dewitt D, Stonebraker M. Mapreduce:A major step backwards[EB/OL].http://databasecolumn.vertica.com/2008/01/mapreduce-a-major-step-back.html,2008
    [152] Boss G, Malladi P, Quan D,Legregni L. IBM White Paper:Cloudcomputing[EB/OL]. http://download.bouder.ibin.com/ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_80ct.pdf
    [153] Lee R, Luo T, Huai Y, et al. Ysmart: Yet another sql-to-mapreducetranslator[C]. The31st International Conference on Distributed ComputingSystems (ICDCS).Minneapolis: IEEE,2011:25-36

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700