内存计算框架局部数据优先拉取策略

英文篇名：Partial Data Shuffled First Strategy for In-Memory Computing Framework
作者：卞琛 ; 于炯 ; 修位蓉 ; 钱育蓉 ; 英昌甜 ; 廖彬
英文作者：Bian Chen;Yu Jiong;Xiu Weirong;Qian Yurong;Ying Changtian;Liao Bin;College of Information Science and Engineering,Xinjiang University;College of Statistics and Information,Xinjiang University of Finance and Economics;
关键词：内存计算 ; 任务分配 ; 作业调度 ; 分配效能熵 ; 节点贡献度 ; 异构环境
英文关键词：in-memory computing;;task allocation;;job scheduling;;allocation efficiency entropy(AEE);;worker contribution degree(WCD);;heterogeneous environment
中文刊名：JFYZ
英文刊名：Journal of Computer Research and Development
机构：新疆大学信息科学与工程学院;新疆财经大学统计与信息学院;
出版日期：2017-04-15
出版单位：计算机研究与发展
年：2017
期：v.54
基金：国家自然科学基金项目(61262088,61462079,61363083,61562086);; 新疆维吾尔自治区高校科研计划(XJEDU2016S106)~~
语种：中文;
页：JFYZ201704011
页数：17
CN：04
ISSN：11-1777/TP
分类号：110-126

摘要

内存计算框架的低延迟特性大幅提高了集群的计算效率,但Shuffle过程的性能瓶颈仍不可规避.宽依赖的同步操作导致大多数工作节点等待慢节点的计算结果,同步过程不仅浪费计算资源,更增加了作业延时,这一现象在异构集群环境下尤为突出.针对内存计算框架Shuffle操作的同步问题,建立了资源需求模型、执行效率模型和任务分配及调度模型.给出了分配效能熵(allocation efficiency entropy,AEE)和节点贡献度(worker contribution degree,WCD)的定义,提出了算法的优化目标.根据模型的相关定义求解,设计了局部数据优先拉取算法(partial data shuffled first algorithm,PDSF),通过高效节点优先调度,提高流水线与宽依赖任务的时间重合度,减少宽依赖Shuffle过程的同步延时,优化集群资源利用率;通过适度倾斜的任务分配,在保障慢节点计算连续性的前提下,提高分配任务量与节点计算能力的适应度,优化作业执行效率;通过分析算法的相关优化原则,证明了算法的帕累托最优性.实验表明:PDSF算法提高了内存计算框架的作业执行效率,并使集群资源得到有效利用.
In-memory computing framework has greatly improved the computing efficiency of cluster,but the low performance of Shuffle operation cannot be ignored.There is a compulsory synchronous operation of wide dependence node on in-memory computing framework,and most executors are obliged to delay their computing tasks to wait for the results of slowest worker,and the synchronization process not only wastes computing resources,but also extends the completion time of jobs and reduces the efficiency of implementation,and this phenomenon is even worse in heterogeneous cluster environment.In this paper,we establish the resource requirement model,job execution efficiency model,task allocation and scheduling model,give the definition of allocation efficiency entropy(AEE)and worker contribution degree(WCD).Moreover,the optimization objective of the algorithm is proposed.To solve the problem of optimizing,we design a partial data shuffled first algorithm(PDSF)which includes more innovative approaches,such as efficient executors priority scheduling,minimize executor wait time strategy and moderately inclined task allocation and so on.PDSF breaks through the restriction of parallel computing model,releases the high performance of efficient executors to decrease the duration of synchronous operation,and establish adaptive task scheduling scheme to improve the efficiency of job execution.We further analyze the correlative attributes of our algorithm,prove that PDSF conforms to Pareto optimum.Experimental results demonstrate that our algorithm optimizes the computational efficiency of inmemory computing framework,and PDSF contributes to the improvement of cluster resources utilization.

引文

[1]Meng Xiaofeng,Ci Xiang.Big data management:Concepts,techniques and challenges[J].Journal of Computer Research and Development,2013,50(1):146-169(in Chinese)(孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169)
    [2]Chen Yong.Towards scalable I/O architecture for exascale systems[C]//Proc of the 2011ACM Int Workshop on Many Task Computing on Grids and Supercomputers.New York:ACM,2011:43-48
    [3]Strande S M,Cicotti P,Sinkovits R S,et al.Gordon:Design,performance,and experiences deploying and supporting a data intensive supercomputer[C]//Proc of the1st Conf on the Extreme Science and Engineering Discovery Environment.New York:ACM,2012:No.3
    [4]Bronevetsky G,Moody A.Scalable I/O systems via nodelocal storage:Approaching 1TB/sec file I/O,LLNL-TR-415791[R].Livermore,CA:Lawrence Livermore National Laboratory,2009:1-15
    [5]Zaharia M,Chowdhury M,Das T,et aI.Fast and interactive analytics over Hadoop data with Spark[J].Login,2012,37(4):45-51
    [6]Apache Spark.Spark overview[EB/OL].2011[2015-03-18].http://spark.apache.org
    [7]Apache Flink.Flink overview[EB/OL].2014[2015-09-21].http://flink.apache.org
    [8]Apache Impala.Impala overview[EB/OL].2013[2015-09-21].http://www.cloudera.com/content/www/en-us/products/apache-hadoop/impala.html
    [9]SAP HANA.HANA overview[EB/OL].2011[2015-09-21].http://hana.sap.com/abouthana.html
    [10]Dean J,Ghemawat S.MapReduce:Simplifed data processing on large clusters[C]//Proc of the 6th Symp on Operating System Design and Implementation(OSDI).New York:ACM,2004:137-150
    [11]Zaharia M,Chowdhury M,Das T,et aI.Resilient distributed datasets:A fault-tolerant abstraction for inmemory cluster computing[C]//Proc of the 9th USENIXConf on Networked Systems Design and Implementation.Berkeley,CA:USENIX Association,2012:No.2
    [12]Lin Xiuqin,Wang Peng,Wu Bin.Log analysis in cloud computing environment with Hadoop and spark[C]//Proc of the 5th IEEE Int Conf on Broadband Network&Multimedia Technology(IC-BNMT).Piscataway,NJ:IEEE,2013:273-276
    [13]Dong Xiangyu,Xie Yuan,Muralimanohar N,et al.Hybrid checkpointing using emerging nonvolatile memories for future exascale system[J].ACM Trans on Architecture and Code Optimization,2011,8(2):No.6
    [14]Wan Hu,Xu Yuanchao,Yan Junfeng,et al.Mitigating log cost through non-volatile memory and checkpoint optimization[J].Journal of Computer Research and Development,2015,52(6):1351-1361(in Chinese)(万虎,徐远超,闫俊峰,等.通过非易失存储和检查点优化缓解日志开销[J].计算机研究与发展,2015,52(6):1351-1361)
    [15]Armbrust M,Xin R S,Lian C,et al.Spark SQL:Relational data processing in Spark[C]//Proc of the 2015 ACMSIGMOD Int Conf on Management of Data.New York:ACM,2015:1383-1394
    [16]Apache Storm.Storm overview[EB/OL].2012[2015-09-21].http://storm.apache.org
    [17]Zaharia M,Das T,Li Haoyuan,et al.Discretized streams:Fault-tolerant streaming computation at scale[C]//Proc of the 24th ACM Symp on Operating Systems Principles.New York:ACM,2013:423-438
    [18]Apache Spark.Spark machine learning library(MLlib)[EB/OL].2012[2015-03-18].http://spark.incubator.apache.org/docs/latest/mllib-guide.html
    [19]Gonzalez J E,Xin R S,Dave A,et al.GraphX:Graph processing in a distributed dataflow framework[C]//Proc of the 11th USENIX Conf on Operating Systems Design and Implementation.Berkeley,CA:USENIX Association,2014:599-613
    [20]Liao Bin,Yu Jiong,Sun Hua,et al.Energy-efficient algorithms for distributed storage system based on data storage structure reconfiguration[J].Journal of Computer Research and Development,2013,50(1):3-18(in Chinese)(廖彬,于炯,孙华,等.基于存储结构重配置的分布式存储系统节能算法[J].计算机研究与发展,2013,50(1):3-18)
    [21]Fitzpatrick B.Memcached-a distributed memory object caching system[EB/OL].2014[2015-09-21].http://memcached.org
    [22]Zawodny J.Redis:Lightweight key/value store that goes the extra mile[EB/OL].2009[2015-09-21].http://redis.io
    [23]Diaconu C,Freedman C,Ismert E,et al.Hekaton:SQLserver's memory-optimized OLTP engine[C]//Proc of 2013ACM SIGMOD Int Conf on Management of Data.New York:ACM,2013:1243-1254
    [24]Garret.FastDB:Main memory relational database management system[EB/OL].2009[2015-09-21].http://www.garret.ru/fastdb.html
    [25]Neumeyer L,Robbins B,Nair A,et al.S4:Distributed stream computing platform[C]//Proc of the 10th IEEE Int Conf on Data Mining Workshops(ICDMW).Piscataway,NJ:IEEE,2010:170-177
    [26]Qian Zhengping,He Yong,Su Chunzhi,et al.TimeStream:Reliable stream computation in the cloud[C]//Proc of 2013ACM European Conf on Computer Systems.New York:ACM,2013:1-14
    [27]Chambers C,Raniwala A,Perry F,et al.FlumeJava:Easy,efficient data-parallel pipelines[C]//Proc of the 31st ACMSIGPLAN Conf on Programming Language Design and Implementation.New York:ACM,2010:363-375
    [28]Chowdhury M,Zaharia M,Ma J,et al.Managing data transfers in computer clusters with orchestra[C]//Proc of the 2011 ACM SIGCOMM Int Conf.New York:ACM,2011:98-109
    [29]Feng X,Kumar A,Recht B,et al.Towards a unified architecture for in-rdbms analytics[C]//Proc of the 2012ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:325-336
    [30]Gonzalez J E,Low Y,Gu H,et al.PowerGraph:Distributed graph-parallel computation on natural graphs[C]//Proc of the 10th USENIX Conf on Operating Systems Design and Implementation.Berkeley,CA:USENIXAssociation,2012:17-30
    [31]Gunda P K,Ravindranath L,Thekkath C A,et al.Nectar:Automatic management of data and computation in datacenters[C]//Proc of the 9th USENIX Conf on Operating Systems Design and Implementation.Berkeley,CA:USENIX Association,2010:75-88
    [32]Hindman B,Konwinski A,Zaharia M,et al.Mesos:Aplatform for fine-grained resource sharing in the data center[C]//Proc of the 8th USENIX Conf on Networked Systems Design and Implementation.Berkeley,CA:USENIXAssociation,2011:429-483
    [33]Li Haoyuan,Ghodsi A,Zaharia M,et al.Tachyon:Memory throughput I/O for cluster computing frameworks[C/OL].2013[2015-09-21].https://people.eecs.berkeley.edu/~alig/papers/tachyon-workshop.pdf
    [34]Li Haoyuan,Ghodsi A,Zaharia M,et al.Tachyon:Reliable,memory speed storage for cluster computing frameworks[C]//Proc of the 2014 ACM Symp on Cloud Computing.New York:ACM,2014:1-15
    [35]Murray D G,Schwarzkopf M,Smowton C,et al.CIEL:Auniversal execution engine for distributed data-flow computing[C]//Proc of the 8th USENIX Conf on Networked Systems Design and Implementation.Berkeley,CA:USENIX Association,2011:113-126
    [36]Shute J,Vingralek R,Samwel B,et al.F1:A distributed SQL database that scales[C/OL].2013[2015-09-21].http://db.cs.berkeley.edu/cs286/papers/f1-vldb2013.pdf
    [37]McSherry F,Murray D G,Isaacs R,et al.Differential dataflow[C/OL].2013[2015-09-21].http://www.cidrdb.org
    [38]Murray D G,McSherry F,Isaacs R,et al.Naiad:A timely dataflow system[C]//Proc of the 24th ACM Symp on Operating Systems Principles.New York:ACM,2013:439-455
    [39]Zeng K,Agarwal S,Dave A,et al.G-OLA:Generalized online aggregation for interactive analysis on big data[C]//Proc of the 2015ACM SIGMOD Int Conf on Management of Data.New York:ACM,2015:913-918
    [40]Corrigan-Gibbs H,Boneh D,Mazières D.Riposte:An anonymous messaging system handling millions of users[C]//Proc of the 36th IEEE Symp on Security and Privacy.Piscataway,NJ:IEEE,2015:321-338
    [41]Ousterhout K,Wendell P,Zaharia M,et al.Sparrow:Distributed,low latency scheduling[C]//Proc of the 24th ACM Symp on Operating Systems Principles.New York:ACM,2013:69-84
    [42]Ananthanarayanan G,Ghodsi A,Shenker S,et al.Disklocality in datacenter computing considered irrelevant[C]//Proc of the 13th USENIX Conf on Hot Topics in Operating Systems.Berkeley,CA:USENIX Association,2011:No.12
    [43]Ananthanarayanan G,Ghodsi A,Wang A,et al.Pacman:Coordinated memory caching for parallel jobs[C]//Proc of the 9th USENIX Conf on Networked Systems Design and Implementation.Berkeley,CA:USENNIX Association,2012:No.20
    [44]Babu S.Towards automatic optimization of MapReduce programs[C]//Proc of the 1st ACM Symp on Cloud Computing.New York:ACM,2010:137-142
    [45]Cipar J,Ho Q,Kim J K,et al.Solving the straggler problem with bounded staleness[C]//Proc of the 14th USENIX Conf on Hot Topics in Operating Systems.Berkeley,CA:USENIX Association,2013:No.22
    [46]Zaharia M,Konwinski A,Joseph A D,et al.Improving MapReduce performance in heterogeneous environments[C]//Proc of the 8th USENIX Conf on Operating Systems Design and Implementation.Berkeley,CA:USENIXAssociation,2008:29-42
    [47]Thomson A,Diamond T,Weng S C,et al.Calvin:Fast distributed transactions for partitioned database systems[C]//Proc of the 2012 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:1-12
    [48]Zou Tao,Wang Guozhang,Salles M V,et al.Making timestepped applications tick in the cloud[C]//Proc of the 2nd ACM Symp on Cloud Computing.New York:ACM,2011:No.20
    [49]Sarma A D,Afrati F N,Salihoglu S,et al.Upper and lower bounds on the cost of a map-reduce computation[C/OL].2013[2015-09-21].http://db.disi.unitn.eu/pages/VLDBProgram/pdf/research/p275-dassarma.pdf
    [50]Kwon Y,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]//Proc of the 1st ACM Symp on Cloud Computing.New York:ACM,2010:75-86
    [51]Jure L.Stanford network analysis project[EB/OL].2009[2015-03-18].http://snap.stanford.edu

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700