基于Storm的多租户槽感知调度策略

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于Storm的多租户槽感知调度策略

详细信息查看全文 | 推荐本文 |

英文篇名：Multi-tenant Slot-aware Scheduling Strategy Based on Storm
作者：师康利 ; 于炯 ; 鲁亮
英文作者：SHI Kangli;YU Jiong;LU Liang;School of Software,Xinjiang University;School of Information Science and Engineering,Xinjiang University;
关键词：大数据 ; 流式计算 ; Storm调度 ; 多租户任务调度
英文关键词：big data;;streaming computing;;storm scheduler;;multi-tenant task
中文刊名：XJDZ
英文刊名：Journal of Xinjiang University(Natural Science Edition)
机构：新疆大学软件学院;新疆大学信息科学与工程学院;
出版日期：2019-02-15
出版单位：新疆大学学报(自然科学版)
年：2019
期：v.36;No.153
基金：国家自然科学基金项目(61462079,61562086);; 国家科技部科技支撑项目(2015BAH02F01);; 新疆维吾尔自治区自然科学基金项目(2017D01A20);; 新疆维吾尔自治区高校科研计划项目(XJEDU2016S106)
语种：中文;
页：XJDZ201901010
页数：9
CN：01
ISSN：65-1094/N
分类号：60-68

摘要

Storm默认任务调度采用轮询算法将任务平均分配到每一个工作节点,但是在多个拓扑提交的情况下Storm默认调度将任务随机分配到工作节点的槽,造成槽分配不均衡的问题并导致工作节点的负载不均衡.针对这一问题,本文提出了多租户槽感知调度策略:首先,根据节点的优先级权重划分,将工作节点按照队列的形式排序,并由队列的FIFO的特点进行优先级分配任务;其次,按照每个工作节点占用的槽越小优先级越高的特点分配任务;然后,每个工作节点被占用的槽不能超过工作节点的槽被占用的最大阈值;最后,实时更新每个工作节点的槽的占用信息进行任务调度,降低工作节点的CPU负载,提高吞吐量,降低延迟.实验证明,在集群4个工作节点的环境下基于benchmark基准测试运行4个作业拓扑的结果表明,本文提出的多租户槽感知调度策略与默认调度相比,分别在数据流的吞吐量提高24.2%、延迟降低29%、CPU负载相对降低了15.1%.
The Storm default scheduling uses round-robin algorithm to assign tasks to each worker node,however, the default scheduling of Storm assigns the task to the slot of the worker node in the case of multiple topologies which are submitted, resulting in unbalance of slot allocation, resulting in unbalance of the workload of the worker node. To solve this problem, a multi-tenant slot-aware scheduling strategy is proposed. First, according to the priority weight division of the node, the work node is sorted according to the form of the queue, and the priority assignment task is carried out according to the FIFO characteristics of the queue. Secondly, according to the number of slot is occupied in each worker node is much lower, the higher that the task is to allocate and the higher priority of each worker node; then, slot that each work node is occupied cannot exceed the maximum threshold of the slot of the worker node; finally, it updates the occupancy information of the slot of each worker node in real time for task scheduling, reduces the CPU load of the work node, improves the throughput and reduces the communication delay. The experiment shows that the proposed multi-tenant Slot-aware scheduling strategy improves the throughput of the data flow by24.2%, the delay reduction by 29% and the CPU load lower by 15.1% compared with the default scheduling for the 4 job topology based on benchmark test in the environment of 4 working nodes in cluster.

引文

[1] Ediger D, Riedy J, Bader D A,et al. Tracking Structure of Streaming Social Networks[C]//IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. New York:IEEE Computer Society, 2011:1691-1699.
    [2] Borthakur D, Gray J, Sarma J S, et al. Apache Hadoop goes Realtime at Facebook[C]//ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June. New York:DBLP, 2011:1071-1080.
    [3] Busch M, Gade K, Larson B, et al. Earlybird:Real-Time Search at Twitter[C]//IEEE, International Conference on Data Engineering. New York:IEEE Computer Society, 2012:1360-1369.
    [4] Lin J, Lin J, Lin J,et al. Fast Data in the Era of Big Bata:Twitter’s Real-time Related Query Suggestion Architecture[C]//ACM SIGMOD International Conference on Management of Data. New York:ACM, 2013:1147-1158.
    [5] Goodhope K, Koshy J, Kreps J,et al. Building LinkedIn’s Real-time Activity Data Pipeline[J]//IEEE Data Eng, Bull,2012,35(2):33-45.
    [6] Pitman A, Zanker M. Insights from Applying Sequential Pattern Mining to E-commerce Click Stream Data[C]//IEEE International Conference on Data Mining Workshops. New York:IEEE Computer Society, 2010:967-975.
    [7] Chen H, Chiang R H L, Storey V C. Business Intelligence and Analytics:from Big Data to Big Impact[J]. Mis Quarterly,2012, 36(4):1165-1188.
    [8]岳建明,袁伦渠.智能交通发展中的大数据分析[J].生产力研究,2013(6):137-138.
    [9] Marz N, Warren J. Big Data:Principles and Best Practices of Scalable Realtime Data Systems[M]. New York:Manning Publication, 2015.
    [10] Apache Hadoop. http:hadoop.apache.org/.
    [11] Apache Storm. http:storm.apache.org/.
    [12] Leonardo Neumeyer, Bruce Robbins, Anish Nair, et al:distributed stream computing platform[C]//in 2010 IEEE Data Mining Workshops(ICDMW), Sydney, Australia, 2010:170-177.
    [13] Sqlstream, 2012..
    [14] http://flink.apache.org/.
    [15] http://spark.apache.org/streaming/.
    [16] Aniello L, Baldoni R, Querzoni L. Adaptive Online Scheduling in Storm[C]//ACM International Conference on Distributed Event-Based Systems. New York:ACM, 2013:207-218.
    [17] Eskandari L, Huang Z, Eyers D. P-Scheduler. adaptive hierarchical scheduling in apache storm[C]//Australasian Computer Science Week Multiconference. New York:ACM, 2016:26.
    [18] Peng B, Hosseini M, Hong Z, et al. R-Storm:Resource-Aware Scheduling in Storm[C]//MIDDLEWARE Conference. New York:ACM, 2015:149-161.
    [19] Xu J, Chen Z, Tang J, et al. T-Storm:Traffic-Aware Online Scheduling in Storm[C]//IEEE, International Conference on Distributed Computing Systems. Piscataway,NJ:IEEE, 2014:535-544.
    [20] Qian W, Shen Q, Qin J, et al. S-Storm:A Slot-Aware Scheduling Strategy for Even Scheduler in Storm[C]//IEEE,International Conference on High PERFORMANCE Computing and Communications; IEEE, International Conference on Smart City; IEEE, International Conference on Data Science and Systems. Piscataway,NJ:IEEE, 2017:623-630.
    [21]鲁亮,于炯,卞琛,等.Storm环境下基于权重的任务调度算法[J].计算机应用, 2018, 38(3):699-706.
    [22] Apache zookeeper, http:zookeeper.apache.org/.
    [23]刘月超,于炯,鲁亮.Storm环境下一种改进的任务调度策略[J].新疆大学学报(自然科学版).2017,34(1):90-95.
    [24]英昌甜,王维庆,于炯,等.内存云计算环境下基于索引结构的内存优化策略[J].新疆大学学报(自然科学版),2018,35(1):13-21.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700