一种分段集群异常作业预测方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A prediction method of staged cluster anomaly job
  • 作者:谢丽霞 ; 汪子荧
  • 英文作者:XIE Lixia;WANG Ziying;School of Computer Science and Technology, Civil Aviation University of China;
  • 关键词:集群异常作业 ; 分段预测 ; 实时预测 ; 动态特征 ; 门控递归单元
  • 英文关键词:cluster anomaly job;;staged prediction;;real-time prediction;;dynamic features;;gated recurrent unit
  • 中文刊名:DLLG
  • 英文刊名:Journal of Dalian University of Technology
  • 机构:中国民航大学计算机科学与技术学院;
  • 出版日期:2019-07-15
  • 出版单位:大连理工大学学报
  • 年:2019
  • 期:v.59
  • 基金:国家自然科学基金民航联合研究基金资助项目(U1833107);; 国家科技重大专项资助项目(2012ZX03002002);; 中央高校基本科研业务费专项资金资助项目(ZYGX2018028)
  • 语种:中文;
  • 页:DLLG201904014
  • 页数:7
  • CN:04
  • ISSN:21-1117/N
  • 分类号:101-107
摘要
针对现有集群异常作业预测方法预测效率低、预测时间长的问题,提出一种分段集群异常作业预测(SCAJP)方法.该方法分为离线预测和在线预测两个阶段:离线预测阶段,依据作业子任务的静态特征对子任务终止状态进行预测,并仅在线预测此阶段的正常子任务所属作业;在线预测阶段,在计算作业子任务动态特征的同时,采用改进门控递归单元(IGRU)神经网络根据动态特征实时预测任务终止状态是否异常.两个阶段的最后均根据作业与其子任务的相关性检索异常作业,实现对异常作业的预测.实验结果表明,该方法在灵敏度、精确度和预测时间方面明显优于其他方法.
        Aiming at the problems of low prediction efficiency and long prediction time of the existing cluster anomaly job prediction methods, a staged cluster anomaly job prediction(SCAJP) method is proposed. This method is divided into offline stage and online stage. The final state of the job′s sub-tasks is predicted according to their static features in the offline stage, then the prediction is only done for the job to which the normal sub-task belongs. In online stage, while calculating the dynamic features of the job′s sub-tasks, the improved gated recurrent unit(IGRU) neural network is used to predict whether the task termination status is anomaly according to the dynamic features in real time. At the end of the both stages, the anomaly job is obtained based on the relevance between the job and its sub-tasks to finish the prediction of the anomaly job. The experimental results show that this method outperforms other methods in terms of sensitivity, accuracy and prediction time obviously.
引文
[1] Google.Google Cluster Data [EB/OL].[2010-01-10].http://googleresearch.blogspot.com/2010/01/google-cluster-data.html.
    [2] SOUALHIA M,KHOMH F,TAHAR S.Predicting scheduling failures in the cloud:A case study with google clusters and hadoop on Amazon EMR [C] // Proceedings — 2015 IEEE 17th International Conference on High Performance Computing and Communications,2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems,HPCC-CSS-ICESS 2015.Piscataway:IEEE,2015:58-65.
    [3] GRZONKA D,JAKóBIK A,KO?ODZIEJ J,et al.Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security [J].Future Generation Computer Systems,2018,86:1106-1117.
    [4] 王意洁,孙伟东,周松,等.云计算环境下的分布存储关键技术 [J].软件学报,2012,23(4):962-986.WANG Yijie,SUN Weidong,ZHOU Song,et al.Key technologies of distributed storage for cloud computing [J].Journal of Software,2012,23(4):962-986.(in Chinese)
    [5] CHEN Xin,LU C D,PATTABIRAMAN K.Failure analysis of jobs in compute clouds:A google cluster case study [J].Proceedings - International Symposium on Software Reliability Engineering,ISSRE,2014:167-177.
    [6] JAKóBIK A,GRZONKA D ,PALMIERI F.Non-deterministic security driven meta scheduler for distributed cloud organizations [J].Simulation Modelling Practice and Theory,2017,76:67-81.
    [7] 刘春红,韩晶晶,商彦磊.基于SVM分类的云集群失败作业主动预测方法 [J].北京邮电大学学报,2016,39(5):104-109.LIU Chunhong,HAN Jingjing,SHANG Yanlei.Predicting job failure in cloud cluster:Based on SVM classification [J].Journal of Beijing University of Posts and Telecommunications,2016,39(5):104-109.(in Chinese)
    [8] ISLAM T,MANIVANNAN D.Predicting application failure in cloud:A machine learning approach [C] // Proceedings — 2017 IEEE 1st International Conference on Cognitive Computing,ICCC 2017.Piscataway:IEEE,2017:24-31.
    [9] 唐红艳,李影,贾统,等.基于时间序列分析的杀手级任务在线识别方法 [J].计算机科学,2017,44(4):43-46.TANG Hongyan,LI Ying,JIA Tong,et al.Time series based killer task online recognition approach [J].Computer Science,2017,44(4):43-46.(in Chinese)
    [10] LIU Chunhong,HAN Jingjing,SHANG Yanlei,et al.Predicting of job failure in compute cloud based on online extreme learning machine:A comparative study [J].IEEE Access,2017,5:9359-9368.
    [11] GARRAGHAN P,TOWNEND P,XU Jie.An empirical failure-analysis of a large-scale cloud computing environment [C] // Proceedings — 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering,HASE 2014.Miami:IEEE Computer Society,2014:113-120.
    [12] ROSA A,CHEN L Y,BINDER W.Predicting and mitigating jobs failures in big data clusters [C] // Proceedings — 2015 IEEE/ACM 15th International Symposium on Cluster,Cloud,and Grid Computing,CCGrid 2015.Piscataway:IEEE,2015:221-230.
    [13] YANBYAK K,PHUNCHONGHARN P,ACHALAKUL T.Failure detection through monitoring of the scientific distributed system [C] // Proceedings of the 2017 IEEE International Conference on Applied System Innovation:Applied System Innovation for Modern Technology,ICASI 2017.Piscataway:IEEE,2017:568-571.
    [14] CHO K,VAN MERRI?NBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation [C] // EMNLP 2014 — 2014 Conference on Empirical Methods in Natural Language Processing,Proceedings of the Conference.Doha:Association for Computational Linguistics (ACL),2014:1724-1734.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700