用户名: 密码: 验证码:
基于二次学习的半监督字典学习软件缺陷预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Twice Learning Based Semi-supervised Dictionary Learning for Software Defect Prediction
  • 作者:张志武 ; 荆晓远 ; 吴飞
  • 英文作者:ZHANG Zhiwu;JING Xiaoyuan;WU Fei;School of Computer,Nanjing University of Posts and Telecommunications;State Key Laboratory of Software Engineering,Wuhan University;School of Automation,Nanjing University of Posts and Telecommunications;
  • 关键词:软件缺陷预测 ; 二次学习 ; 半监督学习 ; 字典学习
  • 英文关键词:Software Defect Prediction;;Twice Learning;;Semi-supervised Learning;;Dictionary Learning
  • 中文刊名:MSSB
  • 英文刊名:Pattern Recognition and Artificial Intelligence
  • 机构:南京邮电大学计算机学院;武汉大学软件工程国家重点实验室;南京邮电大学自动化学院;
  • 出版日期:2017-03-15
  • 出版单位:模式识别与人工智能
  • 年:2017
  • 期:v.30;No.165
  • 基金:国家自然科学基金项目(No.61272273,61073113);; 江苏省普通高校研究生科研创新计划项目(No.CXZZ12_0478)资助~~
  • 语种:中文;
  • 页:MSSB201703006
  • 页数:9
  • CN:03
  • ISSN:34-1089/TP
  • 分类号:52-60
摘要
当软件历史仓库中有标记训练样本较少时,有效的预测模型难以构建.针对此问题,文中提出基于二次学习的半监督字典学习软件缺陷预测方法.在第一阶段的学习中,利用稀疏表示分类器将大量无标记样本通过概率软标记标注扩充至有标记训练样本集中.再在扩充后的训练集上进行第二阶段的鉴别字典学习,最后在学得的字典上预测缺陷倾向性.在NASA MDP和PROMISE AR数据集上的实验验证文中方法的优越性.
        When the previous defect labels of modules in software history warehouse are limited,building an effective prediction model becomes a challenging problem. Aiming at this problem,a twice learning based semi-supervised learning algorithm for software defect prediction is proposed. In the first stage of learning,a large number of unlabeled samples are labeled with probability soft labels and extended to the labeled training dataset by using sparse representation classifier. Then,on this dataset discriminative dictionary learning is used for the second stage of learning. Finally,defect proneness prediction is conducted on the obtained dictionary. Experiments on the widely used NASA MDP and PROMISE AR datasets indicate the superiority of the proposed algorithm.
引文
[1]CATAL C,DIRI B.A Systematic Review of Software Fault Prediction Studies.Expert Systems with Applications,2009,36(4):7346-7354.
    [2]HALL T,BEECHAM S,BOWES D,et al.A Systematic Literature Review on Fault Prediction Performance in Software Engineering.IEEE Transactions on Software Engineering,2012,38(6):1276-1304.
    [3]何亮,宋擒豹,沈钧毅.基于Boosting的集成k-NN软件缺陷预测方法.模式识别与人工智能,2012,25(5):792-802.(HE L,SONG Q B,SHEN J Y.Boosting-Based k-NN Learning for Software Defect Prediction.Pattern Recognition and Artificial Intelligence,2012,25(5):792-802.)
    [4]SELIYA N,KHOSHGOFTAAR T M.Software Quality Estimation with Limited Fault Data:A Semi-supervised Learning Perspective.Software Quality Journal,2007,15(3):327-344.
    [5]SELIYA N,KHOSHGOFTAAR T M.Software Quality Analysis of Unlabeled Program Modules with Semisupervised Clustering.IEEE Transactions on Systems,Man,and Cybernetics(Systems and Humans),2007,37(2):201-211.
    [6]CATAL C,DIRI B.Unlabelled Extra Data Do Not Always Mean Extra Performance for Semi-supervised Fault Prediction.Expert Systems,2009,26(5):458-471.
    [7]JIANG Y,LI M,ZHOU Z H.Software Defect Detection with ROCUS.Journal of Computer Science and Technology,2011,26(2):328-342.
    [8]LI M,ZHANG H Y,WU R X,et al.Sample-Based Software Defect Prediction with Active and Semi-supervised Learning.Automated Software Engineering,2012,19(2):201-230.
    [9]THUNG F,LE X B D,LO D.Active Semi-supervised Defect Categorization//Proc of the 23rd IEEE International Conference on Program Comprehension.Piscataway,USA:IEEE,2015:60-70.
    [10]CATAL C.A Comparison of Semi-supervised Classification Approaches for Software Defect Prediction.Journal of Intelligent Systems,2014,23(1):75-82.
    [11]MA Y,PAN W W,ZHU S Z,et al.An Improved Semi-supervised Learning Method for Software Defect Prediction.Journal of Intelligent&Fuzzy Systems,2014,27(5):2473-2480.
    [12]ABAEI G,SELAMAT A,FUJITA H.An Empirical Study Based on Semi-supervised Hybrid Self-organizing Map for Software Fault Prediction.Knowledge-Based Systems,2015,74:28-39.
    [13]ZHANG Z W,JING X Y,WANG T J.Label Propagation Based Semi-supervised Learning for Software Defect Prediction.Automated Software Engineering,2017,24(1):47-69.
    [14]JING X Y,YING S,ZHANG Z W,et al.Dictionary Learning Based Software Defect Prediction//Proc of the 36th International Conference on Software Engineering.New York,USA:ACM,2014:414-423.
    [15]ZHOU Z H,JIANG Y.Medical Diagnosis with C4.5 Rule Preceded by Artificial Neural Network Ensemble.IEEE Transactions on Information Technology in Biomedicine,2003,7(1):37-42.
    [16]JIANG Y,LI M,ZHOU Z H.Mining Extremely Small Data Sets with Application to Software Reuse.Software:Practice&Experience,2009,39(4):423-440.
    [17]杨子旭,黎铭.二次回归学习及其在软件开发工作量预测上的应用.模式识别与人工智能,2015,28(1):59-64.(YANG Z X,LI M.Twice Regression Learning and Its Application on Software Effort Estimation.Pattern Recognition and Artificial Intelligence,2015,28(1):59-64.)
    [18]WRIGHT J,YANG A Y,GANESH A,et al.Robust Face Recognition via Sparse Representation.IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(2):210-227.
    [19]RAMIREZ I,SPRECHMANN P,SAPIRO G.Classification and Clustering via Dictionary Learning with Structured Incoherence and Shared Features//Proc of the IEEE Conference on Computer Vision and Pattern Recognition.New York,USA:IEEE,2010:3501-3508.
    [20]YANG M,ZHANG L,YANG J,et al.Metaface Learning for Sparse Representation Based Face Recognition//Proc of the 17th IEEE International Conference on Image Processing.New York,USA:IEEE,2010:1601-1604.
    [21]ROSASCO L,MOSCI S F,SANTORO M,et al.Iterative Projection Methods for Structured Sparsity Regularization.Technical Reports,MIT-CSAIL-TR-2009-050,CBCL-282.Cambridge,USA:Massachusetts Institute of Technology,2009.
    [22]YANG M,ZHANG L,FENG X C,et al.Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification.International Journal of Computer Vision,2014,109(3):209-232.
    [23]GRAY D,BOWES D,DAVEY N,et al.The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction//Proc of the 15th Annual Conference on Evaluation&Assessment in Software Engineering.London,UK:IET,2011:96-103.
    [24]SHEPPERD M,SONG Q B,SUN Z B,et al.Data Quality:Some Comments on the NASA Software Defect Datasets.IEEE Transactions on Software Engineering,2013,39(9):1208-1215.
    [25]LU H H,CUKIC B,CULP M.An Iterative Semi-supervised Approach to Software Fault Prediction//Proc of the 7th International Conference on Predictive Models in Software Engineering.New York,USA:ACM,2011.DOI:10.1145/2020390.2020405.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700