基于采样的半监督支持向量机软件缺陷预测方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Software defect prediction using semi-supervised support vector machine with sampling
  • 作者:廖胜平 ; 徐玲 ; 鄢萌
  • 英文作者:LIAO Shengping;XU Ling;YAN Meng;School of Software Engineering, Chongqing University;
  • 关键词:软件缺陷预测 ; 半监督 ; Safe半监督支持向量机(S4VM) ; 类不平衡 ; 采样
  • 英文关键词:software defect prediction;;semi-supervised;;Safe Semi-Supervised Support Vector Machines(S4VM);;class imbalance;;sample
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:重庆大学软件学院;
  • 出版日期:2016-06-17 16:19
  • 出版单位:计算机工程与应用
  • 年:2017
  • 期:v.53;No.885
  • 基金:国家自然科学重点基金(No.91118005);; 重庆市研究生科研创新项目(No.CYS14008)
  • 语种:中文;
  • 页:JSGG201714027
  • 页数:6
  • CN:14
  • 分类号:166-171
摘要
软件缺陷预测有助于提高软件开发质量,保证测试资源有效分配。针对软件缺陷预测研究中类标签数据难以获取和类不平衡分布问题,提出基于采样的半监督支持向量机预测模型。该模型采用无监督的采样技术,确保带标签样本数据中缺陷样本数量不会过低,使用半监督支持向量机方法,在少量带标签样本数据基础上利用无标签数据信息构建预测模型;使用公开的NASA软件缺陷预测数据集进行仿真实验。实验结果表明提出的方法与现有半监督方法相比,在综合评价指标F值和召回率上均优于现有方法;与有监督方法相比,能在学习样本较少的情况下取得相当的预测性能。
        Software defect prediction is helpful to improve the quality of software and effectively allocate test resources.To tackle two practical yet important issues in software defect prediction: labeled data is hard to be collected and class imbalance, a sample based semi-supervised support vector machine method is proposed. This method uses an unsupervised sample approach to sample a small percentage of modules to be tested and labeled, and this sample method can ensure that the defect instances in training sets are not too few. Semi-supervised support vector machine algorithm uses few labeled data combined with unlabeled to build predictor so that the model can exploit the information of unlabeled data. In the evaluation on four NASA projects, the experimental results show that the proposed approach achieves comparable performance compared with supervised learning models, but uses little defect information. Moreover, proposed method's performance is better than other semi-supervised learning methods in terms of recall and F-measure.
引文
[1]Gyimothy T,Ferenc R,Siket I.Empirical validation of object-oriented metrics on open source software for fault prediction[J].IEEE Transactions on Software Engineering,2005,31(10):897-910.
    [2]Zhou Y,Leung H.Empirical analysis of object-oriented design metrics for predicting high and low severity faults[J].IEEE Transactions on Software Engineering,2006,32(10):771-789.
    [3]Menzies T,Greenwald J,Frank A.Data mining static code attributes to learn defect predictors[J].IEEE Transactions on Software Engineering,2007,33(1):2-13.
    [4]Qinbao S,Zihan J,Shepperd M,et al.A general software defect-proneness prediction framework[J].IEEE Transactions on Software Engineering,2011,37(3):356-370.
    [5]Pai G J,Dugan J B.Empirical analysis of software fault content and fault proneness using Bayesian methods[J].IEEE Transactions on Software Engineering,2007,33(10):675-686.
    [6]Bouguila N,Wang J H,Hamza A B.A Bayesian approach for software quality prediction[C]//Proceedings of the 20084th International IEEE Conference on Intelligent Systems(IS 2008),2008.
    [7]Arar?F,Ayan K.Software defect prediction using costsensitive neural network[J].Applied Soft Computing,2015,33:263-277.
    [8]Khoshgoftaar T,Allen E.Neural networks for software quality prediction[J].Computational Intelligence in Software Engineering,1998,16:33-63.
    [9]Xing F,Guo P,Lyu M R.A novel method for early software quality prediction based on support vector machine[C]//Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering(ISSRE 2005),2005:213-222.
    [10]Elish K O,Elish M O.Predicting defect-prone software modules using support vector machines[J].Journal of Systems and Software,2008,81(5):649-660.
    [11]Boehm B.Industrial software metrics top 10 list[J].IEEE Software,1987,4:84-85.
    [12]Li M,Zhang H,Wu R,et al.Sample-based software defect prediction with active and semi-supervised learning[J].Automated Software Engineering,2012,19(2):201-230.
    [13]Li Y F,Zhou Z H.Towards making unlabeled data never hurt[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(1):175-188.
    [14]Knab P,Pinzger M,Bernstein A.Predicting defect densities in source code files with decision tree learners[C]//Proceedings of the 2006 International Workshop on Mining Software Repositories(MSR’06),Co-located with the 28th International Conference on Software Engineering(ICSE 2006),Shanghai,China,2006:119-125.
    [15]Khoshgoftaar T M,Allen E B,Kalaichelvan K S,et al.Early quality prediction:a case study in telecommunications[J].IEEE Software,1996,13(1):65-71.
    [16]Evett M,Khoshgoftar T,Chien P D,et al.GP-based software quality prediction[C]//Proceedings of the 3rd Annual Conference on Genetic Programming,1998.
    [17]Kanmani S,Uthariaraj V R,Sankaranarayanan V,et al.Object-oriented software fault prediction using neural networks[J].Information and Software Technology,2007,49(5):483-492.
    [18]王涛,李伟华,刘尊,等.基于支持向量机的软件缺陷预测模型[J].西北工业大学学报,2011,29(6):864-870.
    [19]姜慧研,宗茂,刘相莹.基于ACO—SVM的软件缺陷预测模型的研究[J].计算机学报,2011,34(6):1148-1154.
    [20]王培,金聪.遗传优化支持向量机在软件缺陷预测中的应用[J].电子测量技术,2012,35(2):126-129.
    [21]Jiang Y,Li M,Zhou Z H.Software defect detection with ROCUS[J].Journal of Computer Science and Technology,2011,26(2):328-342.
    [22]Seliya N,Khoshgoftaar T M.Software quality estimation with limited fault data:a semi-supervised learning perspective[J].Software Quality Journal,2007,15(3):327-344.
    [23]Lu H,Cukic B,Culp M.Software defect prediction using semi-supervised learning with dimension reduction[C]//Proceedings of the 2012 27th IEEE/ACM International Conference on Automated Software Engineering(ASE2012),2012:314-317.
    [24]Nam J,Kim S.CLAMI:defect prediction on unlabeled datasets(T)[C]//Proceedings of 2015 30th IEEE/ACM International Conference on Automated Software Engineering,2015:452-463.
    [25]Chapman M,Callis P,Jackson W.Metrics data program[J/OL].NASA IV and V Facility,2004[2016-01-15].http://www.nasa.gov.
    [26]Jiang Y,Cukic B,Ma Y.Techniques for evaluating fault prediction models[J].Empirical Software Engineering,2008,13(5):561-595.
    [27]Lessmann S,Baesens B,Mues C,et al.Benchmarking classification models for software defect prediction:a proposed framework and novel findings[J].IEEE Transactions on Software Engineering,2008,34(4):485-496.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700