即时软件缺陷预测研究进展

英文篇名：Just-in-time Software Defect Prediction: Literature Review
作者：蔡亮 ; 范元瑞 ; 鄢萌 ; 夏鑫
英文作者：CAI Liang;FAN Yuan-Rui;YAN Meng;XIA Xin;College of Computer Science and Technology, Zhejiang University;Faculty of Information Technology, Monash University;
关键词：软件缺陷预测 ; 即时缺陷预测 ; 软件维护 ; 软件质量 ; 软件工程
英文关键词：software defect prediction;;just-in-time defect prediction;;software maintenance;;software quality;;software engineering
中文刊名：RJXB
英文刊名：Journal of Software
机构：浙江大学计算机科学与技术学院;Faculty of Information Technology Monash University;
出版日期：2019-05-15
出版单位：软件学报
年：2019
期：v.30
基金：浙江大学-中移在线联合创新实验室资助项目~~
语种：中文;
页：RJXB201905007
页数：20
CN：05
ISSN：11-2560/TP
分类号：90-109

摘要

软件缺陷预测一直是软件工程研究中最活跃的领域之一,研究人员己经提出了大量的缺陷预测技术,根据预测粒度不同,主要包括模块级、文件级和变更级(change-level)缺陷预测.其中,变更级缺陷预测旨在于开发者提交代码时,对其引入的代码是否存在缺陷进行预测,因此又被称作即时(just-in-time)缺陷预测.近年来,即时缺陷预测技术由于其即时性、细粒度等优势,成为缺陷预测领域的研究热点,取得了一系列研究成果;同时也在数据标注、特征提取、模型评估等环节面临诸多挑战,迫切需要更先进、统一的理论指导和技术支撑.鉴于此,从即时缺陷预测技术的数据标注、特征提取和模型评估等方面对近年来即时缺陷预测研究进展进行梳理和总结.主要内容包括:(1)归类并梳理了即时缺陷预测模型构建中数据标注常用方法及其优缺点;(2)对即时缺陷预测的特征类型和计算方法进行了详细分类和总结;(3)总结并归类现有模型构建技术;(4)总结了模型评估中使用的实验验证方法与性能评估指标;(5)归纳出了即时缺陷预测技术的关键问题;(6)最后展望了即时缺陷预测的未来发展.
Software defect prediction is always one of the most active research areas in software engineering. Researchers have proposed a lot of defect prediction techniques. These techniques consist of module-level, file-level, and change-level defect prediction according to the granularity. Change-level defect prediction can predict the defect-proneness of changes when they are initially submitted.Hence, such a technique is referred to as just-in-time defect prediction. Recently, just-in-time defect prediction becomes the hot area in defect prediction because of its timely manner and fine granularity. There are a lot of achievements in this area and there are also many challenges in data labeling, feature extraction, and model evaluation. More advanced and unified theoretic and technical guidelines are needed to enhance just-in-time defect prediction. Therefore, in this study, a literature review for prior just-in-time defect prediction studies is presented in three folds, data labeling, feature extraction, and model evaluation. In summary, the contributions of this study are:(1) The data labeling methods and their advantages and disadvantages are concluded;(2) The feature categories and computing methods are concluded and classified;(3) The modeling techniques are concluded and classified;(4) The model validation and performance measures in model evaluation are concluded;(5) The current problems in this area are highlighted; and(6) The trends of Just-in-Time defect prediction are concluded.

引文

[1]Zubrow D.IEEE Standard Classification for Software Anomalies.IEEE Std,2009.1-23.
    [2]Newman M.Software errors cost us economy 59.5 billion annually:NIST assesses technical needs of industry to improve softwaretesting.2002.http://www.abeacha.com/NIST_press_release_bugs_cost.htm
    [3]Marks L,Zou Y,Hassan AE.Studying the fix-time for bugs in large open source projects.In:Proc.of the 7th Int’l Conf.on Predictive Models in Software Engineering.New York:ACM Press,2011.No.11.
    [4]LaToza TD,Venolia G,DeLine R.Maintaining mental models:A study of developer work habits.In:Proc.of the 28th Int’l Conf.on Software Engineering.New York:ACM Press,2006.492-501.
    [5]Tantithamthavorn C,McIntosh S,Hassan AE,Matsumoto K.The impact of automated parameter optimization on defect prediction models.IEEE Trans.on Software Engineering,2018.[doi:10.1109/TSE.2018.2794977]
    [6]Hosseini S,Turhan B,Gunarathna D.A systematic literature review and meta-analysis on cross project defect prediction.IEEETrans.on Software Engineering,2019,45(2):111-147.
    [7]Mende T,Koschke R.Revisiting the evaluation of defect predictionmodels.In:Proc.of the 5th Int’l Conf.on Predictor Models in Software Engineering.New York:ACM Press,2009.No.7.
    [8]Xia X,Shihab E,Kamei Y,Lo D,Wang X.Predicting crashing releases of mobile applications.In:Proc.of the 10th ACM/IEEEInt’l Symp.on Empirical Software Engineering and Measurement.New York:ACM Press,2016.No.29.
    [9]Song Q,Jia Z,Shepperd M,Ying S,Liu J.A general software defect-proneness prediction framework.IEEE Trans.on Software Engineering,2011,37(3):356-370.
    [10]Lessmann S,Baesens B,Mues C,Pietsch S.Benchmarking classification models for software defect prediction:A proposed framework and novel findings.IEEE Trans.on Software Engineering,2008,34(4):485-496.
    [11]Hassan AE.Predicting faults using the complexity of code changes.In:Proc.of the 31st Int’l Conf.on Software Engineering.Washington:IEEE,2009.78-88.
    [12]Menzies T,Butcher A,Cok D,Marcus A,Layman L,Shull F,Turhan B,Zimmermann T.Local versus global lessons for defect prediction and effort estimation.IEEE Trans.on Software Engineering,2013,39(6):822-834.
    [13]Xia X,Lo D,Nagappan N,Wang X.Hydra:Massively compositional model for cross-project defect prediction.IEEE Trans.on Software Engineering,2016,42(10):977-998.
    [14]Nam J,Fu W,Kim S,Menzies T,Tan L.Heterogeneous defect prediction.IEEE Trans.on Software Engineering,2018,44(9):874-896.
    [15]Zhang F,Hassan AE,McIntosh S,Zou Y.The use of summation to aggregate software metrics hinders the performance of defect prediction models.IEEE Trans.Software Engineering,2017,43(5):476-491.
    [16]Koru AG,Zhang D,El Emam K,Liu H.An investigation into the functional form of the size-defect relationship for software modules.IEEE Trans.on Software Engineering,2009,35(2):293-304.
    [17]Kim S,Jr.Whitehead EJ,Zhang Y.Classifying software changes:Clean or buggy?IEEE Trans.on Software Engineering,2008,34(2):181-196.
    [18]Shihab E,Hassan AE,Adams B,Jiang ZM.An industrial study on the risk of software changes.In:Proc.of the 20th Int’l Symp.on the Foundations of Software Engineering.New York:ACM Press,2012.No.62.
    [19]Kamei Y,Shihab E,Adams B,Hassan AE,Mockus A,Sinha A,Ubayashi N.A large-scale empirical study of just-in-time quality assurance.IEEE Trans.on Software Engineering,2013,39(6):757-773.
    [20]Jiang T,Tan L,Kim S.Personalized defect prediction.In:Proc.of the 28th Int’l Conf.on Automated Software Engineering.Washington:IEEE,2013.279-289.
    [21]Shivaji S,Whitehead EJ,Akella R,Kim S.Reducing features to improve code change-based bug prediction.IEEE Trans.on Software Engineering,2013,39(4):552-569.
    [22]Fukushima T,Kamei Y,McIntosh S,Yamashita K,Ubayashi N.An empirical study of just-in-time defect prediction using crossproject models.In:Proc.of the 11th Working Conf.on Mining Software Repositories.New York:ACM Press,2014.172-181.
    [23]Tan M,Tan L,Dara S,Mayeux C.Online defect prediction for imbalanced data.In:Proc.of the 37th Int’l Conf.on Software Engineering.Washington:IEEE,2015.99-108.
    [24]Kamei Y,Fukushima T,McIntosh S,Yamashita K,Ubayashi N,Hassan AE.Studying just-in-time defect predictionusing crossproject models.Empirical Software Engineering,2016,21(5):2072-2106.
    [25]Yang Y,Zhou Y,Liu J,Zhao Y,Lu H,Xu L,Xu B,Leung H.Effort-aware just-in-time defect prediction:Simple unsupervised models could be better than supervised models.In:Proc.of the 24th Int’l Symp.on Foundations of Software Engineering.New York:ACM Press,2016.157-168.
    [26]Huang Q,Xia X,Lo D.Supervised vs unsupervised models:A holistic look at effort-aware just-in-time defect prediction.In:Proc.of the 33rd Int’l Conf.on Software Maintenance and Evolution.Washington:IEEE,2017.159-170.
    [27]Fu W,Menzies T.Revisiting unsupervised learning for defect prediction.In:Proc.of the 25th Int’l Symp.on Foundations of Software Engineering.New York:ACM Press,2017.72-83.
    [28]McIntosh S,Kamei Y.Are fix-inducing changes a moving target?A longitudinal case study of just-in-time defect prediction.IEEETrans.on Software Engineering,2018,44(5):412-428.
    [29]Mockus A,Weiss DM.Predicting risk of software changes.Bell Labs Technical Journal,2000,5(2):169-180.
    [30]?liwerski J,Zimmermann T,Zeller A.When do changes induce fixes?In:Proc.of the 2nd Working Conf.on Mining Software Repositories.New York:ACM Press,2005.24-28.
    [31]Kamei Y,Shihab E.Defect prediction:Accomplishments and future challenges.In:Proc.of the 23rd Int’l Conf.on Software Analysis,Evolution,and Reengineering.Washington:IEEE,2016.33-45.
    [32]Kim SH,Zimmermann T,Pan K,Jr.Whitehead EJ.Automatic identification of bug-introducing changes.In:Proc.of the 21st Int’l Conf.on Automated Software Engineering.Washington:IEEE,2006.81-90.
    [33]Da Costa DA,McIntosh S,Shang W,Kulesza U,Coelho R,Hassan AE.A framework for evaluating the results of the SZZapproach for identifying bug-introducing changes.IEEE Trans.on Software Engineering,2017,43(7):641-657.
    [34]Neto EC,Da Costa DA,Kulesza U.The impact of refactoring changes on the SZZ algorithm:An empirical study.In:Proc.of the25th Int’l Conf.on Software Analysis,Evolution and Reengineering.Washington:IEEE,2018.380-390.
    [35]Zimmermann T,Kim S,Zeller A,Jr.Whitehead EJ.Mining version archives for co-changed lines.In:Proc.of the 3rd Working Conf.of Mining Software Repositories.New York:ACM Press,2006.72-75.
    [36]Fowler M,Beck K,Brant J,Opdyke W,Roberts D.Refactoring:Improving the Design of Existing Code.Boston:Addison-Wesley Professional,1999.
    [37]Silva D,Valente MT.RefDiff:Detecting refactorings in version histories.In:Proc.of the 14th Int’l Conf.on Mining Software Repositories.Washington:IEEE,2017.269-279.
    [38]Basili VR,Perricone BT.Software errors and complexity:An empirical investigation.Communications of the ACM,1984,27(1):42-52.
    [39]Hatton L.Reexamining the fault density component size connection.IEEE Software,1997,14(2):89-97.
    [40]Mockus A,Votta LG.Identifying reasons for software changes using historic databases.In:Proc.of the 16th Int’l Conf.on Software Maintenance.Washington:IEEE,2000.120-130.
    [41]Schneidewind NF,Hoffmann HM.An experiment in software error data collection and analysis.IEEE Trans.on Software Engineering,1979,5(3):276-286.
    [42]Khoshgoftaar TM,Allen EB.Ordering fault-prone software modules.Software Quality Journal,2003,11(1):19-37.
    [43]Gyimothy T,Ferenc R,Siket I.Empirical validation of object-oriented metrics on open source software for fault prediction.IEEETrans.on Software Engineering,2005,31(10):897-910.
    [44]Sammut C,Webb GI.Encyclopedia of Machine Learning and Data Mining.2nd ed.,Boston:Springer-Verlag,2017.314-315.
    [45]Moser R,Pedrycz W,Succi G.A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction.In:Proc.of the 30th Int’l Conf.on Software Engineering.New York:ACM Press,2008.181-190.
    [46]Fan Y,Xia X,Lo D,Li S.Early prediction of merged code changes to prioritize reviewing tasks.Empirical Software Engineering,2018,23(6):3346-3393.
    [47]Nagappan N,Ball T.Use of relative code churn measures to predict system defect density.In:Proc.of the 27th Int’l Conf.on Software Engineering.New York:ACM Press,2005.284-292.
    [48]D’Ambros M,Lanza M,Robbes R.An extensive comparison of bug prediction approaches.In:Proc.of the 7th Working Conf.on Mining Software Repositories.Washington:IEEE,2010.31-41.
    [49]Herzig K,Just S,Zeller A.It’s not a bug,it’s a feature:How misclassification impacts bug prediction.In:Proc.of the 35th Int’l Conf.on Software Engineering.Washington:IEEE,2013.392-401.
    [50]Eyolfson J,Tan L,Lam P.Do time of day and developer experience affect commit bugginess?In:Proc.of the 8th Working Conf.on Mining Software Repositories.New York:ACM Press,2011.153-162.
    [51]Menzies T,Greenwald J,Frank A.Data mining static code attributes to learn defect predictors.IEEE Trans.on Software Engineering,2007,32(1):2-13.
    [52]Card DN,Agresti WW.Measuring software design complexity.Journal of Systems and Software,1988,8(3):185-197.
    [53]Zhang Y,Jin R,Zhou ZH.Understanding bag-of-words model:A statistical framework.Int’l Journal of Machine Learning and Cybernetics,2010,1(1-4):43-52.
    [54]Jones J.Abstract syntax tree implementation idioms.In:Proc.of the 10th Conf.on Pattern Languages of Programs.Hillside,2003.https://www.hillside.net/plop/plop2003/papers.html
    [55]Matsumoto S,Kamei Y,Monden A,Matsumoto KI,Nakamura M.An analysis of developer metrics for fault prediction.In:Proc.of the 6th Int’l Conf.on Predictive Models in Software Engineering.New York:ACM Press,2010.18.
    [56]Bird C,Nagappan N,Murphy B,Gall H,Devanbu P.Don’t touch my code!Examining the effects of ownership on software quality.In:Proc.of the 19th Int’l Symp.on Foundations of Software Engineering.New York:ACM Press,2011.4-14.
    [57]Thongtanunam P,McIntosh S,Hassan AE,Iida H.Investigating code review practices in defective files:An empirical study of the QT system.In:Proc.of the 12th Int’l Conf.on Mining Software Repositories.Washington:IEEE,2015.168-179.
    [58]Mitchell TM.Machine Learning.Burr Ridge:McGraw Hill,1997.
    [59]Mende T,Koschke R.Effort-aware defect prediction models.In:Proc.of the 14th European Conf.on Software Maintenanceand Reengineering.Washington:IEEE,2010.107-116.
    [60]Lyu MR.Handbook of Software Reliability Engineering.IEEE Computer Society Press,1996.359-399.
    [61]Thung F,Lo D,Jiang L.Automatic defect categorization.In:Proc of the 19th Working Conf.on Reverse Engineering.Washington:IEEE,2012.205-214.
    [62]Thung F,Le XB,Lo D.Active semi-supervised defect categorization.In:Proc.of the 23rd Int’l Conf.on Program Comprehension.Washington:IEEE,2015.60-70.
    [63]Huang L,Ng V,Persing I,Chen M,Li Z,Geng R,Tian J.AutoODC:Automated generation of orthogonal defect classifications.Automated Software Engineering,2015,22(1):3-46.
    [64]Hernández-González J,Rodriguez D,Inza I,Harrison R,Lozano JA.Learning to classify software defects from crowds:A novel approach.Applied Soft Computing,2018,62:579-591.
    [65]Lukins SK,Kraft NA,Etzkorn LH.Bug localization using latent dirichlet allocation.Information and Software Technology,2010,52(9):972-990.
    [66]Zhou J,Zhang H,Lo D.Where should the bugs be fixed?More accurate information retrieval-based bug localization based on bug reports.In:Proc.of the 34th Int’l Conf.on Software Engineering.Washington:IEEE,2012.14-24.
    [67]Xie X,Chen TY,Kuo FC,Xu B.A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization.ACMTrans.on Software Engineering and Methodology,2013,22(4):Article No.31.
    [68]Kim D,Tao Y,Kim S,Zeller A.Where should we fix this bug?A two-phase recommendation model.IEEE Trans.on Software Engineering,2013,39(11):1597-1610.
    [69]Lam AN,Nguyen AT,Nguyen HA,Nguyen TN.Bug localization with combination of deep learning and information retrieval.In:Proc.ofthe 25th Int’l Conf.on Program Comprehension.Washington:IEEE,2017.218-229.
    [70]Youm KC,Ahn J,Lee E.Improved bug localization based on code change histories and bug reports.Information and Software Technology,2017,82:177-192.
    [71]Hoang TV,Oentaryo RJ,Le TD,Lo D.Network-clustered multi-modal bug localization.IEEE Trans.on Software Engineering,2018.[doi:10.1109/TSE.2018.2810892]
    [72]Yang X,Lo D,Xia X,Sun J.Tlel:A two-layer ensemble learning approach for just-in-time defect prediction.Information and Software Technology,2017,87:206-220.
    [73]Yang X,Lo D,Xia X,Zhang Y,Sun J.Deep learning for just-in-time defect prediction.In:Proc.of the 15th Int’l Conf.on Software Quality,Reliability and Security.Washington:IEEE,2015.17-26.
    [74]Bachmann A,Bird C,Rahman F,Devanbu P,Bernstein A.The missing links:Bugs and bug-fix commits.In:Proc.of the 18th Int’l Symp.on Foundations of Software Engineering.New York:ACM Press,2010.97-106.
    [75]Graves TL,Karr AF,Marron JS,Siy H.Predicting fault incidence using software change history.IEEE Trans.on Software Engineering,2000,26(7):653-661.
    [76]Hinton GE,Salakhutdinov RR.Reducing the dimensionality of data with neural networks.Science,2006,313(5786):504-507.
    [77]Hinton GE.Learning multiple layers of representation.Trends in Cognitive Sciences,2007,11(10):428-434.
    [78]Deng L,Li J,Huang JT,Yao K,Yu D,Seide F,Seltzer M,Zweig G,He X,Williams J.Recent advances in deep learning for speech research at Microsoft.In:Proc.of the 38th Int’l Conf.on Acoustics,Speech and Signal Processing.Washington:IEEE,2013.8604-8608.
    [79]Hinton GE.Deep belief networks.Scholarpedia,2009,4(5):Article No.5947.
    [80]Krizhevsky A,Sutskever I,Hinton GE.ImageNet classification with deep convolutional neural networks.In:Proc.of the Advancesin Neural Information Processing Systems.2012.1097-1105.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700