Studying re-opened bugs in open source software

详细信息查看全文

作者：Emad Shihab (1)
Akinori Ihara (2)
Yasutaka Kamei (3)
Walid M. Ibrahim (4)
Masao Ohira (2)
Bram Adams (5)
Ahmed E. Hassan (4)
Ken-ichi Matsumoto (2)
关键词：Bug reports ; Re ; opened bugs ; Open source software
刊名：Empirical Software Engineering
出版年：2013
出版时间：October 2013
年：2013
卷：18
期：5
页码：1005-1042
全文大小：731KB
参考文献：1. Anbalagan P, Vouk M (2009) “Days of the week-effect in predicting the time taken to fix defects. In: DEFECTS -9: proceedings of the 2nd international workshop on defects in large software systems, pp?29-0
2. Androutsopoulos I, Koutsias J, Cb KV, Spyropoulos CD (2000) An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp?160-67
3. Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON -8: proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp?304-18
4. Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: ICSE-6: proceedings of the 28th international conference on software engineering, pp 361-70
5. Aranda J, Venolia G (2009) The secret life of bugs: going past the errors and omissions in software repositories. In: ICSE -9: proceedings of the 31st international conference on software engineering, pp 298-08
6. Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36(3):849-51 CrossRef
7. Bettenburg N, Just S, Schr?ter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report? In: SIGSOFT -8/FSE-16: proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp 308-18
8. Bettenburg N, Premraj R, Zimmermann T, Kim S (2008b) Duplicate bug reports considered harmful really? In: ICSM -8: proceedings of international conference on software maintenance, pp 337-45
9. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: ESEC/FSE-9: proceedings of the the seventh joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp?121-30
10. Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009b) Does distributed development affect software quality? An empirical case study of windows vista. In: ICSE -9: proceedings of the 31st international conference on software engineering, pp 518-28
11. Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 99(6):864-78 CrossRef
12. Chan P, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the fourth international conference on knowledge discovery and data mining, pp 164-68
13. D’Ambros M, Lanza M, Robbes R (2009) On the relationship between change coupling and software defects. Working conference on reverse engineering, pp 135-44
14. Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78(382):316-31 CrossRef
15. Erlikh L (2000) Leveraging legacy system dollars for e-business. IT Prof 2(3):17-3 CrossRef
16. Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: IDA -1: proceedings of the 4th international conference on advances in intelligent data analysis, pp?34-3
17. Eyolfson J, Tan L, Lam P (2008) Do time of day and developer experience affect commit bugginess? In: MSR -1: proceedings of the 8th working conference on mining software repositories, pp?153-62
18. Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM-3: proceedings of the international conference on software maintenance, pp 23-2
19. Freund Y, Schapire RE (1995) A decision-theoretic generalization of online learning and an application to boosting. In: Second European conf. on computational learning theory (EuroCOLT), pp 23-7
20. Graham P (2002) A plan for spam. m.com/spam.html" class="a-plus-plus">http://paulgraham.com/spam.html. Accessed Mar 2012
21. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653-61 CrossRef
22. Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: ICSE -0: proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1, pp 495-04
23. Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE -9: proceedings of the 2009 IEEE 31st international conference on software engineering, pp 78-8
24. Hassan AE, Zhang K (2006) Using decision trees to predict the certification result of a build. In: ASE -6: proceedings of the 21st IEEE/ACM international conference on automated software engineering, pp?189-98
25. Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: MSR -8: proceedings of the 2008 international working conference on mining software repositories, pp 145-48
26. Hewett R, Kijsanayothin P (2009) On modeling software defect repair time. Empir Software Eng 14(2):165-86 CrossRef
27. Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: ASE -7: proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp?34-3
28. Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion? In: MSR -0: proceedings of the 2010 international working conference on mining software repositories
29. Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE -9: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 111-20
30. Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: MSR -6: proceedings of the 2006 international workshop on mining software repositories, pp 173-74
31. Kim S, James?Whitehead JE, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34:181-96 CrossRef
32. Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: ICSE -6: proceedings of the IEEE working conference on mining software repositories, pp 1-0
33. Lee T, Nam J, Han D, Kim S, In H (2011) Micro interaction metrics for defect prediction. In: ESEC/FSE -1: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 311-21
34. Menzies T, Dekhtyar A, Distefano J, Greenwald J 2007 Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33:637-40 CrossRef
35. Meyer TA, Whateley B (2004) SpamBayes: effective open-source, Bayesian based, email classification system. In: Proceedings of the first conference on email and anti-spam
36. Michelakis E, Androutsopoulos I, Paliouras G, Sakkis G, Stamatopoulos P (2004) Filtron: a learning-based anti-spam filter. In: Proceedings of the 1st conference on email and anti-spam
37. Mizuno O, Hata H (2010) An integrated approach to detect fault-prone modules using complexity and text feature metrics. In: Proceedings of the 2010 international conference on advances in computer science and information technology, pp 457-68
38. Mizuno O, Ikami S, Nakaichi S, Kikuno T (2007) Spam filter based approach for finding fault-prone software modules. In: MSR-7: proceedings of the fourth international workshop on mining software repositories, pp 4-
39. Mockus A (2010) Organizational volatility and its effects on software defects. In: FSE -0: proceedings of the eighteenth ACM SIGSOFT international symposium on foundations of software engineering, pp 117-26
40. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309-46 CrossRef
41. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE -8: proceedings of the 30th international conference on Software engineering, pp 181-90
42. Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: ICSE -8: proceedings of the 30th international conference on Software engineering, pp 521-30
43. Panjer LD (2007) Predicting eclipse bug lifetimes. In: MSR -7: proceedings of the fourth international workshop on mining software repositories, p?29
44. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
45. Sánchez JS, Barandela R, Marqués AI, Alejo R (2001) Performance evaluation of prototype selection algorithms for nearest neighbor classification. In: SIBGRAPI -1: proceedings of the XIV Brazilian symposium on computer graphics and image processing, p?44
46. Sayyad J, Lethbridge C (2001) Supporting software maintenance by mining software update records. In: ICSM -1: proceedings of the IEEE International Conference on Software Maintenance (ICSM-1), p?22
47. Schr?ter A, Zimmermann T, Premraj R, Zeller A (2006) If your bug database could talk.... In: ISESE -6: proceedings of the 5th international symposium on empirical software engineering. Volume II: short papers and posters, pp 18-0
48. Shang W, Jiang ZM, Adams B, Hassan AE (2009) Mapreduce as a general framework to support research in mining software repositories (MSR). In: MSR -9: proceedings of the fourth international workshop on mining software repositories, p?10
49. Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2010) Predicting re-opened bugs: a case study on the eclipse project. In: WCRE-0: proceedings of the 17th working conference on reverse engineering, pp 249-58
50. Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: ESEC/FSE -1: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 300-10
51. ?liwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR -5: proceedings of the 2005 international workshop on mining software repositories, pp 1-
52. Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: MSR -7: proceedings of the fourth international workshop on mining software repositories, p?1
53. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc
54. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE -7: proceedings of the third international workshop on predictor models in software engineering, p?9
55. Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: ICSE -2: proceedings of the 34th international conference on software engineering, pp?495-04
作者单位：Emad Shihab (1)
Akinori Ihara (2)
Yasutaka Kamei (3)
Walid M. Ibrahim (4)
Masao Ohira (2)
Bram Adams (5)
Ahmed E. Hassan (4)
Ken-ichi Matsumoto (2)

1. Department of Software Engineering, Rochester Institute of Technology, Rochester, NY, USA
2. Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
3. Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Nishi-ku, Fukuoka, Japan
4. Software Analysis and Intelligence Lab (SAIL), Queen’s University, Kingston, ON, Canada
5. Lab on Maintenance, Construction and Intelligence of Software (MCIS), école Polytechnique de Montréal, Montréal, QC, Canada

文摘

Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on three large open source projects—namely Eclipse, Apache and OpenOffice. We structure our study along four dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). We build decision trees using the aforementioned factors that aim to predict re-opened bugs. We perform top node analysis to determine which factors are the most important indicators of whether or not a bug will be re-opened. Our study shows that the comment text and last status of the bug when it is initially closed are the most important factors related to whether or not a bug will be re-opened. Using a combination of these dimensions, we can build explainable prediction models that can achieve a precision between 52.1-8.6?% and a recall in the range of 70.5-4.1?% when predicting whether a bug will be re-opened. We find that the factors that best indicate which bugs might be re-opened vary based on the project. The comment text is the most important factor for the Eclipse and OpenOffice projects, while the last status is the most important one for Apache. These factors should be closely examined in order to reduce maintenance cost due to re-opened bugs.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700