用于软件故障定位的差异比较方法及其改进

英文题名：Fault Localization Method Based on Difference Comparison and Its Improvement
作者：洪丽娜
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：自动调试 ; 故障定位 ; 统计调试 ; 聚类 ; 运行约减
英文关键词：Automated Debugging ; Fault Localization ; Statiscal Debugging ; Clustering ; Reduced Runs
学位年度：2010
导师：陈荣
学科代码：081202
学位授予单位：大连海事大学
论文提交日期：2010-05-01

摘要

软件故障定位目的是快速准确定位软件中出现的错误,它是一项复杂又耗时的过程。基于测试的软件故障定位技术(TBFL)利用测试覆盖信息对程序错误(亦称故障)进行定位。TBFL可以分为基于差异度量的技术和基于特征统计的技术：基于差异度量的方法计算与失效运行与所有成功运行的最小差异,但是程序实际故障位置很可能不出现在最小差异中,并且这种方法都没有对可疑代码按照可疑程度进行排序；基于特征统计的方法是通过对动态程序行为的统计信息进行分析来定位故障,这种方法对代码进行了可疑性排序,但排序往往受制于测试用例的质量。对TBFL的研究工作需要大量的失效运行和成功运行,而这些所需数据不可避免地存在大量的运行路径冗余,因此导致定位效率和准确性的降低。
针对以上问题,我们首先提出了一种基于控制流对齐的运行约减算法,然后提出了一种基于运行约减的差异统计软件故障定位方法。为此,我们首先对运行约减,把运行分成若干类,消除运行冗余；再计算每一类成功运行之间及每一类失效运行之间差异,最后对得到的差异进行统计分析,得出故障报告。我们不但对可疑代码进行了可疑性排序,并对一些非可疑代码进行了非可疑性排序。实验表明,我们的算法对分支错误的定位效果明显优于其他算法,对其他类型错误的定位也起到了较好的效果,大大提高了定位效率和准确性,使得用户能快速准确找到错误位置。
Software fault localization with the goal to locate fault in program quickly and correctly is a complex and time-consuming process. Testing-based fault localization (TBFL) localizes program errors (or faults) by using testing coverage information. TBFL can be divided into distance measures-based and characteristic statistics-based methods. Distance measures-based method calculates the minimal difference between failing run and successful runs for fault location, while the position of the fault may not be in the minimal difference but in other differences, also this method doesn not rank suspicious codes in terms of their likelihood of being faulty. In contrast, feature-based statistical method locates faults through analyzing statistical information of dynamic program behaviors, which ranks the suspicious codes but the ranking metrics are sometimes subject to the quality of test cases. Most of the research works in TBFL require a large amount of failing runs and successful runs. Those required execution data inevitably contain a large number of redundant execution paths, and thus leads to a lower efficiency and accuracy of locating.
To tackle these problems, we propose a runs reduction algorithm on the basis of control flow alignment, then propose an improved fault localization method based on statistical differences between reduced runs. To do so, we first cluster all successful runs and failing runs respectively in order to eliminate run redundancy, then calculate the differences between each class of successful runs and each class of failing runs, and finally return bug reports through statistical analysis of these differences. We not only rank suspicious codes by their likelihood of being faulty, but also rank some unsuspicious codes according to their possibilities of not containing the bug. The experimental results show that our algorithm performs significantly better than competitors when there are branch errors in programs, and also makes a good effect on other error types. Our algorithm greatly improved the efficiency and accuracy of fault localization, and can help the user find errors quickly and correctly.

引文

[1]http://www.nist.gov/public affairs/releases/n02-10. htm.
    [2]James S. Collofello and Scott N. Woodfield. Evaluating the effectiveness of reliability-assurance techniques. J. Syst. Softw.,9(3):191-195,1989.
    [3]Iris Vessey. Expertise in debugging computer programs:A process analysis. International Journal of Man-Machine Studies,23(5):459 - 494,1985.
    [4]Franz Wotawa. Debugging vhdl designs using model-based reasoning. AI in Engineering,14(4):331-351,2000.
    [5]A. Groce. Error explanation with distance metrics. In TACAS, volume 2988 of Lecture Notes in Computer Science. Springer,2004.
    [6]http://compilers.cs.ucla.edu/jtb/jtb-2003/download.html.
    [7]Rui Abreu, Peter Zoeteweij, Rob Golstei jn, and Arjan J. C. van Gemund. A practical evaluation of spectrum-based fault localization. J. Syst. Sof tw.,82(11):1780-1792, 2009.
    [8]Hiralal Agrawal, Richard A. Demillo, and Eugene H. Spafford. An execution backtracking approach to program debugging. IEEE Software,8:283-299,1991.
    [9]Hiralal Agrawal and Joseph R. Horgan. Dynamic program slicing. In PLDI'90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, pages 246-256, New York, NY, USA,1990. ACM.
    [10]Hiralal Agrawal, Joseph R. Horgan, Saul London, and W. Eric Wong. Fault localization using execution slices and dataflow tests,1995.
    [11]T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause:localizing errors in counterexample traces. In Proc. of POPL, pages 97-105. ACM Press,2003.
    [12]R. M. Balzer. Exdams:extendable debugging and monitoring system. In AFIPS '69 (Spring):Proceedings of the May 14-16,1969, spring joint computer conference, pages 567-580, New York, NY, USA,1969. ACM.
    [13]Yang Hong, Hong Lina, Chen Rong, and Liu Yaqing. A method of detecting program plagiarism based on variable dependence. The International Conference on Computational Intelligence and Software Engineering,2009.
    [14]Armin Biere. Bounded model checking. In Handbook of Satisfiability, pages 457-481.2009.
    [15]Armin Biere, Alessandro Cimatti, Edmund M. Clarke, Ofer Strichman, and Yunshan Zhu. Bounded model checking. Advances in Computers,58:118-149,2003.
    [16]James F. Bowring, James M. Rehg, and Mary Jean Harrold. Active learning for automatic classification of software,2004.
    [17]Holger Cleve and Andreas Zeller. Locating causes of program failures. In 27th International Conference on Software Engineering (ICSE 2005),15-21 May 2005, St. Louis, Missouri, USA, pages 342-351. ACM Press,2005.
    [18]http://docs.sun.com/app/docs/doc/819-5257/.
    [19]Brian Demsky and Martin Rinard. Automatic detection and repair of errors in data structures. ACM SIGPLAN Notices,38(11):78-95,2003.
    [20]William Dickinson, David Leon, and Andy Podgurski. Finding failures by cluster analysis of execution profiles. In ICSE'01:Proceedings of the 23rd International Conference on Software Engineering, pages 339-348, Washington, DC, USA,2001. IEEE Computer Society.
    [21]Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems,9:319-349,1987.
    [22]http://www.gnu.org/software/gdb/.
    [23]Liang Guo, Abhik Roychoudhury, and Tao Wang. Accurately choosing execution runs for software fault localization. In Alan Mycroft and Andreas Zeller, editors, 15th International Conference on Compiler Construction, volume 3923 of Lecture Notes in Computer Science. Springer,2006.
    [24]Dan Hao, Lu Zhang, Ying Pan, Hong Mei, and Jiasu Sun. On similarity-awareness in testing-based fault localization. Automated Software Engg.,15(2):207-249,2008.
    [25]Monica Hutchins, Herb Foster, Tarak Goradia, and Thomas Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In ICSE'94:Proceedings of the 16th international conference on Software engineering, pages 191-200, Los Alamitos, CA, USA,1994. IEEE Computer Society Press.
    [26]Dennis Jeffrey, Neelam Gupta, and Rajiv Gupta. Fault localization using value replacement. In International Symposium on Software Testing and Analysis, pages 167-178,2008.
    [27]James A. Jones and Mary Jean Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of theIEEE/ACM International Conference on Automated Software Engineering (ASE 2005), pages 273-282, Long Beach, CA, November 2005.
    [28]James A. Jones, Mary Jean Harrold, and John Stasko. Visualization of test information to assist fault localization. In ICSE'02:Proceedings of the 24th International Conference on Software Engineering, pages 467-477, New York, NY, USA, 2002. ACM.
    [29]B. Korel and J. Laski. Algorithmic software fault localization. In Annual Hawaii International Conference on System Sciences, pages 246-252,1991.
    [30]Bogdan Korel. Dynamic program slicing. In Information Processing Letters, 29(Oct,1988.
    [31]Ben Liblit, Alex Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 141-154. ACM Press,2003.
    [32]Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In PLDI'05:Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 15-26, New York, NY, USA,2005. ACM.
    [33]Chao Liu. Failure proximity:A fault localization-based approach. In In Proceedings of the International Symposium on the Foundations of Software Engineering, pages 286-295,2006.
    [34]Chao Liu, Xifeng Yan, Long Fei, Jiawei Han, and Samuel P. Midkiff. Sober: statistical model-based bug localization. SIGSOFT Softw. Eng. Notes,30(5):286-295, 2005.
    [35]F. J. Lukey. Understanding and debugging programs. International Journal of Man-Machine Studies,12(2):189 - 202,1980.
    [36]W. Mayer, M. Stumptner, D. Wieland, and F. Wotawa. Can ai help to improve debugging substantially? debugging experiences with value-based models, pages 417-421. IOS Press,2002.
    [37]Andy Podgurski, David Leon, Patrick Francis, Wes Masri, and Melinda Minch. Automated support for classifying software failure reports. In In ICSE, pages 465-475. IEEE Computer Society,2003.
    [38]Andy Podgurski, Wassim Masri, Yolanda McCleese, Francis G. Wolff, and Charles Yang. Estimation of software reliability by stratified sampling,1999.
    [39]Andy Podgurski and Charles Yang. Partition testing, stratified sampling, and cluster analysis. In SIGSOFT'93:Proceedings of the 1st ACM SIGSOFT symposium on Foundations of software engineering, pages 169-181, New York, NY, USA,1993. ACM.
    [40]Brock Pytlik, Manos Renieris, Shriram Krishnamurthi, and Steven P. Reiss. Automated fault localization using potential invariants,2003.
    [41]Sandra Rapps and Elaine J. Weyuker. Data flow analysis techniques for test data selection. In ICSE'82:Proceedings of the 6th international conference on Software engineering, pages 272-278, Los Alamitos, CA, USA,1982. IEEE Computer Society Press.
    [42]Sandra Rapps and Elaine J. Weyuker. Selecting software test data using data flow information. IEEE Trans. Softw. Eng.,11(4):367-375,1985.
    [43]Steven P. Reiss and Manos Renieris. Encoding program executions, pages 221-230. IEEE Computer Society,2001.
    [44]Manos Renieris and Steven P. Reiss. Fault localization with nearest neighbor queries. In ASE, pages 30-39. IEEE Computer Society,2003.
    [45]Thomas Reps, Thomas Ball, Manuvir Das, and James Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. In ACM Software Engineering Notes, pages 432-449. Springer-Verlag,1997.
    [46]Willem Visser Riacs, Alex Groce, Alex Groce, Willem Visser, and Willem Visser. What went wrong:Explaining counterexamples. In In SPIN Workshop on Model Checking of Software, pages 121-135. Springer-Verlag,2002.
    [47]Raul Santelices, James A. Jones, Yanbing Yu, and Mary Jean Harrold. Lightweight fault-localization using multiple coverage types. In ICSE'09: Proceedings of the 31st International Conference on Software Engineering, pages 56-66, Washington, DC, USA,2009. IEEE Computer Society.
    [48]Robert L. Sedlmeyer, William B. Thompson, and Paul E. Johnson. Knowledge-based fault localization in debugging (preliminary draft). In SIGSOFT'83: Proceedings of the ACM SIGSOFT/SIGPLAN software engineering symposium on High-level debugging, pages 25-31, New York, NY, USA,1983. ACM.
    [49]Ehud Y. Shapiro. Algorithmic Program DeBugging. MIT Press, Cambridge, MA, USA,1983.
    [50]Physical System Simulation, Peter Bunus, and Peter Fritzson. Semi-automatic fault localization and behavior verification for. In In Proceedings of the 18 th IEEE International Conference on Automated Software Engineering, pages 253-258,2003.
    [51]Abdallah Tubaishat. A knowledge base for program debugging. In AICCSA'01: Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications, page 321, Washington, DC, USA,2001. IEEE Computer Society.
    [52]https://javacc. dev. Java. net/.
    [53]Tao Wang and Abhik Roychoudhury. Automated path generation for software fault localization. In ASE'05:Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pages 347-351, New York, NY, USA,2005. ACM.
    [54]Mark Weiser. Programmers use slices when debugging. Commun. ACM, 25(7):446-452,1982.
    [55]Mark Weiser. Program slicing. IEEE Trans. Software Eng.,10(4):352-357, 1984. [4] http://www.gnu.org/software/ddd/.
    [56]Andreas Zeller. Isolating cause-effect chains from computer programs,2002.
    [58]Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. Locating faults through automated predicate switching. In ICSE'06:Proceedings of the 28th international conference on Software engineering, pages 272-281, New York, NY, USA,2006. ACM.
    [59]Rong Chen, Lina Hong, Chunyan lu, and WuDeng. Author identification of software source code with program dependence graphs. The 2nd IEEE International Workshop on Computer Forensics in Software Engineering,2010.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700