基于程序执行的错误定位方法

作者：田志雨
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：错误定位 ; 自动化调试 ; 程序覆盖信息 ; 软件测试
英文关键词：fault localization ; automated debugging ; program coverage information ; software test
学位年度：2012
导师：张宏
学科代码：081203
学位授予单位：南京理工大学
论文提交日期：2012-01-01

摘要

软件调试是软件开发和维护中最为耗时耗力的部分,而其中的错误定位是最为重要且最为困难的。传统的错误定位方法往往是采用手工定位的方法(比如借助于调试工具设置断点),但是这类方法缺点是可能出错的语句的搜索空间巨大,且往往耗费大量的精力和时间。因此目前在错误定位方面的主要研究内容是自动化错误定位,这类方法主要的手段是借助于设计良好的测试用例集,通过对测试结果和程序特征的自动化分析,计算出可能出错语句的集合。基于程序行为特征的自动化错误定位方法一般根据覆盖信息计算出程序语句的可疑度,然后按照可疑度由高到低的顺序逐条检查程序的可疑语句。
     为了提高定位错误的效率,本文对现有的基于程序行为特征的方法进行改进。
     首先,通过对目前的自动化错误定位方法的研究对比,给出了几种基于程序执行的错误定位方法：基于测试用例对语句可疑度的贡献随着测试用例数目的增加而降低(简记为贡献率的降低)的思想给出了方法Heuristic_1,基于程序执行补集的概念给出Heuristic_2和Heuristic_3,基于贡献率的降低的思想和Tarantula的思想给出了Heuristic_4。
     其次,给出了一个具有可视化界面的通用的自动化错误定位架构的实现流程,其中的可视化是为了方便调试人员查看可疑的语句。
     最后,为了验证方法的有效性,本文根据前面给出的流程实现了一个自动化的错误定位工具Visual_debug,采用测试数据集Siemens Suite作为研究对象,将本文提出的方法和现有的几种方法进行比较。
     实验结果表明,较之于同类思想的方法,Heuristic_1方法的效率略有提高,而Heuristic_2、Heuristic_3和Heuristic_4方法表现出60%左右程序只需要检查20%左右的代码。由此可见本文提出的方法在错误定位的效率方面有所提高,即只需要检查更少的语句就能够定位出程序中的错误。
It is estimated that debugging consumes most time of software development and maintenance. Fault localization is the most important and difficult in debugging. In traditional methods the programmers often set breakpoints and check if the run-time states are right, but the drawback is that the programmers have a very large set of statements to check, and it is time-consuming. So the automated fault localization is proposed, these methods use well designed test cases, and then analyze the test result and program features to get the statements that are suspicious. Always the methods based on program spectra make use of the coverage information to get the suspicious statements, and then the programmer check the statements in the descending order of statement's suspicious.
     In order to improve the efficiency of methods base on program spectra, we make improvement to the existing methods and propose some new methods.
     Firstly, by comparing with the existing automatic fault localization methods, we propose some new methods based on program spectra. We propose Heuristic_1 based on the thought that the contribution of one test case to a statement's suspicious should be decrease when the test case number increases. Thinking of the execution complement we propose Heuristic_2 and Heuristic_3. Based on the idea of Tarantula and contribution, we propose Heuristic_4.
     Secondly, we proposed a general architecture for automated fault localization, in which the visualization component was used to help the developers to check the suspicious statements.
     Finally, we implemented a debugging tool Visual_debug to evaluate the efficiency of the approaches proposed in this paper by comparing with other methods.
     Compared with the existing methods having the same idea, the experimental results show that Heuristic_1 is slightly better, other methods proposed in this paper only need to check 20 percent of the total executed statements in 60 percent of the Siemens suite. So the methods proposed in this paper can reduce the number of statements that need to be examined to find the faulty statement, and thus improve the effect of fault localization.

引文

[1]J. S. Collofello and S. N. Woodfield. Evaluating the effectiveness of reliability-assurance techniques[J]. Journal of Systems and Software,1989,9(3):191-195.
    [2]T. Ball and S. G. Eick. Software visualization in the large [J]. Computer, Apr 1996, 29(4):33-43.
    [3]I. Vessey. Expertise in debugging computer programs [J]. International Journal of Man-Machine Studies:A process analysis,1985,23(5):459-494.
    [4]Mark Weiser, Program Slicing [J]. In IEEE Transactions on Software3 Engineering,1984, Se-10(4):352-357.
    [5]H. Agrawal, J. Horgan, S. London and W. Wong. Fault Localization using execution slices and dataflow tests [J]. In International Symposium on Software Reliability Engineering (ISSRE'95),1995:143-151.
    [6]B. Korel and J. Laski, Dynamic Program Slicing [J]. In Information Processing Letters, 1988,29(3):155-163.
    [7]X. Zhang, R. Gupta and Y. Zhang, Precise Dynamic Slicing Algorithms [C]. In the 25th International Conference on Software Engineering (ICSE'03), May 2003:319-329.
    [8]A. Beszedes, T. Gergely, Z.M. Szabo, J. Csirik, and T. Gyimothy. Dynamic Slicing Method for Maintenance of Large C Programs [C]. In the 5th European Conference on Software Maintenance and Reengineering, Mach 2001:105-113.
    [9]X. Zhang, R. Gupta and Y. Zhang, Effective Forward Computation of Dynamic Slices Using Reduced Ordered Binary Decision Diagrams [C]. In the 26th International Conference on Software Engineering,2004:502-511.
    [10]T. Gyimothym, A. Beszedes, I. Forgacs. An Efficient Relevant Slicing Method for Debugging [C]. In 7th European Software Engineering Conference and 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering,1999: 303-321.
    [11]J. Ferrante, K. J. Ottenstein, J. D. Warren, The program dependence graph and its use in optimization [J]. In ACM Transactions on Programming Languages and Systems, July 1987,9(3):319-349.
    [12]X. Zhang, H. He, N. Gupta and R. Gupta. Experimental evaluation of using dynamic slices for fault location[C]. In the 6th International Symposium on Automated and Analysis-Driven Debugging(AADEBUG'05),2005:33-42.
    [13]A. Zeller and R. Hildebrandt. Simplifying and Isolating Failure-inducing Input [J]. In IEEE Transaction on Software Engineering, Feb 2002,28(2):183-200.
    [14]A. Zeller. Isolating Cause-effect Chains from Computer Programs [J]. In the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE-10),2002:1-10.
    [15]H. Cleve and A. Zeller. Locating Causes of Program Failures [C]. In the 27th International Conference on Software Engineering (ICSE'05),2005:342-351.
    [16]X. Zhang, N. Gupta, and T. Gupta. Locating Faults Through Automated Predicate Switching [C]. In the 28th International Conference on Software Engineering (ICSE'06), May 2006:272-281.
    [17]T. Reps, T.Ball, M.Das and J.Larus. The use of program profiling for software maintenance with applications to the Year 2000 Problem [C]. In the proceedings of 6th European Software Engineering Conference Held Jointly and the 5th ACM SIG-SOFT International Symposium on Foundations of Software Engineering,1997:432-449.
    [18]M.J. Harrold, G.Rothermel, K.Sayre, R.Wu and L.Yi. An empirical investigation of the relationship between spectra differences and regression faults [J]. In Software Testing Verification and Reliability,2000:171-194.
    [19]H. Agrawal, J. Horgan, S. London, and W. Wong. Fault localization using execution slices and data flow tests [J]. In International Symposium on Software Reliability Engineering (ISSRE'95),1995:143-151.
    [20]J. S. Colofello and L. Cousins. Towards automatic software fault location through decision to decision path analysis [C].In proceedings of the 1987 National Computer Conference,1987:539-544.
    [21]T. Y. Chen and Y. Y. Cheng. On program dicing [J]. In Software Maintenance:Research and Experience, Feb 1997,9(1):33-46.
    [22]H. Pan and E. H. Spafford. Heuristics for automatic localization of software faults [J]. Technical Report SERC-TR-116-P, Purdue University,1992.
    [23]M. Renieris and S.P.Reiss. Fault localization with nearest neighbor queries [C]. In the 18th International Conference on Automated Software Engineering,2003:30-39.
    [24]J. A. Jones, M. J. Harrold and J. Stasko. Visualization of Test Information to Assist Fault Localization [C]. In the 24th International Conference on Software Engineering (ISCE'02), May 2002:467-477.
    [25]J. A. Jones and M. J. Harrold. Empirical evaluation of the Tarantula automatic fault-localization technique [C]. In the 20th ACM international Conference on Automated Software Engineering (ASE'05),2005:273-282.
    [26]E. Wong, T. Wei, Y. Qi and L. Zhao. A Crosstab-based statistical method for effective fault localization [C]. In Proceedings of the 2008 International Conference on Software Testing, Verification and Validation (ICST'08),2008:42-51.
    [27]E. Wong, T. Wei, Y. Qi and L. Zhao. Effective fault location using code coverage [C]. In Proceedings of the 31th IEEE Computer Software and Applications Conference,2007: 449-456.
    [28]E. Wong, V. Debroy and B. Choi. A family of code coverage-based heuristics for effective fault location [J]. The Journal of Systems and Software,2010,13(10):188-208.
    [29]B. Liblit, M. Naik, A.X. Zheng, A. Aiken and M. I. Jordan. Scalable statistical bug isolation [C]. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05),2005:15-26.
    [30]C. Liu, L. Fei, X. Yan, J. Han and S.P. Midkiff. Statistical debugging:A hypothesis testing-based approach [J]. IEEE Transactions on Software Engineering,2006, 32(10):831-848.
    [31]Miguel C, Manuel C, Tim H. Securing software by enforcing data-flow integrity. USENIX Symposium on Operating Systems Design and Implementation [J]. Berkeley, CA, USA:USENIX Association,2006:147-160.
    [32]林果园,郭山清,黄皓,曹天杰.基于动态行为和特征模式的异常检测模型[J].计算机学报,2006.9.
    [33]尾春光,陈华,张峰.基于数据流的脆弱性静态分析[J].计算机工程,2008.11：124-128.
    [34]汪小飞,赵克佳,田组伟.数据流分析的关键技术研究[J].计算机科学,2005,32(2)：91-93.
    [35]Martin A, Mihai B, Elfar E, et al. Control-flow integrity principles, implementations, and applications [J]. ACM Transactions on Information and System Security (TISSEC), 2009.13(1).
    [36]陆炜,曾庆凯.一种基于控制流的程序行为扩展模型[J].软件学报,2007(11)：2841-2849.
    [37]谭德贵,陈林等.通过增大边际权重提高基于频谱的错误定位效率[J].计算机学报.2010(12)
    [38]胡从兴,陈林等.结合语句执行补集的程序错误定位[J].计算机科学与探索.2011.5(6).
    [39]SIR [EB/OL]. http://sir.unl.edu/portal/index.php.
    [40]GCC [EB/OL]. http://gcc.gnu.org/onlinedocs/gcc
    [41]GPROF [EB/OL]. http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
    [42]LCOV [EB/OL]. http://ltp.sourceforge.net/coverage/lcov.php

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700