基于数据挖掘的软件测试技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于数据挖掘的软件测试技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Software Testing Based on Data Mining Technology
作者：高亚宁
论文级别：硕士
学科专业名称：计算机技术领域
中文关键词：软件测试 ; 数据挖掘 ; 数量化 ; 代码克隆 ; 程序依赖图 ; 遗传算法 ; 数据生成
英文关键词：software testing ; data mining ; quantitative ; code cloning ; program dependence graph ; genetic algorithm ; data generation
学位年度：2011
导师：孙文辉
学科代码：081202
学位授予单位：北京交通大学
论文提交日期：2011-06-17
答辩委员会主席：杨维

摘要

随着信息技术的飞速发展,软件系统在经济、金融、医疗、通信、交通、航天、航空、工业控制等领域得到了广泛的应用。因此软件的可靠性也越来越受到人们的重视。
     为了提高软件的可靠性,人们采用软件工程学方法来指导整个软件研发流程。在软件开发的整个过程中,在软件的编码过程中往往存在着大量的克隆代码,它们差别细微,分散在软件的不同部分,手工方法难以进行统一维护,在测试阶段中的代码审查阶段不容易对它们进行检测,这是软件测试的一个难点。在使用测试用例对程序进行测试的过程中用手工方式生成测试数据的方法工作量大、效率低、容易出错,而且无法保证测试的充分性。因此,如何生成测试数据也成为软件测试的难点之一。
     数据挖掘是涉及数据库、人工智能、数理统计、机器学习等的交叉学科。它可以发现大量数据间的隐含的关系。在本论文中作者研究了数据挖掘的技术与软件测试的特点,将两者结合起来来解决上述中软件测试中遇到的难点。
     针对代码克隆的现象,本文提出一种新的解决办法,一种数量化的方法来进行代码克隆的查找；针对手工编制结构测试数据极为困难且效率低下这一问题,根据其需要大量测试用例的特点,本文中使用遗传算法的方法来完测试用例的生成并自动完成其测试过程。
With the rapid development of the information technology, software systems in the economic, financial, medical, communications, transportation, aerospace, aviation, industrial control and other fields has been widely used. Therefore, the reliability of the software more and more attention has been paid.
     In order to improve software reliability, software engineering methodology has been applied to guide software development procedure. In the whole process of software development, software coding process in the presence of large numbers of clones are often the code, they differ slightly, scattered in different parts of the software, manual methods are difficult to maintain a unified, in the testing phase of the code review stage is not easy detect them, this is a difficult software testing. In the use of test cases to test the program during the test data generated by hand, heavy workload, low efficiency, error-prone, and can not guarantee the adequacy of the test. Therefore, how to generate test data has also become one of the difficulties of software testing.
     Data mining is related to databases, artificial intelligence, statistics, machine learning and other cross-disciplinary. It can be found hidden among large amounts of data relationships. In this paper, In this paper, investigated the effects of data mining technology and software testing features, and combining the two in software testing to solve the difficulties encountered.
     For the phenomenon of cloning for the code, this paper presents a new solution, a quantitative approach to the search for code clones; structure for the preparation of test data by hand is extremely difficult and inefficient this problem, a large number of test cases according to their needs characteristics of genetic algorithms used in this way to complete the generation of test cases and automate the testing process.

引文

[1]杨玉丽.软件可靠性研究现状与展望[J].电脑知识与技术,2010,6(1)：128～129
    [2]蒋乐天,徐国治.软件缺陷及软件可靠性技术[J].计算机仿真,2004,21(2)：141～144
    [3]Han j W. Kamber M. Data Mining:Concepts and Techniquesf[M]. Beiiing:Higher Education Press.2001.
    [4]Boris Beizer. Software Testing Techniques[M]. New York, Van Nostrand Reinhold,1990:
    [5]Jiawei Han, Micheline Kamber. Data Mining:Concepts and Techniques.2nd ed[M]. USA: Morgan kaufman.2006:5～139
    [6]Ian H. Witten, Eibe Frank. Practical Machine Learning Tools and Techniques.2nd ed[M]. USA: Academic Press.2005:4-82
    [7]黄爱明.国内软件测试现状及对策研究[J].中国管理信息化,2007,10(2)：42～44
    [8]李新,张晓静,米燕涛.软件开发过程中的数据挖掘[J].石家庄职业技术学院报,2007,19(2)：42～44
    [9]叶青青.软件源代码中代码克隆现象及其检测方法[J].计算机应用与软件,2008,25(9)：147～149
    [10]Beizer B, Wiley J. Black Box Testing:Techniques for Functional Testing of Software and Systems[J]. IEEE Software,1996,13(5):98-102
    [11]张晓鹏.用于软件行业的数据挖掘[J].计算机工程,2003,29(12)：179～181
    [12]张海藩.软件工程导论.第五版[M].北京：清华大学出版社,2008：152～153
    [13]郑人杰.计算机软件测试技术[M].北京：清华大学出版社,1992：1-96
    [14]Jeanne Ferrante, Karl J. Ottenstein, Joe D. Warren. The Program Dependence Graph and Its Use in Optimization[J]. ACM Transactions on Programming Languages and Systems,1987, 9(3)
    [15]Beizer B, Wiley J. Black Box Testing:Techniques for Functional Testing of Software and Systems[J]IEEE,1996,13(5):98～102
    [16]GlenfordJ Myers.计算机软件测试技巧[M].北京：清华大学出版社,1985：11～113
    [17]徐仁佐.软件可靠性模型及应用[M].北京：清华大学出版社,南宁：广西科学技术出版社,1994：21～76
    [18]吕晓玲,谢邦昌.数据挖掘方法与应用.第一版[M].北京：中国人民大学出版社,2009：16～20
    [19]吴晨.数据挖掘技术在软件可靠性测试与评估中的应用与研究[D].上海：同济大学,2008：22～23
    [20]Hao Zhong, Suresh Thummalapenta, Tao Xie etc. Mining API Mapping for Language Migration[C]. ACM,2010,32,1
    [21]Suresh Thummalapenta, Jonathan de Halleux, Nikolai Tillmann etc. DyGen:Automatic Generation of High-coverage Tests via Mining Gigabytes of Dynamic Traces[J]. computer science, 2010,6143(10):77～93
    [22]Tim Menzies, Jeremy Greenwald, Art Frank. Data Mining Static Code Attributes to Learn Defect Predictors[J]. IEEE,2007,33(1):2-13
    [23]R.P Jagadeesh, Chandra Bose, S. H. Srinivasan, Data Mining Approaches to Software Fault Diagnosis, Proceedings of the 15th International Workshop on Research Issues in Data Engineering Stream Data Mining and Applications.2005:45～52,
    [24]Schikora PY,Godfrey M. K Efficacy of end-user neural network and data mining software for predicting complex system performance, International Journal of Production Economics, Volume 84(3),2003:231-253(23)
    [25]刘慧巍,张雷,翟军吕.数据挖掘中决策树算法的研究及其改进[J].辽宁师专学报,2005,7(4)：23～24
    [26]Suresh Thummalapenta, Tao Xie, Nikolai Tillmann etc. MSeqGen:Object-Oriented Unit-Test Generation via Mining Source Code[D]. ACM, New York, NY, USA,2009
    [27]王宁.浅析潜在语义分析的原理及其应用[J].新世纪图书馆,2007,5：67～70
    [28]Simon J. Puglisi, William F. Smyth, Andrew Turpin etc. Efficient token based clone detection_with_flexible tokenization[D]. ACM, New York, NY, USA,2007
    [29]盖杰,王怡,武港山.潜在语义分析及其应用[J].计算机应用研究,2004,,21(3)：9～12(20)
    [30]David E Goldberg. Genetic Algorithms in Search, Optimization and Machine Leaming[M]. MA:Addison-Wesley,1989:10～104
    [31]Last M, KandelA. Automated Test Reduction Using all Info-Fuzzy Network, to appear in Annals of Software Engineering, Special Volume on Computational Intelligence in Software Engineering,2003
    [32]李卓,邓明荣.相似代码检测工具及其案例分析[J].计算机工程与科学,2010,32(4)：71～74(135)
    [33]Mark Last, Menahem Friedman, The Data Mining Approach to Automated Software Testing, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining.2003:388～396
    [34]Andy Podgurski, David Leon, Patrick Francis, Automated Support for Classifying oftware Failure Report, Proceedings of the 25th International Conference on Software Engineering.2003: 465～475
    [35]Jens Krinke. Identifying Similar Code with Program Dependence Graphs[C]. IEEE,2001, 16:301～309
    [36]Chanchal K. Roy, James R. Cordy, Rainer Koschke. Comparison and Evaluation of Code Clone Detection Techniques and Tools:A Qualitative Approach[J]. Science of Computer Programming.2009,74(7),470-495
    [37]Jonathan I. Maletic, Andrian Marcus. Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding[C]. IEEE,2000,12:46～53
    [38]Antoniol G, et al. Analyzing cloning evolution in the Linux kemel. Journal of Information and Software Technology,2002,44(13):755～765.
    [39]王小平,曹立明.遗传算法一理论、应用与软件实现[M].西安：西安交通人学出版社,2002：13-172
    [40]aker B S. On finding duplication and near-duplication in large software systems. In Proceedings of the 13th Working Conference On Reverse Engineering, IEEE CS Press,1995:86～95
    [41]吉根林.遗传算法研究综述[J].计算机应用与软件.2004,21(2)：69～73
    [42]Berndt D, Fisher J, Johnson L. Breeding Software Test Cases with Genetic Algorithms[C]. Proceedings of the 36th Annual Hawaii International Conference on System Sciences,2003:10～14
    [43]Tao Xie, Jian Pei, Ahmed E. Hassan. Mining Software Engineering Data[D]. Companion to the proceedings of the 29th International Conference on Software Engineering, p.172-173,2007: 172-173

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700