Automatic identifier inconsistency detection using code dictionary
详细信息    查看全文
  • 作者:Suntae Kim ; Dongsun Kim
  • 关键词:Inconsistent identifiers ; Code dictionary ; Source code ; Refactoring ; Code readability ; Part ; of ; speech analysis
  • 刊名:Empirical Software Engineering
  • 出版年:2016
  • 出版时间:April 2016
  • 年:2016
  • 卷:21
  • 期:2
  • 页码:565-604
  • 全文大小:2,381 KB
  • 参考文献:Deiβenböck F, Pizka M (2005) Concise and Consistent Naming. In: Proceedings of International Workshop on Program Comprehension(IWPC), St. Louis, pp 261–282
    Lawrie D, Field H, Binkley D (2006) Syntactic Identifier Conciseness and Consistency. In: Proceedings of IEEE International Workshop on Source Code Analysis and Manipulation(SCAM). Philadelphia, Pennsylvania, pp 139–148
    Martin RC (2008) Clean Code: A Handbook of Agile Software Craftsmanship, 1st edn. Prentice Hall
    Higo Y, Kusumoto S (2012) How Often Do Unintended Inconsistencies Happen?-Deriving Modification Pattern and Detecting Overlooked Code Fragments-. In: Proceedings of the 28th international conference on software maintenance, Trento, pp 222–231
    Abebe SF, Haiduc S, Tonella P, Marcus A (2008) Lexicon Bad Smells in Software. In: Proceedings of working conference on reverse engineering, Antwerp Belgium, pp 95–99
    Hughes E (2004) Checking Spelling in Source Code. IEEE Software, ACM SIGPLAN Not 39(12):32–38CrossRef
    Delorey DP, Kutson CD, Davies M (2009) Mining Programming Language Vocabularies from Source Code. In: Proceedings of the 21st conference of the psychology of programming group(PPIG), London
    Lawire D, Binkley D, Morrel C (2010) Normalizaing Source Code Vocabulary. In: Proceedings of the 17th working conference on reverse engineering, Boston, pp 3–12
    Abebe SL, Tonella P (2010) Natural Language Parsing of Program Element Names for Concept Extraction. In: proceedings of international conference on program comprehension(ICPC), Minho, pp 156–159
    Falleri J, Lafourcade M, Nebut C, Prince V, Dao M (2010) Automatic Extraction of a WordNet-like Identifier Network from Software. In: Proceedings of international conference on Program comprehension(ICPC), Minho, pp 4–13
    Abebe S, Tonella P (2013) Automated identifier completion and replacement. In: Proceedings of the european conference on software maintenance and reengineering (CSMR), Genova, pp 263–272
    Host EW, Ostvold BM (2009) Debugging Method Names, Proceedings of the 23rd European Conference on Object-Oriented Programming. Lect. Notes Comput. Sci 5653(1):294–317CrossRef
    Lee S, Kim S, Kim J, Park S (2012) Detecting Inconsistent Names of Source Code Using NLP. Computer Applications for Database, Education, and Ubiquitous Computing Communications in Computer and Information Science 352(1):111–115CrossRef
    Code Conventions for the Java Programming Language: Why Have Code Conventions Sun Microsystems (1999). http://​www.​oracle.​com/​technetwork/​java/​index-135089.​html
    Lawrie D, Feild H, Binkley D (2007) Quantifying identifier quality: an analysis of trends. Empir Softw Eng 12(4):359–388CrossRef
    Madani N, Guerroju L, Penta MD, Gueheneuc Y, Antoniol G (2010) Recognizing Words from Source Code Identifiers using Speech Recognition Techniques. In: Proceedings of 14th european conference on software maintenance and reengineering(CSMR), Madrid, pp 68–77
    Goodliffe P (2006) Code Craft: The Practice of Writing Excellent Code. No Starch Press
    WordNet: A lexical database for English Home page (2014). http://​wordnet.​princeton.​edu/​
    Haber RN, Schindler RM (1981) Errors in proofreading: Evidence of Syntactic Control of Letter Processing. J Exp Psychol Hum Percept Perform 7(1):573–579CrossRef
    Monk AF, Hulme C (1983) Errors in proofreading: Evidence for the Use of Word Shape in Word Recognition. Mem Cogn 11(1):16–23CrossRef
    Caprile B, Tonella P (1999) Nomen Est Omen: Analyzing the Language of Funtion Identifiers. In: Proceedings of working conference on reverse engineering, Altanta, pp 112–122
    The Stanford Parser: A statistical parser Home page (2014). http://​nlp.​stanford.​edu/​software/​lex-parser.​shtml
    Apache OpenNLP Homepage (2014). http://​opennlp.​apache.​org/​
    Binkley D, Hearn M, Lawrie D (2011) Improving Identifier Informativeness using Part of Speech Information. In: Proceedings of the 8th working conference on mining software repositories, New York, pp 203–2006
    Guapa S, Malik S, Pollock L, Vijay-Shanker K (2013) Part-of-Speech Tagging of Program Identifiers for Improved Text-Based Software Engineering Tools. In: Proceedings of 21st international conference on program comprehension (ICPC), San Francisco, pp 3–12
    MINIPAR Homepage (2014). http://​webdocs.​cs.​ualberta.​ca/​lindek/​minipar.​htm
    Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL, pp 252–259
    The Penn Treebank Project (2013). http://​www.​cis.​upenn.​edu/​treebank/​
    Budanitsky A, Hirst G (2006) Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Comput Linguis 32(1):13–47CrossRef MATH
    Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Doklady 10(8):707–710MathSciNet MATH
    Frakes WB, Baeza-Yates R (1992) Information Retrival : Data Structures and Algorithms. J.J.: Prentice-Hall, Englewood Cliffs
    Apache Lucene Homegage (2013). http://​lucene.​apache.​org/​core/​
    Apache Ant Homepage (2013). http://​ant.​apache.​org/​
    Apache JMeter Homepage (2013). http://​jmeter.​apache.​org/​
    JUnit Homepage (2013). http://​www.​junit.​org/​
    JHotDraw 7 Homepage (2013). http://​www.​randelshofer.​ch/​oop/​jhotdraw/​
    Sweet Home 3D Homepage (2013). http://​sourceforge.​net/​projects/​sweethome3d
    Klein D, Manning CD (2003) Accurate Unlexicalized Parsing. In: Proceedings of the meeting of the association for computational linguistics, Sapporo, pp 423–430
    Code Amigo Validation WebPage (2014). http://​54.​250.​194.​210/​
    Powers DM (2011) Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. J Mach Learn Technol 1(1):37–63
    Eclipse-CS Check Style Homepage (2013). http://​eclipse-cs.​sourceforge.​net/​
    Find Bugs in Java Programs Homepage (2013). http://​findbugs.​sourceforge.​net/​
    Bloch J (2001) Effective Java Programming Language Guide. Sun Microsystems
    Bolch J (2008) Effective Java (2nd Edition), 2nd edn. Addison-Wesley
    Arnaoudova V, Penta MD, Antoniol G, Gueheneuc Y (2013) A New Family of Software Anti-Patterns: Linguistic Anti-Patterns. In: Proceedings of the european conference on software maintenance and reengineering (CSMR), Genova, pp 187–196
  • 作者单位:Suntae Kim (1)
    Dongsun Kim (2)

    1. Department of Software Engineering, Chonbuk National University, 567 Baekje-daero, Deokjin-gu, Jeollabuk-do, 561-756, Jeonju-si, Republic of Korea
    2. Computer Science and Communications Research Unit, Faculty of Science, Technology and Communication, and Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, 4 rue Alphonse Weicker, L-2721, Luxembourg-Ville, Luxembourg
  • 刊物类别:Computer Science
  • 刊物主题:Software Engineering, Programming and Operating Systems
    Programming Languages, Compilers and Interpreters
  • 出版者:Springer Netherlands
  • ISSN:1573-7616
文摘
Inconsistent identifiers make it difficult for developers to understand source code. In particular, large software systems written by several developers can be vulnerable to identifier inconsistency. Unfortunately, it is not easy to detect inconsistent identifiers that are already used in source code. Although several techniques have been proposed to address this issue, many of these techniques can result in false alarms since such techniques do not accept domain words and idiom identifiers that are widely used in programming practice. This paper proposes an approach to detecting inconsistent identifiers based on a custom code dictionary. It first automatically builds a Code Dictionary from the existing API documents of popular Java projects by using an Natural Language Processing (NLP) parser. This dictionary records domain words with dominant part-of-speech (POS) and idiom identifiers. This set of domain words and idioms can improve the accuracy when detecting inconsistencies by reducing false alarms. The approach then takes a target program and detects inconsistent identifiers of the program by leveraging the Code Dictionary. We provide CodeAmigo, a GUI-based tool support for our approach. We evaluated our approach on seven Java based open-/proprietary- source projects. The results of the evaluations show that the approach can detect inconsistent identifiers with 85.4 % precision and 83.59 % recall values. In addition, we conducted an interview with developers who used our approach, and the interview confirmed that inconsistent identifiers frequently and inevitably occur in most software projects. The interviewees then stated that our approach can help to better detect inconsistent identifiers that would have been missed through manual detection.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700