基于图嵌入的软件项目源代码检索方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Approach to Searching Software Source Code with Graph Embedding
  • 作者:凌春阳 ; 邹艳珍 ; 林泽琦 ; 谢冰 ; 赵俊峰
  • 英文作者:LING Chun-Yang;ZOU Yan-Zhen;LIN Ze-Qi;XIE Bing;ZHAO Jun-Feng;Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education;School of Electronics Engineering and Computer Science, Peking University;Peking University Information Technology Institute (Tianjin Binhai);
  • 关键词:API检索 ; 代码检索 ; 代码图 ; 图嵌入
  • 英文关键词:API search;;code search;;code graph;;graph embedding
  • 中文刊名:RJXB
  • 英文刊名:Journal of Software
  • 机构:高可信软件技术教育部重点实验室(北京大学);北京大学信息科学技术学院;北京大学(天津滨海)新一代信息技术研究院;
  • 出版日期:2019-05-15
  • 出版单位:软件学报
  • 年:2019
  • 期:v.30
  • 基金:国家重点研发计划(2016YFB1000801);; 国家杰出青年科学基金(61525201)~~
  • 语种:中文;
  • 页:RJXB201905019
  • 页数:17
  • CN:05
  • ISSN:11-2560/TP
  • 分类号:283-299
摘要
源代码检索是软件工程领域的一项重要研究问题,其主要任务是检索和复用软件项目API(application programinterface,应用程序接口).随着软件项目的规模越来越大、越来越复杂,当前,源代码检索一方面需要提高基于自然语言API查询的准确性,另一方面需要定位和展示目标API及其相关代码之间的关联,以更好地辅助用户理解API的实现逻辑和使用场景.为此,提出一种基于图嵌入的软件项目源代码检索方法.该方法能够基于软件项目源代码自动构建其代码结构图,并通过图嵌入对源代码进行信息表示.在此基础上,用户可以输入自然语言问题、检索并返回相关的API及其关联信息构成的连通代码子图,从而提高API检索和复用的效率.在以开源项目Apache Lucene和POI为例的检索实验中,该方法检索结果的F1值比现有基于最短路径的方法提高了10%,同时显著缩短了平均响应时间.
        Searching software source code and locating software's API(application program interface) are important research issues in software engineering. As software projects are becoming more and more complex, existing search tools mainly face the following two challenges. First, more accurate search results are required in natural language question based search process. Second, the relationships between API are required to illustrate so that these API' underlying logic and usage scenarios are able to be understood more quickly. In this study, an ovel approach is proposed to searching a software project's API based on graph embedding. It aims to improve the accuracy of natural language based code graph search. A software project's code graph is built automatically from its source code and they are represented through graph embedding. For a natural language question, a code-connected subgraph, composed by relevant API and their associated relationships, are returned as the best answer. In experiments, Apache Lucene and POI projects are selected as examples to perform some API search tasks. Experimental results show that the proposed approach improves F1-score by 10% than existing shortest path based approach, while reduces average response time significantly.
引文
[1]Scaffidi C.Why are API difficult to learn and use?ACM Crossroads,2006,12(4):Article 4.
    [2]Hoffmann R,Fogarty J,Weld DS.Assieme:Finding and leveragingimplicit references in a Web search interface for programmers.In:Proc.of the 20th Annual ACM Symp.on User Interface Software and Technology(UIST 2007).2007.13-22.
    [3]Stylos J,Myers BA.Mica:A Web-search tool for finding API components and examples.In:Proc.of the Visual Languages and Human-centric Computing(VLHCC 2006).2006.195-202.
    [4]Linstead E,Bajracharya S,Ngo T,Rigor P,Lopes C,Baldi P.Sourcerer:Mining and searching Internet-scale software repositories.Data Mining and Knowledge Discovery,2009,18(2):300-336.
    [5]Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval:The Concepts and Technology behind Search.Addison-Wesley,2011.
    [6]LüF,Zhang HY,Lou JG,Wang SW,Zhang DM,Zhao JJ.Codehow:Effective code search based on API understanding and extended Boolean model.In:Proc.of the 30th IEEE/ACM Int’l Conf.on Automated Software Engineering(ASE 2015).2015.260-270.
    [7]Hill E,Pollock LL,Vijay-Shanker K.Improving source code search with natural language phrasal representations of method signatures.In:Proc.of the 26th IEEE/ACM Int’l Conf.on Automated Software Engineering(ASE 2011).2011.524-527.
    [8]Rahman MM,Roy CK.QUICKAR:Automatic query reformulation for concept location using crowdsourced knowledge.In:Proc.of the 31st IEEE/ACM Int’l Conf.on Automated Software Engineering.2016.220-225.
    [9]McMillan C,Grechanik M,Poshyvanyk D,Xie Q,Fu C.Portfolio:Finding relevant functions and their usage.In:Proc.of the ICSE.2011.111-120.
    [10]Chan WK,Cheng H,Lo D.Searching connected API subgraph via text phrases.In:Proc.of the SIGSOFT FSE.2012.Article 10.
    [11]Goyal P,Ferrara E.Graph embedding techniques,applications,and performance:A survey.arXiv:1705.02801[cs.SI],2017.
    [12]Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering.In:Proc.of the 14th Int’l Conf.on Neural Information Processing Systems.2001.585-591.
    [13]Ahmed A,Shervashidze N,Narayanamurthy S,Josifovski V,Smola AJ.Distributed large-scale natural graph factorization.In:Proc.of the 22nd Int’l Conf.on World Wide Web(WWW 2013).2013.37-48.
    [14]Perozzi B,Al-Rfou R,Skiena S.Deepwalk:Online learning of social representations.In:Proc.of the 20th Int’l Conf.on Knowledge Discovery and Data Mining.2014.701-710.
    [15]Wang D,Cui P,Zhu W.Structural deep network embedding.In:Proc.of the 22nd Int’l Conf.on Knowledge Discovery and Data Mining.ACM Press,2016.1225-1234.
    [16]Kipf TN,Welling M.Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907,2017.
    [17]Tang J,Qu M,Wang M,Zhang M,Yan J,Mei Q.LINE:Large-scale information network embedding.In:Proc.of the 24th Int’l Conf.on World Wide Web(WWW 2015).2015.1067-1077.
    [18]Sirres R,Bissyand′e TF,Kim DS,Lo D,Klein J,Kim K,Le Traon Y.Augmenting and structuring user queries to support efficient free-form code search.Empirical Software Engineering,2018,23(5):2622-2654.
    [19]Chatterjee S,Juvekar S,Sen K.Sniff:A search engine for Java using free-form queries.In:Proc.of the 12th Int’l Conf.on Fundamental Approaches to Software Engineering(FASE 2009).2009.385-400.
    [20]Tian Y,Lo D,Lawall JL.Automated construction of a software specific word similarity database.In:Proc.of the CSMR-WCRE.2014.44-53.
    [21]Yang J,Tan L.Inferring semantically related words from software context.In:Proc.of the 9th IEEE Working Conf.of Mining Software Repositories(MSR 2012).2012.161-170.
    [22]Li X,Wang QX,Jin Z.Description reinforcement based code search.Ruan Jian Xue Bao/Journal of Software,2017,28(6):1405-1417(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/5226.htm[doi:10.13328/j.cnki.jos.005226]
    [23]Sridhara G,Hill E,Pollock LL,Vijay-Shanker K.Identifyingword relations in software:A comparative study of semantic similarity tools.In:Proc.of the 16th IEEE Int’l Conf.on Program Comprehension(ICPC 2008).2008.123-132.[23]黎宣,王千祥,金芝.基于增强描述的代码搜索方法.软件学报,2017,28(6):1405-1417.http://www.jos.org.cn/1000-9825/5226.htm[doi:10.13328/j.cnki.jos.005226]
    [24]Gu YS,Zeng GS.Accurate search method for source code by combining syntactic and semantic query.Journal of Computer Applications,2017,37(10):2958-2963(in Chinese with English abstract).[24]顾逸圣,曾国荪.基于语法和语义结合的源代码精确搜索方法.计算机应用,2017,37(10):2958-2963.
    [25]Wang S,Lo D,Jiang L.Active code search:Incorporatinguser feedback to improve code search relevance.In:Proc.of the 29th IEEE/ACM Int’l Conf.on Automated Software Engineering(ASE 2014).2014.
    [26]Haiduc S,Bavota G,Marcus A,Oliveto R,De Lucia A,Menzies T.Automatic query reformulations for text retrieval in software engineering.In:Proc.of the 2013 Int’l Conf.on Software Engineering(ICSE 2013).2013.842-851.
    [27]Huang Q,Xia X,Xing ZC,Lo D,Wang XY.API method recommendation without worrying about the task-API knowledge gap.In:Proc.of the 2018 33rd Int’l Conf.on Automated Software Engineering(ASE 22018).2018.292-303.
    [28]Jiang H,Nie L,Sun Z,et al.ROSF:Leveraging information retrieval and supervised learning for recommending code snippets.IEEE Trans.on Services Computing,2019,12(1):34-46.
    [29]Gu XD,Zhang HY,Zhang DM,Kim SH.Deep API learning.In:Proc.of the 24th ACM SIGSOFT Int’l Symp.on the Foundations of Software Engineering.ACM Press,2016.631-642.
    [30]Richardson K,Kuhn J.Function assistant:A tool for NL querying of API.In:Proc.of the EMNLP.2017.67-72.
    [31]Nguyen AT,Hilton M,Codoban M,Nguyen HA,Mast L,Rademacher E.API code recommendation using statistical learning from fine-grained changes.In:Proc.of the 24th ACM SIGSOFT Int’l Symp.on the Foundations of Software Engineering.ACM Press,2016.511-522.
    [32]Liu XY,Huang LG,Ng V.Effective API recommendation without historical softwarerepositories.In:Proc.of the 2018 33rd ACM/IEEE Int’l Conf.on Automated Software Engineering(ASE 2018).2018.282-292.
    [33]Sillito J,Murphy GC,De Volder K.Asking and answering questions during a programming change task.IEEE Trans.on Software Engineering,2008,34(4):434-451.
    [34]Sillito J,Murphy GC,De Volder K.Questions programmers ask during software evolution tasks.In:Proc.of the SIGSOFT FSE.2006.23-34.
    [35]Li X,Wang ZR,Wang QX,Yan SM,Xie T,Mei H.Relationship-aware code search for JavaScript frameworks.In:Proc.of the24th ACM SIGSOFT Symp.on the Foundations of Software Engineering.2016.690-701.
    [36]Fu K,Wu YJ,Peng X,Zhao WY.A feature location method based on call chain analysis.Computer Science,2017,44(4):56-59(in Chinese with English abstract).[36]付焜,吴毅坚,彭鑫,赵文耘.一种基于子图搜索的特征定位方法.计算机科学,2017,44(4):56-59.
    [37]Li Z,Niu J,Wang K,Xin YY.Optimization of source code search based on multi-feature weight assignment.Journal of Computer Applications,2018,38(3):812-817(in Chinese with English abstract).[37]李阵,钮俊,王奎,辛园园.基于多特征权重分配的源代码搜索优化.计算机应用,2018,38(3):812-817.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700