软件仓库挖掘领域:贡献者和研究热点
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Mining Software Repositories:Contributors and Hot Topics
  • 作者:江贺 ; 陈信 ; 张静宣 ; 韩雪娇 ; 徐秀娟
  • 英文作者:Jiang He;Chen Xin;Zhang Jingxuan;Han Xuejiao;Xu Xiujuan;School of Software,Dalian University of Technology;
  • 关键词:文献分析 ; 合作模式分析 ; 数据挖掘 ; 软件仓库挖掘 ; 大数据
  • 英文关键词:publication analysis;;collaboration pattern analysis;;data mining;;mining software repositories;;big data
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:大连理工大学软件学院;
  • 出版日期:2016-12-15
  • 出版单位:计算机研究与发展
  • 年:2016
  • 期:v.53
  • 基金:国家自然科学基金项目(61370144);; 教育部新世纪优秀人才支持计划基金项目(NCET-13-0073)~~
  • 语种:中文;
  • 页:JFYZ201612008
  • 页数:15
  • CN:12
  • ISSN:11-1777/TP
  • 分类号:105-119
摘要
随着时间的推移,软件不断地更新和演化,软件仓库中累积了海量的数据,如何有效地收集、组织、利用软件工程中涌现的软件大数据是一个至关重要的问题.软件仓库挖掘(mining software repositories,MSR)通过挖掘软件仓库中繁杂多变的数据中蕴含的知识来提高软件的质量和生产效率.虽然一些研究工作详细阐述了MSR的背景、历史和前景,但现有的研究工作并未系统地呈现MSR领域中最有影响力的作者、机构、国家以及最受欢迎的研究主题和主题变迁等领域知识.因此,结合已有的经典的文献分析框架和算法来分析MSR相关文献,并呈现一些MSR基本领域知识.为了实现MSR文献分析,建立了一个包含3个组件的MSR文献分析框架(MSR publication analysis framework,MSR-PAF),这3个组件分别被用来创建数据集、执行基础文献分析、实施合作模式分析.基础文献分析结果表明:最高产的作者、机构、国家?地区分别是Ahmed E.Hassan,University of Victoria和美国,最有影响力作者是Ahmed E.Hassan,最频繁的关键词是software maintenance.合作模式分析的结果显示Abram Hindle是MSR领域最活跃的作者,open source project和software maintenance是最流行的研究主题.
        Software updates and evolves continuously over time,software repositories accumulate massive data.How to effectively collect,organize,and make use of these data has become a key problem in software engineering.Mining Software Repositories(MSR)aim to mine useful knowledge contained in complex and diversified data to improve the quality and productivity of software.Although some studies have elaborately summarized the background,history,and prospects about MSR,existing studies do not present systematically the most influential author,institution,and country as well as the major research topics and their transitions over time.Therefore,this study combines the existing classical publication analysis frameworks and algorithms to analyze the relationships among publications related to MSR,and presents some important domain knowledge for researchers in detail.To effectively tackle this task,we construct a framework named MSR Publication Analysis Framework(MSR-PAF).MSR-PAF consists of three components which can be used to create a dataset for the study,conduct a bibliography analysis,and implement a collaboration pattern analysis,respectively.The results of the bibliography analysis show that the most productive author,institution,and country are Ahmed E. Hassan, University of Victoria,and USA,respectively.The most frequent keyword is software maintenance and the most influential author is Abram Hindle.In addition,the results of the collaboration pattern analysis show that Abram Hindle is the most active author,and open source project and software maintenance are the most popular research topics.
引文
[1]Zhou Minghui,Guo Changguo.New thought of software engineering based big data[J].Communications of the CCF,2014,10(3):37-42(in Chinese)(周明辉,郭长国.基于大数据的软件工程新思维[J].中国计算机学会通讯,2014,10(3):37-42)
    [2]Zhang Dongmei,Han Shi,Lou Jianguang,et al.Software analytics-key points and practice[J].Communications of the CCF,2014,10(3):29-36(in Chinese)(张冬梅,韩石,楼建光,等.软件解析学---要点与实践[J].中国计算机学会通讯,2014,10(3):29-36)
    [3]He Keqing,Li Bing,Ma Yutao,et al.Key techniques of software engineering in the era of big data[J].Communications of the CCF,2014,10(3):8-18(in Chinese)(何克清,李兵,马于涛,等.大数据时代的软件工程关键技术[J].中国计算机学会通讯,2014,10(3):8-18)
    [4]Xie Tao,Pei Jian,Hassan A E.Mining software engineering data[C]Proc of IEEE ICSE07 Compaion.Piscataway,NJ:IEEE,2007:172-173
    [5]Xie Tao,Thummalapenta S,Lo D,et al.Data mining for software engineering[J].Computer,2009,42(8):55-62
    [6]Li Xiaochen,Jiang He,Ren Zhilei.Data driven feature extraction for mining software repositories[J].Computer Science,2015,42(9):159-164(in Chinese)(李晓晨,江贺,任志磊.面向软件仓库挖掘的数据驱动特征提取方法[J].计算机科学,2015,42(9):159-164)
    [7]Xuan Jifeng,Jiang He,Ren Zhilei,et al.Developer prioritization in bug repositories[C]Proc of IEEEICSE07.Piscataway,NJ:IEEE,2012:25-35
    [8]LüLinyuan,Zhang Yicheng,Yeung C H,et al.Leaders in social networks,the delicious case[J].PloS One,2011,6(6):e21202
    [9]Hassan A E,Xie Tao.Software intelligence:The future of mining software engineering data[C]Proc of the 10th ACM FSE/SDP Workshop on Future of Software Engineering Research.New York:ACM,2010:161-166
    [10]Eunjoo L E E,Chisu W U.A survey on mining software repositories[J].IEICE Trans on Information and Systems,2012,95(5):1384-1406
    [11]Wang Feiyue.Publication and impact:A bibliographic analysis[J].IEEE Trans on Intelligent Transportation Systems,2010,11(2):250-250
    [12]Li Linjing,Li Xin,Li Zhenjiang,et al.A bibliographic analysis of the IEEE Transactions on Intelligent Transportation Systems literature[J].IEEE Trans on Intelligent Transportation Systems,2010,11(2):251-255
    [13]Heilig L,VoβS.A scientometric analysis of cloud computing literature[J].IEEE Trans on Cloud Computing,2014,2(3):266-278
    [14]Park D H,Kim H K,Choi I Y,et al.A literature review and classification of recommender systems research[J].Expert Systems with Applications,2012,39(11):10059-10072
    [15]Li Linjing,Li Xin,Cheng Changjian,et al.Research collaboration and ITS topic evolution:10years at T-ITS[J].IEEE Trans on Intelligent Transportation Systems,2010,11(3):517-523
    [16]Xu Xiujuan,Wang Wei,Liu Yu,et al.A bibliographic analysis and collaboration patterns of IEEE Transactions on Intelligent Transportation Systems between 2000and 2015[J].IEEE Trans on Intelligent Transportation Systems,2016,17(8):2238-2247
    [17]Lindsey D.Production and citation measures in the sociology of science:The problem of multiple authorship[J].Social Studies of Science,1980,10(2):145-162
    [18]Ward P L.Foundations of Library and Information Science[M].New York:Anmol Publications,2006:3287-3292
    [19]Holsapple C W,Johnson L E,Manakyan H,et al.Business computing research journals:A normalized citation analysis[J].Journal of Management Information Systems,2015,11(1):131-140
    [20]Borgatti S P.Netdraw network visualization[R/OL].Cambridge:Analytic Technologies,2002[2016-08-01].http:www.analytictech.com/netdraw/netdraw.htm
    [21]Podgurski A,Leon D,Francis P,et al.Automated support for classifying software failure reports[C]Proc of IEEEICSE03.Piscataway,NJ:IEEE,2003:465-475
    [22]Dang Yingnong,Wu Rongxin,Zhang Hongyu,et al.ReBucket:A method for clustering duplicate crash reports based on call stack similarity[C]Proc of IEEE ICSE12.Piscataway,NJ:IEEE,2012:1084-1093
    [23]Kim S H,Zimmermann T,Nagappan N.Crash graphs:An aggregated view of multiple crashes to improve crash triage[C]Proc of the 41st IEEE/IFIP Int Conf on Dependable Systems&Networks(DSN).Piscataway,NJ:IEEE,2011:486-493
    [24]Zimmermann T,Nagappan N.Predicting defects using network analysis on dependency graphs[C]Proc of ACMICSE08.New York:ACM,2008:531-540
    [25]Chang R Y,Podgurski A,Yang J.Discovering neglected conditions in software by mining dependence graphs[J].IEEE Trans on Software Engineering,2008,34(5):579-596
    [26]Runeson P,Alexandersson M,Nyholm O.Detection of duplicate defect reports using natural language processing[C]Proc of IEEE ICSE07.Piscataway,NJ:IEEE,2007:499-510
    [27]Wang Xiaoyin,Zhang Lu,Xie Tao,et al.An approach to detecting duplicate bug reports using natural language and execution information[C]Proc of ACM ICSE08.New York:ACM,2008:461-470
    [28]Nguyen A T,Lo D,Nguyen T N,et al.Duplicate bug report detection with a combination of information retrieval and topic modeling[C]Proc of IEEE ASE12.Piscataway,NJ:IEEE,2012:70-79
    [29]Sun Chengnian,Lo D,Wang Xiaoyin,et al.A discriminative model approach for accurate duplicate bug report retrieval[C]Proc of ACM ICSE10.New York:ACM,2010:45-54
    [30]Tang Shaohu,Li Zhengxi,Chen Dewang,et al.Theme classification and analysis of core articles published in IEEETransactions on Intelligent Transportation Systems from2010to 2013[J].IEEE Trans on Intelligent Transportation Systems,2014,15(6):2710-2719
    [31]Hacohenkerner Y.Automatic extraction of keywords from abstracts[C]Proc of the 7th Int Conf on Knowledge-Based and Intelligent Information and Engineering Systems.Berlin,Springer,2003:843-849
    [32]OSCAR.The public-access stop word list[EB/OL].2016[2016-10-22].http:oscar-lab.org/chn/resource.htm
    [33]Hirsch J E.An index to quantify an individual's scientific research output[J].Proceedings of the National Academy of Sciences of the United States of America,2005,102(46):16559-16572
    [34]Alcaide G G,Gómez M C,Zurián J C V,et al.Scientific literature by Spanish authors on the analysis of citations and impact factor in biomedicine(1981-2005)[J].Revista Espaola De Documentación Científica,2008,31(3):344-365
    [35]Oppenheim C.Using the H-index to rank influential British researchers in information science and librarianship[J].Journal of the American Society for Information Science&Technology,2007,58(2):297-301
    [36]Bornmann L,Daniel H.What do we know about the h index/[J].Journal of the American Society for Information Science&Technology,2007,58(9):1381-1385
    [37]Alonso S,Cabrerizo F J,Herrera-Viedma E,et al.H-index:A review focused in its variants,computation and standardization for different scientific fields[J].Journal of Informetrics,2009,3(4):273-289
    [38]Serenko A,Bontis N.Meta-review of knowledge management and intellectual capital literature:Citation impact and research productivity rankings[J].Knowledge and Process Management,2004,11(3):185-198
    [39]Cheng C H,Holsapple C W,Lee A.Citation-based journal rankings for AI research:A business perspective[J].AIMagazine,1996,17(2):87-97
    [40]Girvan M,Newman M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences,2002,99(12):7821-7826
    [41]Clauset A,Newman M E J,Moore C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):066111
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.