用户名: 密码: 验证码:
基于多开发者社区的用户推荐算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:User Recommendation Algorithm Based on Multi-developer Community
  • 作者:时宇岑 ; 印莹 ; 赵宇海 ; 张斌 ; 王国仁
  • 英文作者:SHI Yu-Cen;YIN Ying;ZHAO Yu-Hai;ZHANG Bin;WANG Guo-Ren;School of Computer Science and Engineering, Northeastern University;School of Computer Science and Technology, Beijing Institute of Technology;
  • 关键词:多开发者社区 ; 重启随机游走 ; Taxonomy ; 协同过滤 ; 推荐系统
  • 英文关键词:multi-developer community;;restart random walk;;Taxonomy;;collaborative filtering;;recommendation system
  • 中文刊名:RJXB
  • 英文刊名:Journal of Software
  • 机构:东北大学计算机科学与工程学院;北京理工大学计算机学院;
  • 出版日期:2019-05-15
  • 出版单位:软件学报
  • 年:2019
  • 期:v.30
  • 基金:国家重点研发计划(2018YFB1004402);; 国家自然科学基金(61772124,61872072)~~
  • 语种:中文;
  • 页:RJXB201905025
  • 页数:14
  • CN:05
  • ISSN:11-2560/TP
  • 分类号:363-376
摘要
随着互联网技术的迅猛发展,基于开发者社区的提问-回答经验交流方式已成为众多开发人员解决软件开发、维护过程中所遇问题的重要手段之一.如何为开发者社区中的提问者及时、准确地推荐问题回答者,是具有实际需求的重要问题.通过对StackOverflow和Github两个具有代表性的主流开发者社区相关数据的收集和分析,观察到影响上述问题推荐准确性和反馈及时性的3个现象:(1)用户标签自定现象,即开发者社区中,用户的标签信息是由用户自己主观定义所得,而非系统根据用户的历史行为客观标定;(2)不对称活跃现象,即用户可能在某个或某些开发者社区中活跃,但在其他社区中并不具有同等活跃程度,甚至不活跃;(3)关键词集封闭现象,即开发者社区中的问题回答者推荐仅依据问题文本中的关键词,而未考虑其他语义相关的关键词.针对以上问题,融合开发者社区的用户信息,通过分析用户与用户之间的互动行为,建立跨社区的开发者网络,并提出一种基于重启随机游走的算法更新用户标签.进一步地,通过使用Taxonomy来扩充问题的查询关键词范围,在此基础上,协同用户矩阵进行更加准确的推荐,并增大了推荐时有效用户的范围.收集的实验数据包括170万个有效主题、累计40万用户以及117个标签.实验结果证实,所提出的算法具有较好的F-measure和NDCG度量.特别是在冷门标签的推荐中,与未采用该方法的推荐算法相比,基于NDCG度量的推荐准确率至少可提高2倍,部分甚至可高达4倍.
        Internet technology is developing rapidly. The developer community's question-answering based experience communication method has become one of the important means for many developers to solve problems encountered in software development and maintenance. How to promptly and accurately recommend a question responder to a questioner in the developer community is an important issue with practical needs. Through the collection and analysis of the data of two representative mainstream developers in Stack Overflow and Github, three phenomena are observed that affect the timeliness and accuracy of the above recommended questions:(1)User label customization phenomenon. In the developer community, the user's tag information is subjectively defined by the user, rather than the system is objectively calibrated according to the user's historical behavior;(2) Asymmetric activity. The user may be active in one or some developer communities, however, it is not equally active or even inactive in other communities;(3) Keyword set closure phenomenon. That is the question answerer in the developer community recommends only based on the keywords in the question text, but does not consider other semantic related key words. In view of the above problems, the user information of the developer community is integrated, the interaction between users and users is analyzed, a cross-community developer network is established, and an algorithm based on restart random walk is proposed to update user tags. Further, by using Taxonomy to expand the query keyword range of the problem, on the basis of this, the user matrix is more accurately recommended, and the range of effective users at the time of recommendation is increased. Finally, the experimental results of F-measure and NDCG are good, which can effectively improve the efficiency and accuracy of problem recommendation.
引文
[1]Ponzetto SP,Strube M.WikiTaxonomy:A large scale knowledge resource.In:Proc.of the European Conf.on Artificial Intelligence.2008.751-752.
    [2]Zhu J,Shen B,Cai X,et al.Building a large-scale software programming Taxonomy from Stackoverflow.In:Proc.of the Int’l Conf.on Software Engineering and Knowledge Engineering.2015.391-396.
    [3]Goeminne M,Mens T.A comparison of identity merge algorithms for software repositories.Science of Computer Programming,2013,78(8):971-986.
    [4]Bird C,Gourley A,Devanbu P,Gertz M,Swaminathan A.Mining email social networks.In:Proc.of the 2006 Int’l Workshop on Mining Software Repositories.ACM Press,2006.137-143.
    [5]Kouters E,Vasilescu B,Serebrenik A,ven den Brand MG.Who’s who in gnome:Using lsa to merge softwarerepository identities.In:Proc.of the 2012 28th IEEE Int’l Conf.on Software Maintenance.IEEE Press,2012.592-595.
    [6]Iofciu T,Fankhauser P,Abel F,et al.Identifying users across social tagging systems.In:Proc.of the Int’l Conf.on Weblogs and Social Media.Barcelona:DBLP Press,2010.
    [7]Liu S,Wang S,Zhu F,et al.HYDRA:Large-scale social identity linkage via heterogeneous behavior modeling.In:Proc.of the ACM SIGMOD Int’l Conf.on Management of Data.ACM Press,2014.51-62.
    [8]Zhang C,Jiang S,Chen Y,et al.Fast inbound top-K query for random walk with restart.In:Proc.of the Joint European Conf.on Machine Learning&Knowledge Discovery in Databases.2015.608-624.
    [9]Eto M.Document retrieval method using random walk with restart on weighted co-citation network.Proc.of the Association for Information Science&Technology,2015,51(1):1-4.
    [10]Yu W,Mccann J.Random walk with restart over dynamic graphs.In:Proc.of the IEEE Int’l Conf.on Data Mining.IEEE Press,2017.589-598.
    [11]Wu F,Weld DS.Automatically refining the wikipedia infobox ontology.In:Proc.of the WWW.2008.635-644.
    [12]Hoffart J,Suchanek FM,Berberich K,et al.YAGO2:A spatially and temporally enhanced knowledge base from Wikipedia.Artificial Intelligence,2013,194:28-61.
    [13]Wu TW,Li H,Wang H,Zhu KQ.Probase:A probabilistic Taxonomy for text understanding.In:Proc.of the SIGMOD 2012.ACMPress,2012.481-492.
    [14]Zhou M,Bao S,Wu X,et al.An unsupervised model for exploring hierarchical semantics from social annotations.In:Proc.of the Int’l Semantic Web and Asian Conf.on Asian Semantic Web Conf.Springer-Verlag,2007.680-693.
    [15]Tang J,Leung HF,Luo Q,et al.Towards ontology learning from folksonomies.In:Proc.of the Int’l Jont Conf.on Artifical Intelligence.Morgan Kaufmann Publishers Inc.,2009.2089-2094.
    [16]Si X,Liu Z,Sun M.Explore the structure of social tags by subsumption relations.In:Proc.of the Int’l Conf.on Computational Linguistics.2010.1011-1019.
    [17]Lin H,Davis J,Zhou Y.An integrated approach to extracting ontological structures from folksonomies.In:Proc.of the European Semantic Web Conf.on the Semantic Web:Research and Applications.Springer-Verlag,2009.654-668.
    [18]Wang H,Wu T,Qi G,et al.On publishing Chinese linked open schema.In:Proc.of the Semantic Web(ISWC 2014).Springer Int’l Publishing,2014.293-308.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700