含代码的IT社区答案质量评价模型

英文篇名：Code-based IT Community Answer Quality Evaluation Model
作者：许能闯 ; 袁健 ; 高喜龙
英文作者：XU Neng-chuang;YUAN Jian;Gao Xi-long;School of Optical Electrical& Computer Engineering,University of Shanghai for Science& Technology;
关键词：质量评价 ; 社区问答 ; 相似度 ; Stack ; Overflow
英文关键词：quality evaluation;;community questions and answers;;similarity;;Stack Overflow
中文刊名：XXWX
英文刊名：Journal of Chinese Computer Systems
机构：上海理工大学光电信息与计算机工程学院;
出版日期：2019-01-15
出版单位：小型微型计算机系统
年：2019
期：v.40
基金：国家自然科学基金项目(61775139)资助
语种：中文;
页：XXWX201901030
页数：6
CN：01
ISSN：21-1106/TP
分类号：160-165

摘要

Stack Overflow问答社区已经成为软件开发者解决开发问题的重要渠道.但社区答案多样,信息繁杂,大量问答使开发者难以找到自己问题的匹配项,导致大量时间花费在寻找最佳答案上.为了解决上述问题,提出了含代码的IT社区答案质量评价模型.该模型首先收集所有符合要求的带有源码的问题答案对,分析问题中源码和答案中源码的相似程度,同时度量代码质量,然后结合用户评论对该答案的评价,统计得分,使得答案质量得以量化.最后按分数从高到低对答案进行重新排序,使代码片段质量高、相关程度高的信息出现在前列,方便用户寻找高质量的答案.实验证明,该模型能快速有效地完成IT社区答案质量评价,对答案进行重排序,给开发者迅速定位最佳答案带来非常实用的价值.结果表明,该模型切实可行.
The Stack Overflowquestions&answers community has become an important channel for software developers to solve development problems. However,the answers of the community are diverse and the information is complex. A lot of questions and answers make it difficult to find matches to their problems for developers,result in a lot of time spent looking for the best answer. In order to solve the problem,the code-based IT community answer quality evaluation model is proposed. The model firstly collects all the required questions&answers pairs with source code,analyzes the similarity between the source code of questions and answers,and measures the quality of the code at the same time. Then,it counts the score combined with the user comments on the evaluation of the answers so that the quality of the answers can be quantified. Finally,It reorders the answers according to the score from high to lowto make the code fragment'quality can be high and the information whose relevant degree is high appears in the forefront,so that users can find the answers with high relevancy conveniently. Experimental results showthat the model can evaluate the answers quality of the IT community quickly and effectively. According to reordering the answers,It is of practical value to locate the best answer quickly for the developers. The results showthat the model is feasible.

引文

[1] Lai She-an,Cai Zhong-min. Question answering quality evaluation for community question answering based on similarity[J]. Computer Applications and Software,2013,30(2):266-269.
    [2] Yuan Jian,Liu Yu. Answer quality evaluation model for community question answering based on hybrid method[J]. Application Research of Computers,2017,34(6):1708-1712.
    [3] Xiong Da-ping,Wang Jian,Lin Hong-fei. An LDA-based approach to finding similar questions for community question answer[J].Journal of Chinese Information Processing,2012,26(5):40-45.
    [4] Zhang Cheng,Qu Ming-cheng,Ni Ning,et al. Automatic answer selection based on probabilistic latent semantic analysis model[J].Computer Engineering,2011,37(14):70-72.
    [5] Ma Z,Sun A,Yuan Q,et al. A tri-role topic model for domain-specific question answering[C]. Twenty-Ninth AAAI Conference on Artificial Intelligence,AAAI Press,2015:224-230.
    [6] Xiong Hao,Yan Hai-hua,Guo Tao,et al. Code similarity detection:a survey[J]. Computer Science,2010,37(8):9-14.
    [7] Faidhi J A W,Robinson S K. An empirical approach for detecting program similarity and plagiarism within a university programming environment[J]. Computer Education,1987,11(1):11-19.
    [8] Verco K L,Wise MJ. Software for detecting suppected plagiarism:comparing structure and attribute-counting system[C]. Proceeding of the 1st Australian Conference on Computer Science Education,1996:3-5.
    [9] Liu Yun-long. Token-based structured code matching homology detection technology[J]. Application Research of Computers,2014,31(6):1841-1845.
    [10] Arwin C,Tahaghoghi S MM. Plagiarism detection across programming languages[C]. Proceedings of 29th Australasian Computer Science Conference(ACSC2006),2006:277-286.
    [11] Zhang Jiu-jie,Wang Chun-hui,Zhang Li-ping,et al. Clone code detection based on Levenshtein distance of token[J]. Journal of Computer Applications,2015,35(12):3536-3543.
    [12] Jadalla A,Elnagar A. PDE4Java:plagiarism detection engine for java source code:a clustering approach[J]. International Journal of Bussiness Intelligence Data Mining,2008,3(2):121-135.
    [13] Zhu Bo,Zheng Hong,Sun Lin-lin,et al. Research on similarity measure for AST-based program codes[J]. Journal of Jilin University(Information Science Edition),2015,33(1):99-104.
    [14] Zhu Bo. Research on similarity measure method of program coda[D]. Changchun:Changchun University of Technology,2015.
    [15] Liu Chao,Chen Chen,Han Jia-wei,et al. GPLAG:detection of software plagiarism by program dependence graph analysis[C].Proceedings of ACMSIGKDD 2006,2006:872-881.
    [16] Komondoor R,Horwitz S. Using slicing to identify duplication in source code[C]. Proceedings of the 8th International Static Analysis Symposium(SAS),2001:40-56.
    [17] Krinke J. Identifying similar code with program dependence graphs[C]. Proceeding of 8th Working Conference on Reverse Engineering(WCRE'01),2001:301-309.
    [18] Boehm B W,Brown J R. Quantitative evaluation of software quality[C]. 2nd International Conference on Software Engineering,IEEE Computer Society,1979:592-605.
    [19] Sun Meng-lin,Song Xiao-qiu,Chao Yi. Quality measurement technology research in software[J]. Computer Engineering and Design,2006,27(2):325-327.
    [20] Sun Meng-lin,Gan Zhi-qiang. Quality measurement evaluation implement in aerospace software[J]. Systems Engineering and Electronics,2009,31(4):956-959.
    [21] Charikar MS. Similarity estimation techniques from rounding algorithms[C]. Proceedings of the Thiry-fourth Annual ACMSymposium on Theory of Computing,ACM,2002:380-388.
    [22] Zhang Pei-yun,Chen Chuan-ming,Huang Bo. Texts similarity algorithm based on subtrees matching[J]. Pattern Recognition and Artificial Intelligence,2014,27(3):226-234.
    [23] Goemans MX,Williamson D P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming[M]. Association for Computing Machinery,ACM,1995.
    [24] Jia Li-jie. Based on the maximum entropy model segmentation technology research[D]. Jinan:Shandong Normal University,2007.
    [25] Gao W,Liang L. Ontology similarity measure by optimizing NDCG measure and application in physics education[M]. Future Communication,Computing,Control and Management,Springer Berlin Heidelberg,2012:415-421.
    [1]来社安,蔡中民.基于相似度的问答社区问答质量评价方法[J].计算机应用与软件,2013,30(2):266-269.
    [2]袁健,刘瑜.基于混合式的社区问答答案质量评价模型[J].计算机应用研究,2017,34(6):1708-1712.
    [3]熊大平,王健,林鸿飞.一种基于LDA的社区问答问句相似度计算方法[J].中文信息学报,2012,26(5):40-45.
    [4]张成,曲明成,倪宁,等.基于概率潜在语义分析模型的自动答案选择[J].计算机工程,2011,37(14):70-72.
    [6]熊浩,晏海华,郭涛,等.代码相似性检测技术:研究综述[J].计算机科学,2010,37(8):9-14.
    [9]刘云龙.基于Token的结构化匹配同源性检测技术研究[J].计算机应用研究,2014,31(6):1841-1845.
    [11]张久杰,王春晖,张丽萍,等.基于Token编辑距离检测克隆代码[J].计算机应用,2015,35(12):3536-3543.
    [13]朱波,郑虹,孙琳琳,等.基于AST的程序代码相似性度量研究[J].吉林大学学报(信息科学版),2015,33(1):99-104.
    [14]朱波.程序代码相似性度量方法研究[D].长春:长春工业大学,2015.
    [19]孙梦璘,宋晓秋,巢翌.软件程序代码质量度量技术研究[J].计算机工程与设计,2006,27(2):325-327.
    [20]孙梦璘,甘志强.航天型号软件代码质量度量评估实现[J].系统工程与电子技术,2009,31(4):956-959.
    [22]张佩云,陈传明,黄波.基于子树匹配的文本相似度算法[J].模式识别与人工智能,2014,27(3):226-234.
    [24]贾丽洁.基于最大熵模型的分词技术研究[D].济南:山东师范大学,2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700