问答社区中的问题与答案推荐机制研究与实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

问答社区中的问题与答案推荐机制研究与实现

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Question and Answer Recommendation in Question Answering Communities
作者：曲明成
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：问答社区 ; 问题推荐 ; 答案推荐 ; 主题建模 ; 链接分析
英文关键词：Question Answering Communities ; Question Recommendation ; Answer Recommendation ; Topic Modeling ; Link Analysis
学位年度：2010
导师：卜佳俊 ; 王灿
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2010-03-01

摘要

如今,用户交互式问答社区已成为网上信息获取和知识分享的重要媒介。诸如Yahoo! Answers、百度知道等问答社区网站每天发布有数以万计的问题。然而,随着问答社区数据量的增长,用户需要花更多的时间找到自己感兴趣的问题。由此,问题的提问者需要等待较长的时间才能得到该问题的答案。同时,问题候选答案数目的迅速增长、答案质量的层次不齐,也加重了提问用户选择最佳答案的负担。
     本文针对问答社区中的问题推荐和答案推荐机制进行了深入的研究,旨在帮助提问用户和回答用户获取信息,从而增进问答社区中的知识分享行为。问题推荐将待解决问题推荐给对该问题感兴趣的用户,使该问题能尽快得到解答。本文认为,用户将根据自己的兴趣主题选择相应的问题进行回答,故问题的主题与回答用户的兴趣需有较高的相关度。基于此,本文提出了一种基于主题建模思想的问题推荐方法,充分利用问答社区中丰富的用户个性化信息,以概率潜在语义分析模型来表达问答社区中的用户兴趣分布,并以此计算问题推荐列表。答案推荐针对问题的候选答案作自动排序,从而使提问用户能更方便地选择最佳答案。本文认为,提问用户将根据答案质量及其与问题的相关程度选择最佳答案。由此本文提出了一种基于问题与答案间的相似性及用户权威度的答案推荐方法。该方法通过用户问答关系构建用户链接图,以此使用PageRank算法估计用户权威度。在计算相似度时,综合考虑了问题和答案内容的相似度以及提问用户与回答用户的相似度。
     实验结果表明,本文提出的基于主题建模的问题推荐方法可有效挖掘用户兴趣,从而推荐待解决问题。答案推荐的实验结果则证明综合考虑问题和答案内容的相似度以及提问用户与回答用户相似度的有效性,及通过用户权威度衡量答案质量的可行性。
User-Interactive Question Answering (QA) communities such as Yahoo! Answers and Baidu Zhidao are growing in popularity. However, as these QA sites always have thousands of new questions posted daily, it is difficult for users to find the questions that interest them. Consequently, this may delay the answering of new questions. Meanwhile, as the number of candidate answers increasing, the asker may have difficulty in choosing the best one from those with uneven quality.
     In this paper, we study the question and answer recommendation mechanism to help the asker and answerer seek information and enhance the knowledge sharing activities within question answering communities. Question recommendation techniques help users locate interesting questions and expedite the answering of new questions. We believe users may select questions according to their own preferences. Thus in this paper, we adopt the Probabilistic Latent Semantic Analysis (PLSA) model to present users' preference distribution according to users' previous answering history, thus generating recommending question lists. To help askers find the best answers, answer recommendation techniques rank candidate answers automatically. The recommendation is conducted based on the content similarity and user authority. We summarize the relationship between users as link structure, and adopt the PageRank algorithm for estimating user authority. The similarity calculation considers both the similarity between questions and answers, and that between askers and answers. The experimental results show our topic-modeling based question recommendation approach can capture users' preference and recommend questions effectively. Experimental results of answer recommendation conclude the effectiveness of considering both the similarity between questions and answers, and that between askers and answers.

引文

[1] Han, J. and Kamber, M., Data mining: concepts and techniques[M]. 2006: Morgan Kaufmann.

    [2] Nimetz, J., Jody Nimetz on Emerging Trends in B2B Social Networking, in Marketing Jive[M]. 2007.

    [3] Liebeskind, J., Oliver, A., Zucker, L., and Brewer, M., Social networks, learning, and flexibility: sourcing scientific knowledge in new biotechnology firms[J]. Organization Science, 1996:428-443.

    [4] Kim, S., Oh, J., and Oh, S., Best-answer selection criteria in a social Q&A site from the user-oriented relevance perspective[J]. Proceedings of the American Society for Information Science and Technology, 2007, 44(1).

    [5] Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. Finding high-quality content in social media[C]. Proceedings of the international conference on Web search and web data mining. ACM. 2008:183-194.

    [6] Jeon, J., Croft, W., Lee, J., and Park, S. A framework to predict the quality of answers with non-textual features[C]. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM. 2006:235.

    [7] Jurczyk, P. and Agichtein, E. Hits on question answer portals: exploration of link analysis for author ranking[C]. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM. 2007:846.

    [8] Liu, Y., Bian, J., and Agichtein, E. Predicting information seeker satisfaction in community question answering[C]. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM. 2008:483-490.

    [9] Jeon, J., Croft, W., and Lee, J. Finding similar questions in large question and answer archives[C]. Proceedings of the 14th ACM international conference on Information and knowledge management. ACM New York, NY, USA. 2005:84-90.

    [10] Jurczyk, P. and Agichtein, E. Discovering authorities in question answer communities by using link analysis[C]. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM. 2007:919-922.

    [11] Kleinberg, J., Authoritative sources in a hyperlinked environment[J]. Journal of the ACM (JACM), 1999, 46(5):604-632.

    [12] Bian, J., Liu, Y., Zhou, D., Agichtein, E., and Zha, H. Learning to recognize reliable users and content in social media with coupled mutual reinforcement[C]. Proceedings of the 18th international conference on World wide web. ACM New York, NY,USA. 2009:51-60.
    [13] Bian, J., Liu, Y., Agichtein, E., and Zha, H., Finding the right facts in the crowd: Factoid question answering over social media[J]. 2008.
    [l4]Adamic, L., Zhang, J., Bakshy. E., and Ackerman. M., Knowledge sharing and yahoo answers: everyone knows something[J]. 2008.
    [15] Gimpel, K., Modeling Topics[R]. 2006.
    [16]Seymore, K. and Rosenfeld, R. Using story topics for language model adaptation[C]. Fifth European Conference on Speech Communication and Technology. Citeseer. 1997.
    [17] Allan, J., Jin, H., Rajman, M, Wayne, C, Gildea, D., Lavrenko, V., Hoberman, R., and Caputo, D. Topic-based novelty detection[C]. Summer Workshop Final Report: 1-16.
    [18] Blet, D. and Lafferty, J. Dynamic topic models[C]. Proceedings of the 23rd international conference on Machine learning. ACM. 2006:120.
    [19] Strzalkowski, T., Stein, G., and Wise, G. GE. Tracker: A Robust, Lightweight Topic Tracking System[C]. Proceedings of the DARPA Broadcast News Workshop. Citeseer. 1999.
    [20]Zhao, B. and Xing, E. BiTAM: Bilingual topic admixture models for word alignment[C]. Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics. 2006:976.
    [21]Lewandowski, D., Web Information Retrieval[M]. 2005: DGI.
    [22] Allan, J., Topic detection and tracking: event-based information organization[M]. 2002.
    [23] Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. Topic detection and tracking pilot study: Final report[C]. Proceedings of the DARPA broadcast news transcription and understanding workshop. Citeseer. 1998.
    [24] Allan, J., Lavrenko, V., and Jin, H. First story detection in TDT is hard[C]. Proceedings of the ninth international conference on Information and knowledge management. ACM New York, NY, USA. 2000:374-381.
    [25]Cieri, C., Multiple annotations of reusable data resources: Corpora for topic detection and tracking[J]. JADT, 2000.
    [26]Nigam, K., McCallum, A., Thrun, S., and Mitchell, T., Text classification from labeled and unlabeled documents using EM[J]. Machine learning, 2000, 39(2): 103-134.
    [27] Kuhn, R. and De Mori, R., A cache-based natural language model for speech recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(6):570-583.
    [28]Mahajan, M., Beeferman, D., and Huang, X. Improved topic-dependent language modeling using information retrieval techniques[C]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing- Proceedings. 1999:541-544.
    [29] Spitters, M. and Kraaij, W. A language modeling approach to tracking news events[C]. Proceedings of TDT workshop. Citeseer. 2000:101706.
    [30] Manning, C., Raghavan, P., and Schutze, H., An introduction to information retrieval[M].
    [31] Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R., Indexing by latent semantic analysis[J]. Journal of the American society for information science, 1990, 41 (6):391-407.
    [32] Landauer. T., Foltz, P., and Laham, D., An introduction to latent semantic analysis[J]. Discourse processes, 1998,25:259-284.
    [33] Dumais, S., Latent semantic analysis[J]. Annual Review of Information Science and Technology (ARIST), 2004, 38:189-230.
    [34] Steyvers, M. and Griffiths, T., Probabilistic topic models[J]. Handbook of Latent Semantic Analysis, 2007:424-440.
    [35]Blei, D., Ng, A., and Jordan, M., Latent dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
    [36]Hofmann, T. Probabilistic latent semantic indexing[C]. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM New York, NY, USA. 1999:50-57.
    [37] Yamamoto, M. and Sadamitsu, K., Dirichlet mixtures in text modeling[J]. 2005.
    [38]Qu, M., Qiu, G., He, X., Zhang, C., Wu, H., Bu, J., and Chen, C. Probabilistic question recommendation for question answering communities[C]. Proceedings of the 18th international conference on World wide web. ACM New York, NY, USA. 2009:1229-1230.
    [39] Popescul, A., Ungar, L., Pennock, D., and Lawrence, S. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments[C]. 17th Conference on Uncertainty in Artificial Intelligence. Citeseer. 2001.
    [40]Mei, Q., Cai, D., Zhang, D., and Zhai, C., Topic modeling with network regularization[J]. 2008.
    [41]Mei, Q. and Zhai, C. A mixture model for contextual text mining[C]. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 2006:655.
    [42]Si, L. and Jin, R. Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis[C]. The Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Springer. 2005.

    [43]Duda, R., Hart, P., and Stork, D., Pattern classification[M]. 2001: Citeseer.

    [44]Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S., Latent semantic indexing: A probabilistic analysis[J]. Journal of Computer and System Sciences, 2000, 61(2):217-235.
    [45] Steyvers, M., Smyth, P., Rosen-Zvi, M., and Griffiths, T. Probabilistic author-topic models for information discovery[C]. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 2004:315.
    [46]Cohn, D. and Hofmann, T., The missing link-a probabilistic model of document content and hypertext connectivity[J]. Advances in neural information processing systems, 2001:430-436.
    [47]Zhou, D., Manavoglu, E., Li, J., Giles, C., and Zha, H. Probabilistic models for discovering e-communities[C]. Proceedings of the 15th international conference on World Wide Web. ACM. 2006:182.
    [48] Dempster. A., Laird, N., and Rubin, D., Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society. Series B (Methodological), 1977, 39(1): 1-38.
    [49]McLachlan, G. and Krishnan, T., The EM algorithm and extensions[M]. 1997: Wiley New York.
    [50] Moon, T., The expectation-maximization algorithm[J]. IEEE Signal processing magazine, 1996,13(6):47-60.

    [51] Mitchell, T., Machine learning and data mining[J]. 1999.
    [52]Smith, L., Citation analysis[J]. Library trends, 1981, 30(1):83-106.
    [53] Henzinger, M., Link analysis in web information retrieval[J]. Bulletin of the Technical Committee on:3.
    [54] Borodin, A., Roberts, G., Rosenthal, J., and Tsaparas, P., Link analysis ranking: algorithms, theory, and experiments[J]. ACM Transactions on Internet Technology (TOIT),2005, 5(1):231-297.
    [55]Ng, A., Zheng, A., and Jordan, M. Link analysis, eigenvectors and stability[C]. International Joint Conference on Artificial Intelligence. Citeseer. 2001:903-910.
    [56]Thelwall, M. and Thelwall, M., Link analysis: An information science approach[M]. 2004: Emerald Group Pub Ltd.
    [57] Farahat, A., LoFaro, T, Miller, J., Rae, G., and Ward, L., Authority rankings from HITS, PageRank, and SALSA: Existence, uniqueness, and effect of initialization[J]. SIAM Journal on Scientific Computing, 2006, 27(4): 1181-1201.
    [58]Brin, S. and Page, L., The anatomy of a large-scale hypertextual Web search engine[J]. Computer networks and ISDN systems, 1998, 30(1-7): 107-117.
    [59] Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. PageRank, HITS and a unified framework for link analysis[C]. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM New York, NY, USA. 2002:353-354.
    [60] Tufts, D. and Melissinos, C. Simple, effective computation of principal eigenvectors and their eigenvalues and application to high-resolution estimation of frequencies[C]. Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'85. 1985.
    [61]Liu, B., Web data mining[M]. 2007: Springer.
    [62] Chakrabarti, S., Mining the web[M]. 2003: Kaufmann.
    [63] Bishop, C., Pattern recognition and machine leaming[M]. 2006: Springer.
    [64] Adomavicius, G and Tuzhilin, A., Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions[J]. IEEE transactions on knowledge and data engineering, 2005, 17(6):734-749.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700