融合用户兴趣和混合估计的微博检索模型

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

融合用户兴趣和混合估计的微博检索模型

详细信息查看全文 | 推荐本文 |

英文篇名：Microblog Retrieval Model Combining User Interest and Mixed Estimation
作者：吴树芳 ; 张雄涛 ; 朱杰
英文作者：Wu Shufang;Zhang Xiongtao;Zhu Jie;School of Management, Hebei University;College of Management and Economics, Tianjin University;Department of Information Management, the Central Institute for Correctional Police;
关键词：微博检索 ; 查询似然模型 ; 用户兴趣 ; 用户交互 ; 混合估计
英文关键词：microblog retrieval;;query likelihood model;;user interest;;user interaction;;mixed estimation
中文刊名：QBXB
英文刊名：Journal of the China Society for Scientific and Technical Information
机构：河北大学管理学院;天津大学管理与经济学部;中央司法警官学院信息管理系;
出版日期：2019-04-24
出版单位：情报学报
年：2019
期：v.38
基金：国家社会科学基金面上项目“网络信息治理视域下社交网络不可信用户识别研究”(17BTQ068)
语种：中文;
页：QBXB201904009
页数：9
CN：04
ISSN：11-2257/G3
分类号：81-89

摘要

随着移动互联技术的进一步发展,微博检索已成为微博服务的重要组成部分。考虑到微博检索与传统文本检索的不同,提出一个改进的微博检索模型。新模型对传统查询似然模型中的文档先验概率和文档语言模型估计进行了改进。在文档先验概率方面,通过量化用户对博文的兴趣获得用户的兴趣博文库,并在兴趣博文库的基础上计算微博先验概率,使得符合检索用户兴趣的微博具有较高的先验概率;在文档语言模型估计方面,混合内容及用户交互两方面信息获得微博的相关文档集,并将其作为平滑项实现对微博文档语言模型的混合估计,有效缓解了微博短文本的数据稀疏问题。实验采用从新浪微博爬取的真实数据对研究内容的有效性进行验证,结果表明与现有研究中较好的改进查询似然模型相比,新模型在P@15、P@30和MRR上均有一定提高。
With the further development of mobile internet technology, microblog retrieval has become an important part of microblog service. Considering the difference between microblog retrieval and traditional text retrieval, a new microblog retrieval model is put forward. The new model improves the prior probability and document language model estimation of the query likelihood model. To improve the document prior probability, the user's interest blog library is obtained by quantifying the interest of users in blogs, and then the prior probability of microblog document is computed based on the proposed interest blog library. On the other hand, the information of blog contents and user interaction are mixed to obtain related blogs, which are used to smooth the original blog and achieve the mixed estimation on document language model,to effectively solve the problem of data sparseness in microblog short text. Experiments adopt the real data crawled from Sina to verify the effectiveness of our model, and experimental results demonstrate that our model outperforms some stateof-the-art models on P@15, P@30, and MRR.

引文

[1]微博数据中心.2017年微博用户发展报告[EB/OL].[2017-12-25].http://www.useit.com.cn/thread-17562-1-1.html.
    [2]Teevan J,Ramage D,Morris M R.TwitterSearch:A comparison of microblog search and web search[C]//Proceedings of the Fourth International Conference on Web Search and Data Mining.New York:ACM Press,2011:35-44.
    [3]卫冰洁,王斌.面向微博搜索的时间感知的混合语言模型[J].计算机学报,2014,37(1):229-237.
    [4]Liang S S,de Rijke M.Burst-aware data fusion for microblog search[J].Information Processing&Management,2015,51(2):89-113.
    [5]Li S,Ning H,Han Z Y,et al.A method for microblog search by adjusting the language model with time[C]//Proceedings of the Eighth International Conference on Internet Computing for Science and Engineering.IEEE,2016:25-28.
    [6]叶施仁,严水歌,杨长春.基于VSM和LSA的微博搜索排序方法研究[J].情报科学,2015,33(7):98-101,112.
    [7]Jiang Y C,Xu Y X,Shao L.A personalized microblog search model considering user-author relationship[C]//Proceedings of the First International Conference on Data Science in Cyberspace.IEEE,2016:508-513.
    [8]卫冰洁,史亮,王斌.一种融合聚类和时间信息的微博排序新方法[J].中文信息学报,2015,29(3):177-183.
    [9]李锐,王斌.一种基于作者建模的微博检索模型[J].中文信息学报,2014,28(2):136-143.
    [10]Tommasel A,Godoy D.A social-aware online short-text feature selection technique for social media[J].Information Fusion,2018,40:1-17.
    [11]Ponte J M,Croft W B.A language modeling approach to information retrieval[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,1998:275-281.
    [12]Zhai C.A study of smoothing methods for language models applied to information retrieval[J].ACM Transactions on Information Systems,2004,22(2):179-214.
    [13]Choi J,Croft W B.Temporal models for microblogs[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management.New York:ACM Press,2012:2491-2494.
    [14]Li X Y,Croft W B.Time-based language models[C]//Proceedings of the 12th International Conference on Information and Knowledge Management.New York:ACM Press,2003:469-475.
    [15]Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communications of the ACM,1974,18(11):613-620.
    [16]Joachims T.A probabilistic analysis of the rocchio algorithm with TF-IDF for text categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers,1996:143-151.
    [17]Lin J,Mohammed S,Sequiera R,et al.Overview of the TREC2016 real-time summarization track[C]//Proceedings of the 25th Text Retrieval Conference,Boston,USA,2016:38-44.
    [18]徐建民,王平.小型中文信息检索测试集的构建与分析[J].情报杂志,2009,28(1):13-16.
    [19]Cormack G V,Palmer C R,Clarke L A.Efficient construction of large test collections[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,1998:282-289.
    [20]Wang Y S,Huang H Y,Feng C.Query expansion based on a feedback concept model for microblog retrieval[C]//Proceedings of the 26th International Conference on World Wide Web.Geneva:International World Wide Web Conferences Steering Committee,2017:559-568.
    [21]陈杰,刘学军,李斌,等.一种基于用户动态兴趣和社交网络的微博推荐方法[J].电子学报,2017,45(4):898-905.
    [22]韩中元,杨沐昀,孔蕾蕾,等.基于词汇时间分布的微博查询扩展[J].计算机学报,2016,39(10):2031-2044.
    [23]Bertsimas D,Gupta V,Kallus N.Data-driven robust optimization[J].Mathematical Programming,2018,167(2):235-292.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700