一种查询意图边界检测方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study on boundary detection of user's query intents
  • 作者:王凯 ; 洪宇 ; 邱盈 ; 王剑 ; 姚建民 ; 周国栋
  • 英文作者:WANG Kai;HONG Yu;QIU Ying-ying;WANG Jian;YAO Jian-min;ZHOU Guo-dong;School of Computer Science and Technology Soochow University;
  • 关键词:信息检索 ; 查询意图 ; 边界检测
  • 英文关键词:information retrieval;;query intent;;boundary detection
  • 中文刊名:SDDX
  • 英文刊名:Journal of Shandong University(Natural Science)
  • 机构:苏州大学计算机科学与技术学院;
  • 出版日期:2017-06-14 09:03
  • 出版单位:山东大学学报(理学版)
  • 年:2017
  • 期:v.52
  • 基金:国家自然科学基金资助项目(61672368,61672367,61373097,61331011)
  • 语种:中文;
  • 页:SDDX201709003
  • 页数:6
  • CN:09
  • ISSN:37-1389/N
  • 分类号:16-21
摘要
针对一个特定的查询意图,用户往往需要提交多次查询请求。有效地识别连续查询请求之间的意图变化边界,能够帮助检索系统更好地理解用户完整查询意图,以提高查询推荐及查询扩展的效果,并能够辅助个性化检索中用户模型的建立。在充分分析前人研究的有效特征基础上,提出了基于主题相似度检测意图边界的方法,并在SVM及CRF模型上都取得一定的提升。实验结果显示,所提方法的最优性能比Baseline系统F值提高了2%。
        In generally,several query requests will be submit by user to capture specific query intent. It is quite a meaningful work to detect the boundary among continuous query requests effectively,which could help search engine to understand the query intent completely. Moreover,identifying the integrated query intent is considerable helpful to query suggestion,query expansion and the construction of user profile. On the basis of fully analyzing the features mentioned from previous research,this paper proposed topic distribution-based similarity and this similarity is effective with SVM model and CRF model. The results show that,with topic distribution similarity,F-measure is improved by 2% in comparison to the baseline system.
引文
[1]SILVERSTEIN C,MARAIS H,HENZINGER M,et al.Analysis of a very large w eb search engine query log[J].SIGIR Forum,1999,33(1):6-12.
    [2]LI Yanan,ZHANG Sen,WANG Bin,et al.Characteristics of chinese w eb searching:A large-scale analysis of chinese query logs[J].Journal of Computational Information Systems,2008,4(3):1127-1136.
    [3]余慧佳,刘奕群,张敏,等.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114.YU Huijia,LIU Yiqun,ZHANG M in,et al.Research in search engine user behavior based on log analysis[J].Journal of Chinese Information Processing,2007,21(1):109-114.
    [4]BRODER A.A taxonomy of web search[J].SIGIR Forum,2002,36(2):3-10.
    [5]江雪,孙乐.用户查询意图切分的研究[J].计算机学报,2013,36(3):664-670.JIANG Xue,SUN Le.Study on segmentation of users query intents[J].Chinese Journal of Computers,2013,36(3):664-670.
    [6]HE Daqing,GKER A,HARPER D J.Combining evidence for automatic w eb session identification[J].Information Processing&M anagement,2002,38(5):727-742.
    [7]JANSESN B J,SPINK A,BLAKELY C,et al.Defining a session on w eb search engines[J].Journal of the American Society for Information Science and Technology,2007,58(6):862-871.
    [8]DOWNEY D,DUMAIS S,HORVITZ E.Models of searching and brow sing:languages,studies,and applications[C]//Proceedings of the International Joint Conference on Artificial Intelligence.Hyderabad:ACM,2007:1465-1472.
    [9]NIKOLAI B,BERNARD J B J.Limits of the web log analysis artifacts[C]//Proceedings of Workshop on Logging Traces of Web Activity.Edinburgh:World Wide Web Conference,2006:152-156.
    [10]MURRAY G C,LIN J,CHOWDHURY A.Identification of user session w ith hierarchical agglomerative clustering[J].Journal of American Society for Information Science,2006,43(1):1-9.
    [11]OZMUTLU H C,CAVDUR F.Application of automatic topic identification on excite w eb search engine data logs[J].Information Processing and Management,2005,41(5):1243-1262.
    [12]OZMUTLU S,CAVDUR F.Neural network applications for automatic new topic identification[J].Online Information Review,2005,29(1):34-53.
    [13]OZMUTLU S,OZMUTLU H C,SPINK A.Automatic new topic identification in search engine transaction logs using multiple linear regression[J].Haw aii International Conference on System Sciences,2008,16(3):140.
    [14]OZMUTLU S,OZMUTLU H C,BUYUK B.Using M onte-Carlo simulation for automatic new topic identification of search engine transaction logs[J].Winter Simulation Conference,2007,16(5):2306-2314.
    [15]LI Xiao,WANG Yeyi,ALEX A.Learning query intent from regularized click graphs[C]//Proceedings of the31st annual international ACM SIGIR conference on Research and development in information retrieval.New York:ACM,2008:339-346.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700