摘要
根据用户调查结果,提出包括视觉力、信息力、有效力这3个维度15个因素的搜索结果展示效果评估体系,并自动评估传统的文本结果展示效果。实验结果表明:该方法能够与人工测评和用户受控实验取得一致的评价效果,大大减少评价所需的人力资源,缩短评价反馈周期。
According to the questionnaires and interviewers, a snippet evaluation system for the search snippet presentation performance was put forward, including 15 factors in 3 dimensions, such as attractiveness, informativeness and effectiveness. The presentation performance of the traditional text snippet was evaluated automatically. The experimental results show that this automatic method can achieve a similar result with manually annotated evaluation and users' experiments. It can also reduce the human resources needed for evaluation and shorten the evaluation feedback cycle.
引文
[1]CHEN Ye,LIU Yiqun,ZHOU Ke,et al.Does vertical bring more satisfaction?:Predicting search satisfaction in a heterogeneous environment[C]//Proceedings of the 24th ACMInternational on Conference on Information and Knowledge Management.ACM,Melbourne,Australia,2015:1581-1590.
[2]LI Kang,LI Yi,RICHARD Q.Disambiguating intents within search engine result pages:U.S.Patent 9183310[P].2015-11-10.
[3]LURIE E,MUSTAFARAJ E.Investigating the effects of Google's search engine result page in evaluating the credibility of online news sources[C]//Proceedings of the 10th ACMConference on Web Science.New York,USA:ACM Press,2018:107-116.
[4]PARTHASARATHY S K,AHMED J,SARAF Y,et al.Clustering web pages on a search engine results page:U.S.9842158[P].2017-12-12.
[5]CUTRELL E,GUAN Zhiwei.What are you looking for?:an eyetracking study of information usage in web search[C]//Conference on Human Factors in Computing Systems,CHI2007,California,USA:DBLP,2007:407-416.
[6]LIN Chinyew,OCH F.J.Looking for a few good metrics:rouge and its evaluation[C]//Proc of the Ntcir Workshops,Tokyo,Japan.2004:1-8.
[7]LIN Chinyew,HOVY E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Association for Computational Linguistics,Edmonton,Cadana,2003:71-78.
[8]MURRAY G,KLEINBAUER T,POLLER P,et al.Extrinsic summarization evaluation:a decision audit task[J].Acm Transactions on Speech and Language Processing,2009,6(2):1-29.
[9]OVER P,DANG Hua,HARMAN D.DUC in context[J].Information Processing&Management,2007,43(6):1506-1520.
[10]RELE R S,DUCHOWSKI A T.Using eye tracking to evaluate alternative search results interfaces[J].Proceedings of the Human Factors and Ergonomics Society Annual Meeting,2005,49(15):1459-1463.
[11]KAMMERER Y,GERJETS P.How the interface design influences users'spontaneous trustworthiness evaluations of web search results:comparing a list and a grid interface[C]//Proceedings of the 2010 Symposium on Eye-Tracking Research&Applications.New York,USA:ACM Press,2010:299-306.
[12]OSTERGREN M,SEUNG-YON Y,EFTHIMIADIS E N.The value of visual elements in web search[C]//Proceedings of the33rd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACMPress,2010:867-868.
[13]张辉,苏宁,刘奕群,等.文本飘红策略对搜索引擎用户行为的影响[J].清华大学学报(自然科学版),2018,58(8):703-709.ZHANG Hui,SU Ning,LIU Yiqun,et al.Effect of snippet text bolding in search user behavior.Journal of Tsinghua University(Science and Technology),2018,58(8):703-709.
[14]SAVENKOV D,BRASLAVSKI P,Lebedev M.Search snippet evaluation at yandex:lessons learned and future directions[C]//International Conference of the Cross-Language Evaluation Forum for European Languages.Heidelberg,Berlin:Springer,2011:14-25.
[15]MARCOS M C,GAVIN F,ARAPAKIS I.Effect of snippets on user experience in web search[C]//Proceedings of the XVIInternational Conference on Human Computer Interaction.New York,USA,ACM,2015:47-55.
[16]AGEEV M,LAGUN D,AGICHTEIN E.Towards task-based snippet evaluation:preliminary results and challenges[C]//MUBE(SIGIR Workshop),Dublin,Ireland,2013:1-2.
[17]余慧佳,刘奕群,张敏,等.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114.YU Huijia,LIU Yiqun,ZHANG Min,et al.Research in search engine user behavior based on log analysis[J].Journal of Chinese Information Processing,2007,21(1):109-114.
[18]LIU Yiqun,SONG Ruihua,ZHANG Min,et al.Overview of the ntcir-11 imine task[C]//Proc.11th NTCIR Workshop Meeting,Tokyo,Japan,2014:8-23.
[19]COHEN J.Weighted kappa:Nominal scale agreement provision for scaled disagreement or partial credit[J].Psychological Bulletin,1968,70(4):213-220.
[20]KLEINBAUM D G,KLEIN M.Ordinal logistic regression[M]//Logistic Regression.New York:Springer,2010:331-343.