面向用户的查询扩展研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来随着Internet的飞速发展,Web资源以指数级的速度增长,到2004年初,网页数量大约达到80亿。目前搜索Web资源的形式多种多样,使用最广泛的是搜索引擎,但当前的搜索引擎检索信息主要考虑的是通用性,没有体现个别用户的信息需求,个性化信息服务能有效地满足个别用户的信息检索需求。此外,有研究表明人们在访问Web时58-81%的网页访问是访问已访问过的网页。因此在实现个性化信息服务的同时,对用户已访问的网页进行有效管理也是有现实应用意义的。
     由于目前大多数检索系统中,用户的需求是通过查询关键词来表示的。用户实际需求与查询关键词之间是存在较大语义差距的。如何缩小这种语义差距是实现面向用户个性化信息服务的关键问题。本文应用查询扩展方法,给出了对查询关键词的增加、删除和权重修改的自适应模型,使之能够更好的满足用户的实际需求,提高了检索的精度。在模型中给出了确定扩展关键词的数量及优化了查询反馈中权重调节因子α,β,γ,λ。
     我们合作设计了一个基于个人的电子信息助手原型系统,其主要思想是:首先,在每一个用户注册时,我们要求用户给出他的基本信息、兴趣类、查询关键词等信息。对每一个新注册的用户,我们将根据该用户的兴趣类为该用户建立初始的用户兴趣模型。然后,我们将借用现有的搜索引擎(如Google,Baidu等)进行信息查找,对返回的结果文档,利用用户兴趣模型过滤掉与用户兴趣不相关的文档,再将剩余的文档重新排序显示给用户。用户可对感兴趣的文档下载、浏览,系统将根据用户的行为反馈自动地更新用户的兴趣模型并扩展查询,以使系统中的用户兴趣模型能真正地代表用户当前的兴趣。同时实现了网络信息管理功能,能将搜索到的信息自动归档。
     进一步研究工作:1.尝试使用其它方法进一步改善查询扩展自适应模型。2.权重调节因子α,β,γ,λ有待进一步优化。3.完善系统的功能。
With the fast development of Internet in recent years, Web resources increase at an explosive speed. By the beginning of 2004,the quantity of the webpage was up to 8 billion. At present the ways of using the resource on Web is various, the most popular is to use search engine, but the existing engines is for all users, which can't satisfy the user's individual demands, Personalized information service system can satisfy the users' individual demands effectively. Besides, some studies show that 58-81% webpage are those visited before[21], when people access web. So, Managing visited webpages effectively is meaningful for a personalized information service system.
    In most retrieval systems, the demand of users is represented by query keywords. In fact, there exists difference between the real demand of users and the query words. How to decreasing the difference is the key problem in implementing the user-oriented EIS. We put forward a way of query expansion oriented to user and adapted modification model oriented to users' interests. The model can increase the retrieval precision and make the returning webpages satisfy users better. Furthermore, using this model, we can ensure the numbers of query-words expanded and can optimize the regulating factor α,β,γ,λ .
    We cooperate to design an Electronic Information Assistant's prototype System based on individual, the idea of design is: first of all, when every registration of users, we request user provide his basic information, interest domain, query keyword and so on. We will create an initial user's interest model to every new user according to those basic information. Then we can search information by using existing search engine(such as Google, Baidu etc.). To the returning result, we filter the documents irrelevant to the user's interests and rerank the remaining documents according to user's profile. The remaining
    documents will be displayed in user interface. User can download and browse the documents which he is interest in, Personalized Electronic Information Assistant's System will modify the user's interest model and expand original query by the result of user feedback automatically, so that the interest model can really represent the users' interest. And we implement network information manage function, which can classify documents be searched EIS automatically.
    Further work: I. bettering the query expansion adapted model; 2. optimizing the regulating factor α,β,γ,λ ; S.perfecting the system's function.
引文
[1] C. J. Crouch. a cluster-based approach to thesaurus construction, the ACM, Inc. 309-320. 1988.
    [2] C. J. van RIJSBERGEN B. Sc., Dip. NAAC, Ph. D., M. B. C. S., F. I. E. E., C. Eng., F. R. S.E. information retrieval, http://www.dcs.gla.ac.uk/Keith/Preface.html.
    [3] C. T. Yu, C. Buckley, K. Lam, and G. Salton. A generalized term dependence model in information retrieval. Information Technology: Research and Development. 2(4): 129-154, October. 1983.
    [4] C. H, Wen, J. R, Li, M. Q. A statical query expansion model based on query Logs. Journal of Software, 2003, 14(9): 1593-1599.
    [5] E. Ide. New experiments in relevance feedback. In G. Salton. editor, The SMART Retrieval System, pages 337-354. Prentice Hall. 1971.
    [6] G. Salon. The SMART Retrieval System-Experments in Automatic Document Processing. Prentice Hall Inc, Englewood Cliffs, NJ, 1971.
    [7] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Iiill Book Co., New York, 1983.
    [8] H. Cui, J. R. Wen, J. Y. Nie and W. Y. Ma, Query Expansion for Short Queries by Mining User Logs. http://research.microsoft.com/asia/dload_files/group/mediasearching/2002p/QE-TKDE, pdf.January 2003.
    [9] H. Drucker, B. Shahrary, D. C. Gibbon. Relevance Feedback using Support VectorMachines.http://citeseer.ist.psu.edu/cache/papers/cs/22095/http:zSzzSzwww.monmouth.eduzSz~druckerzSzd ruckerpdfzSzRelevanceFeedback.pdf/druckerOlrelevance.pdf/.2001.
    [10] J. I. Hong. An Overview of Latent Semantic Indexing. http://www.cs.berkeley.edu/~jas nh/classes/sims240/sims-240-final-paper-lsi.htm.Spring 2000.
    [11] J. Y. Nie. Query expansion and query translation as logical inference. Journal of the American Society for Information Science and Technology archive Volume 54, Issue 4. February 2003.
    [12] J. J. Rochio. Relevance feedback in information retrieval. In G. Salton. editor. The SMART Retrieval System-Experiments in Automatic Document Processing. Prentice Hall Inc. Englewood Cliffs, NJ, 1971.
    [13] N. Koutsoupias. Exploring Web Access Logs with Correspondence Analysis. http://www.csd.auth.gr/~setn02/poster_papers/229.pdf.2002.
    [14] O. R. Zaiane, M. Kin, J. W. Han. Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs. http://citeseer.ist.psu.edu/cache/papers/cs/271/ftp:zSzzSzftp.fas.sfu.ca zSzpubzSzcszSzhanzSzkddzSzweblog98.pdf/zaiane98discover ing.pdf/.1998.
    [15] P. Batista, M. J. Silva. Mining Web Access Logs of an On-line Newspaper. http://ectrlitc..it/rpec/RPEC-Papers/11-batista.pdf.
    [16] Y. Qiu, H. P. Frei. Concept based query expansion, the 16th annual international ACM SIGIR conference on Research and development in information retrieval. (SIGIR-1993), 1993.
    [17] B. Y. Ricardo, R. N. Berthier. Modern Information Retrieval. Addison Wesley, 2004. 2001.2:187-193.
    [18] R. Wilkinson. Using Combination of Evidence for Term Expansion. 19th Annual BCS-IRSG Colloquium on IR. Aberdeen, UK. 8th-9th April 1997.
    [19] S. E. Robertson, K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Sciences. 27(3): 129-146. 1976.
    
    
    [20] S. E. Robertson and S. Walker. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. http://www. computing. dcu. ie/~gjones/Teaching/CA437/p232.pdf.1994.
    [21] S. Dumails, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin, D. C. Robbins. Stuff I've seen: A System for Personal Information Retrieval and Re-use. In proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2003), 2003.
    [22] J. R. Wen, J. Y. Nie, H. J. Zhang. Query Clustering Using User Logs, ACM Transactions on Information Systems (ACM TOIS), 20(1), 2002, pp. 59-81.
    [23] Y. H. Wu, Y. C. Chen, A. L. P. Chen. Enabling personalized recommendation on the web based on user interests and behaviors, http://make.cs.nthu.edu.tw/alp/course/cs57 30/teacher%20 paper/ride 01-wcc. pdf. 2001.
    [24] Y. H. Xu, K. J. Umemura. Very Low-Dimensional Latent Semantic Indexing for Local Query Regions. http://acl.eldoc.ub.rug.nl/mirror/W/WO3/WO3-1111.pdf.1999.
    [25] Y. Seo, B. Zhang A reinforcement learning agent for personalized information filtering. Proc. of Int'l Conf. on Intelligent User Interface 2000 (IUI '2000), 2000, pp. 248-251.
    [26] Z. Chen, F. Lin, H. Lin, Y. Liu, W. Y. Ma, W. Y. Liu. User Intention Modeling in Web Application Using Data Mining. http://research.microsoft.com/~zhengc/papers/WWWJ_intention_mo deling.pdf. 2002.
    [27] Z. Chen, W. Y. Liu, F. Lin, P. Xiao, Y. Liu, B. Lin, W. Y. Ma. User Modeling for Building Personalized Web Assistants. http://research. microsoft. com/~zhengc/papers/www11POSTER -UM. pdf. 2002.
    [28] M. Zhang, R. H. Song, C. Lin, S. P. Ma, Z. Jiang, Y. J. Jin, Y. Q. Liu, L. Zhao. Expansion-Based Technologies in Finding Relevant and New Information: THU TREC2002 Novelty Track Experiments*.http://trec.nist.gov/pubs/trec11/papers/tsinghuau.novelty2.pdf.2002.
    [29] J. R. Wen, J. Y. Nie, H. J. Zhang. Clustering User Queries of a Search Engine. WWW10, May 1-5, 2001, Hong Kong.
    [30] C. H. Chang, C. C. H. Integrating Query Expansion and Conceptual Relevance Feedback for Personalized Web Information Retrieval. Computer Networks 30(1-7): 621-623. 1998.
    [31] I. Cingil, A. Dogac, A. Azgin, A Broader Approach to Personalization, Communications of the ACM, Vol. 43, No. 8, pages 136-141, 2000.
    [32] S. Kaasten, S. Greenberg, C. Edwards. How people recognize previously seen WWW pages from titles, URLs and thumbnails. Proceedings of human Computer Interaction 2002. 247-265.2002.
    [33] M. W. Wang, J. Y. Nie, F. M. Jin. A Dempster-shafer Model for Query Expansion. 2003.
    [34] W. P. Jones, S. T. Dumais, H. Bruce. Once found, what next? A study of 'Keeping' behaviors in the personal use of web information. Proceedings of ASIST 2002, 391-402.
    [35] B. Billerbeck, J. Zobel. Questioning query expansion: An examination of behaviour and parameters. Proceedings of the Australasian Database Conference, K. -D. Schewe and H. E. Williams (eds), Dunedin, New Zealand, January 2004, pp. 69-76.
    [36] P. Ogilvie, J. Callan. The Effectiveness of Query Expansion for Distributed Information Retrieval. In the Proceedings of the Tenth International Conference on Information Knowledge Management(CIKM 2001), pp. 183-190.
    [37] B. Billerbeck, J. Zobel. What Query Expandsion Fails. the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval(SIGIR-2003), 2003.
    [38] C. Ch. Latiri, S. Ben Yahia, J. P. Chevallet, A. Jaoua. Query expansion using fuzzy association rules between terms, http://www-mrim. imag. fr/publications/2003/Jim2003. pdf. 2003.
    [39] G. Amati, C. Carpineto, G. Romano. Query difficulty, robustness and selective application of
    
    query expansion. To appear in Proceedings of the 25th European Conference on Information Retrieval (ECIR 2004), Sunderland, Great Britain, 2004.
    [40] D. L. Yeung, C. L. A. Clarke, G. V. Cormack, T. R. Lynam, E. L. Terra. Task-Specific Query Expansion (MultiText Experiments for TREC 2003). Presented at the 2003 Text REtrieval Conference (TREC 2003), Gaithersburg, Maryland, November 2003.
    [41] 徐宝文,张卫丰。搜索引擎与信息获取技术。清华大学出版社,2003。
    [42] 黄蕾。网络信息检索工具使用策略及其发展趋势。http://www.fjinfo.gov.cn/publicat/qbts/001/11.htm.2004。
    [43] 王继成,萧嵘,孙正兴,张福炎。Web信息检索研究进展。计算机研究与发展,vol(38).No.2
    [44] 陈光华,庄雅蓁.信息检索之中文词汇扩展.http://www.lis.ntu.edu.tw/~khchen/writtings/pdf/ji
    [45] 应晓敏,窦文华。Internet个性化服务的关键技术。http://www2.ccw.com.cn/03/0322/b/0322651_6.asp.
    [46] 曾春,邢春晓,周立柱。个性化服务技术综述。软件学报。Vol.13,No.10.2002.10:1952-1961.
    [47] 曾春,邢春晓,周立柱。基于内容过滤的人性化搜索算法。http://www.jos.org.cn/1000-9825/14/999.pdf.2003.
    [48] 李广建。个性化网络信息检索系统的研究与实现。中国科学院文献情报中心,博士论文。2002.6。
    [49] 王晓庆。基于RBF网络的文本自动分类的研究:[硕士学位论文].南昌:江西师范大学计算机信息工程学院,2003。
    [50] 钟茂生。基于智能Agent的个性化Web浏览器研究与实现:[硕士学位论文].南昌:江西师范大学计算机信息工程学院,2003。
    [51] 淅江大学图书馆http://libweb.zju.edu.cn/02/lesson/teach/search/ch5/ch51.htm#51
    [52] 西北工业大学,图书馆。http://www.lib.nwpu.edu.cn/kjpdf/1-1.pdf.
    [53] 西北工业大学,图书馆。http://www.lib.nwpu.edu.cn/kjpdf/6-1.pdf.
    [54] 吴福英。面向用户的信息过滤研究与实现。[硕士学位论文].南昌:江西师范大学计算机信息工程学院,2004。
    [55] 王洪志,王学军。Internet上的个性化信启、服务。http://www.cnki.net/oldcnki/ac2002/doc/lun wen_06.doc。
    [56] http://ccl.pku.edu.cn/doubtfire/Course/Chinese%20Information%20Processing/2002_2003_1.htm.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700