基于Agent的元搜索引擎研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Internet的快速发展,网络信息呈指数级急剧增长,信息获取的方式也多种多样。这种情况下,如何准确、高效地获取信息是研究人员和用户关注的问题。已有的独立搜索引擎各有优点,虽然在某一领域查询精确度较高,但是对于普通用户来说,对于要查询的内容,选择搜索引擎成为难点。元搜索引擎通过调用多个成员搜索引擎的方式扩大了检索的范围,是解决这种问题的一种有效方式。同时元搜索引擎也存在返回的结果过多,考虑用户的搜索倾向不够等问题,难以满足不同用户的需要。并且由于网络环境是动态变化的,现有的元搜索引擎与传统搜索引擎一样不能适应网络环境的动态变化,可能返回无用的信息等问题。
     针对以上问题,本文将Agent技术引入到元搜索引擎中,利用Agent的适应性来满足网络环境的动态变化,提出了一种基于Agent的元搜索引擎模型IMSA,利用用户的反馈信息,修改对应成员搜索引擎的权值,指导以后的查询。同时针对用户往往不能准确表述自己的查询请求的问题,提出了基于加权与或树的查询扩展算法,利用用户的相关性反馈,修正用户的查询请求,使之更接近用户的实际查询需要,保证用户获得有用的信息。最后通过实验验证和分析了调度策略和查询扩展算法的有效性。
It is well known, the Web information increase exponentially with the rapid development of the Internet, and there are many ways of getting information from the Internet. In this case, how to retrieve information correctly and effectively is widely concerned by researchers and common users. Every independent search engine has its own strongpoint, and can get high precision in some specific field, but for common users, it is difficult to choose the appropriate search engine to meet their information querying requirement. Meta-search engine is an effective method to resolve this problem, which enlarges the scale of querying area by choosing different specific independent search engine. At the same times, meta-search engines have the same problem of getting much useless results and can’t satisfy different users’interest. Besides, as the Web environment change dynamic, meta-search engine still can’t adapt to the Web changing environment, so it is much possible to retrieve many useless information.
     In this paper, the agent technology is adopted to the meta-search engine, adapt the changes in the Web environment through the adaptability of agent , a agent based meta-search engine architecture(IMSA) is introduced, by modifying the relevant weight of the search engine in terms of the user’s relevance feedback to guide the last query. At the same time, the query expansion algorithm based on AND/OR tree with weight is introduced to solve the problem that the uses usually can’t express their own query need well and truly, it amend the user’s query through the relevance feedback to make the query submitted by the user fit the user’s need. Finally, an experiment is made to test the effect of the scheduler and the query expansion algorithm.
引文
[1]董慧,丁波涛,余传明.网络信息资源开发与利用.武汉大学出版社.2001,320-334.
    [2]徐宝文,张卫丰.搜索引擎与信息获取技术.第 1 版.清华大学出版社,2003. 4:1-7
    [3]沈贺丹,潘亚楠,邵良杉.关于搜索引擎的研究综述.计算机技术与发展.2006,16(4).
    [4]S.Lam.TheOverview ofWebSearch Engines[OL].http://citeseer.ist.psu.edu/lam01overview.html. 2001-02.
    [5]王继成,萧嵘,孙正兴,张福炎.Web 信息检索研究进展.计算机研究与发展.2001.38(2) 187-193.
    [6] K. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured Databases on the Web: Observations and Implications. In SIGMOD Record, 2004,33(3). 61-70.
    [7] X. Meng, H. Lu, H. Wang. SG-WRAP: A Schema-Guided Wrapper Generation. In ICDE.2002, 331-332.
    [8] Kai Simon, Georg Lausen. ViPER: Augmenting Automatic Information Extraction with Visual Perceptions. In CIKM, 2005, 381-388.
    [9]原福永.元搜索引擎的现状与发展[J].计算机工程与设计.2005,26(12):3278-3280.
    [10]张卫丰,徐宝文,周晓宇.元搜索引擎研究[J].计算机科学,2001, 28(8): 36-41.
    [11]V. N. Anh and A. Moffat, "Impact transformation: effective and efficient web retrieval," presented at Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, 2002.
    [12] China Internet Network Information Center. (中国互联网络信息中心)http://www.cnnic,net.cn.
    [13]杨涛.中文智能搜索引擎浅析.图书情报工作.2002,12(1),62-65.
    [14]刘畅,林剑锋,王雁杰. 元搜索引擎的调查分析. 现代图书情报技术, 2004 (9).
    [15] S. Chakrabarti, M. Joshi, and V. Tawde, "Enhanced topic distillation using text, markup tags, and hyperlinks," presented at 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sep 9-13 2001, New Orleans, LA, 2001.
    [16]皮鹏,张国印.智能元搜索引擎的研究.应用科技.2001,28(8),24-26.
    [17]张健奕.搜索引擎的新发展——元搜索引擎.河南图书馆学刊.2002,22(2),52-53.
    [18]晏一平,岳泉.中外元搜索引擎的比较研究.图书馆学研究.2005(11),19-24.
    [19] W. Meng, C. Yu, and K. Liu. Building Efficient and Effective Metasearch Engines. ACM Computing Surveys, 2002,34(1), 48-89.
    [20]Wooldridge, M. ,N.R.Jenings. Agent Theories,Architectures and languages:a Survey.In Wooldridge and Jennings,Intelligent Agents,Berlin:Spinger-Verlag,1995.
    [21]Minsky,Marvin.The Society of Mind.New York:Simon and Schuser,1985.
    [22]Shoham,Y.Agent-Oriented Programming:An overview and summary of recent research.In Proc.of Artificial Intelligence,1992.
    [23]Genesereth,M.and Steven P.Ketchpel.Software Agents.Communication of the ACM.1994,37(7).
    [24]石纯一,张伟,徐晋晖.多Agent系统引论.电子工业出版社.2003.
    [25]Jennings,N.On agent-base software engineering.Artificial Intelligence,117.277-296.
    [26]薛云..Internet上元搜索引擎的研究与设计.太原理工大学硕士论文.2003.
    [27]Amir Hossein Keyhanipoor,Maryam Piroozmand,Behzad Moshiri,Caro Lucas.A Multi-Layer/Multi-Agent Architecture for Meta-Search Engines.AIML 05 Conference,CICC,Cairo,Egypt.2005,19-21.
    [28] E.J. Glover, S. Lawrence, W.P. Birmingham, C.LGiles, Architecture of a Metasearch Engine that Supports User Information Needs, Proceedings of the Eighth International Conference on Information Knowledge Management, (CIKM-99).1999, 210-216.
    [29] Z. Li, Y. Wang and Vincent Oria, A New Architecture for Web Meta-Search Engines, Proceedings of the 2001 Americas Conference on Information Systems, Boston. 2001.
    [30] Koster, M. 1994. Aliweb: Archie-like indexing in the web. Computer Networks and ISDN Systems 27(2),175-182.
    [31] Chakravarthy, A. and Haase, K. Netserf: Using semantic knowledge to find internet information archives. In Proceedings of the ACM SIGIR Conference, Seattle.1995, 4-11.
    [32] Yuwono, B. and Lee, D. Search and ranking algorithms for locating resources on the world wide web. In Proceedings of the IEEE International Conference on Data Engineering. New Orleans, Louisiana. 1996,. 164-177.
    [33] Yuwono, B. and Lee, D. Server ranking for distributed text resource systems on the internet. In Proceedings of the 5th International Conference On Database Systems For Advanced Applications, Melbourne, Australia 1997, 391-400.
    [34] Gravano, L. and Garcia-Molina, H. Generalizing gloss to vector-space databases and broker hierarchies. In Proceedings of the International Conferences on Very Large DataBases, Zurich, Switzerland 1995, 78-89.
    [35] Meng, M., Liu, K., Yu, C., Wang, X., Chang, Y., and Rishe, N. Determine text databases to search in the internet. In Proceedings of the International Conferences on Very Large Data Bases, New York City. 1998 14-25.
    [36] Yu, C., Meng, W., Liu, K., Wu, W., and Rishe, N. Efficient and effective metasearch for a large number of text databases. In Proceedings of the Eighth ACM Inter-national Conference on Information and Knowledge Management, Kansas City.1999, 217-224.
    [37] Voorhees, E., Gupta, N., and Johnson-Laird, B. Learning collection fusion strategies. In Proceedings of the ACM SIGIR Conference, Seattle.1995, 172-179.
    [38] Dreilinger, D. and Howe, A. Experiences with selecting search engines using metasearch. ACM Transactions on Information Systems 15, 3.1997, 195-222.
    [39] Fan, Y. and Gauch, S. Adaptive agents for information gathering from multiple, distributed information sources. In Proceedings of the 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University. 1999, 40-46.
    [40]Selberg E,Etzioni O.Multi-Service Search and Comparison Using the MetaCrawler.4th International World Wide Web Conference.1995.
    [41] ]Selberg E,Etzioni O.The MetaCrawler Architecture for Resource Aggregation on the Web.IEEE Expert.1997.
    [42]Voorhees E,Gupta T,Johnson-Laird B.Learning Collection Fusion Strategy.ACM SIGIR Conference ,Seattle.1995,172-179.
    [43] Voorhees E,Gupta T,Johnson-Laird B.The Collection Fusion Problem.TREC-3 Conference,Gaithersburg.1995.
    [44]Salton G. Automatic Text Processing:The Transformation,Analysis,and Retrieval of Information by Computer.Addison Wesley.1989.
    [45] Callan, J., Lu, Z., and Croft, W. Searching distributed collections with inference networks. In Proceedings of the ACM SIGIR Conference, Seattle. 1995, 21-28.
    [46] Lawrence, S. and Lee Giles, C. Accessibility of information on the web. Nature 400. 107-109.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700