基于P2P的分布式搜索引擎的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联网的迅速发展导致网络上的信息爆炸性增长,如何快速准确地在互联网上获取有价值的信息变得越来越重要。搜索引擎的出现给用户在互联网上检索信息带来了极大的便利,其快速性和准确性使得搜索引擎成为互联网上最重要和流行的应用之一。
     然而,当前搜索引擎还存在以下两点不足之处。第一是搜索深度不够,当前搜索引擎通过网络蜘蛛获取互联网上的资源,无法检索用户个人电脑上的共享资源。第二是当前搜索引擎基于关键词和超链接分析进行排序,未考虑用户的反馈信息。
     本文将P2P技术引入到搜索引擎中,提出了一种基于P2P的分布式搜索引擎模型和一种新的排序算法。本文首先设计了一种基于P2P网络的分布式搜索引擎模型。该模型没有中心服务器,每台计算机称为一个对等点,每个对等点将其资源的索引发布到P2P网络中供其它对等点检索,因此可以检索到用户个人电脑上的共享资源,从而获得更好的搜索深度。本文接着基于此搜索引擎模型提出了一种新的排序算法。这种排序算法以相关度作为排序的基本因素,利用流行因子和友好因子来优化排序。相关度是检索请求与文档的相关性的度量值。流行因子体现了资源在网络中的受欢迎程度。友好因子反映了用户的兴趣。这种排序算法利用用户反馈信息优化排序结果,可以为特点用户提供更准确的结果。
The rapid growth of Internet leads to explosion of information. How to get valuable information on the Internet rapidly and accurately is more and more important. The advent of search engine provides the users great convenience when they retrieve information on Internet. The rapidness and accurateness of information retrieval makes search engine one of the most important and popular application.
     However, there are two drawbacks in current search engines. First, the search depth is not ideal. Search engines obtain information on Internet via web crawler, so they cannot get the shared information stored in users' personal computer. Second, search engines rank pages based on keywords and hyperlink analysis. And the users' feedback information is not taken into consideration.
     This paper brings P2P technology into search engine and proposes a model of P2P-Based Distributed Search Engine and a new ranking algorithm. First, the paper designs a model of P2P-Based Distributed Search Engine. There is no directory server in this model. Every computer is as a peer. Peers publish the index of local resource on P2P network to provide search service for other peers. Therefore the shared information stored in users' personal computer can be retrieved. By this way, the search depth is improved. Then the paper proposes a new ranking algorithm based on the model. The ranking algorithm uses relevance as the basic ranking factor. And use popularity factor and friendliness factor to optimize ranking result. Relevance is the value of query request and document. Popularity factor reflects the resource's popularity in the network. Friendliness factor reflects the users' interest. The ranking algorithm utilizes users' feedback information to optimize ranking result. Therefore more accurate result can be presented to specific user.
引文
[1]Goole Inc. http://www.google.com
    [2]Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems, Proceedings of the Multimedia Computing and Networking, 2002
    [3]Stefan Saroiu, Krishna P. Gummadi, Steven D. Gribble, Measuring and analyzing the characteristics of Napster and Gnutella hosts, Multimedia Systems, 2003
    [4]Matthias Bender, Sebastian Michel, Peter Triantafillou, Improving collection selection with overlap awareness in P2P search engines, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005, 67-74
    [5]Daniel A, Menascé, Scalable P2P search, IEEE INTERNET COMPUTING, 2003, 83-87
    [6]Minerva_Homepage. http://www.mpi-inf.mpg.de/departments/d5/software/minerva/index.html
    [7]ODISSEA Homepage. http://cis.poly.edu/westlab/odissea/
    [8]Jin Zhou, Kai Li, Li Tang, Towards a Fully Distributed P2P Web Search Engine, Proceedings of the 10th IEEE International Workshop on Future Trends, 2004, 332-338
    [9]彭波,大规模搜索引擎检索系统框架与实现要点,计算机工程与科学,2006,28(3):1-4
    [10]Sergey Brin, Lawrence Page. The anatomy of a large-scale hypertextual Web search engine, Proceedings of the seventh international conference on World Wide Web 7, 1998
    [11]贺广宜,罗莉,分布式搜索引擎的设计与实现,计算机应用,2003,23(5):83-85
    [12]袁霖,覃征,对等网络中分布式散列表的研究,计算机应用研究,2006,25(9):39-43
    [13]郭方方,杨永田,一种非结构化P2P系统搜索算法的研究,哈尔滨工程大学学报,2006,27(1):99-102
    [14]Luc Onana Alima, Ali Ghodsi, Seif Haridi, A Framework for Structured Peer-to-Peer Overlay Networks, Global Computing, 2005, 223-249
    [15]KaZaA Homepage. http://www.kazaa.com/us/index.htm
    [16]Ion Stoica, Robert Morris, David Karger, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, 149-160
    [17]Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, 161-172
    [18]Antony I. T. Rowstron, Peter Druschel, Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems, Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, 2001, 329-350
    [19]Hui Zhang, Ashish Goel, Ramesh Govindan, Incrementally improving lookup latency in distributed hash table systems, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2003
    [20]Timo Tanner, Distributed Hash Tables in P2P Systems - A literary survey, Seminar on Internetworking, 2005
    [21]JXTA Homepage. http://www.jxta.org/
    [22]Jere, A. Meza, M. Marusic, B. Dobravec. Peer to peer search engine and collaboration platform based on JXTA protocol, EUROCON 2003 Ljubljana, Slovenia, 2003, 256-260
    [23]赵二龙,JXTA网络中的资源发现策略:硕士学位论文,大连;大连理工大学,2006
    [24]Changtao Qu, Wolfgang Nejdl, Exploring JXTASearch for P2P educational media discovery, Proceedings of the IEEE Workshop on Knowledge Media Networking, Knowledge Media Networking, 2002, 147-152
    [25]孟宪军,基于JXTA的P2P网络的研究与实现:硕士学位论文,哈尔滨;哈尔滨工业大学,2004
    [26]韩丽,杨宏,雷振明,一种基于DHT的资源查找算法,计算机应用研究,2006,25(7):78-79
    [27]张有为,基于DHT的P2P研究:硕士学位论文,合肥;中国科学技术大学,2005
    [28]张浩,金海,聂江武等,Dual-Chord:一种更加有效的分布式哈希表,小型微型计算机系统,2006,27(8):1450-1454
    [29]Lucene Homepage. http://lucene.apache.org/
    [30]李寅,马范援,邹福泰,基于分布式哈希表对等网络的Web服务发现,上海交通大学学报,2006,40(5):805-809
    [31]肖卓程,荆金华,层次式Chord:物理拓扑感知的结构化对等网,计算机科学,2006,33(7):25-28
    [32]夏捷,基于P2P网络改进路由算法的通用架构:硕士学位论文,成都;电子科技大学,2006
    [33]凌波,吕永成,周水庚等,P2P信息检索及其优化策略,计算机科学,2006,33(8):173-177
    [34]Gleb Skobeltsyn, Karl Aberer, Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, Information Retrieval In Peer-To-Peer Networks, 2006, 33-40

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700