基于JXTA的P2P搜索引擎研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
搜索引擎解决了用户搜索信息的难题。但是,由于传统搜索引擎采用集中式架构,还存在许多问题,如服务器故障、存储容量有限、以及存储链接不能及时更新等,严重影响了搜索引擎的性能。
     P2P技术具有分布式、动态性、可扩展性的特点。P2P技术应用于搜索引擎,给搜索引擎的发展带来了新活力。
     论文主要探讨一种将P2P的新理念和技术优势引入搜索引擎系统的方式,主要研究内容和解决的问题包括:
     (1)由于现有的P2P应用程序均采用从底层开发的方式,没有共同的标准,彼此间无法相互兼容。因此,设计系统时选择了SUN公司的通用开发平台——JXTA作为P2P网络的开发标准,在JXTA协议基础上构建了基本的P2P通信网络。
     (2) P2P网络中的资源发现是一个难点。实现时采用IP多播进行防火墙内的多播搜索,采用HTTP实现穿越防火墙的搜索。同时定义了“搜索”对等组,提供组成员资格服务,并将通信流量限制在对等组范围内,避免网络通信流量不必要的扩散。
     (3)加入二次排序模块,将来自多个对等体的检索结果汇总排序后显示给查询用户。考虑到P2P系统动态特性以及用户需求特性,以Lucene评分机制为前提,提出了二次排序评分机制以适应P2P网络搜索的特点。
     (4)定义了位于P2P网络之上的对等组管理服务、管道通信服务、消息管理服务、内容下载服务以及本地资源管理服务,设计了便于用户操作的应用界面,从而构建了一个完整的基于JXTA的P2P搜索引擎系统。
     最后,论文给出了系统实现方案,在局域网环境内对系统进行了测试与分析。实践证明,系统能够有效的挖掘网络边缘计算机中的信息,充分利用边缘计算机的计算与存储能力,具有较高的实用价值和推广前景。
Search engine solves the difficult problem of searching information. However, because traditional search engine adopts centralized mode, some problems still exist, such as server failure, limited storing capacity and outdated links that are not upgraded in time, which seriously affect performances of search engine.
     P2P has characteristics of distribution, dynamics and scalability. That P2P technology is to be applied into search engine brings new energy for search engine.
     How to bring P2P new idea and technology advantages to search engine is discussed in this thesis, and main content to research and problems to solve are the followings:
     (1) Because existing P2P applications are developed from the bottom with no standards, all of them can not be compatible to each other. Then, while designing, P2P platform—JXTA is chose as the development standards for P2P network, and a basic P2P communication network is built based on JXTA protocols.
     (2) Resource discovery is a difficult point in P2P network. IP multicast is used for broadcast searching within firewall, and HTTP is realized to search through firewall. Meanwhile, search peergroup is defined with membership service, and communication flow is limited within peergroup, avoiding needless communication.
     (3) Because searching results come from several peers, second sorting module is used to gather and sort results before displaying them to users. Considering dynamic character of P2P and user demands, based on Lucene sorting mechanism, second sorting mechanism is advanced to adapt to search in P2P.
     (4) Peergroup management service, pipe communication service, message management service, content download service and local resource management service above P2P network are defined, user-friendly interface is designed, and at last, one integrated P2P search engine based on JXTA is built.
     At last, implementing schemes of P2P search engine based on JXTA is present, then it is tested and analyzed in LAN. The experimental results approve that this system can deeply mine information stored in the computers lying on the edge of the network, and can sufficiently take use of the computing and storing capability, so it has great value of utility and foreground to widely spread.
引文
1. Ricardo Baeza-Yates.Modern Information Retrieval[M].New York:ACM press, 1999,45-47
    2.李晓明,刘建国.搜索引擎技术及趋势[J],中国计算机用户,2000,183(03):65-67
    3.P2P天空[EB\OL].http://www.p2psky.com/,2004
    4.易倍思.无所不共享:P2P网络共享完全手册[M].上海:上海科学技术出版社,2005,76-79
    5.周傲英,凌波.Peer-to-Peer系统及应用[J],计算机科学,2002,344(08):200-202
    6.王丹,于戈.P2P模型研究[J],计算机工程,2005,362(02):88-91
    7.李祖鹏,黄建华,黄道颖.P2P网络技术的发展与展望[J],电信科学,2003,567(03):54-55
    8. Dejan S. Milojicic,Vana Kalogeraki,Rajan Lukose.Peer-to-Peer Computing[J],HP Laboratories Palo Alto,2003,41 (05):35-36
    9.张联峰,刘乃安,钱秀槟.综述:对等网(P2P)技术[J],计算机工程与应用,2003,456(12):142-145
    10.赵恒,陈杰.P2P技术应用及其研究现状[J],电信快报,2004,465(09):39-41
    11.杨广文,黄大正,肖侬.P2P计算深入底层与边缘[J],计算机世界报,2003,Vol.40:B7-B8
    12.唐辉,张国杰,黄建华.一种混合P2P网络模型研究与设计[J],计算机应用,2005,291(03):521-524
    13. Lei Pan,M K.Lai,K Noguchi.Distributed parallel computing using navigational programming[J],International Journal of Parallel Programming,2004,25(01): 1-37
    14. Holger Blaar,Matthias,Thomas Rauber.Efficiency of Thread-parallel Java Programs[J],Scientific Computing,2002,40(04): 1-8.
    15. Pascale Launay,Jean Louis Pazat.Generation of distributed parallel Java programs[J],International Euro-Par Conference on Parallel Processing table of contents, 1998,31 (07):729-732.
    16. Bernard Traversat,Mohamed Abdelaziz,Mike Duigou,Jean-Christophe Hugly,Eric Pouyoul,Bill Yeage. Project JXTA Virtual Network[J],Sun Microsystems,2002,32(08):34-35
    17. Robert Flenner, Michael Abbott,Toufic Boubez,Frank Cohen,Navaneeth Krishnam,Alan Moffet,Rajam Ramamurti,Bilal Siddiqui,Frank Sommers.Java p2p Unleashed:With JXTA,Web Service,XML,Jini,JavaSpaces,and J2EE[M].America:The Sams Publishing,2003,25-78
    18.高朝.JXTA分布式计算技术研究[D],重庆:重庆大学,2005
    19. Andrew S. Tanenbaum.Distributed systems-principles and paradigrns[J],Prentice Hall,2002,29(05):26-27
    20.奥克斯,切沃萨特,宫力.JXTA技术手册[M].北京:清华大学出版社,2004,16-179
    21.黄敬磊,黄永中,王磊.基于JXTA技术的网络计算模型[J],计算机工程与设计,2007,324(03):15-18
    22.李俊清,孙涛.IP多播技术在JXTA系统中的应用[J],微计算机应用,2006,315(03):25-27
    23.邓小宁.基于JXTA架构的对等网络技术研究与应用[D],北京:北京工业大学,2003
    24.孙默,武波,张玉清.对等组机制的应用研究[J],计算机工程,2006,374(02):32-35
    25.陈国志.基于JXTA的P2P分布式计算环境研究[J],中国科技信息,2006,206(02):18-21
    26. Jxta project homepage[EB\OL].http://www.jxta.org/,2006
    27.孟波,马勇,张玉清.一种基于JXTA的协同工作P2P系统[J],计算机科学,2006,389(05):16-19
    28.吴慧良.对等网络计算平台JXTA的研究[D],浙江杭州:浙江大学,2003
    29. Daniel Brookshier,Darren Govoni,Navaneeth Krishnan.JXTA:P2P Programming[J],Sun Microsystems white paper, 2002,34(10):49-50
    30.吴胜浩,钟亦平,张世永.JXTA:新型的网络计算环境[J],计算机工程,2004,357(09):4-6
    31.姜超.JXTA分布式计算技术[J],现代计算机,2005,241(01):81-84
    32.张智,李瑞轩.基于JXTA的Web服务发现模型研究[J],计算机工程与应用,2005,381(09):137-139
    33.胡放明,李俊兵,贺贵明.对P2P网中发现机制的研究[J],计算机应用,2004,278(02):521-524.
    34.史艳芬,葛隧和.基于JXTA的P2P网络信息安全模型[J],微型机与应用,2005,279(03):29-31
    35. Gnutella homepage[EB\OL].http://www.gnutella.com,2001
    36.高斯帕那,哈特赫.Lucene IN ACTION[M].北京:电子工业出版社,2007,193-369
    37.李刚,宋伟,邱哲.征服Ajax+Lucene构建搜索引擎[M].北京:人民邮电出版社,2006, 57-58
    38. Lueene中国[EB\OL].http://www.lueene.com.en/,2005
    39.Chuck White,Liam Quin,Linda Burman.XML从入门到精通[M].北京:电子工业出版社,2002:56-68
    40.安茹.Eclipse权威开发指南[M].北京:清华大学出版社,2006,15-126
    41.颜维龙,盖杰.面向网络的全文检索中索引文件的组织[J].计算机应用研究,2002,263(11):124-126

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700