用户名: 密码: 验证码:
结构化对等计算机系统中的查询处理
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
对等计算(Peer-to-Peer,简称P2P)是一个分布式资源管理、定位系统一它的目标是聚集大规模具有存储能力和处理能力的资源并且最小化资源访问和查询代价。这种系统的出现给网络应用带来了新的活力,因为所有节点可以分布地、平等地管理网络中的资源。但这种分布式数据结构也向网络应用提出了很多新的挑战,例如如何保证数据可达性、系统的动态可维护性、自主性、如何使用没有认证的、匿名的组件实现各种高级应用等。
     尽管对等计算系统在查询处理方面已经取得不少研究成果,但仍然存在许多问题有待解决。本文的研究工作的重点是把集中式环境下成熟的索引树结构引入对等计算系统,解决结构化对等计算环境下查询处理问题,包括:单维范围查询,多属性范围查询,多维范围查询以及高维文本查询的问题。本文的主要贡献如下:
     1.针对单维数据范围查询(range query),在BATON索引结构的启发之下,提出基于多叉树(m-way tree)结构的高效P2P索引结构BATON*。BATON*可以实现更高效的范围查询,更良好的均衡负载特性以及更强壮的系统容错能力。通过重新定义节点的路由表,并设计新的查询算法,系统保证查询效率可以达到O(log_mN)(Ⅳ系统规模,m是树分支出度),并且不存在根节点瓶颈问题。考虑到查询代价的降低是以路由表更新代价变大为代价的,文中给出了简单的基于出度的代价模型。该代价模型可以根据查询操作和更新操作的比率估算使系统性能最优的近似出度值。在分布式系统测试平台Planetlab下的实验结果验证了这种新的结构不仅提高了查询效率,还加强了系统的容错能力,同时基于出度的代价模型也得到了验证。
     2.针对多属性范围查询(multi-attribute range query),探索了基于属性分组降维索引的技术。多属性查询的特点是查询属性(维度)、或查询属性数目不固定。用多维数据索引技术处理多属性查询时,查询转换后,冗余查询太多。用单维数据索引结构处理多属性查询时,系统的存储代价、维护代价很高。但这两种索引结构各有各的优点,前者存储代价小,后者查询效率高。因此本系统提出把发布的多维数据按照属性分组,然后采用流行的降维技术,把按组划分的数据降维到一维数据空间,最后采用单维索引结构BATON*索引数据。这种方法的优点是综合考虑了数据存储、维护代价和查询代价,并且BATON*结构保证了查询效率。在各个属性维度上的查询概率已知的情况下,这种基于分组策略的多属性查询的优势将更明显。模拟实验结果表明,把分组索引策略和高效的索引结构结合,可以达到用低存储代价、低维护代价实现高查询效率的目标。
     3.提出对等计算环境下支持多维数据范围查询的索引架构(framework)—虚拟平衡二叉树。它将集中式环境下成熟的、基于空间划分的树索引结构成功地引入对等计算环境。系统设计了网络节点(peer)和树节点映射的方式、数据储存方式以及邻居节点选择策略,在此基础之上,给出范围查询算法,它可以有效地实现P2P环境下的多维数据范围查询和κ最近邻(KNN)查询。算法保证查询效率是O(logN)(N是系统规模),并且不会出现系统根节点瓶颈问题。该架构为不同的层次树索引结构定义了统一的数据操作、网络维护操作的接口。任何基于空间划分的层次树索引结构都可以方便地映射到这个架构上。不同的索引结构可以使用相同的接口实现,唯一不同的是数据空间划分的方式,和被划分空间的选择方式。系统中只有最底层的叶子节点存放数据,中间节点是虚拟节点,它只维护索引空间。系统中每个中间节点具有纵向链接(父子链接,近邻链接)和横向链接(邻居链接),通过这些链接可实现有效的错误恢复。AVL-Tree的树分支旋转(rotation)的方法用来解决在均衡负载之后树结构的重构问题:即恢复树结构的平衡性。在分布式系统测试平台Planetlab下的实验结果验证了这种结构的合理性和它处理范围查询的有效性。
     4.在VBI-Tree基础之上,本文提出了对原有架构的改良方案SDI,主要目的是降低原有系统的更新维护代价。新的系统去除了原来系统中每个节点记录祖先节点的索引区域信息的数据结构,取而代之的是一个简单的、特定的祖先链接,祖先链接均匀地分布在各个节点上。根据新的索引结构,系统提出了完全不同的查询算法,该算法仍然保证同样的查询效率,但大大降低了原来系统的查询代价,即实现了用更少的路由消息数找到同样多的结果。模拟实验结果表明,当数据维度、系统规模增大,数据分布不均衡性增强时,新的索引结构表现出更好的性能,因此该改良系统的可扩展性更好。
     5.针对对等计算环境下基于内容的查询处理,提出了一种基于层次化树结构和Chord的索引结构。这种结构最大的优点是没有全局信息维护节点。每个文件或查询可以表示成由关键词构成的向量。系统采用Chord结构索引词,用层次树结构对文件的词向量逐层分类(classification)和聚合(aggregation)。树的每个分支孩子节点个数在m和2m之间(m是树出度),每层中连续出现的g(m≤g≤2·m)个节点构成组,选出一个叶子节点作为该组的父节点。每组父节点采用TF.IDF(词频与逆文档频率)技术计算聚合向量里每个词的权值,该值越大,词的重要性越强。然后,父节点直接发布组内一定比率权重较大的词,其它的词聚合之后继续往树的高层传送做进一步处理。因为树在各个层次都选择一定比例的重要词汇发布出去,故这种层次化结构不存在全局信息维护节点。并且词被索引出去时,所在的层次越低,词的重要性越强。系统定义的查询算法保证查询可以从树的任意位置开始,另外查询包含的词通常是重要词,那么查询操作通常在树的底层进行,故树的上层节点的负载不会明显偏重。系统提出了改进性能的方法,利用这些方法可以进一步提高查询质量,降低查询和维护的代价。模拟实验结果表明,该方法具有良好的召回率(recall)和精确率(precision),并且不存在系统瓶颈问题。
     综上所述,本文详细介绍了四种基于树结构的索引结构,并且在本文提出的单维索引结构的基础上,设计了一种支持多属性范围查询的数据分配策略。这些工作旨在满足高级应用中的复杂查询处理需求。对每种索引结构,系统给出相应的查询算法的说明和实现,以及均衡负载方法,并且用详尽的实验来验证正文的分析和结论。这些方法对现有对等计算环境下的查询处理技术是有益的补充和改进。
Peer-to-Peer Computing(P2P) is a distributed system that is used for distributed resource management.Its goal is to aggregate enormous storage and process resources while minimizing resource visiting and searching cost.However,it has brought new challenges in designing such distributed data structures that can promise desired functionalities,such as data availability,dynamic system maintenance, independence,and support for complex query processing using unauthentic and unreliable components.
     Despite of current achievements in P2P-based query processing,there are still a lot of problems requiring to be studied.My research work is devoted to introducing the mature tree index structures of centralized systems into P2P environment,so as to solve the problems of query processing in structured P2P systems,including single dimensional and multi-dimensional range query,multi-attribute range query and high-dimensional information retrieval in P2P systems.The major contributions of this thesis include:
     1.A high performance index structure BATON~* is proposed for P2P systems, which is inspired by BATON.It is based on the m-way tree,not the binary tree,to support single dimensional range query processing.BATON~* can help to reach better query processing efficiency,better load balance performance and stronger fault recovery ability.It redefines the routing tables for each tree node and designs new search algorithm,which help to reduce the query processing cost to O(log_m N),where N is the network size and m is tree fanout. Additionally,the new search algorithm promises to avoid the root bottleneck problem.However there is a trade-off between the search cost and the updating cost,that is,the lower the search cost,the higher the updating cost. Considering this problem,a simple fanout-based cost model is introduced.It can help to estimate the approximate optimal value for m using the percentage between search operations and updating operations.Experiments are run in Planetlab.And the results verify that the new index structure has im- proved the search efficiency and the fault tolerance ability,and show that the fanout-based cost model is correct.
     2.A group-based multi-attribute query processing strategy is proposed.The character of multi-attribute query is that each query dose not have fixed attributes(dimensions) or fixed number of attributes.Using multi-dimensional index structures to deal with such queries,it will meet with much redundant messages.Using the single dimensional index structures to deal with such queries,the storage and maintenance cost is high.However each of them has its own advantages:the former one has lower storage and maintenance cost and the latter one has better search efficiency.The current design is going to group the attributes of the data space,use the space filling curve to reduce the dimensionality to single dimensional space and then index the transformed data into state-of-art single dimensional index structures,among which BATON~* is preferred.In such a way,on one hand,it takes full consideration of both the maintenance cost and the query processing cost,and on the other hand, it promises high query processing efficiency by employing BATON~* structure. Especially with knowing query probability to each attribute(dimensionality) in advance,its advantage will be more obvious.Simulation shows that the combination of group-based method with a high performance index structure will achieve good search efficiency with low storage and maintenance cost.
     3.A general framework,a virtual binary balanced tree,which supports multidimensional range query processing,is provided.It introduces the mature multi-dimensional index tree structures of centralized systems,which are based on hierarchical space division,to P2P systems successfully.The mapping method between the peers and the tree nodes,the way to distribute and store the data on the nodes,and the neighbor selection rules for nodes are all designed. Relying on this data structure,it puts forward the algorithm for multi-dimensional range query processing,which can solve range queries and KNN queries efficiently and bound the efficiency by O(log N) without the root bottleneck problem,where N is the network size.And uniform interfaces for data operation and network maintenance are defined,so any kind of spacedivision-based hierarchical tree structure can be easily implemented on the framework.The differences for these implementations are the way to split the space and the method to select the space to split.Only the leaf nodes manage data and the internal nodes are virtual nodes which just manage the indexing space.But each internal node has vertical(parent link and adjacent link) and horizontal(neighbor link) links to other internal nodes.These links are used to for failure recovery.AVL-Tree rotation method is responsible for the network restructure as long as the tree is detected unbalanced after resolving the load balancing problem.Experiments are run in Planetlab.The results verify that the design of the structure is reasonable and the query processing is efficient.
     4.Based on VBI-Tree,a new improved structure SDI is introduced.The main target of the new design is to reduce the maintenance cost.In SDI,it removes the data structure which records the ancestor index information for each internal node on the way to root,but uses one simple and specific ancestor link. The ancestor links are distributed among the nodes evenly.According to the change to tree structure,new search algorithms for both point queries and range queries are proposed,which can still promise the query efficiency but with less messages.That is to say,it can find the same number of results for queries but with less messages.Simulation shows that the new design gets better performance with larger dimensionality,bigger network size or more skewed data distribution.So the new design has better scalability.
     5.A hierarchical index structure,based on tree and Chord,for content-based search in P2P environmental is proposed.The main advantage is that it does not require any global knowledge maintainer.Each file and query can be represented as vectors of a list of terms.In this index structure,Chord is used to index terms and the hierarchical tree structure is used to classify and aggregate the vectors coming from the children.Each node on the tree will have k children(m≤k≤2m,m is the fanout).At each level,it arranges every g(m≤g≤2·m) adjacent nodes in the same group,and selects one leaf node as the parent.In each group,the parent calculates the weights for terms in the aggregated vector by TF.IDF.The larger the weight,the more important the terms to the group.Then some percentage of important terms are indexed directly to Chord by the parent,and the others are aggregated and sent to higher level parent node for further processing.As a result,it has no global knowledge maintainer in the system and the lower level nodes index most of the important terms out.The proposed query processing algorithm promises that query can be started at any node on the tree.Generally,the terms in the queries are important or popular ones,and they can be resolved by lower level nodes,so nodes at higher levels will not meet with obviously higher load.Some performance improvement methods are introduced in the end,which will improve the query processing quality and reduce the cost for both query processing and maintenance.Simulation shows that this index structure has good recall and precision without bottleneck problem.
     In summary,this thesis introduces four kinds of P2P index structures in details, which are all based on tree structures.Relying on the single dimensional index structure proposed by this work,one data distribution method is designed for supporting multi-attribute range query.All the work aims to solve the complex query processing in advanced application.For each index structure,it defines the search algorithms and explains the implementations as well as the load balancing strategy. The analysis or conclusion to each part of the work is verified by extensive experiments. These approaches are a kind of useful complement to and improvement on current query processing methods in P2P environment.
引文
[1]K.Aberer.P-grid:A self-organizing access structure for p2p information systems.In Proceedings of the Sixth International Conference on Cooperative Information Systems(CoopIS'2001),pages 179-194,September 2001.
    [2]F.S.Action.Numerical method that work.Harpercollins College Div,1970.
    [3]D.P.Anderson.Seti@home:An experiment in public-resource computing.Communications of the ACM,45(11):56-61,2002.
    [4]A.Andrzejak and Z.Xu.Scalable,efficient range queries for grid information services.In Proceedings of the 2nd International Conference on Peer-to-Peer Computing(P2P'2002),pages 33-40,September 2002.
    [5]B.Arai,G.Das,D.Gunopulos,and V.Kalogeraki.Approximating aggregation queries in peer-to-peer networks.In Proceedings of the 22nd International Conference on Data Engineering(ICDE'2006),pages 42-44,April 2006.
    [6]J.Aspnes and G.Shah.Skip graphs.In Proceedings of the Fourteenth Annual A CM-SIAM Symposium on Discrete Algorithms(SODA'2003),pages 384-393,January 2003.
    [7]H.Attiya and J.Welch.Distributed computing:fundamentls,simulations and advanced topics.McGraw-Hill Publishing Company,Cambridge,UK,1998.
    [8]N.Beckmann,H.-P.Kriegel,R.Schneider,and B.Seeger.The r~*-tree:An efficient and robust access method for points and rectangles.In Proceedings of the 1990 ACM SIGMOD.International Conference on Data Management (SIGMOD'1990),pages 322-331,June 1990.
    [9]J.L.Bentley.Multidimensional binary search trees used for associative searching.Communications of the ACM,18(9):509-517,September 1975.
    [10]S.Berchtold,D.A.Keim,and H.P.Kriegel.The x-tree:an index structure for high dimensional data.In Proceedings of the 22nd International Conference on Very Large Data Bases(VLDB'1996),pages 28-39,September 1996.
    [11]A.Bharambe,M.Agrawal,and S.Seshan.Mercury:Supporting scalable multiattribute range queries.In Proceedings of the ACM SIGCOMM 2004 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2004),pages 353-366,August 2004.
    [12]C.Bohm,G.Klump,and H.Kriegel.Xz-ordering:A space-filling curve for objects with spatial extension.In Advances in Spatial Databases,6th International Syraposium(SSD'1999),pages 75-90,July 1999.
    [13]C.Buckley,A.Ginghal,and G.Salton.New retrieval approaches using smart.In The Fourth Text REtrieval Conference(TREC-4)(TREC'1995),pages 25-48,November 1995.
    [14]H.Burton.Space/time tradeoffs in hash coding with allowable errors.Communications of the A CM,13(7):422-426,1970.
    [15]J.Byers,J.Considine,and M.Mitzenmacher.Simple load balancing for distributed hash tables.In Proceedings for the 2nd International Work,shop on Peer-to-Peer Systems(IPTPS'2003),pages 80-87,February 2003.
    [16]M.Cai,M.Frank,J.Chen,and P.Szekely.Maan:A multi-attribute addressable network for grid information services.In Proceedings of 4th International Workshop on Grid Computing(GRID'2003),pages 184-191,November 2003.
    [17]S.Chaudhuri,G.Das,and U.Srivastava.Effective use of block-level sampling in statistics estimation.In Proceedings of the A CM SIGMOD International Conference on Management of Data(SIGMOD'2004),pages 287-298,June 2004.
    [18]B.Chun,D.Culler,and T.Roscoe.Planetlab:An overlay testbed for broad-coverage services.A CM SIGCOMM Computer Communication Review,33(3):3-12,2003.
    [19]P.Ciaccia,M.Patella,and P.Zezula.M-tree:An efficient access method for similarity search in metric spaces.In Pwceedings of the 23rd VLDB International Conference(VLDB'1997),pages 426-435,September 1997.
    [20]CiteSeer.Citeseer homepage,http://citeseer.nj.nec.com/cs,2003.
    [21]I.Clarke,O.Sandberg,B.Wiley,and T.Hong.Preenet:A distributed anonymous information storage and retrieval system.In Proceedings of ICSI Workshop on Design Issues in Anonymity and Unobservability(ICSI'2001),pages 46-66,July 2001.
    [22]D.Comer.The ubiquitous b-tree.ACM Computing Surveys,11(2):121-137,1979.
    [23]Limewire Host Count.Limewire host count homepage.http://www.limewire.com/english/content/netsize.shtml,2004.
    [24]A.Crainiceanu,P.Linga,J.Gehrke,and J.Shanmugasundaram.Querying peer-to-peer networks using p-trees.In Proceedings of the Seventh International Workshop on the Web and Databases(WebDb'04),pages 25-30,June 2004.
    [25]A.Crainiceanu,P.Linga,A.Machanavajjhala,J.Gehrke,and J.Shanmugasundaram.P-ring:an index structure for peer-to-peer systems.Technical Report,TR2004-1946,New York:Cornell University,2004.
    [26]F.M.Cuenca-Acuna,C.Peery,R.P.Martin,and T.D.Nguyen.Planetp:Using gossiping to build content addressable peer-to-peer information sharing communities.In Proceedings of 12th International Symposium on High-Performance Distributed Computing(HPDC'2003),pages 236-246,June 2003.
    [27]E.Dekel,S.Peng,and S.Iyengar.Optimal parallel algorithms for constructing and maintaining a balanced m-way search tree.Int'l Journal of Parallel Programming,15(5):503-528,1986.
    [28]A.Demers,D.Greene,C.Hauser,W.Irish,J.Larson,S.Shenker,H.Sturgis,D.Swinehart,and D.Terry.Epidemic algorithm for replicated database maintenance.In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing(PODC'1987),pages 1-12,August 1987.
    [29]FidoNet.Fidonet homepage,http://www.fidonet.org/.
    [30]A.Fisk.Gnutella dynamic query protocol.http://groups.yahoo.com/gwup/the_gdf/files/proposals/working %20proposals/search/dynamic%20Querying/v0.1,2003.
    [31]A.Fisk.Gnutella ultrapeer query routing.http://groups.yahoo.com/group/the_gdf/files/proposals/working %20proposals/search/Ultrapeer%20QRP/v0.1,2003.
    [32]G.W.Furnas,S.Deerwester,S.T.Dumais,T.K.Landauer,R.A.Harshman,L.A.Streeter,and K.E.Lochbaum.Information retrieval using a singular value decomposition model of latent semantic structure.In Proceedings of the 11th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'1988),pages 465-480,June 1988.
    [33]P.Ganesan,B.Yang,and H.G.Molina.One torus to rule them all:Multidimensional queries in p2p systems.In Proceedings of the 7th International Workshop on the Web and Databases(WebDB'2004),pages 19-24,June 2004.
    [34]H.Garcia-Molina,J.D.Ullman,and J.Widom.Database systems:The complete book.Prentice Hall,2001.
    [35]C.Gkantsidis,M.Mihail,and A.Saberi.Random walks in peer-to-peer networks.In Proceedings of IEEE INFOCOM 2004(INFOCOM'2004),pages 120-130,March 2004.
    [36]Gnutella.Gnutella homepage,http://www.gnutella.wego.com,2003.
    [37]N.S.Good and A.Krekelberg.Usability and privacy:a study of kazaa p2p file-sharing.In Proceedings of the SIGCHI conference on Human factors in computing systems(SIGCHI'2003),pages 137-144,April 2003.
    [38]A.Gupta,D.Agrawal,and A.El Abbadi.Approximate range selection queries in peer-to-peer systems.In Proceedings of First Biennial Conference on Innovative Data Systems Research(CIDR' 2003),pages 141-151,January 2003.
    [39]A.Guttman.R-trees:a dynamic index structure for spatial searching.In Proceedings of the 1984 ACM SIGMOD international conference on Management of Data(SIGMOD'1984),pages 47-57,June 1984.
    [40]P.Haas and C.KSnig.A hi-level bernoulli scheme for database sampling.In Proceedings of the A CM SIGMOD International Conference on Management of Data(SIGMOD'2004),pages 275-286,June 2004.
    [41]A.Y.Halevy,Z.Ives,P.Mork,and I.Tatarinov.Piazza:Data management infrastructure for semantic web applications.In Proceedings of the Twelfth International World Wide Web Conference(WWW'2003),pages 556-567,May 2003.
    [42]A.Y.Halevy,Z.Ives,D.Suciu,and I.Tatarinov.Schema mediation in peer data management systems.In Proceedings of the 19th International Conference on Data Engineering(ICDE'2003),pages 505-516,March 2003.
    [43]M.Harren,J.M.Hellerstein,R.Huebsch,B.T.Loo,S.Shenker,and I.Stoica.Complex queries in dht-based peer-to-peer networks.In Proceedings of the first International Workshop on Peer-to-Peer Systems(IPTPS'2002),pages 242-259,March 2002.
    [44]N.Harvey,M.B.Jones,S.Saroiu,M.Theimer,and A.Wolman.Skipnet:A scalable overlay network with practical locality properties.In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS'2003),pages 113-126,March 2003.
    [45]K.Hinrichs and J.Nievergelt.The grid file:A data structure designed to support proximity queries on spatial objects.In Proceedings of the International Workshop on Graphtheoretic Concepts in Computer Science(WG'1983),pages 100-113,1983.
    [46]R.Huebsch,J.M.Hellerstein,N.Lanham,B.T.Loo,S.Shenker,and I.Stoica.Querying the internet with pier.In Proceedings of 29th International Conference on Very Large Data Bases(VLDB'2003),pages 321-332,September 2003.
    [47]ICQ.Icq homepage,http://www.icq.com/,2003.
    [48]Jabber.Jabber homepage,http://www.jabber.org,2003.
    [49]H.V.Jagadish,B.C.Ooi,and Q.H.Vu.Baton:a balanced tree structure for peer-to-peer networks.In Proceedings of the 31st International Conference on Very Large Data Bases(VLDB'2005),pages 661-672,August 2005.
    [50]JXTA.Jxta project homepage,http://www.jxta.org/,2003.
    [51]D.R.Karger and M.Ruhl.Simple efficient load balancing algorithms for peer-to-peer systems.In Pwceedings for the third International Workshop on Peer-to-Peer Systems(IPTPS'2004),pages 131-140,February 2004.
    [52]KaZaA.Kazaa homepage,http://www.kazaa.com,2003.
    [53]I.S.Kim,Y.H.Kang,and Y.I.Eom.An efficient contents discovery mechanism in pure p2p environments.In Proceedings of Grid and Cooperative Computing,Second International Workshop(GCC'2003),pages 420-427,December 2003.
    [54]V.King and J.Saia.Choosing a random peer.In Proceedings of the Twenty-Third Annual A CM Symposium on Principles of Distributed Computing(PODC'2004),pages 125-130,July 2004.
    [55]I.Klampanos and J.Jose.an architecture for information retrieval over semicollaborating peer-to-peer networks.In Proceedings of the 2004 A CM Symposium on Applied Computing(SAC'2004),pages 1078-1083,March 2004.
    [56]D.E.Knuth.The art of computer programming.Addison-Wesley Professional,3,1998.
    [57]J.Lee,H.Lee,S.Kang,S.M.Kim,and J.Song.Ciss:An efficient object clustering framework for dht-based peer-to-peer applications.In Proceedings of Databases,Information Systems,and Peer-to-Peer Computing (DBISP2P'2004),pages 215-229,August 2004.
    [58]J.Li,B.T.Loo,J.Hellerstein,Fi Kaashoek,D.Karger,and R.Morris.On the feasibility of peer-to-peer web indexing and search.In Proceedings of the 2nd International Workshop on Peer-to-Peer Systems(IPTPS'2003),pages 207-215,February 2003.
    [59]C.Y.Liau,W.S.Ng,Y.Shu,K.-L.Tan,and S.Bressan.Efficient range queries and fast lookup services for scalable p2p networks.In Proceedings of 2nd International Workshop On Databases,Information Systems and Peer-to-Peer Computing(DBISP2P'2004),pages 78-92,August 2004.
    [60]M.Lupu and B.Yu.Hiwarpp-hierarchical wavelet-based retrieval on peer-to-peer network.Pwceedings of the 22nd International Conference on Data Engineering(ICDE'2006),page 133,April 2006.
    [61]Nancy A.Lynch.Distributed algorithms.Morgan Kaufmann Publishing,Burlington,MA,USA,1997.
    [62]D.Malkhi,M.Naor,and D.Ratajczak.Viceroy:A scalable and dynamic emulation of the butterfly.In Proceedings of the Twenty-First Annual ACM Symposium on Principles of Distributed Computing(PODC'2002),pages 183-192,July 2002.
    [63]MSN Messenger.Msn messenger homepage,http://messenger.msn.com/,2003.
    [64]Microsoft.Microsoft.net homepage,http://www.microsoft.com/net/,2003.
    [65]Morpheus.Morpheus homepage,http://www.morpheus.com,2003.
    [66]Napster.Napster homepage,http://www.napster.com,2003.
    [67]W.Nejdl,B.Wolf,C.Qu,S.Decker,M.Sintek,A.Nacre,M.Nilsson,M.Palmer,and T.Rishch.Edutella:A p2p networking infrastructure based on rdf.In Proceedings of the 11th World Wide Web Conference(WWW'2002),pages 604-615,May 2002.
    [68]W.S.Ng,B.C.Ooi,K.L.Tan,and A.Y.Zhou.Peerdb:A p2p-based system for distributed data sharing.In Proceedings of the 19th International Conference on Data Engineering(ICDE'2003),pages 633-644,March 2003.
    [69]D.A.Nowitz.Uucp implementation description.UNIX Programmer's Manual,AT & T Bell Laboratories,1978.
    [70]OceanStor.Oceanstore homepage,http://oceanstore.cs.berkeley.edu/,2003.
    [71]A.Oram and editors et.al.Peer-to-peer:Harnessing the power of disruptive technologies.O'Reilly & Associates,2001.
    [72]J.Orenstein and T.Merrett.A class of data structures for associative searching.In Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems(PODS'1984),pages 181-190,April 1984.
    [73]M.T.Ozsu and P.Valduriez.Principles of distributed database systems.Prentice Hall,second edition,1999.
    [74]C.G.Plaxton,R.Rajaraman,and A.W.Richa.Accessing nearby copies of replicated objects in a distributed environment.In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures(SPAA '1997),pages 311-320,June 1997.
    [75]W.Pugh.A skip list cookbook.Technical Report CS-IR-2286.1,University of Maryland,Colledge Park,1989.
    [76]S.Ratnasamy,P.Francis,M.Handley,R.Karp,and S.Shenker.A scalable content-addressable network.In Proceedings of the A CM SIGCOMM 2001Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2001),pages 161-172,August 2001.
    [77]P.Reynolds and A.Vahdat.Efficient peer-to-peer keyword searching.In Proceedings of A CM/IFIP/USENIX International Middleware Conference(Middleware'2003),pages 21-40,June 2003.
    [78]S.E.robertson,S.Walker,and M.Beaulieu.Okapi at trec7:automatic ad hoc,filtering,vlc and filtering tracks.In Pwc.of TREC 7,pages 253-264,1999.
    [79]A.Rowstron and P.Druschel.Pastry:Scalable,distributed object location and routing for large-scale peer-to-peer systems.In Proceedings of the 18th IFIP/A CM International Conference on Distributed Systems Platforms(Middleware'2001),pages 329-350,Novemeber 2001.
    [80]M.Ruhl.Efficient algorithms for new computational models.Doctoral Dissertation,Massachusetts Institute of Technology,2003.
    [81]O.D.Sahin,A.Gupta,D.Agrawal,and A.El Abbadi.A peer-to-peer framework for caching range queries.In Proceedings of the 20th International Conference on Data Engineering(ICDE'2004),pages 165-176,March 2004.
    [82]G.Salton,A.Wong,and C.S.Yang.a vector space model for automatic indexing.Communications of the A CM,18(11):613-620,1975.
    [83]Scott Shenker.The data-centric revolution in networking.In Proceedings of 29th International Conference on Very Large Data Bases(VLDB'2003),page 15,September 2003.
    [84]S.M.Shi,Y.G.Wen,D.Wang,J.Yu,S.Qu,and M.Chen.making peer-to-peer keyword searching feasible using multi-level partitioning.In Proceedings of the third International Workshop on Peer-to-Peer Systems(IPTPS'2004),pages 151-161,February 2004.
    [85]Y.Shu,K-L.Tan,and Aoying Zhou.Adapting the content native space for load balanced indexing.In Proceedings of Databases,Information Systems,and Peer-to-Peer Computing(DBISP2P'2004),pages 122-135,August 2004.
    [86]A.Singla and C.rohrs.Ultrapeers:anothers step towards gnutella scalability.http://groups.yahoo,com/group/the_gdf/files/Proposals/working %20Proposals/Ultrapecr/version 1.0,26 November,2002.
    [87]E.Sit,F.Dabek,and J.Robertson.Usenetdht:A low overhead usenet server.In Proceedings of the 3rd International Workshop on Peer-to-Peer Systems (IPTPS'2004),pages 26-27,February 2004.
    [88]I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,and H.Balakrishnan.Chord:a scalable peer-to-peer lookup service for internet applications.In Proceedings of the ACM SIGCOMM 2001 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2001),pages 149-160,August 2001.
    [89]A.S.Tanenbaum.Computer networks.Prentice Hall International,1981.
    [90]C.Tang and S.Dwarkadas.Hybrid global-local indexing for efficient peer-to-peer information retrieval.In Pwceedings of 1st Symposium on Networked Systems Design and Implementation(NSDI'2004),pages 211-224,March 2004.
    [91]C.Tang,Z.Xu,and S.Dwarkadas.Peer-to-peer information retrieval using self-organizing semantic overlay networks.In Proceedings of the A CM SIG-COMM 2003 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2003),pages 175-186,August 2003.
    [92]C.Tang,Z.Xu,and M.Mahalingam.psearch:Information retrieval in structured overlays.A CM SIGCOMM Computer Communication Review,33(1):89-94,2003.
    [93]I.Tatarinov and A.Halevy.Efficient query reformulation in peer data management systems.In Proceedings of the A CM SIGMOD International Conference on Management of Data(SIGMOD'2004),pages 539-550,June 2004.
    [94]I.Tatarinov,P.Mork,Z.Ives,J.Madhavan,A.Y.Halevy,D.Suciu,N.Dalvi,X.Dong,Y.Kadiyska,and G.Miklau.the piazza peer data management project.ACM SIGMOD Record,32(3):47-52,2003.
    [95]D.Tsoumakos and N.Roussopoulos.A comparison of peer-to-peer search methods.In Proceedings of International Workshop on Web and Databases (WebDB'2003),pages 61-66,June 2003.
    [96]R.van Renesse.The importance of aggregation.In Pwceedings of Future Directions in Distributed Computing,pages 87-92,2003.
    [97]Dan S.Wallach.A survey of peer-to-peer security issues.In Software Security-Theories and Systems,Mext-NSF-JSPS International Symposium(ISSS'2002),pages 42-57,November 2002.
    [98]R.Weber,H.-J.Schek,and S.Blott.A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces.In Proceedings of 24rd International Conference on Very Large Data Bases(VLDB'1998),pages 194-205,August 1998.
    [99]D.A.White and R.Jain.Similarity indexing with the ss-tree.In Proceedings of the Twelfth International Conference on Data Engineering(ICDE'1996),pages 516-523,February 1996.
    [100]S.K.M.Wong,W.Ziarko,V.V.Raghavan,and P.C.N.Wong.On modeling of information retrieval concepts in vector spaces.ACM Transactions on Database Systems(TODS),12(3),1987.
    [101]B.Yang and H.Garcia-Molina.Ad hoc,self-supervising peer-to-peer search networks.Technical report,Stanford University,2003.
    [102]B.Yang and H.Garcia-Molina.Improving search in peer-to-peer networks.In Proceedings of the 22nd International Conference on Distributed Computing Systems(ICDCS'2002),pages 5-14,July 2002.
    [103]R.B.Yates and B.R.Neto.Modern information retrieval.Addison Wesley/Pearson,1999.
    [104]Z.Zhang,S.Zhou,W.N.Qian,and A.Y.Zhou.Keynote:Keyword search using nodes selection for text retrieval on dht-based p2p networks.In Proceedings of the 11th International Conference on Database Systems for Advanced Applications(DASFAA'2006),pages 797-806,March 2006.
    [105]B.Y.Zhao,J.Kubiatowicz,and A.Joseph.Tapestry:A fault-tolerant widearea application infrastructure.Technical Report UCB//CSD-01-1141,Berkeley Computer Science Division,University of California,2001.
    [106]A.Y.Zhou,L.H.Xu,and W.N.Qian.Adaptive probabilistic search over unstructured peer-to-peer computing systems.World Wide Web archive,9(4):537-556,2006.
    [107]S.G.Zhou,Z.Zhang,W.N.Qian,and A.Y.Zhou.Sipper:Selecting informative peers in structured p2p environment for.content-based retrieval.In Proceedings of the 22rid International Conference on Data Engineering(ICDE'2006),page 161,April 2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700