对等计算系统中的相似查询处理研究

英文题名：Similarity Search in Peer-to-Peer Systems
作者：徐林昊
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：对等计算 ; 相似查询 ; 索引 ; 缓存 ; 概率
英文关键词：peer-to-peer computing ; similarity search ; index ; cache ; probability
学位年度：2005
导师：周傲英
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2005-04-19

摘要

对等计算(peer-to-peer computing，简称P2P)已经成为了计算机科学领域的研究热点。在对等计算系统中，每个节点都是完全自治的，拥有相同的责任，扮演着双重角色—既可以是客户机(服务消费者)，也可以足服务器(服务提供者)，而且任意一个节点都可以随意地加入或退出系统。因此，对等计算系统是一个完全动态的、没有任何集中控制的分布式系统。对等计算模型具有许多潜在的优势，如扩展性强、鲁棒性好、资源可用性高等特点，特别适用于具有地理分布、资源异构、扩展性要求高、局部自治等特征的分布式系统。因而，对等计算模型推动了“以主机为中心(host-centric)”的传统互联网向“以数据为中心(data-centric)”的未来互联网的发展，被学术界和工业界公认为是重构基于互联网应用的关键技术之一。
     虽然，学术界已经取得了不少对等计算环境下的查询处理研究成果，但仍然存在着许多有待研究与解决问题。本文研究了对等计算环境下的相似查询问题，探索了对等计算环境下的基于路由索引、数据空间划分、协作缓存和概率模型的相似查询处理技术，旨在为现有的对等计算系统提供基于语义或者相似度的查询处理功能。本文的主要贡献有如下四个方面：
     1．将多维数据空间中的相似查询处理(similarity search)技术引入到无结构(unstructured)对等计算系统中，利用近似向量(vector approximation)技术和路由索引(routing index)技术，为系统中的每个节点建立基于近似向量的路由索引，使得用户查询能够准确地路由到并且有效地查询拥有相关数据资源的节点，实现无结构对等计算系统中的相似查询处理。另外，利用无结构对等计算系统中的网络自配置(self-reconfiguration)特性，通过动态调整节点在网络中的位置，使得与相似查询相关的节点保持位置邻近，进一步提高了系统的查询处理性能。仿真实验表明，该方法对无结构对等计算环境下的相似查询处理非常有效。
     2．将数据空间划分(space partitioning)技术引入到结构化(structured)对等计算系统中，通过选定的代表点(reference point)，将整个数据空间划分成没有任何重叠(overlap)的数据子空间。通过将代表点线性化，在节点、代表点和数据子空间三者之间建立起一一映射关系。利用传统的高维索引技术和基于分布式散列表(distributed hash table，或DHT)的资源查找和定位机制，使得高维数据空间中的相似查询处理在结构化对等计算系统上得以实现。此外，通过维护数据子空间之间的物理邻近(physical proximity)特征，降低了系统的查询路由代价；通过调整数据子空间的粒度，达到均衡系统负载(load balance)的目的。仿真实验表明，该方法能够有效地适应数据维度的增长和系统规模的扩展。
     3．针对关系查询处理，探索了基于协商(negotiation)的协作缓存技术(collaborative caching)，提出了一种基于网络传输代价的查询代价模型，用于评价不同查询计划的执行代价。在对等计算环境下，一个查询计划的执行代价可以被分解为子查询计划的执行代价。结合代价模型，利用协调重叠网络(collaborative overlap network)，通过查询请求节点(requester)和协调节点(coordinator)之间的协商，确定协作缓存的逻辑查询表达式和参与数据缓存的查询请求节点，实现了对等计算环境下的基于语义的查询处理。仿真和真实实验表明，该方法能够确定较优的数据缓存放置策略，降低系统的查询处理开销。尤其是在单个节点仅能贡献有限的存储资源的情况下，该方法的优势更为明显。4．针对基于主题(topic)的对等计算文件共享系统，研究了一种基于概率的相似查询处理技术。该技术的核心思想是利用概率模型(probabilistic model)描述共享主题之间的语义重叠度(overlap)以及节点对主题的信息覆盖度(coverage)，为节点建立起概率路由信息。相似查询处理算法以每个节点已有的概率信息为基础，依据推导出的邻居节点对查询主题的覆盖度，决定主题查询的搜索路径。此外，利用查询反馈的信息，通过更新路由查询的节点上的概率信息，使得这些节点能够为将来的主题查询选择更准确的查询搜索路径。模拟实验表明，该方法能够利用基于自反馈的概率更新算法，逐步改善查询处理的效果，提高查询处理的效率。
     总之，本文详细地介绍了四种相似查询处理方法的算法设计与实现，以及测试结果。这些方法是对现有对等计算环境下的查询处理技术的有益补充和改进。本文的研究工作建立在对当前已有技术的详尽分析与理论研究，以及大量的实验测试的基础上。实验和分析表明，与当前对等计算环境下的查询处理技术相比，上述方法在查询效率和资源利用率等方面具有优势。
Peer-to-peer computing (P2P) has become an extremely popular topic in computer science. In a P2P system, each peer is fully autonomic, has equal responsibility and plays the role of both a service costumer and a service provider. Moreover, peers can enter or leave the P2P network at any time. Thus, a P2P system is a type of fully dynamic distributed system without any central administration. The P2P computing paradigm has many advantages, such as, scalability, robustness, resource availability etc, and is especially suitable for distributed applications with properties of geographical distribution, heterogeneous resource, scalability and local autonomy. Hence, the P2P computing paradigm is driving the evolution from the host-centric Internet to the data-centric future Internet, and is thought of as a promising technology to reconstruct the future Internet-based applications.
    Despite of current achievements on P2P-based query processing, there are still a lot of problems need to be studied. To this end, this thesis is devoted to the issue of similarity search in P2P systems. It studies routing index, space partitioning, collaborative caching and probabilistic model-based similarity search techniques, in order to support semantic or similarity-based query processing methods above existing P2P systems. Major contributions of this thesis include:
    1. Similarity search in multi-dimensional data space is introduced to unstructured P2P systems. A simple yet effective routing index structure, called EVA-Index, is designed by combining both vector approximation and routing indexing techniques together. With the aid of EVA-Index, each peer can process similarity query with its local dataset, and route queries to promising peers with the desired data objects. Furthermore, each peer can reconfigure its neighboring peers to keep the relevant peers close by so as to improve system performance. Simulation shows that the proposed approach is efficient and effective to similarity query processing in unstructured P2P environments.
    2. An efficient space partitioning strategy is introduced to a structured P2P system. The whole data space is first partitioned based on a set of pre-generated reference points. Each reference point has an associated subspace and any pair of subspaces does not overlap with each other. As such, through building one-to-one mapping between nodes and reference points (and hence subspace), similarity search in high-dimensional space can be implemented by using the traditional high-dimensional indexing technique and distributed hash table-based resource location mechanism. Moreover, the routing cost of similarity search can be greatly reduced through capturing physical proximity of sub-spaces, and the load balance at nodes can be obtained by tuning the granularity of subspaces. Simulation shows the efficiency of the proposed method can be kept in term of dimensionality and network size increasing.
    3. The technique for collaborative cache, based on negotiation, is studied for supporting SQL query processing, and a cost model for evaluation of the network transmission cost of query plans is given. The cost of the query plans are estimated by using the cost of sub-query plans. The cost model is combined with collaborative overlap network, through negotiation between requesters and coordinators, to determine the logical expression and physical maintenance nodes of data cache. Thus, based on collaborative caching technique, the P2P system can implement semantic-based query processing. Simulation and real experiments show that the proposed method usually obtains near-optimized cache plans and reduces the cost of query processing, especially in the case that a single node can contribute limited storage space.
    4. A similarity search technique based on probabilistic information is investigated for the P2P file sharing application with hierarchical topics. This approach uses probabilistic model to describe the overlap between topics and the coverage of nodes to topics, and hence builds routing indices at nodes. Based on existing probabilistic information at nodes, the similarity search algorithm can deduce the coverage of neighboring nodes to the queried topic, so that a promising routing path can be determined. Further, using feedback of the previous queries, nodes that were responsible for routing topic queries can update their probabilistic information to guide future ones more efficiently. Simulation shows the proposed approach can gradually improve the search efficiency and effectiveness in a feedback-based way.
    In summary, this thesis gives a detailed description of the algorithm, implemen-
    tation and experimental evaluation of the above four similarity query processing approaches. These approaches are a kind of useful complement to and improvement on current query processing methods in the P2P environment. The work is based on complete survey and theoretical analysis along with extensive experimental evaluation. The experiments and analysis show that, compared with existing query processing methods for P2P systems, the approaches mentioned above have advantages in efficiency of both query processing and resource utilization.

引文

ⅰ 三层模型是指浏览器／Web服务器／应用服务器(Browser／Web Server／Application Server，或B／WS／AS)模型或客户机／应用服务器／数据库服务器(Client／Application Server／DatabaseServer，或c／AS／DBS)模型。
    ⅱ 与专用的P2P文件共享系统不同，这些系统一般不提供文件查找功能。
    ⅲ 在本文中，查询类型的分类主要依据了传统数据管理对查询的定义与分类，并综合了现有P2P环境下查询处理技术中对查询的定义、功能与应用场景的说明。
    ⅳ 关于P2P网络结构的分类原则将在第2章中详细介绍。
    ⅴ 度量空间(metric space)可以由一个二元组(D，d)表示。其中D表示特征值的域，d是这个域上的一种距离函数且必须三个性质：1)对称性(symmetry)．即d(x，y)=d(y，x)；2)非负性(non negativity)，即d(x，y)≥0；3)三角不等式(triangle inequality)，即d(x，y)≤d(x，z)+d(z，y)。
    ⅰ 本文给出的分类方法并不保证子类之间没有重叠。实际上，由于每个P2P系统都有可能具备多种特征，所以存在着同一个P2P系统属于多个子类的情况。
    ⅱ 基于Skip List的结构化P2P系统也是一种层次型P2P系统。
    ⅲ 原文为: "...In fact, we wish to stress the point that it may be strategically unwise to discuss peer-to-peer databases at all, with their attendant complexities in software and administration associated with database storage. Instead, we focus on peer-to-peer query processing, and separate it from the problem of storage semantics and administration."[44]以及"However, we do argue that massively distributed database research can and should proceed without waiting for breakthroughs on the semantic frontier."[48]
    ⅳ 突变查询处理是Niagara[73]项目中的一部分，专门用于分布式环境下的XML查询处理。
    ⅰ 关于度量空间的定义详见第一章中的注释。
    ⅰ 范围查询的形式化定义详见第三章中的定义3．1。
    ⅱ 在实现中，将一个数据对象散列为一个标识，并用这个标识作为查询。
    ⅰ 目前，在BestPeer[70]平台上实现了CoCache系统。
    ⅱ 查询计划中的节点是查询树中的节点，是一个关系查询表达式。
    ⅲ 本文暂不考虑两个关系查询之间的等价问题。
    ⅳ 目前，CoCache系统使用IBM DB2 UDB 7.1作为每个节点的查询执行引擎。
    ⅴ 生成器下载白[40]。
    ⅵ DBLP是Digital Bibliography and Library Project的缩写，其前身是DataBase systems and Logic Programming文献目录。
    ⅶ Article表中的keywords属性中存放了从标题title中提取的关键词。
    [1]Karl Aberer. Special issue on peer-to-peer data management: Guest editor's introduction. SIGMOD Record, 32(3):21-22, September 2003.
    [2]AIM. American online instant message homepage, http://www.aim.com/,2003.
    [3]James Aspnes and Gauri Shah. Skip graphs. In Proceedings of the Fourteenth Annual A CM-SIAM Symposium on Discrete Algorithms (SODA '2003), pages 384-393, January 2003.
    [4]Franz Aurenhammer. Voronoi diagrams-a survey of a fundamental geometric data structure. ACM Computing Survey, 23(3): 345-405, September 1991.
    [5]Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. ACM Press, Inc., 1999.
    [6]N. Beckmann, H. -P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robnst access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD'1990), pages 322-331, May 1990.
    [7]J. L. Bentley. Multidimensional binary search trees used for associative searching. Communication of the ACM, 18(9): 509-517, September 1975.
    [8]Michael K. Bergman. The deep web: Surfacing hidden value. Journal of Electronic Publishing, 7(1), August 2001.
    [9]Philip A. Bernstein, Fausto Giunchiglia, Anastasios Kementsietsidis, John Mylopoulos, Luciano Serafini, and Ilya Zaihrayeu. Data management for peer-to-peer computing: A vision. In Proceedings of the Fifth International Workshop on the Web and Databases (WebDB'2002), pages 89-94, June 2002.
    [10]BestPeer. Bestpeer project homepage, http://xena1.ddns.correp.nus.edu.sg/p2p/, 2003.
    [11]Burton H. Bloom. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM (CACM'1970), 13(7): 422-426, 1970.
    [12]Paul Bratley and Bennett L. Fox. Algorithm 659. implementing sobol's quasirandom sequence generator. A CM Transactions on Mathematical Software, 14(1): 88-100, March 1988.
    [13]Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, and Eve Maler. Extensible markup language (xml) 1.0 (second edition). W3C Recommendation 6 October 2000. Available at: http://www.w3.org/TR/REC-xml, 2000.
    [14]Chris Bucldey. Implementation of the smart information retrieval system. Technical Report TR-85-686, Cornell University, 1985.
    [15]Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott Shenker. Making gnutella-like p2p systems scalable. In Proceedings of the A CM SIGCOMM 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'2003), pages 407-418, August 2003.
    [16]Brent N. Chun, Joseph M. Hellerstein, Ryan Huebsch. Shawn R. Jeffery. Boon Thau Loo, Sam Mardanbeigi, Timothy Roscoe, Seen C. Rhea, Scott Shenker, and Ion Stoica. Querying at internet-scale. In Proceedings of the A CM SIGMOD International Conference on Management of Data (SIGMOD'2004), pages 935-936, June 2004.
    [17]CiteSeer. Citeseer homepage, http://citeseer.nj.nec.com/cs, 2003.
    [18]Brian F. Cooper and Hector Garcia-Molina. Studying search networks with sil. In Proceedings for the 2rid International Workshop on Peer-to-Peer Systems (IPTPS'2003), pages 216-224, February 2003.
    [19]Arturo Crespo and Hector Garcia-Molina. Routing indices for peer-to-peer systems. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'2002), pages 21-34, July 2002.
    [20]Francisco Matias Cuenca-Acuna, Christopher Peery, Richard P. Martin, and Thu D. Nguyen. Planetp: Using gossiping to build content addressable peer-to-peer information sharing communities. In Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC'2003), pages 236-246, June 2003.
    [21]D~2OL. D~2ol homepage, http://www.d2ol.com/, 2002.
    [22]Mayur Datar. Butterflies and peer-to-peer networks. In Proceedings of the lOth Annual European Symposium of Algorithms (ESA'2002), pages 310-322, September 2002.
    [23]DBLP. Dblp xml records, http://dblp.uni-trier.de/xml/, 2003.
    [24]DBLP. Dblp homepage, http://dblp.uni-trier.de/, 2004.
    [25]Mark de Berg, Otfried Schwarzkopf, Marc van Kreveld, and Mark Overmars. Computational geometry: algorithms and applications. Springer-Verlag, 1997.
    [26]A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithm for replicated database maintenance. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (PODC'1987), pages 1-12, August 1987.
    [27]Daniela Florescu, Daphne Koller, and Alon Levy. Using probabilistic information in data integration. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB'1997), pages 216-225, August 1997.
    [28]Folding@home. Folding@home homepage, http://fording.stanford.edu/, 2003.
    [29]Ian Foster and Adriana Iamnitchi. On death, taxes, and the convergence of peer-to-peer and grid computing. In Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS'2003), pages 118-128, February 2003.
    [30]Volker Gaede and Oliver Günther. Multidimensional access methods. ACM Computing Surveys, 30(2): 170-231, June 1998.
    [31]P. Ganesan, B. Yang, and H. G. Molina. One torus to rule them all: Multidimensional queries in p2p systems. In Proceedings of the 17th International Workshop on the Web and Databases (WebDB'2004), pages 19-24, June 2004.
    [32]Prasanna Ganesan, Mayank Bawa, and Hector Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'2004), pages 444-455, August 2004.
    [33]Prasanna Ganesan, Krishna Gummadi, and Hector Garcia-Molina. Canon in g major: designing dhts with hierarchical structure. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'2004), pages 263-272, March 2004.
    [34]Hector Garcia-Molina. Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice Hall, 2001.
    [35]Bugra Gedik and Ling Liu. Peercq: A decentralized and self-configuring peer-to-peer information monitoring system. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS'2003), pages 490-499, May 2003.
    [36]Genome@home. Genome@home homepage, http://www.stanford.edu/group/pandegroup/genome/, 2003.
    [37]Gnutella. Gnutella developement homepage, http://gnutella.wego.com/, 2003.
    [38]Google. Google directory, http://www.google.com/dirhp, 2004.
    [39]Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, and Dan Suciu. What can databases do for peer-to-peer? In Proceedings of the 4th International Workshop on the Web and Databases (WebDB'2001), pages 31-36, May 2001.
    [40]GT-ITM. Gt-itm: Georgia tech internetwork topology models, http://www.cc.gatech.edu/fac/Ellen.Zegura/gt-itm/gt-itm.tar.9z, 2003.
    [41]Alon Halevy, Oren Etzioni, Anhai Doan, Zachary Ives, Jayant Madhavan, Luke McDowell, and Igor Tatarinov. Crossing the structure chasm. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR '2003), January 2003.
    [42]Alon Halevy, Zachary Ives, Peter Monk, and Igor Tatarinov. Piazza: Data management infrastructure for semantic web applications. In Proceedings of the 12th International World Wide Web Conference (WWW'2003), pages 556-567, May 2003.
    [43]Alon Halevy, Zachary Ires, Dan Suciu, and Igor Tatarinov. Schema mediation in peer data management systems. In Proceedings of the 19th International Conference on Data Engineering (ICDE'2003), pages 505-516, March 2003.
    [44]Matthew Harren, Joseph M. Hellerstein, Ryan Huebsch, Boon T. Loo, Scott Shenker, and Ion Stoica. Complex queries in dht-based peer-to-peer networks. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS'2002), pages 242-259, March 2002.
    [45]N. J. A. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman. Skipnet: A scalable overlay network with practical locality properties. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS'2003), pages 113-126, March 2003.
    [46]David Heckerman. A tutorial on learning with bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, Advanced Technology Division, 1996.
    [47]Ryan Huebsch, Brent N. Chun, Joseph M. Hellerstein, Boon Thau Loo, Petros Maniatis, Timothy Roscoe, Scott Shenker, Ion Stoica, and Aydan R. Yumerefendi. The architecture of pier: an internet-scale query processor. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'2005), pages 28-43, January 2004.
    [48]Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica. Querying the internet with pier. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'2003), pages 321-332, September 2003.
    [49]ICQ. Icq homepage, http://www.icq.com/, 2003.
    [50]Stratos Idreos, Manolis Koubarakis, and Christos Tryfonopoulos. P2p-diet: an extensible p2p service that unifies ad-hoc and continuous querying in superpeer networks. In Proceedings of the A CM SIGMOD International Conference on Management of Data (SIGMOD'2004), pages 933-934, June 2004.
    [51]Jabber. Jabber homepage, http://www.jabber.org/, 2003.
    [52]H. V. Jagadish. Linear clustering of objects with multiple atributes. In Proceedings of the 1990 A CM SIGMOD International Conference on Management of Data (SIGMOD'1990), pages 332-342, May 1990.
    [53]JXTA. Jxta project homepage, http://www.jxta.org/, 2003.
    [54]Timour Katchaounov, Vanja Josifovski, and Tore Risch. Scalable view expansion in a peer mediator system. In Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA '2003), pages 107-116, March 2003.
    [55]KaZaA. Kazaa homepage, http://www.KaZaA.com/, 2005.
    [56]Anastasios Kementsietsidis, Marcelo Arenas, and Renée J. Miller. Managing data mappings in the hyperion project. In Proceedings of the 19th International Conference on Data Engineering (ICDE'2003), pages 732-734, March 2003.
    [57]Anastasios Kementsietsidis, Marcelo Arenas, and Renée J. Miller. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD'2003), pages 325-336, June 2003.
    [58]Manolis Koubarakis, Christos Tryfonopoulos, Stratos Idreos, and Yannis Drougas. Selective information dissemination in p2p networks: Problems and solutions. SIGMOD Record, 32(3): 71-76, 2003.
    [59]Boon Thau Loo, Joseph M. Hellerstein, Ryan Huebsch, Scott Shenker, and Ion Stoica. Enhancing p2p file-sharing with an internet-scale query processor. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'2004), pages 432-443, August 2004.
    [60]Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and replication in unstructured peer-to-peer networks. In Proceedings of the 16th International Conference on Supercomputing (ICS'2002), pages 84-95, June 2002.
    [61]Dahlia Malkhi, Moni Naor, and David Ratajczak. Viceroy: A scalable and dynamic emulation of the butterfly. In Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing (PODC'2002), pages 183-192, July 2002.
    [62]MSN Messenger. Msn messenger homepage, http://messenger.msn.com/,2003.
    [63]Microsoft. Microsoft. net homepage, http://www.microsoft.com/net/, 2003.
    [64]Tom M. Mitchell. Machine Learning. McGraw-Hill Companies, Inc., 1997.
    [65]Morpheus. Morpheus homepage, http://www.morpheus.com/, 2003.
    [66]Napster. Napster homepage, http://www.napster.com/, 2003.
    [67]W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek, A. Naeve, M. Nilsson, M. Palmer, and T. Risch. Edutella: A p2p networking infrastructure based on rdf. In Proceedings of the 11th International World Wide Web Conference (WWW'2002), pages 604-615, 2002.
    [68]Wolfgang Nejdl, Wolf Siberski, and Michael Sintek. Design issues and challenges for rdf- and schema-based peer-to-peer systems. SIGMOD Record, 32(3): 41-46, 2003.
    [69]Netscape. Open directory project homepage, http://dmoz.org/about.html/, 2002.
    [70]Wee Siong Ng, Beng Chin Ooi, and Kian-Lee Tan. Bestpeer: A self-configurable peer-to-peer system. In Proceedings of the 18th International Conference on Data Engineering (ICDE'2002), page 272, February 2002.
    [71]Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, and Aoying Zhou. Peerdb: A p2p-based system for distributed data sharing. In Proceedings of the 19th International Conference on Data Engineering (ICDE'2003), pages 633-644, March 2003.
    [72]Wee Siong Ng, Wee Hyong Tok, Beng Chin Ooi, Kian-Lee Tan, and Aoying Zhou. Efficient distributed cq processing using peers. In Poster Proceedings of the 12th International World Wide Web Conference (WWW'2003), May 2003.
    [73]Niagara. Niagara project homepage, http://datalab.cs.pdx.eduniagara/, 2005.
    [74]OceanStore. Oceanstore project homepage, http://oceanstore.cs.berkeley.edu/, 2003.
    [75]Beng Chin Ooi, Yanfeng Shu, and Kian-Lee Tan. Relational data sharing in peer-based data management systems. SIGMOD Record, 32(3): 59-64, 2003.
    [76]Andy Oram and et. al, editors. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O'Reilly & Associates, 2001.
    [77]J. Orenstein and T. Merrett. A class of data structures for associative searching. In Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems (PODS'1990), pages 181-190, 1984.
    [78]M. Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems. Prentice Hall, second edition, 1999.
    [79]Yahoo! Pager. Yahoo messenger homepage, http://messenger.yahoo.com/, 2003.
    [80]Christopher R. Palmer and J. Gregory Steffan. Generating network topologies that obey power law. In Proceedings of the Global Internet Symposium (GLOBECOM'2000), November 2000.
    [81]V. Papadimos and D. Maier. Mutant query plans. Information and Software Technology, 44(4): 197-206, March 2002.
    [82]Vassilis Papadimos, David Maier, and Kristin Tufte. Distributed query processing and catelogs for peer-to-peer systems. In Proceedings of the 1st Biennial Conference on Innovated Data Systems Research (CIDR '2003), January 2003.
    [83]C. Greg Plaxton, Rajmohan Rajaraman, and Andrea W. Richa. Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of the 9th Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA '1997), pages 311-320, June 1997.
    [84]Publius. Publius homepage, http://cs1.cs.nyu.edu/~waldman/publius/, 2003.
    [85]William Pugh. A skip list cookbook. Technical Report CS-TR-2286.1, University of Maryland, Colledge Park, 1989.
    [86]Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. McGraw-Hill, third edition, 2002.
    [87]Sylvia Ratnasamy, Paul Francis, Kark Handley, Richard Karp, and Scott Shenker. A scalable content-addressable network. In Proceedings of the A CM SIGCOMM 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'2001), pages 161-172, August 2001.
    [88]P. Reynolds and A. Vahdat. Efficient peer-to-peer keyword searching. In Proceedings of ACM/IFIP/USENIX International Middleware Conference (Middleware'2003), pages 21-40, June 2003.
    [89]A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM In ternational Conference on Distributed Systems Platforms (Middleware'2001), pages 329-350, November 2001.
    [90]Mario Schlosser, Michael Sintek, Stefan Decker, and Wolfgang Nejdl. Hypercup-hypercubes, ontologies and efficient search on p2p networks. In Proceedings of International Workshop on Agents and Peer-to-Peer Computing (AP2PC'2002), July 2002.
    [91]Cristina Schmidt and Manish Parashar. Flexible information discovery in decentralized distributed systems. In Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC'2003), pages 226-235, June 2003.
    [92]SETI@home. Seti@home homepage, http://setiathome.ssl.berkeley.edu/, 2003.
    [93]Shareaza. Shareaza homepage, http://www.shareaza.com/, 2005.
    [94]Heng Tao Shen, Yan Feng Shu, and Bei Yu. Efficient semantic-based content search in p2p network. IEEE Transactions on Knowledge and Data Engineering, 17(7): 813-826, July 2004.
    [95]Scott Shenker. The data-centric revolution in networking. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'2003), page 15, September 2003.
    [96]Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: a scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM 2001 Conference on Applica- tions, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'2001), pages 149-160, August 2001.
    [97]C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proceedings of the A CM SIGCOMM 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'2003), pages 175-186, August 2003.
    [98]Chunqiang Tang, Zhichen Xu, and Mallik Mahalingam. psearch: Information retrieval in structured overlays. A CM SIGCOMM Computer Communication Review, 33(1): 89-94, 2003.
    [99]Jeffrey D. Ullman. A survey of new directions in database system. In Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA '2003), page 3, March 2003.
    [100]Roger Weber, Hans-J. Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB'1998), pages 194-205, August 1998.
    [101]Beverly Yang and Hector Garcia-Molina. Comparing hybrid peer-to-peer systems. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'2001), pages 561-570, September 2001.
    [102]Beverly Yang and Hector Garcia-Molina. Improving search in peer-to-peer networks. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'2002), pages 5-14, July 2002.
    [103]Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the 4th Annual ACM/SIGACTSIAM Symposium on Discrete Algorithms (SODA '1993), pages 311-321, 1993.
    [104]Cui Yu, Beng Chin Ooi, Kian-Lee Tan, and H. V. Jagadish. Indexing the distance: An efficient method to knn processing. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'2001), pages 421-430, September 2001.
    [105]Ellen W. Zegura, Kenneth L. Calvert, and Samrat Bhattacharjee. How to model an internetwork. In Proceedings of the 5th Annual Joint Conference of the IEEE Computer and Communications (INFOCOM'1996), volume 2, pages 594-602, March 1996.
    [106]Chi Zhang, Arvind Krishnamurthy, and Randolph Y. Wang. Skipindex: Towards a scalable peer-to-peer index service for high dimensional data. Technical Report TR-703-04, Princeton University, 2004.
    [107]Hui Zhang, Ashish Goel, and Ramesh Govindan. Using the small-world model to improve freenet performance. In Proceedings the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'2002), June 2002.
    [108]Ben Y. Zhao, John Kubiatowicz, and Anthony D. Joseph. Tapestry: a faulttolerant wide-area application infrastructure. Technical Report CSD-01-1141, University of California at Berkeley, 2001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700