基于对等网络的大地规模内容检索研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络技术的迅猛发展和网络应用的迅速普及,互联网日益形成一个巨大的分布式信息库。互联网应用产生的超大规模信息对现有的网络数据管理基础设施提出了新的严峻挑战。互联网信息库的无限扩张性和与生俱来的分布式特性使研究非集中式的数据管理和共享机制成为一种必然趋势。基于分布式技术的大规模内容检索研究具有重要的学术价值和应用价值。
     对等网络(Peer-to-Peer Network,简称P2P)打破了传统的“客户机/服务器”模式,以“自主、平等”的原则将处于网络边缘的计算、存储、通信、信息等各种资源高效地共享起来,形成分布式的协作网络。对等计算模型凭借其分布式、易扩展、容错性高等优点,日益在互联网信息共享方面显示出巨大潜力。
     然而,对等网络的分布式、动态性、异构性等特性,又给基于对等网络的大规模内容检索带来了巨大的挑战。首先,虽然分布式哈希表技术使现有的对等网络系统能准确、快速地定位全局数据对象,但分布式哈希映射的精确性与用户查询语义多样性的矛盾,却是构建大规模对等网络内容检索系统带来难以突破的瓶颈;其次,由于缺乏集中的索引服务器,传统集中式信息检索的模型、算法和技术在分布式对等网络环境下无法适用。
     大规模分布式内容检索系统的核心问题,即如何建立高效的分布式索引以支持大规模网络环境下的复杂内容检索,在国际学术界至今并没有有效解决。基于对等网络的大规模内容检索是一个极具挑战性的开放性课题。
     本文从这一核心问题出发,通过扩展传统对等网络的概念、结构、资源描述与组织、资源发现与路由、结果融合与排序等,在大规模对等网络内容检索方面作了一系列研究,提出了一套行之有效的新理论、新方法,全面、深入、系统地论述了利用对等网络构建大规模分布式文本内容检索系统的解决方案和关键技术。
     具体来说,本文主要提出了以下创新性理论或方法:
     1.分布式集合运算布隆滤波优化理论及其多关键字搜索协议:基于传统的分布式哈希表全局索引,进行多关键字搜索,需要在广域网上进行分布式集合运算,这将给系统带来难以接受的网络开销。本文针对此难题,提出了一套针对分布式集合运算的布隆滤波优化理论,并基于此优化理论设计了一种高效的多关键字搜索协议PWEB。在美国国家标准研究院发布的TREC WT10G大规模文本检索测试集以及主流商业Web搜索引擎的查询日志上对PWEB进行了大规模的模拟测试。实验结果表明,相对现有结构化对等网内容搜索协议,PWEB协议将查询所需的网络流量显著降低了73%,同时将查询延迟降低了41%。
     2.多维分布式哈希表技术及其全文索引、检索及排序策略:提出一种新颖的多维分布式哈希表技术用于更高效的支持全文索引和检索,并设计了一种分布式多维索引剪枝算法TSS。基于TREC WT10G数据集和主流商业搜索引擎查询日志的大规模实验结果表明,TSS显著地将分布式多维索引空间复杂度从O(2n)降低到了O(nlog n);将查询网络流量降低到现有算法的28%;大规模实验结果同时显示TSS算法获得了与传统集中式信息检索算法相当的检索质量和性能。
     3.基于语义拓扑的联邦式搜索策略:基于自主开发的P2P文献共享平台SemreX,证实了对等网内容共享网络中的“兴趣局部性”原理,基于此原理提出一种结点内容相似性度量模型,并采用此模型将对等网络中的相似结点聚集起来形成语义覆盖网络,同时进一步探索了如何利用“small world”特性改进语义覆盖网络的拓扑属性。对提出的算法进行的全面仿真测试结果显示基于语义覆盖的SemreX联邦式搜索协议将传统无结构搜索协议的总体性能提高了81.6%。
     4.难度感知的混合式对等网络搜索协议:通过结合结构化DHT和无结构对等网搜索协议各自的优点,混合对等网搜索策略能有效提高对等网系统的检索效率。混合对等网搜索策略的关键问题是如何高效估计网络中拥有与查询相关数据的结点的数量,并据此选择最优的查询搜索策略。现有研究基于这样的假设:如果网络中与某查询匹配的相关数据很多,则这些数据广泛地分布在网络中,对此查询使用无结构搜索协议更有效;反之,则采用分布式哈希表查找更有效。从“兴趣局部性原理”出发,指出前人的研究假设并不成立,与查询匹配的大量数据往往聚集在少量结点上,而使无结构搜索协议效率显著降低。并进一步提出了一种查询难度感知(Difficulty-aware)的混合搜索协议QRank,它能够根据查询关键字在网络中出现的频率等统计信息有效预测各种搜索策略针对此查询的搜索效率,并智能地选择高效的搜索策略。基于Gnutella网络的真实拓扑和查询跟踪数据对QRank的协议进行了大规模全面的系统仿真测试。实验结果表明QRank混合搜索协议显著地提高了混合对等系统的搜索性能。相对于现有混合搜索协议,QRank将系统查全率提高了21%,将查询延迟降低了26%,同时将查询产生的平均网络流量降低了40%。
The Internet has become an extremely large-scale distributed information repository with the prevalence of network applications and the rapid development of networking techniques. The extremely large-scale information with unlimited expansion brings new challenges to the current Internet data management infrastructure and makes the topic of buding decentralized data management and sharing infrastructure the inevitable research trend. Large-scale distributed content retrieval has an important academic and practical value.
     Peer-to-Peer (P2P) network is one of the state-of-the-art distributed systems built over the Internet. It breaks the traditional client-server model by introducing the concept of peer which basically acts as both a client and a server in the network. The P2P model efficiently harnesses resources such as computation, storage, communication, and information residing at the edges of the Internet to form a distributed cooperative network in an independent and equal manner. P2P networks demonstrate many good features such as decentralization, good scalability, and the ability to tolerate faults, hence it indeed has a great potential to become a major infrastructure for sharing information on the Internet.
     However, P2P networks are distributed, dynamic, and heterogeneous in nature, making it extremely difficult to build a large-scale content retrieval system without centralized repository and index. First, by utilizing distributed hash tables (DHTs), existing P2P data sharing systems can only provide exact-match style search support (such as file-name based search) with poor complex content retrieval capacity. Second, without a cetralized index existing state-of-the-art retrieval models and algorithms extensively used in centralized search engines are not applicable to such a distributed environment. Overall, the key issue - how to build efficient global index that supports complex queries and enables novel distributed retrieval model has not been effectively solved up to now, making large-scale P2P content retrieval a challenging and open research issue.
     To address this issue, we extend dimensions of current P2Ps including the concept, architecture, resource description and management, routing mechanisms, and result merging and ranking mechanisms. In this dissertation, we focus on the solutions to build scalable content retrieval in large-scale P2P networks, and propose a set of novel theories and enabling techniques.
     Specifically, the contributions of this dissertation are fourfold.
     1. Theory of optimizing Bloom Filter (BF) for distributed set operations and multi-keyword search protocol over structured DHTs using optimal BF settings. Multi-keyword searching over DHTs needs distributed set operations over wide area network, raising unacceptable bandwidth cost. To address this issue, we propose a novel Bloom Filter optimizing theory and design PWEB, an effective and efficient multi-keyword search protocol over DHTs based on the theory. We conduct comprehensive large-scale trace-driven simulations to evaluate PWEB design using National Institute of Standards and Technology (NIST) benchmark TREC WT10G large-scale standard testing collection and the query logs from a major commercial Web search engine. Results show that PWEB significantly outperforms existing techniques. It reduces the network traffic by 74%, as well as reduces the search latency by 41%.
     2. Multi-dimensional distributed hash table technique and global full-text indexing searching and ranking strategies. We extend the traditional DHT structure, and propose a novel multi-dimensional distributed hashing structure to support term-set indexing over distributed environments. A novel index pruning algorithm, TSS, is designed to reduce the index size. We base our design on the observations of the query logs from a major commercial search engines. We show through experiment that the TSS significantly reduces the space complexity of the global index from O(2n) to a formal bound of O(nlogn), reduces the communication cost to 28% of the cost of existing schemes, and achieves performance comparable to traditional popular centralized information retrieval algorithms.
     3. Federated search over semantic overlay. We design and implement SemreX, a P2P based literature sharing and retrieval system. The system work demonstrates the principle of interest locality in P2P systems. Based on the principle, we propose a novel peer similarity measurement model. Peers with high similarity values are linked together to form the semantic overlay. We further investigate how to leverage the small world topology features to improve the search performance of the semantic P2P overlay. Performance evaluation results show that SermeX federated search protocol based on semantic overlay can greatly improve the overall search efficiency by 81.6% compared to the popular Gnutella protocol, greatly improving the scalability of the unstructured P2P retrieval systems.
     4. Difficulty-aware hybrid search protocol in P2P networks. By combining structured DHTs and unstructured search protocols, hybrid P2Ps have the potential to effectively improve the performance of P2P retrieval systems. The key issue of hybrid P2P network is how to estimate the number of peers with matched items. Existing studies assume that such popularity can be obtained by estimating the number of relevant items in the distributed networks; that is to say if there are many matched items, they are also widely distributed across the network. In this case, flooding based search protocol will be more efficient, otherwise DHT-based search protocol will be more preferable. Based on the demonstrated principle of interest locality, we argue that the assumption of existing work is invalid and not practical in real world P2P systems. We further propose QRank, a novel difficulty-aware hybrid search protocol. It intelligently selects search strategies based on the ranked query difficulty estimated with the gathered statistical information of keyword popularity in distributed P2P networks. We collect the trace from popular real world P2P systems, and conduct comprehensive large-scale trace-driven simulations to evaluate the performance of QRank design. Results show that QRank hybrid P2P search protocol significantly improve the search performance for hybrid P2P networks. It improves the recall by 21%, reduces the search latency by 26%, and reduces the network traffic by 40%, compared to existing work.
引文
[1]陈贵海,李振华.对等网络:结构、应用与设计.北京:清华大学出版社. 2007
    [2] Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, et al. Making Gnutella-like P2P Systems Scalable. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Karlsruhe, Germany. 2003. New York, USA: ACM Press. 114-125
    [3] Jian Liang, Rakesh Kumar, Keith Ross. The KaZaA Overlay: A Measurement Study. Computer Networks, 2006, 50 (6): 842-858
    [4] Dongyu Qiu, Rayadurgam Srikant. Modeling and Performance Analysis of BitTorrent-like Peer-to-Peer Networks. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Portland, Oregon, USA. 2004. New York, USA: ACM Press. 367-378
    [5] Chris Anderson. The Long Tail: Why the Future of Business is Selling Less of More. New York: Hyperion. 2006
    [6] Napster, "http://www.napster.com," 2010
    [7]陈浩.对等网络拓扑相关的关键理论与技术研究: [博士学位论文].华中科技大学图书馆. 2005
    [8] The Gnutella Protocol Specification 0.6, "http://rfc-gnutella.sourceforge.net," 2002
    [9] Ion Stoica, Robert Morris, David Karger, et al. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). San Diego, CA, USA. 2001. New York, USA: ACM Press. 149-160
    [10] Sylvia Ratnasamy, Paul Francis, Mark Handley, et al. A Scalable Content-addressable Network. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). San Diego, CA, USA. 2001. New York, USA:ACM Press. 161-172
    [11] Ben Zhao, Ling Huang, Jeremy Stribling, et al. Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications (JSAC), 2004, 2 (1): 41-53
    [12] Antony Rowstron, Peter Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large Scale Peer-to-Peer Systems. In: Proceedings of ACM/IFIP/USENIX International Middleware Conference (Middleware). Heidelberg, Germany. 2001. German: Springer-Verlag GmbH. 329-350
    [13] Krishna Puttaswamy, Alessandra Sala, Christo Wilson, et al. Protecting Anonymity in Dynamic Peer-to-Peer Networks. In: Proceedings of IEEE International Conference on Network Protocols (ICNP). Orlando, Florida, USA. 2008. New York, USA: IEEE Press. 104-113
    [14] Yunhao Liu, Xiaomei Liu, Li Xiao, et al. Location-Aware Topology Matching in P2P Systems. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Hong Kong, China. 2004. New York, USA: IEEE Press. 2220-2230
    [15] Yan Huang, Tom Fu, Dah-Ming Chiu, et al. Challenges, Design and Analysis of a Large-scale P2P-vod System. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Seattle, WA, USA. 2008. New York, USA: ACM Press. 375-388
    [16] Mao Yang, Zheng Zhang, Xiaoming Li, et al. An Empirical Study of Free-Riding Behavior in the Maze P2P File-Sharing System. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). Ithaca, NY, USA. 2005. German: Springer-Verlag GmbH. 182-192
    [17] Yunhao Liu, Li Xiao, Xiaomei Liu, et al. Location Awareness in Unstructured Peer-to-Peer Systems. IEEE Transactions on Parallel and Distributed Systems (TPDS). 2005, 16 (2): 163-174
    [18] Mudhakar Srivatsa, Bugra Gedik, Ling Liu. Large Scaling Unstructured Peer-to-Peer Networks with Heterogeneity-Aware Topology and Routing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2006, 17 (11): 1277-1293
    [19] Edith Cohen, Scott Shenker. Replication Strategies in Unstructured Peer-to-Peer Networks. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Pittsburgh, PA, USA. 2002. New York, USA: ACM Press. 177-190.
    [20] Song Jiang, Lei Guo, Xiaodong Zhang. LightFlood: An Efficient Flooding Scheme for File Search in Unstructured Peer-to-Peer Systems. In: Proceedings of International Conference on Parallel Processing (ICPP). Kaohsiung, Taiwan. 2003. New York, USA: IEEE Press. 627-635
    [21] Boon Thau Loo, Ryan Huebsch, Ion Stoica, et al. The Case for a Hybrid P2P Search Infrastructure. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). La Jolla, CA, USA. 2004. German: Springer-Verlag GmbH. 141-150
    [22] Matei Zaharia, Srinivasan Keshav. Gossip-based Search Selection in Hybrid Peer-to-Peer Networks. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). Santa Barbara, CA, USA. 2006
    [23] Hanhua Chen, Hai Jin, Yunhao Liu, et al. Difficulty-aware Hybrid Search in Peer-to-Peer Networks. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009, 20 (1): 71-82
    [24] Haiying Shen, Cheng-Zhong Xu, Guihai Chen. Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS). Santa Fe, New Mexico, USA. 2004. New York, USA: IEEE Press.1-10
    [25] Dongsheng Li, Xicheng Lu, Jie Wu. FISSIONE: A Scalable Constant Degree and Low Congestion DHT Scheme based on Kautz Graphs. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Miami, FL, USA. 2005. New York, USA: IEEE Press. 1677-1688
    [26] Frank Dabek, Ben Zhao, Peter Druschel, et al. Towards a Common API for Structured Peer-to-Peer Overlays. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). Berkeley, CA, USA. 2003. German: Springer-Verlag GmbH. 33-44
    [27] Boon Thau Loo, Joseph M. Hellerstein, Ryan Huebsch, et al. Enhancing P2P File-Sharing with an Internet-scale Query Processor. In: Proceedings of InternationalConference on Very Large Data Bases (VLDB). Toronto, Canada. 2004. San Fransisco, CA 94104, USA: Morgan Kaufmann. 432-443
    [28] Yatin Chawathe, Sriram Ramabhadran, Sylvia Ratnasamy, et al. A Case Study in Building Layered DHT Applications. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Philadelphia, PA, USA. 2005. New York, USA: ACM Press. 97-108
    [29] Ion Stoica, Robert Morris, David Liben-Nowell, et al. Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications. ACM/IEEE Transactions on Networking (TON), 2003, 11 (1): 17-32
    [30] Greg Plaxton, Rajmohan Rajaram, Andrea Werneck Richa. Accessing Nearby Copies of Replicated Objects in a Distributed Environment. In: Proceedings of ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1997. New York, USA: ACM Press. 314-326
    [31] Search Engine Downtime Report,“http://www.searchenginejournal.com,”2010
    [32]百度声明, "http://tech.sina.com.cn/i/2008-09-13/15472456026.shtml," 2008
    [33] Matthias Bender, Sebastian Michel, Peter Triantafillou, et al. P2P Content Search: Give the Web Back to the People. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). Santa Barbara, CA, USA. 2006
    [34]凌波,陆志国,黄维雄,等. PeerIS:基于Peer-to-Peer的信息检索系统.《软件学报》, 2004, 15 (9): 1375-1384
    [35] Jie Lu. Full-text Federated Search in Peer-to-Peer Networks: [Ph.D. Dissertation]. CMU. 2007
    [36]方启明,杨广文,武永卫,等基于P2P的Web搜索技术.《软件学报》, 2008, 19 (10): 2706-2719
    [37] Chunqiang Tang, Zhichen Xu, Sandhya Dwarkadas. Peer-to-Peer Information Retrieval using Self-organizing Semantic Overlay Networks. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Karlsruhe, Germany. 2003. New York, USA: ACM Press. 175 - 186
    [38] Mei Li, Wang-Chien Lee, Anand Sivasubramaniam, et al. SSW: A Small-world-based Overlay for Peer-to-Peer Search. IEEE Transactions on Parallel Distributed Systems, 2008, 19 (6): 735-749
    [39] Chunqiang Tang, Sandhya Dwarkadas. Hybrid Global-local Indexing for Efficient Peer-to-Peer Information Retrieval. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI). San Francisco, California, USA. 2004. Berkeley, CA, USA: USENIX. 211-224
    [40] Patrick Reynolds, Amin Vahdat. Efficient Peer-to-Peer Keyword Searching. In: Proceedings of ACM/IFIP/USENIX International Middleware Conference (Middleware). Rio de Janeiro, Brazil. 2003. German: Springer-Verlag GmbH. 21-40
    [41] Hanhua Chen, Hai Jin, Jiliang Wang, et al. Efficient Multi-Keyword Search over P2P Web. In: Proceedings of International World Wide Web Conference (WWW). Beijing, China. 2008. New York, USA: ACM Press. 989-998
    [42] Jie Lu, Jamie Callan. User Modeling for Full-text Federated Search in Peer-to-Peer Networks. In: Proceedings of ACM Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR). Seattle, Washington, USA. 2006. New York, USA: ACM Press. 332-339
    [43] Francisco Matias Cuenca-Acuna, Christopher Peery, Richard Martin, et al. PlanetP: using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: Proceedings of IEEE International Symposium on High Performance Distributed Computing (HPDC). Seattle, WA, USA. 2003. New York, USA: IEEE Press. 236-246
    [44] Jiangong Zhang, Xiaohui Long, Torsten Suel. Performance of Compressed Inverted List Caching in Search Engines. In: Proceedings of International World Wide Web Conference (WWW). Beijing, China. 2008. New York, USA: ACM Press. 387-396
    [45] Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, et al. Fast Hash Table Lookup using Extended Bloom Filter: an Aid to Network Processing. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Philadelphia, PA, USA. 2005. New York, USA: ACM Press. 181-192
    [46] Andrei Broder, Michael Mitzenmacher. Network Applications of Bloom Filters: ASurvey. Internet Mathematics, 2005, 1 (4): 484-509
    [47] Burton Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communication of the ACM, 1971, 13 (7): 422-426
    [48] Yifeng Zhu, Hong Jiang. False Rate Analysis of Bloom Filter Replicas in Distributed Systems. In: Proceedings of International Conference on Parallel Processing (ICPP). Columbus, Ohio, USA. 2006. New York, USA: IEEE Press. 255-262
    [49] Jinyang Li, Boon Thau Loo, Joseph M. Hellerstein, et al. On the Feasibility of Peer-to-Peer Web Indexing and Search. In: Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS). Berkeley, CA, USA. 2003. German: Springer-Verlag GmbH. 207-215
    [50] Gerard Salton, Chris Buckley. Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 1988, 24: 513-523
    [51] Stephen Robertson. Understanding Inverse Document Frequency: on Theoretical Arguments for IDF. Journal of Documentation, 2004, 60 (5): 503-520
    [52] Omprakash Gnawali. A Keyword-Set Search System for Peer-to-Peer Networks: [Master's Thesis]. Massachusetts Institute of Technology. 2002
    [53] Matthias Bender, Sebastian Michel, Peter Triantafillou, et al. Improving Collection Selection with Overlap Awareness in P2P Search Engines. In: Proceedings of ACM Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR). Salvador, Brazil. 2005. New York, USA: ACM Press. 67-74
    [54] Jie Lu, Jamie Callan. Content-Based Retrieval in Hybrid Peer-to-Peer Networks. In: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM). New Orleans, Louisiana, USA. 2003. New York, USA: ACM Press. 199-206
    [55] Beverly Yang, Hector Garcia-Molina. Designing a Super-Peer Network. In: Proceedings of International Conference on Data Engineering (ICDE). Bangalore, India. 2003. New York, USA: IEEE Press. 49-60
    [56] Heng Tao Shen, Yan Feng Shu, Bei Yu. Efficient Semantic-based Content Search in P2P Network. IEEE Transaction on Knowledge and Data Engineering (TKDE), 2004, 16 (7): 813-826
    [57] Sunny Wong, Wojciech Ziarko, Vijay Raghavan, et al. On Modeling of Information Retrieval Concepts in Vector Spaces. ACM Transactions on Database Systems (TODS), 1987, 12 (2): 299-321
    [58] Christos Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, et al. Latent Semantic Indexing: A Probabilistic Analysis. In: Proceedings of PODS. Seattle, Washington. 1998. New York, USA: ACM Press. 159-168
    [59] Roger Weber, Hans-J?rg Schek, Stephen Blott. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proceedings of International Conference on Very Large Data Bases (VLDB). New York City, New York, USA. 1998. San Fransisco, CA 94104, USA: Morgan Kaufmann. 194-205
    [60] Thomas Cover, Peter Hart. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 1967, 13 (1): 21-27
    [61] Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang. Efficient Content Location using Interest-based Locality in Peer-to-Peer Systems. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). San Franciso, CA, USA. 2003. New York, USA: IEEE Press. 2166-2176
    [62] Wolfgang Nejdl, Boris Wolf, Changtao Qu, et al. Edutella: A Peer-to-Peer Networking Infrastructure based on RDF. In: Proceedings of International World Wide Web Conference (WWW). Honolulu, Hawaii, USA. 2002. New York, USA: ACM Press. 604-615
    [63] Peter Haase, Jeen Broekstra, Marc Ehrig, et al. Bibster - A Semantics-Based Bibliographic Peer-to-Peer System. In: Proceedings of International Semantic Web Conference (ISWC). Hiroshima, Japan. 2004. German: Springer-Verlag GmbH. 122-136
    [64] Mei Li, Wang-Chien Lee, Anand Sivasubramaniam. Semantic Small World: An Overlay Network for Peer-to-Peer Search. In: Proceedings of IEEE International Conference on Network Protocols (ICNP). Berlin, Germany. 2004. New York, USA: IEEE Press. 228-238
    [65] Adriana Iamnitchi, Ian Foster. Interest-aware Information Dissemination in Small-world Communities. In: Proceedings of IEEE International Symposium onHigh Performance Distributed Computing (HPDC). Durham NC USA. 2005. New York, USA: IEEE Press. 167-175
    [66] Resource Description Framework (RDF) Model and Syntax Specification, "http://www.w3.org/TR/REC-rdf-syntax/," 2010
    [67] Matei Zaharia, Srinivasan Keshav. Gossip-based Search Selection in Hybrid Peer-to-Peer Networks. Concurrency and Computation: Practice and Experience, 2008, 20 (2): 139-153
    [68] Hosagrahar Visvesvaraya Jagadish, Beng Chin Ooi, Quang Hieu Vu. BATON: A Balanced Tree Structure for Peer-to-Peer Networks. In: Proceedings of International Conference on Very Large Data Bases (VLDB). Trondheim, Norway. 2005. New York, USA: ACM Press. 661-672
    [69] Li Xiong, Ling Liu. PeerTrust: Supporting Reputation-Based Trust for Peer-to-Peer Electronic Communities. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2004, 16 (7): 843-857
    [70] Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, et al. PeerDB: A P2P-based System for Distributed Data Sharing. In: Proceedings of International Conference on Data Engineering (ICDE). Bangalore, India. 2003. New York, USA: IEEE Press. 633-644
    [71]綦宏伟,代亚非,李晓明.针对访问成功率的P2P动态网络对象定位模型.《软件学报》, 2005, 16 (5): 894-902
    [72] Xinyan Zhang, Jiangchuan Liu, Bo Li, et al. CoolStreaming/DONet: A Data-driven Overlay Network for Peer-to-Peer Live Media Streaming. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Miami, FL, USA. 2005. New York, USA: IEEE Press. 2102-2111
    [73] Li Fan, Pei Cao, Jussara Almeida, et al. Summary Cache: A Scalable Wide-area Web Cache Sharing Protocol. IEEE/ACM Transactions on Networking (TON), 2000, 8 (3): 282-293
    [74] David Hawking. Overview of the TREC-9 Web Track. In: Proceedings of Text REtrieval Conference (TREC). Gaithersburg, Maryland. 2000. National Institute of Standards and Technology (NIST)
    [75] Jirong Wen, Jianyun Nie, Hongjiang Zhang. Query Clustering using User Logs.ACM Transactions on Information Systems, 2002, 20 (1): 59-81
    [76] Yingwu Zhu, Yiming Hu. Efficient, Proximity-aware Load Balancing for DHT Based P2P Systems. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2005, 16 (4): 349-361
    [77] Haiying Shen, Cheng-Zhong Xu. Locality-Aware and Churn-Resilient Load Balancing Algorithms in Structured P2P Networks. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2007, 18 (6): 849-862
    [78] Brighten Godfrey, Ion Stoica. Heterogeneity and Load Balance in Distributed Hash Tables. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Miami, FL, USA. 2005. New York, USA: IEEE Press. 596-606
    [79] Michael Mitzenmacher. Compressed Bloom Filters. IEEE/ACM Transactions on Networking (TON), 2002, 10 (5): 604-612
    [80] Jiangong Zhang, Torsten Suel. Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P). Konstanz, Germany. 2005. New York, USA: IEEE Press. 225-233
    [81] Suman Nath, Phillip Gibbons, Srinivasan Seshan, et al. Synopsis Diffusion for Robust Aggregation in Sensor Networks. In: Proceedings of International Conference on Embedded Networked Sensor Systems (SenSys). Battimore, MD, USA. 2004. New York, USA: ACM Press. 250-262
    [82] David Kempe, Alin Dobra, Johannes Gehrke. Gossip-based Computation of Aggregate Information. In: Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS). Cambridge, MA, USA. 2003. New York, USA: IEEE Press. 482-491
    [83] Philippe Flajolet, Nigel Martin. Probabilistic Counting Algorithms for Database Applications. Journal of Computer and System Sciences, 1985, 31 (2): 182-209
    [84] Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, et al. Gossip Algorithms: Design, Analysis, and Applications. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Miami, FL, USA. 2005. New York, USA: IEEE Press. 1653-1664
    [85] Gnutella Protocol Version 0.6, "http://rfc-gnutella.sourceforge.net," 2002
    [86] Hao Chen, Hai Jin, Jianhua Sun, et al. Analysis of Large-scale Topological Properties for Peer-to-Peer Networks. In: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid). Chicago, Illinois, USA. 2004. New York, USA: IEEE Press. 27-34
    [87] Limewire, "http://www.limewire.com," 2010
    [88] Matei Ripeanu, Ian Foster, Adriana Iamnitchi. Mapping the Gnutella Network. IEEE Internet Computing, 2002, 6 (1): 50-57
    [89] Duncan Watts, Steven Strogatz. Collective Dynamics of Small-world Networks. Nature, 1998, 393: 440
    [90] Lada Adamic, Rajan Lukose, Amit Puniyani, et al. Search in Power-law Networks. Physical Review E, 2001, 64 (4): 046135
    [91] Ricardo Oliveira, Beichuan Zhang, Lixia Zhang. Observing the Evolution of Internet AS Topology. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Kyoto, Japan. 2007. New York, USA: ACM Press. 313-324
    [92] Tian Bu, Donald Towsley. On Distinguishing between Internet Power Law Topology Generators. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). New York, USA. 2002. New York, USA: IEEE Press
    [93] BRITE, "http://www.cs.bu.edu/brite," 2010
    [94] Shubho Sen, Jia Wang. Analyzing Peer-to-Peer Traffic Across Large Networks. In: Proceedings of ACM SIGCOMM Workshop on Internet Measurement. Marseille, France. 2002. New York, USA: ACM Press. 137-150
    [95] Shubho Sen, Jia Wang. Analyzing Peer-to-Peer Traffic Across Large Networks. IEEE/ACM Transacions on Networking (TON). 2004. 12(2): 219-232
    [96] Cheng Huang, Jin Li, Keith Ross. Can Internet Video-on-Demand be Profitable? In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Kyoto, Japan. 2007. New York, USA: ACM Press. 133-144
    [97] Nen-Fu Huang, Rong-Tai Liu, Chih-Hao Chen, et al. A Fast URL Lookup Enginefor Content-Aware Multi-gigabit Switches. In: Proceedings of International Conference on Advanced Information Networking and Applications (AINA). Taipei, Taiwan. 2005. New York, USA: IEEE Press. 641-646
    [98] Mark Nelson. Data Compression with the Burrows-Wheeler Transform. Dr. Dobb's Journal, 1996
    [99] "http://www.iprospect.com," 2010
    [100] FIPS 180-1. Secure Hash Standard, "http://www.itl.nist.gov/fipspubs/fip180-1.htm," 1995
    [101] David Hawking, Nick Craswell, Peter Bailey, et al. Measuring Search Engine Quality. Information Retrieval, 2001, 4 (1): 33-59
    [102] Wesley Terpstra, Jussi Kangasharju, Christof Leng, et al. Bubblestorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Kyoto, Japan. 2007. New York, USA: ACM Press. 49-60
    [103] The ACM Topic, "http://www.acm.org/class/1998," 2008
    [104] Michael Berry, Zlatko Drmac, Elizabeth R. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 1999, 41 (2): 335-362
    [105] Christopher Burges. A Tutoral on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery. Data Mining and Knowledge Discovery, 1998, 2 (2): 121-167
    [106] Bernhard Boser, Isabelle Guyon, Vladimir Vapnik. A Training Algorithm for Optimal Margin Classifiers. In: Proceedings of Annual ACM Workshop on Computational Learning Theory (COLT). Pittsburgh, PA. 1992. New York, USA: ACM Press. 144-152
    [107] Adriana Iamnitchi, Matei Ripeanu, Ian Foster. Small-world File-sharing Communities. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Hong Kong, China. 2004. New York, USA: IEEE Press. 952-963
    [108] Yijiao Yu, Hai Jin. Building a Semantic P2P Scientific References Sharing System with JXTA. In: Proceedings of Asia-Pacific Web Conference. Harbin,China. 2006.German: Springer-Verlag GmbH. 937-942
    [109] Alexander Budanitsky, Graeme Hirst. Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures. In: Proceedings of Workshop on WordNet and Other Lexical Resources, in the North American Chapter of the Association for Computational Linguistics (NAACL-2000). Pittsburgh, PA. 2001
    [110] Roy Rada, Hafedh Mili, Ellen Bicknell, et al. Development and Application of a Metric on Semantic Nets. IEEE Transactions on System, Man, and Cybernetics (TSMC), 1989, 19 (1): 17~30
    [111] Yuhua Li, David McLean, Zuhair Bandar, et al. An Approach for Measuring Semantic Similarity between Words using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2003, 15 (4): 871~882
    [112] Philip Resnik. Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999, 11: 95~130
    [113] Jay Jiang, David Conrath. Semantic Similarity based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (POCLING X). Taiwan. 1997
    [114] Paul Stelling, Ian Foster, Carl Kesselman, et al. A Fault Detection Service for Wide Area Distributed Computation. Cluster Computing, 1999, 2 (2): 117-128
    [115] Hsinping Wang, Tsungnan Lin. On Efficiency in Searching Networks. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Miami, FL, USA. 2005. New York, USA: IEEE Press. 1490-1501
    [116] Kui-Lam Kwok. A New Method of Weighting Query Terms for ad-hoc Retrieval. In: Proceedings of ACM Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR). Zurich, Switzerland. 1996. New York, USA: ACM Press. 187-195
    [117] Demetrios Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos. Exploiting Locality for Scalable Information Retrieval in Peer-to-Peer Networks. Information Systems, 2005, 30 (4): 277-298
    [118] Vladimir Vapnik. The Nature of Statistical Learning Theory. Second ed: German:Springer-Verlag GmbH. 1999
    [119] Stefan Saroiu, Krishna Gummadi, Steven Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems. In: Proceedings of SPIE Conference on Multimedia Computing and Networking (MMCN). San Jose, California. 2002
    [120] Jacky Chu, Kevin Labonte, Brian Neil Levine. Availability and Locality Measurements of Peer-to-Peer File Systems. In: Proceedings of ITCom. Boston MA, USA. 2002. Bellingham WA: Society of Photo-Optical Instrumentation Engineers (SPIE). 310-321
    [121] Eibe Frank, Mark Hall, Geoffrey Holmes, et al. Weka - A Machine Learning Workbench for Data Mining. The Data Mining and Knowledge Discovery Handbook, German: Springer-Verlag GmbH, 2005. 1305-1314
    [122] David Choffnes, Fabián Bustamante. Taming the Torrent: A Practical Approach to Reducing Cross-isp Traffic in Peer-to-Peer Systems. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Seattle, WA, USA. 2008. New York, USA: ACM Press. 363-374
    [123] Haiyong Xie, Yang Richard Yang, Arvind Krishnamurthy, et al. P4P: Provider Portal for Applications. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Seattle, WA, USA. 2008. New York, USA: ACM Press. 351-362
    [124] Ronaldo Alves Ferreira, Murali Krishna Ramanathan, Asad Awan, et al. Search with Probabilistic Guarantees in Unstructured Peer-to-Peer Networks. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing. Konstanz, Germany. 2005. New York, USA: IEEE Press. 165-172
    [125] Xucheng Luo, Zhiguang Qin, Jinsong Han, et al. DHT-assisted Probabilistic and Exhaustive Search in Unstructured P2P Networks. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS). Miami, Florida, USA. 2008. New York, USA: IEEE Press. 1-9
    [126] Hanhua Chen, Hai Jin, Xucheng Luo, et al. HRS: A Hybrid Replication Strategy for Exhaustive P2P Search. In: Proceedings of IFIP International Conference Networkand Parallel Computing (NPC). Shanghai, China. 2008. German: Springer-Verlag GmbH. 173-184
    [127]罗绪成,秦志光,陈汉华,等. RPXS:一种基于复制的概率穷尽搜索P2P系统.《计算机学报》,已录用, 2010
    [128] Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat. A Scalable, Commodity Data Center Network Architecture. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Seattle, WA, USA. 2008. New York, USA: ACM Press. 63-74
    [129] Chuanxiong Guo, Haitao Wu, Kun Tan, et al. Dcell: A Scalable and Fault-tolerant Network Structure for Data Centers. In: Proceedings of ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (ACM SIGCOMM). Seattle, WA, USA. 2008. New York, USA: ACM Press. 75-86
    [130] Aruna Balasubramanian, Brian Neil Levine, Arun Venkataramani. Enhancing Interactive Web Applications in Hybrid Networks. In: Proceedings of ACM Annual International Conference on Mobile Computing and Networking (ACM MobiCom). San Francisco, California, USA. 2008. 70-80
    [131]欧中洪,宋美娜,战晓苏,等.移动对等网络关键技术研究综述.《软件学报》, 2008, 19 (1): 126-140
    [132] SensorWeb, "http://research.microsoft.com/nec/senseWeb/," 2010
    [133] Yanif Ahmad, Suman Nath. COLR-Tree: Communication-Efficient Spatio-Temporal Indexing for a Sensor Data Web Portal. In: Proceedings of International Conference on Data Engineering (ICDE). Cancun, Mexico. 2008. New York, USA: IEEE Press. 784-793
    [134] Andreas Krause, Eric Horvitz, Aman Kansal, et al. Toward Community Sensing. In: Proceedings of ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). St. Louis, Missouri, USA. 2008. 481-492
    [135] Hanhua Chen, Mo Li, Hai Jin, et al. MDS: Efficient Multi-dimensional Query Processing in Data-Centric WSNs. In: Proceedings of IEEE Real-Time SystemsSymposium (RTSS). Barcelona, Spain. 2008
    [136] Elizabeth Leicht Leicht, Mark Newman. Community Structure in Directed Networks. Physical Review Letter, 2008, 100 (11): 118703
    [137] Aaron Clauset, Mark Newman, Cristopher Moore. Finding Community Structure in Very Large Networks. Physical Review E, 2004, 70 (6): 066111
    [138]徐志伟.为人民计算的三个问题,《中国计算机学会通讯》, 2008, (10): 15-21
    [139] Huicheng Chi, Qian Zhang, Juncheng Jia, et al. Efficient Search and Scheduling in P2P-based Media-on-demand Streaming Service. IEEE Journal on Selected Areas in Communications (JSAC), 2007, 25 (1): 119-130
    [140] Xiaofei Liao, Hai Jin, Yunhao Liu, et al. AnySee: Peer-to-Peer Live Streaming. In: Proceedings of IEEE International Conference on Computer Communications (IEEE INFOCOM). Anchorage, Alaska, USA. 2006. New York, USA: IEEE Press. 1-10