XML内容路由关键技术研究

英文题名：Research on Key Techniques of XML-based Content Routing
作者：王桐
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：XML ; 内容路由 ; 发布/订阅 ; 粒子群优化 ; 森林自动机
英文关键词：XML ; Content Routing ; Publish/subscribe ; Particle Swarm Optimization ; Hedge Automata
学位年度：2006
导师：刘大昕
学科代码：081203
学位授予单位：哈尔滨工程大学
论文提交日期：2006-07-01

摘要

随着信息高速公路的发展，互联网上出现了大量采用事件-驱动模式的应用，如主动服务中的发布订阅系统、基于内容的XML路由、XML文档分发以及新闻传递等。这类应用中，信息以XML流的形式由一系列生产者经过事件代理传递到另一些消费者手中；消费者通过过滤引擎进行订阅。由于仅与XML的内容本身有关，而与信息在何处发布无关，这种路由方式常被称作内容路由。然而，现有的内容路由技术在高效匹配算法、对异构事件处理等方面尚存一些问题。
     扩展标记语言XML作为一种数据表示和交换的标准，具有自描述性、可扩展性、利于异构数据交换等诸多优点。本文以XML为事件模型、XPath作为多用户订阅模型来研究内容路由的若干关键技术。
     本文提出了一种基于hedge文法的HXFA机来处理XML发布流事件，并给出了HXFA机的过滤优化算法及算法正确性分析。最后，将多个HXFA机合并作为系统的过滤引擎。从算法的效率和可扩展性方面进行实验分析，提出的方法优于著名的内容过滤引擎YFilter。
     分析了现有XML相似性模型的优缺点，针对这些模型的不足，扩展了向量空间模型，提出了基于语义和支持度的层次路径模型，并给出其生成算法及复杂度分析。模型首先挖掘文档集中频繁出现的路径，通过文档中的语义信息来合并重复节点、路径，同时对文档特征向量进行维数规约。最后给出基于语义和支持度的距离测度方法。该方法兼顾了XML文档的结构信息和语义信息两个方面的相似性。与树编辑距离模型相比，不但每个文档具有“类原型”描述，而且在时间开销上有较大优势。
     根据H path模型，提出一种基于改进粒子群优化的XML文档聚类方法。首先将文档集映射到粒子群模型问题空间，然后利用粒子群聚类方法进行聚类，最终权衡了时间和准确性两方面因素，进一步提出混合的粒子群聚类方法，增强了聚类收敛程度和准确程度。
     尽管提出的模型在提取时已进行了数据归约，然而对于冗余的、异构的XML文档而言，高维灾难问题仍然存在。针对此问题，提出一种独立分量分析的预分类方法。该方法首先对文档矩阵进行维数归约，随后在独立分量张成的空间中进行聚类分析。采用本方法有两个优点：第一，去除相关冗余，挖掘更具有区分能力的特性并尽量刻画潜在的数据分布，从而增加聚类准确性。第二，通过有效降低向量空间的维数，大大压缩了搜索空间规模，减小开销。
     最后，提出了一个支持异构事件处理的XML发布／订阅系统体系结构。该系统反应了本研究中提出的内容路由技术是如何应用的。
With the development of the Internet techniques, there are lots of Event-driven Applications such as Content-based publish/subscribe system, selective dissemination of information, content-based XML routing and news distribution. In these applications, a stream of XML documents is sent from a set of data producers to a set of data consumers. Consumers subscribe to the data by means of filters, and then receive a copy of all contents that satisfy the filters. This style of routing is called content-based routing, because the contents are routed based on their contents, and not based on any destination address. However, the existing content-based technologies suffer many problems on the efficient filtering method and the support to the heterogeneity events.
     XML has become the de facto standard of data exchange over the Internet, due to that XML is characterized by self-described, scalable and convenient for exchange. In this thesis, supposed that XML as publishing events, whereas XPath as multiuser subscriptions, some key techniques of content routing were focused on.
     In order to deal with the XML publishing events, a novel HXFA method is presented with the optimized rewriting method and then, the theoretical analysis is given. Finally, thousands of HXFAs are combined as a filtering engine. The proposed method shows the satisfactory results compared with the YFilter engine on efficiency and expansibility.
     The advantages and defects of the existing XML similarity models are analysed, based on which this paper extends the Vector Space Model and proposes a novel H_path model using ontology and supports. In addition, the constructing algorithm and complexity analysis are given. The approach at first extracts the frequent sequences from the document collection as the features, and then judges the semantic features between the tags of XML documents using ontology. Furthmore, the model combines repeated node and path through the semantic features. Finally, the distance calculation based on supports is put forward. Compared to the tree edit script model, this model has not only the description of each document but also the priority of the time expense.
     Based on the H_path model, the clustering method using improved PSO is given. Firstly, the document collection is mapped into the problem space of the particle model. Then, the CIP method is applied for clustering. Furthermore, weighing the time and accuracy factors, the mixed clustering method based on PSO is applied into the XML category to improve the clustering constringency and accuracy.
     When extracted from a large scale of heterogeneous documents collection, the H_paths have been dimension-reduced to some extent; however, the high dimensionality curse still exists. Aimed at the problem, a novel preprocessing strategy is proposed. Independent Component Analysis is applied to reduce the dimensionality of document matrix. Then, document vectors are clustered on this reduced Euclidean Space spanned by the independent components. It has two merits: the method can at first delete the correlative redundancy and find the underlying latent variables of XML structures to improve the quality of the clustering, and secondly reduce dimensionality to compress the search space with low cost.
     Finally, the architecture of Publish/subscribe System is presented. We can also find how these proposed key techniques works in this system.

引文

[1] Nguyen B, Abiteboul S, Cobena G, Preda M. Monitoring XML data on the Web. In: Proc, of the ACM/SIGMOD Conf. on Management of Data. 2001: 437-448P.

    [2] Ouksel, A.M., Moro, G. G-Grid: A class of scalable and self-organizing data tructures for multi-dimensional querying and content routing in p2p networks. In: Proceedings of the second Internat. Workshop on Agents and Peer-to-Peer Computing, Melbourne, Australia, July 2003. Springer. Volume 2004/2872: 123-137P

    [3] M.K.Aguilera, R.E.Strom, D.C.Sturman, M.Astley, and T.D.Chandra. Matching events in a content-based subscription system. In Proceedings of the Eighteenth ACM Symposium on Principles of Distributed Computing (PODC99), 1999.53-61P

    [4] Diao, Y., Fisher, P., M.Franklin, R.To. YFilter: Efficient and scalable filtering of XML. In: Proc, of the 18th Int'l Conf. on Data. Engineering. 2002:341-345P
    [5] Chan CY, Felber P, Garofalakis M, Rastogi R.Efficient filtering of XML documents with XPath expressions. VLDB Journal, 2002,11(4): 354-3 79P
    [6] Felber P, Chan CY, Garofalakis M, Rastogi R.Scalable filtering of XML data for Web services. IEEE Internet Computing, 2003, 7(1): 49-57P
    [7] Ashish Kumar Gupta, Dan Suciu, Alon Y. Halevy. The view selection problem for XML content based routing. PODS 2003: 68-77P
    [8] G M uhl.Large-Scale Content-Based Publish/Subscribe Systems. PhD thesis. Darmstadt University of Technology, 2002.

    [9] B.Oki, M.Pfluegl,A.Siegel,D.and Skeen. The Information Bus: An Architecture for Extensible Distributed Systems, ACM SIGOPS Operating systems Review, 1993.27(5): 58-68P

    [10] IBM Corp. Internet Application Development with MQSeries and Java, Vervante Corporate Publishing, 1997
    [11] TIB/Rendezvous White Paper, 2000. http://www.rv.tibco.com/whitepaper.html.
    [12] A.Carzaniga, D.S.Rosenblum, A.L.Wolf. Design and evaluation of a widearea event notification service, ACM Transactions on Computer Systems, 2001.19(3): 332-383P
    [13] Alex C. Snoeren, Kenneth CoNey, and David K. Gifford. Mesh-Based Content Routing using XML. Proc. of the 18th ACM Symposium on Operating Systems Principles, 2001:160-173P
    [14] 岳昆，王晓玲，周傲英．Web服务核心支撑技术：研究综述．软件学报．2004，15(3)：428-442页
    [15] 薛涛，冯博琴．使用Gossip算法实现可靠的基于内容的发布订阅系统．小型微型计算机系统．2006，27(1)：185-189页
    [16] Bharambe, S. Rao, and S. Seshan. Mercury: A scalable publish-subscribe system for Internet games. In Proceedings of the 1st Workshop on Network and System Support for Games. Apr. 2002: 3-9P
    [17] P.T.Eugster, Pascal A.Felber, Rachid Guerraoui, Anne-Marie The many faces of publish/subscribe, ACM Computing Surveys, 2003.35(2): 114-131P
    [18] World Wide Web Consortium. Extensible markup language (XML) 1.0 (second edition), 2000.
    [19] Xiaochun Yang, Chen Li. Secure XML Publishing without Information Leakage in the Presence of Data Inference. VLDB2004: 96-107P
    [20] Byron Choi, Wenfei Fan, Xibei Jia, Arek Kasprzyk.A Uniform System for Publishing and Maintaining XML Data. VLDB2004:1301-1304P
    [21] Daniela F, Donald K. A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3684, INRIA, March 1999.
    [22] Meike K, Holger M.XML and Object-Relational Database SystemsEnhancing Structural Mappings Based on Statistics.Selected Papers From the 3rd International Workshop WebDB 2000 on the World Wide Web and Databases, 2000:151-170P
    [23] Florescu D, Kossmann D. Storing and Querying XML Data Using an RDBMS.IEEE Data Engineering, 1999, 22(3): 27-34P
    [24] Sangeeta Doraiswarny, Mehmet Altinel, Shrinivas et al. Reweaving the Tapestry: Integrating Database and Messaging Systems in the Wake of New Middleware Technologies. Data Management in a Connected World. Springer Volume 2005/3551: 91-110P
    [25] Roy G; Jennifer W. Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. Technical Report, Stanford, 1997
    [26] Chan C Y, Felber P, Garofalakis M, et al. Efficient Filtering of XML Documents with XPath Expressions. Technical Report, Bell Labs, June 2001
    [27] Sara C, Yaron K, Yakov A.et al. EquiX-A Search and Query Language for XML.JASIST, 2002, 53(6): 454-466P
    [28] McHugh J, Widom J, Abiteboul S, et al. Indexing Semistructured Data. Technical Report, Stanford University, 1998
    [29] Hua-Fu Li, Man-Kwan Shan, and Suh-Yin Lee. Online Mining of Frequent Query Trees over XML Data Streams. In Proceedings of the 15th International Conference on World Wide Web. Edinburgh, Scotland, May, 2006. WWW '06. ACM Press, New York, NY, 2006: 959-960P
    [30] Tatsuya Asai, Hiroki Arimura, Kenji AbeOnline. Algorithms for Mining Semi-structured Data Stream.In Proc.IEEE International Conference on Data Mining (ICDM'02), Maebashi, Dec. 2002: 27-34P
    [31] Luigi Palopoli, Giorgio Terracina, Domenico Ursino. A Graph-Based Approach For Extracting Terminological Properties of Elements of XML Documents. ICDE 2001: 330-337P
    [32] Nicolas, Bruno, Luis Gravano, Nick Koudas, Divesh Srivastava. Navigation- vs.Index-Based XML Multi-Query Processing. In Proceedings of International Conference on Data Engineering (ICDE), 2003: 139-150P
    [33] 李智，唐常杰，栾江，汪锐，贾晓斌．基于索引的XML数据流的变化检测．计算机科学2003，30(10)增刊：49-54页
    [34] K.Zhang, D.Shasha.Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM 1989, 18(6): 1245-1262P
    [35] G.Cobena, S.Abiteboul,A.Marian. Detecting Changes in XML Documents. In Proceedings of International Conference on Data Engineering (ICDE), 2002:41-47P
    [36] Shuohao Zhang, Curtis Dyreson, and Richard T. Snodgrass.Schema-Less, Semantics-Based. Change Detection for XML Documents. X. Zhou et al. (Eds.): WISE 2004, springer, LNCS 3306: 279-290P
    [37] Yi Chen, George A. Mihaila, Sriram Padmanabhan, and Rajesh Bordawekar. EXPedite: A System for Encoded XML Processing. In Proceedings of 13rd ACM Conference on Information and Knowledge Management (CIKM), 2004: 108-117P
    [38] Chawathe S, Garcia M H, Hammer J, et al. The TSIMMIS Project: Integration of Heterogenous Information Sources. Proceedings of the 10th Meeting of the Information Processing Society of Japan, 1994: 7-18P
    [39] Jason Hunter. JDOM Makes XML Easy.Sun's 2002 World wide Java Developer Conference. 2002
    [40] 张忠平，王超，朱扬勇．基于约束的XML文档规范化算法．计算机研究与发展．2005，42(5)：755-764页
    [41] Dongwon Lee, Wesley W.Chu. Comparative Analysis of Six SCML-Schema Languages. ACM SIGMOD Record, 2000. 29(3):76-78P
    [42] H.S.Thompson, D.Beech, M.Maloney, N.Mendelsohn. XML-Schema Part Structures, W3C, April 2000. http:/lwww.w3.or}/TR/xmlschema-1
    [43] P.V.Biron, A.Maihotra.XML-Schema Part 2: Datatypes, W3C, April 2000. http://www.w3.org/TR/xmlschema-2
    [44] C.Frankston, H.S.Thompson.XMLData Reduced, Internet Document, July 1998.http://www.ltg.ed.ac.uk/-ht/XMLDataRecuced.htm
    [45] Boris Chidlovskii. Using Regular Tree Automata as XML Schemas. ADL 2000: 89-104P
    [46] Murata, M. Hedge Automata: A Formal Model for XML Schemata. Technical Report, Fuji Xerox Information Systems. 1999
    [47] Bonifati S C.Comparative Analysis of Five XML Query Languages. ACM SIGMOD Record, 2000, 29(1): 68-79P
    [48] Abiteboul S, Quass D, McHugh J, et al. The Lorel Query Language for Semistructured Data.International Journal on Digital Libraries, 1997, 1(1): 68-88P
    [49] Deutsch A, et al. XML-QL: A Query Language for XML.http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/
    [50] Don C, Jonathan R, Daiela F.Quilt. An XML Query Language for Heterogeneous Data Sources. Lecture Notes in Computer Science, Springer-Verlag. 2001: 199-234P
    [51] Don C, James C, Daniela F et al. XQuery 1.0: An XML Query Language.W3C Working Draft, 2001.http://www.w3.org/TR/xquery
    [52] World Wide Web Consortium (W3C). XML path language (XPath) version 1.0, 2000.
    [53] World Wide Web Consortium, XML Path Language (XPath) Version 1.0. 1999. http://www.w3.org/TR/XPath.xml
    [54] 孟小峰，周龙骧，王珊．数据库技术发展趋势．软件学报．2004，15(12)：1822-1836页
    [55] 吕建华，王国仁，于戈．XML数据的路径表达式查询优化技术．软件学报2003，14(9)：1615-1620页
    [56] 罗道锋，蒋瑜，孟小峰．OrientXA：一种有效的XQuery查询代数．软件学报，2004，15(11)：1648-1660页
    [57] Wood P T.Minimising Simple XPath Expressions. Proceedings of the 4th WebDB 2001: 13-18P
    [58] Ramanan P.Efficient Algorithms for Minimizing Tree Pattern Queries. Proceedings of the 21th ACM SIGMOD International Conference on Management of Data, 2002:299-309P
    [59] Mildau G and Suciu D. Containment and Equivalence for a Fragment of XPath. Journal of the ACM, 2004, 51(1): 2-45P
    [60] A.Y.Halevy.Answering queries using views: A survey. The VLDB Journal, 2001, 10(4): 270-294P
    [61] Jansen, M. Matchmaker-a framework to support collaborative java applications. In the Proceedings of Artificial Intelligence in Education, IOS Press, Amsterdam.2003: 535-536P

    [62] Apache XML project. Xerces Java Parser 1.2.3 Release.http:// xml.apache.org/xerces-j/index.html, 1999.

    [63] Eric N.Hanson, Samir Khosla: An Introduction to the TriggerMan Asynchronous Trigger Processor.Rules in Database Systems 1997: 51-66P

    [64] Liu, L., Pu, C, and Tang, W. Continual Queries for Internet Scale Event- Driven Information Delivery.In Special issue on Web Technologies, IEEE Transactions on Knowledge and Data Engineering. 1999.11(4): 610-628P

    [65] Chen, J., DeWitt, D., Tian, F, and Wang, Y. NiagaraCQ: A Scalable Continuous Query System for Internet Databases.In Proceedings ACM SIGMOD International Conference on Management of Data.2000: 379- 390P

    [66] Liu, L., Pu, C, Tang, W, and Han, W. CONQUER: A Continual Query System for Update Monitoring in the WWW.In Special edition on Web Semantics, International Journal of Computer Systems, Science, and Engineering1999:99-112P

    [67] Hari Balakrishnan, Magdalena Balazinska, Donald Carney.et al. Retrospective on Aurora.VLDB J.2004.13 (4): 370-383P

    [68] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003:269-280P

    [69] Joao Pereira, Francoise Fabret, Hans-Arno Jacobsen, et al. WebFilter: A High-throughput XML-based Publish and Subscribe System.VLDB 2001: 723-724P

    [70] Gupta, Halevy, Suciu. View selection for XML stream processing. In WebDB, 2002.

    [71] Yi Chen, George A. Mihaila, Susan B. Davidson, and Sriram Padmanabhan. Efficient Path Query Processing on Encoded XML. In Proceedings of International Workshop on High Performance XML Processing, in conjunction with WWW, 2004.
    [72] Banavar G, Chandra T, Mukherjee B,Nagarajarao J, Strom RE, Sturman DC.An eficient multicast protocol for content-based publish-subscribe systems. In: Proc.of the IEEE Int'l Conf. on Distributed Computing Systems' 99.New York: IEEE. 1999: 262-272P
    [73] Cz Cugola, E. D. Nit-to, and A. Fuggetta, "The JEDI event-based infrastructure and its application to the development of the OPSS WFMS", IEEE Transactions on Software Engineering, 2001. 27(9): 827-850P
    [74] S. Bhola, Topology Changes in a Reliable Publish/Subscribe System. Technical Report RC23354, IBM Research Division, Oct. 2004.
    [75] A. R. LABOVITZ, C., AHUJA, A., ABOSE, A., AND JAHANIAN, F.Routing stability and convergence. In Proc. ACM SIGCOMM. Aug. 2000:115-126P.
    [76] Todd J.Green, Ashish Gupta, Gerome Miklau et al. Processing XML streams with deterministic automata and stream indexes. IEEE Transactions on Database Systems, 2004. 29(4): 752-788P
    [77] Gupta AK, Suciu D.Stream processing of XPath queries with predicates. Proc. of the 2003 ACM SIGMOD Int'l Conf. on Management of Data. ACM, 2003: 419-430P
    [78] Frank Neven. Automata theory for XML researchers. ACM SIGMOD Record, 2002. 31 (3): 39-46P
    [79] 高军，杨冬青，唐世渭，王腾蛟．基于树自动机的XPath在XML数据流上的高效执行．软件学报．2005，16(2)：223-232页
    [80] Yan TW, Garcia-Molina H the SIFT information dissemination system. ACM Trans.on Database Systems, 1999.24 (4): 529-565P
    [81] IBM RedBook. Internet Application Development with MQSeries and Java.February 1997.IBM Corporation, Yorktown Heights.
    [82] Talarian Corporation. Everything you need to know about middleware: Mission critical interprocess communication White paper, Talarian Corporation, Los Altos, CA 1999.
    [83] Segall B,Amold D,Boot J,Henderson M,Phelps T.Content based routing with elvin4.In: Proc.of the Australian UNIX and Open Systems User Group Conference. Canbe-a, Australian, 2000: 25-30P
    [84] P.T.Eugster, Type-based publish/subscribe, PhD thesis. Swiss Federal Institute of Technology, Lausanne, CH, 2001
    [85] Yanlei Diao, Mehmet Altinel, Michael J.Franklin, et al. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans.Database Syst.2003.28 (4): 467-516P
    [86] Hitoshi Ohsaki, Jean-Marc Talbot, Sophie Tison, Yves Roos. Monotone AC-Tree AutomataXPAR 2005: 337-351P
    [87] Hubert C, Max D, Remi G, Florent J, Denis L, Sophie T, Marc T.Tree automata techniques and applications.http://www.grappa.univ- lille3.fr/tata/tata.pdf.

    [88] Byron Choi.What are real DTDs like. In WebDB, 2002:43-48P
    [89] M.Takahashi.Generalization of regular sets and their application to a study of context-free languages. Information and Control, 1975.27(1): 1-36P
    [90] T.Milo,S.Abiteboul,B.Amann,O.Benjelloun, and F.Dang Ngoc.Exchanging Intensional XML Data .In Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, 2003: 289-300P
    [91] Peter T. Wood. Rewriting XQL Queries on XML Repositories. BNCOD 2000: 209-226
    [92] Frank N, Thomas S. XPath containment in the presence of disjunction, DTDs, and variables.Proc.of the ICDE.Heidelberg: Springer-Verlag.2003: 315-329P
    [93] T.S.Chung and H.J.Kim. Extracting Indexing Information from XML DTDs.Information Processing Letters, 2002.81(2): 97-103P

    [94] J.Shanmugasundaram, GHe, K.Tufte, C.Zhang, D.DeWitt and J.Naughton.Relational Databases for Quering XMLDocuments: Limitation and Opportunities. VLDB1999: 302-314P
    [95] P.Buneman, S.Davidson, M.Fernandez, D.Suciu: Adding structure to unstructured data. In Proceedings of ICDT, Springer-Verlag, Vol. 1186. Deplhi, Greece. 1997:336-350P
    [96] 路燕，郝忠孝，张亮．一种DTDs完全一致性判断算法．计算机研究与发展．2005，42(11)：1977-1982页
    [97] K.Kim and S.Sahni, Efficient construction of pipelined multibit-trie routertables. IEEE Transactions on Computers, to appear. 2006
    [98] 张银奎，廖丽．数据挖掘原理．机械工业出版社．2003
    [99] 宋擒豹，沈钧毅．基于关联规则的Web文档聚类算法．软件学报．2002，13(3)：417-423页
    [100] 陈宁，陈安，周龙骧，贾维嘉，罗三定．基于模糊概念图的文档聚类及其在Web中的应用．软件学报．2002，13(8)：1598-1608页
    [101] Graham Cormode, S.Muthukrishnan. The String Edit Distance Matching Problem with Moves. In Proceedings of the 13th Annual Symposium on Discrete Algorithms. 2002: 667-676P
    [102] Florescu D, Kossmann D, Manolescu I. Integrating keyword search into XML query processing. In: Proceedings of the 9th International www Conference, Amsterdam, Netherlands.2000: 119-136P
    [103] Han Jiawei, Pei Jian.FreeSpan: Frequent Pattern-projected Sequential Pattern Mining. Proc 2000 Int Conf Knowledge Discovery and Data Mining. Boston: MAACM Press, 2000:355-359P
    [104] Wang J T_L, Zhang K et al. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man and Cybernetics, 1994, 24(4): 668-678P
    [105] Zhang K, Shasha D. On the editing distance between unordered labeled trees. Information Processing Letters, 1992, 42(3): 133-139P
    [106] Zhang K.A constrained editing distance between unordered labeled trees. Journal of Algorithmica, 1996, 15(3): 205-222P
    [107] Wang J T_L, Shasha D et al.Structural matching and discovery in document databases. Sigmod Record, 1997, 26(2): 560-564P
    [108] Kaizhong Zhang, Dennis Shasha: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems.SIAM J.Comput. 1989.18 (6): 1245-1262P
    [109] Andrew Nierman, H.V.Jagadish.Evaluating Structural Similarity in XML Documents. Proceedings of the Fifth International Workshop on the Web and Databases (WebDB 2002) in conjunction with SIGMOD. 2002: 61-66P
    [110] 王玉，周志华，周傲英．机器学习及其应用．清华大学出版社，北京．2006
    [111] Antoine Doucet. Naive clustering of a large XML document collection. In Proceedings of the 1st INEX, Germany, 2002:23-34P
    [112] Jiefeng Cheng, Ge Yu, and Guoren Wang. PathGuide An Efficient Clustering Based Indexing Method for XML Path Expressions. 2003: 257-267P
    [113] Yang JW, Chen XO. Similarity measures for XML documents based on kernel matrix learning. Journal of Software, 2006, 17(5): 991-1000P
    [114] Ke Wang, Huiqing Liu: Discovering Structural Association of Semistructured Data. IEEE Trans. Knowl. Data Eng, 2000, 12(2): 353-371P
    [115] Barbara Catania, Anna Maddalena: A Clustering Approach for XML Linked Documents. DEXA Workshops. 2002:121-128P
    [116] Wang Lian, David Wai-lok Cheung, Nikos Mamoulis, Siu-Ming Yiu, An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, 2004. 16(1):82-96P
    [117] 张万松，刘大昕，王桐．一种基于聚类的XML文档压缩方法．计算机研究与发展．2005(增刊)
    [118] H.V.Jagadish, Nick Koudas, Divesh Srivastava. On Effective MultiDimensional Indexing for Strings. Proceedings of the ACM SIGMOD international conference on Management of data. 2000.403-414P
    [119] 张忠平．基于约束的XML数据库模式规范化研究．复旦博士论文．2004
    [120] Flesca, S.; Manco, G.; Masciari, E.; Pomieri et.al.Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering archive. 2005.17 (2): 160-175P
    [121] Pei J, Hart J, Pinto H, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, Proc.2001 Int.Conf.on Data Engineering (ICDE'01), Heidelberg, Germany, 2001:215-226P
    [122] GA Miller. WordNet: A Lexical Database for English.Communications of the ACM, 1995.38(11):39-41.http://www.cogsci.princeton.edu/-wn/
    [123] 郑仕辉，周傲英，张龙．XML文档的相似测度和结构索引研究．计算机学报，2003，26(9)：1116-1122页
    [124] 徐月芳．基于遗传模糊c均值聚类算法的图像分割．西北工业大学学报，2002，20(4)：549-553页
    [125] Jong P. Yoon, Vijay Raghavan, Venu Chakilam: BitCube: A ThreeDimensional Bitmap Indexing for XML Documents. SSDBM 2001: 158-167P
    [126] J Kennedy, RC Eberhart, Particle Swarm Optimization, In Proc. the IEEE International Joint ConScrence on Neural Networks, 1995.4. (1): 12-34P
    [127] Loan Cristian Trelea. The particle swarm optimization algorithm: convergence analysis and parameter selection. Information Processing Letters, 2003, 85(6): 317-325P
    [128] ShI Y, Eberhart R C.Fuzzy Adaptive particle swarm optimization. In: Proc.of the Congress on Evolutionary Computation, Seoul Korea, 2001. Piscataway, NJ, IEEE 2001:101-106P
    [129] Clerc M. The swarm and the Queen: Towards a deterministic and adaptive particle swarm optimization. In: Proc. of the Congress of Evolutionary Computation, 1999: 1951-1957P
    [130] Angeline P J. Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences. In: Evolutionary programming Ⅶ. 1988: 601-610P
    [131] Lovbjerg M, Rasmussen T K, Krink T. Hybrid particle swarm optimization with breeding and subpopulation. In: Proc.of the third Genetic and Evolutionary computation, 2001:12-17P
    [132] S S Liu, Z J Hou. Weighted gradient direction based chaos optimization algorithm for nonlinear programming problem. In: Proceedings of the 4th World Congress on Intelligent Control and Automation, 2002:1779-1783P
    [133] 王东升，曹磊．混沌、分形及其应用．中国科学技术大学出版社．1995
    [134] DBLP Bibliography. http://www.informatik.uni-trier.de/-ley/db/.
    [135] Y.H.Shi and R.C.Eberhart.Parameter selection in particle swarm optimization. Annual Conference on Evolutionary Programming, San Diego, 1998:591-600P
    [136] Scott, D.W., Thompson, J.R. Probability density estimation in higher dimensions. Proceedings of the Fifteenth Symposium on the Interface, North Holland-Elsevier, Amsterdam, New York, Oxford. 1983: 173-179P
    [137] A/Hyvarinen and E.Oja. A fast fixed-point algorithm for independent component analysis, Neural Computation, 1997. 9(7): 1483-1492P
    [138] 游荣义，陈忠．一种基于ICA的盲信号分离快速算法．电子学报．2004，32(4)：669-672页
    [139] Te-Won Lee, Jean-Francois Cardoso, Erkki Oja, Shun-ichi Amari.Introduction to Special Issue on Independent Components Analysis. Journal of Machine Learning Research 2003(4): 1175-1176P
    [140] Isbell, C.L., Viola, P. Restructuring sparse high dimensional data for effective retrieval.Proc.of the Advances in neural information processing systems 1999: 480-486P
    [141] Y.H.Kim, and B.T.Zhang. Document Indexing Using Independent Topic Extraction, Proceedings of the Third International Conference on Independent Component Analysis and Signal Separation, 2001:557-562P
    [142] C.Jutten and J.H\erault. Independent component analysis versus PCA.In Proc.EUSIPCO, 1988: 643-646P
    [143] Jianghui Liu, Jason TL Wang, Wynne Hsu et al. XML Clustering by Principal Component Analysis. ICTAI 2004: 658-662P
    [144] Mong Li Lee, Liang Huai Yang, Wynne Hsu. XClust: Clustering XML Schemas for Effective Integration. The 11th ACM International Conference on Information and Knowledge Management, McLean, Virginia, 2002: 292-299P
    [145] XMark-An XML Benchmark. http://Project.monetdb.cwi.nl/xml/
    [146] A.Carzaniga, D.S.Rosenblum, A.L.Wolf, Achieving scalability and expressiveness in an Internet-scale event notification service. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, Portland OR, USA, 2000:219-227P
    [147] 汪洋，魏峻，王振宇．可扩展和可配置事件通知服务体系结构．软件学报．2006，17(3)：638-648页
    [148] Son Vuong Juan Li. Efa: an Efficient Content Routing Algorithm in Large Peer-to-Peer Overlay Networks. Peer-to-Peer Computing 2003: 216-217P
    [149] 薛涛，冯博琴．内容发布订阅系统路由算法和自配置策略研究，软件学报．2005，16(2)：251-259页

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700