XML数据频繁模式挖掘技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

XML数据频繁模式挖掘技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Techniques of Mining Frequent XML Patterns
作者：贝毅君
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：XML数据挖掘 ; 频繁模式 ; 标签序列挖掘 ; 查询子树挖掘 ; 变化结构挖掘 ; 聚类 ; 查询缓存
英文关键词：XML ; data mining ; frequent patterns ; tag sequence mining ; query patterns ; changing structure ; clustering ; query caching
学位年度：2008
导师：董金祥 ; 陈刚
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2008-04-01

摘要

XML具有简单性、结构化、可扩展性、互操作性、开放性、通用性、灵活性等特点,因而在数据交换、数据集成、数据发布、数据存储、数据管理、知识管理、信息检索等诸多领域获得广泛应用。XML数据的快速发展迫切需要与之相适应的、能有效处理XML数据的数据挖掘技术。然而传统数据挖掘技术主要处理关系数据库或数据仓库中的结构化数据,无法解决具有复杂层次结构的XML数据挖掘问题。
     目前关于XML数据频繁模式挖掘技术的研究尚处于起步阶段,虽然研究人员已经提出了一些面向XML数据的频繁模式挖掘算法,但由于XML数据具有结构变化、不规则、没有完全固定模式等特点,且XML数据中可挖掘结构繁多,因而至今缺乏一个统一的、抽象的模型去描述XML数据的频繁模式挖掘过程。本文在研究XML数据结构模型特征、表示方法的基础上,提出了一个统一、抽象的XML频繁模式挖掘框架系统。并在此框架基础上分别就XML数据的频繁标签序列挖掘技术、频繁查询子树离线挖掘技术、频繁查询子树在线挖掘技术、文档历史版本变化结构挖掘技术进行了讨论和研究:
     面向XML文档聚类的频繁XML标签序列挖掘技术研究
     采用分而治之的思想,提出了基于概念格的XML频繁标签序列挖掘算法。算法将XML标签数据按照共同前缀序列分成不相交等价类:通过在每个等价类中分别实施挖掘过程获取频繁标签序列。在标签序列挖掘技术基础上,研究了基于频繁标签序列的XML文档聚类技术。该技术采用频繁标签序列表示文档特征,通过考虑标签序列包含关系,并引入标签路径长度、标签路径在XML文档中连续状况等特性,提高XML文档相似性估量准确度及聚类质量。
     面向XML查询缓存的频繁XML查询子树离线挖掘技术研究
     分析XML查询结构的特点,提出了基于全局树视图的、自底向上的频繁查询子树挖掘算法BUXMiner和最大频繁查询子树挖掘算法BUMXMiner。通过构建全局树视图,使得候选子树的频度计算可直接从全局树视图中获得,而不再依赖于扫描XML文档数据集。借鉴频繁查询子树挖掘算法,给出了基于频繁查询子树的XML查询框架系统。XML查询系统中,为处理相似但不相同的XML查询树,介绍了四种XML查询树关系并给出了相似查询重写过程。大量实验表明BUXMiner算法在性能上优于原有查询子树挖掘算法;相比于传统缓存技术LRU、MRU,应用频繁查询的缓存技术可获得更好的XML查询效率。
     (?)基于滑动窗口的频繁XML查询子树在线挖掘技术研究
     通过引入滑动窗口模型,提出了面向XML查询数据流的频繁查询子树在线挖掘算法。算法以全局Trie树为数据缓存结构管理和维护缓存池的数据流,采用自下而上、基于前缀等价类的遍历方式快速产生所有带根查询子树和标准查询子树。实验结果表明该算法具有挖掘速度快、内存消耗稳定等特点,可以有效、平稳地处理XML查询数据流。
     (?)基于双位图B-DOM结构的XML文档历史版本变化结构挖掘技术研究
     在研究XML动态数据挖掘问题及XML版本变化结构挖掘框架的基础上,提出了动态变化结构挖掘算法DXSM,用于有效提取频繁变化结构及基于此的频繁插入变化结构和频繁删除变化结构。通过构建存储、管理数据动态变化信息的双位图结构B-DOM有效地解决了各种变化结构提取问题。实验结果表明该变化结构挖掘算法能快速、有效地提取XML版本序列中的变化结构信息。
Due to its simplicity, scalability, interoperability, openness, and flexibility, XML data are widely used in the areas such as data exchange, data integration, data distribution, data storing, data management, knowledge management, and information retrieval, etc. Discovering patterns in XML data has recently become an interesting topic. However, the traditional database mining approaches mainly handle the structured data and may not adequately address the problem of mining semi-structured patterns.
     Recently, the research community has seen growing interests in extracting useful information or patterns from XML database. Researchers have proposed a few algorithms to mine various frequent patterns from XML data. However, due to the irregular, complex nature of XML data, there still remain many kinds of frequent patterns to be mined from the XML data. Moreoever, there is still no uniform and abstract model to describe the frequent patterns mining process of XML data.
     In this thesis, we first study the model and representation of XML data, and then propose a uniform and abstract framework of mining frequent patterns from various XML data. Based on this framework, we present our study on the following mining problems: (1) mining frequent XML tag sequences, (2) the offline and online mining approaches of frequent XML query patterns, and (3) mining frequently changing structures from historical XML versions:
     Research on mining frequent XML tag sequences for XML document clustering
     Using the idea of "divide and conquer", we propose a frequent XML tag sequence miner based on the lattice theory. We partition the database into equivalence classes based on common prefix sequences and mine frequent patterns within each equivalence class. An XML clustering algorithm is proposed using the mined tag sequence. When clustering, we not only take the containment between the documents and the sequences into consideration, but also handle the sequence characteristics, such as the length, the frequency, and the continuity of the sequence in the original documents. In this way, we improve both the evaluation results of the similarity between XML documents and the clustering quality.
     Research on mining frequent XML query patterns for XML caching
     We analyze the structure characteristics of XML queries, and propose a tree mining algorithm named BUXMiner for finding XML query patterns. BUXMiner employs an efficient bottom-up approach, which enumerates all candidate trees over a compact global tree guide and computes the support of candidate tress on the mentioned tree guide. In addition, we also propose a mining approach called BUMXMiner, based on BUXMiner, to discover the maximal frequent XML query patterns. We apply our XML query pattern mining algorithm to a caching prototype system to evaluate the query performance improvement. We evaluate the performance of our proposed approaches and show that our algorithm outperforms previous ones in terms of efficiency. Furthermore, we illustrate that the caching scheme utilizing the proposed frequent query pattern mining results is more efficient compared to traditional caching policies such as LRU and MRU.
     Research on online mining of frequent XML query patterns in streams
     We propose an online algorithm to mine frequent XML query patterns from XML query stream. In this problem, we employ the sliding window model to control the memory which stores the sample XML queries. To better maintain the query stream in the pool we introduce a data structure named global trie. And to generate sub-queries, we enumerate the query from bottom to top based on the concept of prefix equivalence class. Experiments show that our approach can deal with XML query stream effectively and stably.
     Research on frequently changing structure mining from XML historical versions using two-bitmap structure B-DOM
     We first study the dynamic XML structure mining problem and describe the mining framework of XML changing structures, as well as the measured criteria of frequently changing patterns. We then introduce the concepts of frequent insertion structures and frequently deletion structures. Based on these concepts, we present the mining approach of frequently changing structure from different versions of XML documents. We propose a two-bitmap structure called B-DOM as a maintainer to store and manage the changing information of the XML data, and extract frequently changing structure from the B-DOM. Experimental results show that our algorithm is both efficient and effective.

引文

http://www.cs.wisc.edu/niagara/Introduction.html
    2 http://magna.cs.ucla.edu/stream-mill/
    3 http://www-sop.inria.fr/axis/
    4 http://www.liacs.nl/～kosters/mista/
    5 http://www.ntu.edu.sg/home/assourav/
    6 http://www.cais.ntu.edu.sg/content/research/current_projects.jsp
    7 http://dbgroup.cs.tsinghua.edu.cn/dmg.html
    8 http://www.informatik.uni-trier.de/～ley/db/
    9 http://www.alphaworks.ibm.com
    10 http://www.sigmod.org/record/xml/
    11 http://www.sigmod.org/record/xml/
    12 http://monetdb.cwi.nl/xml/
    13 http://www.informatik.uni-trier.de/～ley/db/
    14 http://monetdb.cwi.nl/xml/
    16 http://www.sigmod.org/record/xml/
    17 http://www.informatik.uni-trier.de/～ley/db/
    [1]Han J W,Kamberm.数据挖掘概念与技术.2001:北京:机械工业出版社.
    [2]Extensible Markup Language(XML).Available from:http://www.w3.org/XML/.
    [3]XML Database Products.Available from:http://www.rpbourret.com/xml/XMLDatabaseProds.htm.
    [4]Wang L,Cheung D W,Mamoulis N.An efficient and scalable algorithm for clustering XML documents by structure.IEEE Trans.Knowl.Data Eng,2004.16(1):p.82-96.
    [5]Zaki M J.Efficiently Mining Frequent Embedded Unordered Trees.Fundam.Inform,2005.66(1-2):p.33-52.
    [6]Leonardi Erwin,Bhowmick Sourav S.XANDY:A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases.Data and Knowledge Engineering Journal(DKE),2006.59(2):p.476-502.
    [7]Zhao Qiankun,Chen Ling,Bhowmick Sourav S.,et al.XML structural delta mining:Issues and challenges.Data Knowl.Eng,2006.59(3):p.627-651.
    [8]Zaki Mohammed Javeed,Aggarwal Charu C.XRules:An effective algorithm for structural classification of XML data.Machine Learning,2006.62(1-2)):p.137-170.
    [9]Papakonstantinou Y.,Garcia-Molina H.,Widom J.Object exchange across heterogeneous information sources,in Proceedings of the Eleventh International Conference on Data Engineering.1995:Taipei,Taiwan.p.251-260.
    [10]Level 1 Document Object Model Specification.Available from:http://www.w3.org/TR/WD-DOM/.
    [11]Leung Ho-pong,Chung Korris Fu-Lai,Chan Stephen Chi-fai,et al.XML Document Clustering Using Common XPath,in WIRI 2005.p.91-96.
    [12]Manku G S,Motwani R.Approximate Frequency Counts over Data Streams,in Proceedings of the 28th VLDB.2002.p.346-357.
    [13]Davey B A,Priestley H A.Introduction to Lattices and Order.1990:Cambridge University Press.
    [14]Agrawal Rakesh,Imielinski Tomasz,Swami Arun N.Mining Association Rules between Sets of ltems in Large Databases,in SIGMOD Conference.1993.p.207-216.
    [15]Mannila H.,Toivonen H.,Verkamo A.Efficient algorithm for discovering association rules,in AAAI Workshop on Knowledge Discovery in Databases.1994.p.181-192.
    [16]Lin J.L.,Dunham M.H.Mining association rules:Anti-skew algorithms,in Proceedings of the International Conference on Data Engingeering.1998:Orlando,Florida.
    [17]Savasere A.,Omiecinski E.,Navathe S.An efficient algorithm for mining association rules in large databases,in Proceedings of the 21st International Conference on Very large Database.1995.
    [18]Park J.S.,Chen M.S.,Yu P.S.Efficient parallel data mining of association rules,in 4th International Conference on Information and Knowledge Management.1995:Baltimore,Maryland.
    [19]Park J.S.,Chen M.S.,Yu P.S.An effective hash-based algorithm for mining association rules,in Proceedings of ACM SIGMOD International Conference on Management of Data.1995:San Jose,CA.p.175-186.
    [20]Han J.W.,Pei J.,Yin Y.Mining frequent patterns without candidate generation,in Proceedings of the ACM SIGMOD Int.Conf.Management of Data.2000:Dalas,TX.
    [21]Li Xiaolei,Han Jiawei.Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data,in Proceedings of VLDB.2007.p.447-458.
    [22]Kamber Micheline,Han Jiawei,Chiang Jenny.Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes,in KDD.1997.p.207-210.
    [23]Srikant R.,Agrawa R.Mining generalized association rules,in Proceedings of the 21st International Conference on Very Large Database.1995.p.407-419.
    [24]Brin S.,Motwani R.,Ullman J.D.,et al.Dynamic Itemset counting and implication rules for market basket data,in ACM SIGMOD International Conference On the Management of Data.1997.
    [25]Yang Liang Huai,Lee Mong-Li,Hsu Wynne.Efficient Mining of XML Query Patterns for Caching,in Proceedings of VLDB.2003.p.69-80.
    [26]Lee Jung-Won,Lee Kiho,Kim Won.Preparations for Semantics-Based XML Mining,in ICDM.2001:345-352.
    [27]Costa Gianni,Manco Giuseppe,Ortale Riccardo,et al.Clustering of XML Documents by Structure based on Tree Matching and Merging,in SEBD 2004.p.314-325.
    [28]Zhou B.Y.,Hui S.C.,Fong A.C.M.CS-mine:An Efficient WAP-tree Mining for Web Access Patterns,in Proceedings of the 6th Asia Pacific Web Conference(APWeb'04).2004,Springer:Hangzhou,China.p.523-532.
    [29]Lu Y.,Ezeife C.I.Position Coded Pre-order Linked WAP-Tree for Web Log Sequential Pattern Mining,in Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD).2003:Seoul,Korea.p.337-349.
    [30]Zaki M J.Efficiently Mining Frequent Trees in a Forest,in Proceedings of the 8th ACM SIGKDD International Conference Knowledge Discovery and Data Mining.2002.p.71-80.
    [31]Chi Yun,Yang Yirong,Muntz Richard R.Canonical forms for labelled trees and their applications in frequent subtree mining.Knowl.Inf.Syst,2005.8(2):p.203-234.
    [32]Chi Yun,Muntz Richard R.,Nijssen Siegfried,et al.Frequent Subtree Mining-An Overview.Fundam.Inform.,2005.66(1-2):p.161-198.
    [33]Asai T.,Abe K.,Kawasoe S.,et al.Efficient Substructure Discovery from Large Semi-structured Data,in Proceedings of the 2nd SIAM Int'l Conference on Data Mining.2002.
    [34]Chen Ling,Bhowmick Sourav S.,Chia Liang-Tien.Mining Association Rules from Structural Deltas of Historical XML Documents,in PAKDD.2004.p.452-457.
    [35]Li Hua-Fu,Shan Man-Kwan,Lee Suh-Yin.Online mining of frequent query trees over XML data streams,in WWW.2006.p.959-960.
    [36]Asai Tatsuya,Arimura Hiroki,Abe Kenji,et al.Online Algorithms for Mining Semi-structured Data Stream,in ICDM.2002.p.27-34.
    [37]Agrawal R.,Srikant R.Fast Algorithms for Mining Association,in Proceedings of the 20th International Conference on Very Large Data Bases(VLDB).1994:Santiago,Chile.p.487-499.
    [38]Pei J.,Han J.,Mortazavi-asl B.,et al.Mining Access Patterns Efficiently from Web Logs,in Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD).2000:Kyoto,Japan.p.396-407.
    [39]Zaki Mohammed Javeed.SPADE:An Efficient Algorithm for Mining Frequent Sequences.Machine Learning 2001.42(1/2):p.31-60.
    [40]Srikant R.,Agrawal R.Mining Sequential Patterns:Generalizations and Performance Improvements,in Proceedings of the 5th International Conference on Extending Database Technology(EDBT).1996:Avignon,France.p.3-17.
    [41]Agrawal R.,Srikant R.Mining Sequential Patterns,in Proceedings of the 11th International Conference on Data Engineering.1995:Taipei,Taiwan.p.3-14.
    [42]Maged E.,Elke A.R.,Carolina R.FS-Miner:An Efficient and Incremental System to Mine Contiguous Frequent Sequences.,in Computer Science Technical Report Series.2003,Worcester Polytechnic Institute.
    [43]Leung Ho-pong,Chung Korris Fu-Lai,Chan Stephen Chi-fai.On the use of hierarchical information in sequential mining-based XML document similarity computation.Knowl.Inf.Syst,2005.7(4):p.476-498.
    [44]Leung Ho-pong,Chung Korris Fu-Lai,Chan Stephen Chi-fai.A New Sequential Mining Approach to XML Document Similarity Computation,in PAKDD.2003.
    [45]Kaufman L.,Rousseeuw P.J.Finding Groups in Data:An Introduction to Cluster.1990,Wiley,New York.:Analysis.
    [46]Garboni Calin,Masseglia Florent,Trousse Brigitte.Sequential Pattern Mining for Structure-Based XML Document Classification,in INEX.2005.p.458-468.
    [47]Garey M.R.,Johnson D.S.Computers and Intractability:A Guide to the Theory of NP-Completeness.1979,New York:W.H.Freeman And Company.
    [48]Shamir R.,Tsur D.Faster Subtree Isomorphism.J.Algorithms,1999.33:p.267-280.
    [49]Cole R.,Hariharan R.,Indyk P.Tree Pattern Matching and Subset Matching in Deterministic o(nlog3n)-Time,in Proceedingsw of the 10th Symp.Discrete Algorithms.1999.
    [50]Mannila P.Kilpelainen and H.Ordered and Unordered Tree Inclusion.SIAM J. Computing,1995.24(2):p.340-356.
    [51]Zaki Mohammed Javeed.Effciently Mining Frequent Trees in a Forest:Algorithms and Applications.IEEE Trans.Knowl.Data Eng,2005.17(8):p.1021-1035.
    [52]朱永泰,王晨,洪铭胜,汪卫,施伯乐.ESPM—频繁子树挖掘算法.计算机研究与发展,2004.41(10):p.1720-1727.
    [53]Asai T.,Arimura H.,Uno T.,et al.Discovering Frequent Substructures in Large Unordered Trees,in Proceedings of the 6th Int'l Conf on Discovery Science.2003.
    [54]Chi Y.,Yang Y.,Muntz R.R.HybridTreeMiner:An Effcient Algorihtm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms,in Proceedings of the 16th International Conference on Scientific and Statistical Database Management.2004.p.11-20.
    [55]Wang Chen,Hong Mingsheng,Pei Jian,et al.Effcient Pattern-Growth Methods for Frequent Tree Pattern Mining,in Proceedings of the PAKDD 2004.p.441-451.
    [56]Wang K.,Liu H.Discovering Typical Structures of Documents:A Road Map Approach,in Proceedings of ACM SIGIR Conf.Information Retrieval.1998.
    [57]Termier A.,Rousset M.C.,Sebag M.TreeFinder:a First Step towards XML Data Mining,in Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM' 02).2002.
    [58]Yang L.H.,Lee M.L.,Hsu W.,et al.Mining Frequent Query Patterns from XML Queries,in Proceedings of the 8th DASFAA.2003.p.355-362.
    [59]Bei Y.J.,Chen G.,Dong J.X.BUXMiner:An Efficient Bottom-Up Approach to Mining XML Query Patterns,in Proceedings of APWEB/WAIM.2007.p.709-720.
    [60]Li Guoliang,Feng Jianhua,Wang Jianyong,et al.Incremental Mining of Frequent Query Patterns from XML Queries for Caching,in Proceedings of the 2006 IEEE Int.Conf.on Data Mining.2006:Hong Kong,China.
    [61]Wan J.W.W.,Dobbie G.Extracting association rules from XML documents using XQuery,in Proceedings of DASFAA.2004.p.110-112.
    [62]Ling Feng,Tharam S.D.Mining Interesting XML-Enabled Association Rules with Templates,in Proceedings of the KDID.2004.p.66-88.
    [63]Braga D.,Campi A.,Ceri S.,et al.Discovering Interesting Information in XML Data with Association Rules,in Proceedings of the SAC.2003,ACM Press.p.450-454.
    [64]Zaki Mohammed Javeed,Aggarwal Charu C.XRules:an effective structural classifier for XML data,in KDD.2003.p.316-325.
    [65]Chen L.,Bhowmick S.S.,Chia L.T.Mining Positive and Negative Association Rules from XML Query Patterns for Caching,in Proceedings of the 10th DASFAA.2005.p.736-747.
    [66]Hristidis V.,Petropoulos M.Semantic caching of xml database,in Proceedings of the 5th WebDB,2002.
    [67]Chen L.,Rundensteiner E.A.,Wang S..Xcache-a semantic caching system for xml queries,in Proceedings of the ACM SIGMOD,2002.
    [68]Hidber Christian.Online Association Rule Mining,in Proceedings of the SIGMOD Conference.1999.p.145-156\.
    [69]Hsieh Mark Cheng-Enn,Wu Yi-Hung,Chen Arbee L.P.Discovering Frequent Tree Patterns over Data Streams,in Proceedings of the SDM.2006.
    [70]Yang Liang Huai,Lee Mong-Li,Hsu Wynne.Finding hot query patterns over an XQuery stream.VLDB J,2004.13(4):p.318-332.
    [71]Yang Liang Huai,Lee Mong Li,Hsu Wynne.Approximate Counting of Frequent Query Patterns over XQuery Stream,in Proceedings of the DASFAA 2004,LNCS 2973.p.75-87.
    [72]Zhao Qiankun,Bhowmick Sourav S.,Mohania Mukesh K.,et al.Discovering frequently changing structures from historical structural deltas of unordered XML,in Proceedings of the CIKM.2004.p.188-197.
    [73]Zhao Qiankun,Bhowmick Sourav S.FASST Mining:Discovering Frequently Changing Semantic Structure from Versions of Unordered XML Documents,in Proceedings of the DASFAA.2005,LNCS 3453.p.724-735.
    [74]Rusu Laura Irina,Rahayu Wenny,Taniar David.Extracting Variable Knowledge from Multiversioned XML Documents,in Proceedings of the Sixth IEEE International Conference on Data Mining-Workshops.2006.
    [75]Rusu L.I.,Rahayu W.,Taniar D.Maintaining Versions of Dynamic XML Documents,in Proceedings of The 6th International Conference on Web Information Systems Engineering.2005:New York,NY,USA.p.536-543.
    [76]Chen Ling,Bhowmick Sourav S.,Chia Liang-Tien.FRACTURE mining:Mining frequently and concurrently mutating structures from historical XML documents.Data Knowl.Eng,2006.59(2):p.320-347.
    [77]Wang Y.,DeWitt D.J.,Cai J.-Y.X-Diff:An effective change detection algorithm for XML documents,in Proceedings of ICDE International Conference on Data Engineering.2003.
    [78]Curbera,Epstein D.A.Fast difference and update of XML documents.,in Proceedings of XTech.1999.
    [79]Cobena G.,Abiteboul S.,Marian A.Detecting changes in XML documents,in Proceedings of ICDE International Conference on Data Engineering.2002.
    [80]Bei Yijun,Chert Gang,Yu Lihua,et al.XML Query Recommendation Based On Association Rules,in Proceedings of the SNPD.2007.p.303-308.
    [81]Viglas Stratis,Naughton Jeffrey F.Rate-based query optimization for streaming information sources,in Proceedings of the SIGMOD Conference.2002.p.37-48.
    [82]Naughton Jeffrey F.,DeWitt David J.,Maier David,et al.The Niagara Internet Query System.IEEE Data Eng.Bull,27-33.24(2).
    [83]Chen Jianjun,DeWitt David J.,Tian Feng,et al.NiagaraCQ:A Scalable Continuous Query System for Internet Databases,in Proceedings of the SIGMOD Conference 2000.p.379-390.
    [84]Chen Jianjun,DeWitt David J.,Naughton Jeffrey F.Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries,in Proceedings of the ICDE.2002.p.345-356.
    [85]Zhou Xin,Thakkar Hetal,Zaniolo Carlo.Unifying the Processing of XML Streams and Relational Data Streams,in Proceedings of the ICDE.2006.
    [86]Luo Chang,Thakkar Hetal,Wang Haixun,et al.A native extension of SQL for mining data streams,in Proceedings of the SIGMOD Conference.2005.p.873-875.
    [87]Bai Yijian,Wang Fusheng,Liu Peiya,et al.RFID Data Processing with a Data Stream Query Language,in Proceedings of the ICDE.2007.p.1184-1193.
    [88]Bai Yijian,Thakkar Hetal,Wang Haixun,et al.Optimizing Timestamp Management in Data Stream Management Systems,in Proceedings of the ICDE.2007.p.1334-1338.
    [89]Tanasa Doru,Masseglia Florent,Trousse Brigitte.Mining Generalized Web Data for Discovering Usage Patterns,in Encyclopedia of Data Warehousing and Mining.2008.
    [90]Silva Alzennyr Da,Lechevallier Yves,Rossi Fabrice,et al.Clustering Strategies for Detecting Changes on Web Usage Data,in Proceedings of the 56th Session of the International Statistical Institute.2007:Lisbon,Portugal.p.22-29.
    [91]Denoyer Ludovic,Gallinari Patrick,Vercoustre Anne-Marie.Categorization and Clustering of XML Documents,in Proceedings of the 5th International Workshop of the Initiative for the Evaluation of XML Retrieval.2006.p.432-443.
    [92]Leonardi Erwin,Bhowmick Sourav S.XANADUE:A System for Detecting Changes to XML Data in Tree-Unaware Relational Databases,in Proceedings of the ACM SIGMOD International Conference on Management of Data 2007.
    [93]Leonardi Erwin,Bhowmick Sourav S.OXONE:A Scalable Solution for Detecting Superior Quality Deltas on Ordered Large XML Documents,in Proceedings of the 25th International Conference on Conceptual Modelling.2006.
    [94]Leonardi Erwin,Bhowmick Sourav S.Detecting Changes on Unordered XML Documents Using Relational Databases:A Schema-Conscious Approach,in Proceedings of the 14th ACM International Conference on Information and Knowledge Management.2005,ACM Press.
    [95]Zeng Zhiping,Wang Jianyong,Zhou Lizhu,et al.Coherent Closed Quasi-Clique Discovery from Large Dense Graph Databases,in Proceedings of the 12th ACM SIGKDD Int.Conf on Knowledge Discovery and Data Mining.2006:Philadelphia,Pennsylvania,USA.
    [96]Wang Jianyong,Zeng Zhiping,Zhou Lizhu.CLAN:An Algorithm for Mining Closed Cliques from Large Dense Graph Databases.,in Proceedings of the 2006IEEE Int.Conf on Data Engineering.2006:Atlanta,Georgia,USA.
    [97]Feng Jianhua,Qian Qian,Wang Jianyong,et al.Exploit Sequencing to Accelerate Hot XML Query Pattern Mining,in Proceedings of the 21st ACM Symposium on Applied Computing(Data Mining Track).2006.
    [98]Feng Jianhua,Qian Qian.Efficient Mining of Frequent Closed XML Query Pattern.Journal of Computer Science and Technology(JCST),207.22(5):p.725-735.
    [99]Dalamagas T,Cheng T,Winkel K J.Clustering XML documents by structure,in Proceedings of SETN.2004.p.112-121.
    [100]Zhang K,Shasha D.Simple fast algorithms for the editing distance between trees and related problems.SIAM Journal of Computing,1989.18(6):p.1245-1262.
    [101]Chawathe S S.Comparing Hierarchical data in external memory,in Proceedings of VLDB 1999.p.90-101.
    [102]Bunke H.On a relation between graph edit distance and maximum common subgraph.Pattern Recognition Letters,1997.18:p.689-694.
    [103]Miyahara T,Suzuki Y,Shoudai T,et al.Discovery offrequent tag tree patterns in semistructured Web Documents,in Proceedings of the sixth Pacific-Asia conference on knowledge discovery and data mining.2002.p.341-355.
    [104]Miyahara T,Shoudai T,Uchida T,et al.Discovery of frequent tree structured patterns in semi-structured Web documents,in Proceedings of the Fifth Pacific-Asia conference on knowledge discovery and data mining.2001:Hong Kong,China.p.47-52.
    [105]Chang CH,Lui SC,Wu YC.Applying pattern mining to Web information extraction,in Proceedings of the fifih Pacific-Asia conference on knowledge discovery and data mining.2001:Hong Kong,China.p.4-16.
    [106]Seo D.M.,Yoo J.S.,Cho K.H.An Efficient XML Index Structure with Bottom-Up Query Processing,in Proceedings of the International Conference on Computational Science.2007.p.813-820.
    [107]Kim Y,Park S.H.,Kim T.S.,et al.An Efficient lndex Scheme for XML Databases.,in Proceedings of the SOFSEM.2006.p.370-378.
    [108]Al-Khalifa S.,Jagadish H.V.,Koudas N.,et al.Structural Joins:A Primitive for Efficient XML Query Pattern Matching,in Proceedings of the 18th ICDE.2002.p. 141-152.
    [109]Hristidis V.,Petropoulos M.Semantic Caching of XML Databases,in Proceedings of the 5th WebDB.2002.
    [110]Hong J.W,Kang H.Data Integration and Cache-Answerability of Queries through XML View of Data Source on the Web,in Proceedings of IMSA.2005.p.242-247.
    [111]Luccio F.,Enriquez A.M.,Rieumont P.O.,et al.Exact Rooted Subtree Matching in Sublinear Time,in Technical Report TR-01-14.2001.
    [112]Yang Rui,Kalnis Panos,Tung Anthony K.H.Similarity Evaluation on Tree-structured Data,in Proceedings of the SIGMOD Conference.2005.p.754-765.
    [113]Nierman Andrew,Jagadish H.V.Evaluating Structural Similarity in XML Documents,in Proceedings of the WebDB.2002.p.61-66.
    [114]Chawathe Sudarshan S.Comparing Hierarchical Data in External Memory,in Proceedings of the VLDB.1999.p.90-101.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700