基于图的数据挖掘算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

基于图的数据挖掘算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Algorithm of Graph-Based Data Mining
作者：唐德权
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：数据挖掘 ; 子图同构 ; 规范化编码 ; 嵌入集 ; 频繁子图挖掘
英文关键词：data mining ; subgraph isomorphism ; canonicalization ; embedding set ; frequent subgraph
学位年度：2007
导师：夏幼明
学科代码：081202
学位授予单位：云南师范大学
论文提交日期：2007-05-20

摘要

与一般的数据比较,图能够表达更加丰富的语义,在科学研究和许多商业领域有着更为广泛的应用,如它可以描述世界万物之间错综复杂的联系,在社会网络分析中,人与人之间、人与物之间的联系是复杂的。通过抽象方法,可以将整个社会变成一个网络拓扑图,其中每个人可以是图中的节点,而人与人之间的联系则可以看作为图中的边。对社会人群的分析,自然就可以转化为对社会网络结构的挖掘。在生物技术领域,在生物学家发现蛋白质基因结构配位实验上频繁子图挖掘可以减轻结构匹配实验的代价。在Web挖掘、空间数据挖掘、药物分子式设计及其功能预测等领域也都有广泛应用。因此,对频繁子图的挖掘算法的研究有着重要的理论意义和应用价值。同时,这种丰富的语义也增加了数据结构的复杂性和挖掘令人感兴趣的图的子结构的难度。因此,需要综合应用图论知识与数据挖掘的各种技术。
     本文工作主要包括以下几部分:(1)在分析当前频繁子图挖掘定义的基础上结合支持度能反映数据库元素的共性和频繁度揭示了元素的个性特点,给出了基于支持度和频繁度的频繁子图挖掘定义;(2)基于子图同构理论和判断两个图同构的思想:如果两个图同构,那么它们的规范化编码一定相等。提出了一种有效进行图的规范化编码算法,从而避免子图同构NP完全问题带来的困难;(3)基于目前产生许多冗余候选子图的技术,提出了一种新的候选子图产生方法,通过连接和扩展操作产生所有候选子图,并且无冗余候选子图产生,从而可以正确计算候选子图的支持度,也减少了一些无效子图匹配问题;(4)引用嵌入集概念,基于候选子图的规范化邻接矩阵(CAM)在某个图的规范化邻接矩阵(CAM)的嵌入特征有效地计算候出选子图的支持度和频繁度;(5)基于图的规范化理论、候选子图的产生技术和候选子图支持度的计算方法,提出了频繁子图挖掘算法FSubgraphM,它能有效地从图数据库中挖掘频繁导出子图。
     实验研究表明,FSubgraphM能有效地从数据库carcinogen中挖掘其中的频繁导出子图结构,并根据频繁结构集提取有趣的关联知识,有着重要的理论指导意义和应用价值。
     本文解决了频繁子图挖掘算法中三个关键问题:(1)提出新的规范编码解决了判断一个图是否与另一个图同构,即子图同构问题;(2)提出新的连接和扩展操作算子有效解决了生成候选子图问题;(3)引入嵌入集概念,巧妙地结合连接和扩展操作计算嵌入集,解决了计算频繁子图问题。
Graph can express more semantence compared with other data. It can be intuitively presented and has a wide variety of applications both in research and in business. Such as it may describe relations in the world intricate things. Because the relation of between person and person ,and the person and thing relation is complex in the social network. Through the abstract method, may turn the entire society into network topology graph and each person may be the node of graph, but between the person and person's relation may regard as for the edge of graph. Therefore, analysis to the social crowd's may transform to the mining of society network structure. In the biological technology domain, the biologist discovered frequent subgraph mining may reduce the price of structure match experiment in the protein gene structure match experiment. Frequent subgraph mining has the widespread application in the Web mining the space data mining, the medicine molecular formula design and it's the function forecast and so on. Therefore, research on the frequent subgraph mining has the important significance of theory and the application value. Simultaneously, The rich semanteme enhance the complexity of data structure and increase the difficulty of mining interested sub-graph. So, we must imply the graph knowledge and data mining techology to solove this problem.
     Our contribution in this paper includes: (1) according to the particular that support reflects the common characteristic of the elements in a database, then frequent reveals the personality of the elements, we propose a novel definition of frequent subgraph mining based on the support and frequency subgraph mining definition, (2) base on the theory of subgraph isomorphism and the idea of determine two graph isomorphism: if two graph isomorphism, then canonical code must be equal, we propose a novel graph canonical form to determining graph isomorphism and avoid the NP-Complete problem of subgraph isomorphism, (3) base on the subgraph mining technology which generate redundancy subgraph, we propose two efficient candidate generation operations: FSubgraphM-Join and FSubgraphM-Extension that can significantly reduce the invalid graph matchings during the counting,(4) Can efficient enumerated candidate subgraph's supporty and frequency by maintaining an embedding set, which base on the features that one CAM is another graph's CAM;(5) base on the theory of canonical graph, candidate subgraph generate techlogy and method of support, we propose algorithm FSubgraphM which employs the above techniques to mine frequent induced subgraph from a graph data sets.
     Performance study indicates that FSubgraphM can effectively discovery the frequent induced subgraph from the database carcinogen, it also can form some interesting association rules from the frequent subgraphs which has some theoretical and practical significance.
     In this paper solve three key problem that frequent subgraph mining: (1) solve subgrph isomorphism problem by novel canonical coding to determine one graph whether isomorphism to another graph; (2) solve generate candidate subgraph problem by propose novel join and extension operator; (3) solve counting frequent subgraph problem by embedding sets, apply skill of combine join and extension calculate embedding sets.

引文

[AAP2000]R.Agarwal, C.Aggarwal, and V.V.V.Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing,2000.

    [AIS1993]R.Agrawal, T.Imielinski, and A.Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, 207-216,1993.

    [AS1994] Agrawal R.and Srikant R.,Fast Algorithms for Mining Association Rules,Porceedings of the 1994 International Conference on Very Large Database(VLDB'94), September 1994.

    [AS1995]R.Agrawal and R.Srikant. Mining sequential patterns. In Proc. 1995 Int.Conf.Data Engineering,3-14,1995.

    [ATH2000]Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. PKDD 2000: 13-23

    [BB2002] Borgelt C. and Berthold M.R., Mining Molecular Fragments: Fining Relevant Substructures of Molecules,Proceedings of IEEE the 2002 International Conference on Data Mining(ICDM'O2), 2002.

    [BSJ1992] Bayada D.M., Simpson R.W. and Johnson A.P., An Algorithm for the Multiple Common Subgraph Problem, Journal of Chemical Information and Computer Science, 1992.

    [C1995] W. W. Cohen. Fast e_ective rule induction. In Proc. ICML 1995, pages 115-123.Morgan Kaufmann, 1995.

    [CH1994] Cook D.J. and Holder L.B., Substructure Discovery Using Minimum Description Length and Background Knowledge, Journal of Artificial Intelligence Research, 1994.

    [CH2000] Cook D.J. and Holder L.B., Graph-based Data Mining.IEEE Intelligent System,2000.

    [CHD1995] Cook D.J, Holder L.B. and Djoko S., Knowledge Discovery from Structural Data,Journal of Intelligent Information Systems, 1995.

    [CMH2003]D.J.Cook,N.Manocha,and L.B.Holder,Using a graph - based data mining system to perform web search [J],International Journal of Pattern Recognition and Artificial Intelligence,vol. 17,no.5,705 -720,2003.

    [CWJY2004]Chen Wang, Wei Wang, Jian Pei, Yongtai Zhu, Baile Shi: Scalable mining of large disk-based graph databases. KDD 2004: 316-325
    [FCYTW2005]Chen Wang. Yongtai Zhu, Tianyi Wu, Wei Wang. Baile Shi: Constraint-Based Graph Mining in Large Database. APWeb 2005: 133-144.

    [D1998] L. Dehaspe. Frequent Pattern Discovery in First-Order Logic. K. U. Leuven, Dec. 1998.

    [DKK2003] M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. ICDM-03, pages 35-42,2003.

    [DTK1998] Dehaspe L., Toivonen H. and King R.D., Fining Frequent Substructures in Chemical Compounds, Proceedings of ACM SIGKDD the 1998 International Conference on Knowledge Discovery in Database(KDD'98), 1998.

    [FPSU1996] Fayyad U.M.,Piatetsky-Shapiro G.,Smyth P.and Uthurusamy R.,Advances in Knowledge Discovery and Data Mining,Cambridge,MA:AAAI/MIT Press, 1996.

    [FW1994] J. Furnkranz and G. Widmer. Incremental reduced error pruning. In W. W. Cohen and H. Hirsh, editors, Proc. ICML 1994, pages 70-77. Morgan Kaufmann, 1994.

    [GC2002] Ghazizadeh S.and Chawathe S.,SEuS:Structures Extraction Using Summaries,Proceedings of the 2002 International Conference on Discovery Science, 2002.

    [GHC2003] J. A. Gonzalez, L. B. Holder, and D. J. Cook. Experimental comparison of graphbased relational concept learning with inductive logic programming systems. In Proc. ILP 2003, volume 2583 of Lecture Notes in Arti_cial Intelligence, pages 84-100. Springer-Verlag,2003.

    [GMYMW2003] W. Geamsakul, T. Matsuda, T. Yoshida, H. Motoda, and T. Washio.Constructing a decision tree for graph structured data. In Proc. MGTS 2003, pages 1-10, 2003.

    [HBB2003]H.Hofer,C.Borgelt,and M.R.Berthold,Large scale mining of molecular fragments with wildcards,in Advances in Intelligent Data Analysis V,ser.Lecture Notes in Computer Science (LNCS).Springer Verlag,2003,no.2810,380- 389.

    [HCD1994] Holder L.B., Cook D.J. and Djoko S.,Substructure Discovery in the Subdue System,Proceedings of ACM SIGKDD the 1994 International Conference on Knowledge Discovery in Database(KDD'94),July 1994.

    [HK2000] Han J.and Kamber M., Data Mining:Concepts and Techniques, Morgan Kaufmann Publishers,2000.

    [HKR2002]C.Helma,S.Kramer,and L.de Raedt,The molecular feature miner molfea.in Proceedings of the Beilstein - Institut Workshop,M.G.Hicks and C.Kettner,Eds.,May 2002.
    [HPY2000] Han J., Pei J. and Yin Y.,Mining Frequent Patterns without Candidate Generation,Proceedings of ACM SIGMOD the 2000 International Conference on Management of Data(SIGMOD'00),May 2000.

    [HPY2004] Han Jia-Wei, Pei Jian,Yan Xi—Fleng. From Sequential Pattern Mining to Structured Pattern Mining : A Pattern-Growth Approach. Journal of Computer Science and Technology.2004, 19(3): 257

    [HWB2001] T. Horvath, S. Wrobel, and U. Bohnebeck. Relational instance-based learning with lists and terms. Machine Learning, 43(l/2):53-80, Apr. 2001.

    [HWL2004]Han Jia-Wei, Wang Jianyong, Lu Ying. Petre Tzvetkov. Minging Top-K Frequent Closed Pattern s without Minimum Support. http: //140.116.247.141 /～tsengsm / COURSE/ DM / Paper / DMPAPER—LIST-04. HTM

    [HWPY2004] Huan J., Wang W, Prins J. and Yang J., Spin: Mining Maximal Frequent Subgraphs from Graph Database, Proceedings of ACM SIGKDD the 2004 International Conference on Data Mining and Knowledge Discovery(KDD'04),August 2004.
    [IK2003]. A. Inokuchi and H. Kashima. Mining significant pairs of patterns from graph structures with class labels. In Proc. ICDM-03, pages 83-90, 2003.

    [IM2003]Inokuchi A, Motod a T H. Complette Mining of Frequent Patterns from Graphs: Mining Graph Data. In: Machine Learning, 2003. 321— 354

    [INM2002]Inokuchi A, Nishimura T K, Motod a H. A Fast Algorithm for Mining Frequent Connected Subgraph IBM Research. Tokyo Research Imboratory, 2002

    [IWM2000] Inokuchi A., Washin T. and Motoda H., An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data, Proceedings of the 2000 Europe Conference on Principle of Data Mining and Knowledge Discovery(PKDD'00), September 2000.
    [IWM2001]Inokuchi A, Washio T, Motoda H. Applying the Apriori—based Graph Mining Method to Mutagenesis Data Analysis. Journal of Computer Aided Chemistry , 2001, 2: 87—92

    [IWM2003] A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321-354, 2003.

    [IWNM2002] Inokuchi A., Washin T., Nishimura K. and Motoda H., A Fast Algorithm for Mining Frequent Connected Subgraphs, Research Report RT-0448, IBM Tokyo Research Lab,2002.

    [JL1994]J.Cook and L.Holder. Substructure discovery using minimun description length and background knowledge. .[.Artificial Intel. Research,1:231-255,1994.

    [JWJJ2004] Jun Huan, Wei Wang, Jan Prins, Jiong Yang: SPIN: mining maximal frequent subgraphs from graph databases. KDD 2004: 581-586

    [KGS2004] Koyuturk M, Grama A, Szpankowski W. An Eficient Algorithm for Detecting Frequent Subgraphs in Biological Networks Biolnformatics Supp. In: 12th Intl. Conf. Intelligent Systems Molecular Biology(ISMB04) vol.20, PP.i200a- i207

    [KH1995] K.Koperski and J.Han. Discovery of spatial association rules in geographic infoormation databases. In Proc.4~(th) Int.Symp.Large Spatial Databases,47-66,1995.

    [KK2001] Kuramochi M. and Karypis G, Frequent Subgraph Discovery. Proceedings of IEEE the 2001 International Conference on Data Mining(ICDM'01), November 2001.

    [KK2004] Kuramochi M. and Karypis G, Finding Frequent Patterns in a Large Sparse Graph.Proceedings of SAM the 2004 International Conference on Data Mining(SDM'04), 2004.

    [KMSS1996] R. D. King, S. Muggleton, A. Srinivasan, and M. J. E. Sternberg. Structureactivity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. of the National Academy of Sciences,93:438-442, 1996.

    [KRH2001] S. Kramer, L. De Raedt, and C. Hehna. Molecular feature mining in HIV data.In F.Provost and R. Srikant, editors, Proc. KDD-01, pages 136{143, New York,2001. ACM Press.

    [LH1999] L. Dehaspe and H. Toivonen. Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery,3(1):7-36,1999.

    [LHF1998] H.Lu,J.Han, and L.Feng. Stock movement and n-dimensional intertransaction association rules. In Pro.1998 SIGMOD Workshop on Research Issue on Data Mining and Knowledge Discovry,pages 12:1 -12:7,1998.

    [M1981] McKay B.D.,,Practical Graph Isomorphism,Congressus Numerantium,1981.

    [M1995] S. Muggleton. Inverting entailment and Progol. In Machine Intelligence, volume 14,pages 133-188. Oxford University Press, 1995.

    [M2004]T.Meinl,Erweiterte Fragmentsuche in Molek¨uldatenbanken,diploma thesis,Computer Science Department 2,University of Erlangen-Nuremberg,August 2004.
    [MG2001]Kuramochi,M.,Karypis,G.:Frequent Subgraph Discovery,Proceedings of the 2001IEEE International Conference on Data Mining(ICDM'01),November 2001.
    [MG1902]Kuramochi,M.,Karypis,G.:An Efficient Algorithm for Discovery Frequent Subgraphs.Technical Report TR02-026,Department of Computer Science/Army HPC Research Center University of Minnesota,2002.
    [MHZW2003]Mingsheng Hong,Haofeng Zhou,Wei Wang,Baile Shi:An Efficient Algorithm of Frequent Connected Subgraph Extraction.PAKDD 2003:40-51
    [MTV1994]H.Mannila,H.Toivonen,and A.I.Verkamo.Efficient algorithms for discovering association rules.In Proc.AAAI'94 Workshop Knowledge Discovery in Databases,181-192,1994.
    [MY1997]R.J.Miller and Y.Yang.Association rules over interval data.In Proc.1997ACM-SIGMOD Int.Conf.Management of Data,452-461,1997.
    [NK2004]S.Nijssen and J.N.Kok,A quickstart in frequent structure mining can make a difference,LIACS,Leiden University,Leiden,The Netherlands,Tech.Rep.,April 2004.
    [PBTL1999]N.Pasquier,Y.Bastide,R.Taouil,and L.Lakhal.Discovering frequent closed itemsets for association rules.In Proc.7~(th) Int.Conf.Database Theory,398-416,1999.
    [PCY1995]J.S.Park,M.S.Chen,and P.S.Yu.An effective hash-based algorithm for mining association rules.In Proc.1995 ACM-SIGMOD Int.Conf.Management of Data,175-186,1995.
    [PF1991]Piatetsky-Shapiro G and Frawley W.J.,Knowledge Discovery in Database,Cambridge,MA:AAAI/MIT Press,1991.
    [PHM2000]J.Pei,J.Han,and R.Mao.CLOSET:An efficient algorithm for mining frequent closed itemsets.In Proc.2000 ACM-SIGMOD Int.Workshop Data Mining and Knowledge Discovery,11-20,2000.
    [Q1990]J.R.Quinlan.Learning logical definitions from relations.Machine Learning,5:239-266,1990.
    [QYHW2002]Qingqing Yuan,Yubo Lou,Haofeng Zhou,Wei Wang,Baile Shi:Extract Frequent Pattern from Simple Graph Data.WAIM 2002:158-169
    [S1989]Sridharan N.S.,Editor,Proceedings of the 1989 International Joint Conference on Artificial Intelligence,August 1989.
    [RD1997]Ron Shamir,Dekel Tsur:Faster subtree isomorphism.ISTCS 1997:126-131
    [RD1999]Ron Shamir,Dekel Tsur:Faster Subtree Isomorphism.J.Algorithms 33(2):267-280 (1999)
    [SF1996]Scott Fortin.The Graph Isomorphism Problem.Technical Report TR 96-20,Department of Computing Science,University of Alberta,1996.
    [SJ2003]Siegfried Nijssen,Joost N.Kok:Efficient Frequent Query Discovery in FARMER.PKDD 2003:350-362.
    [SKB1999]A.Srinivasan,R.D.King,and D.W.Bristol.An assessment of ILP-assisted models for toxicology and the PTE-3 experiment.In S.D_zeroski and P.Flach,editors,Proc.ILP-99,volume 1634 of LNAI,pages 291 {302.Springer-Verlag,1999.
    [SMSK1996]A.Srinivasan,S.Muggleton,M.E.Sternberg,and R.D.King.Theories for mutagenicity:a study of first-order and feature based induction.A.I.Journal,85:277-299,1996.
    [TH2003]Takashi Washio,Hiroshi Motoda:State of the art of graph-based data mining.SIGKDD Explorations 5(1):59-68(2003)
    [TSS1987]Takahashi Y.,Satoh Y.and Sasaki S.,Recognition of the Largest Common Fragment among a Variety of Chemical Structures,Analytical Sciences,1987.
    [U1976]Ullmann J.R.,An Algorithm for Subgraph Isomorphism,Journal of ACM,1976.
    [UK2004]U.R uckert and S.Kramer,Frequent free tree discovery in graph data[C],ACM symposium on Applied computing.2004,564-570.
    [VGS2002]Vanetik N.,Gudes E.and Shimony S.E.,Computing Frequent Graph Patterns from Semi-structured Data,Proceedings of IEEE the 2002 International Conference on Data Mining(ICDM'02),2002.
    [W1988]D.Weininger.SMILES,a chemical language and information system 1.Introduction and encoding rules.J.Chem.Inf.Comput.Sci.,28:31-36,1988.
    [WM2003]Washio T,Motoda H.State of the Art of Graph-based Data Mining.ACM SIGKDD Explorations Newsletter,2003,5(1):59-68.
    [XJ2002]Xifeng Yan,Jiawei Han:gSpan:Graph-Based Substructure Pattern Mining.ICDM 2002:721-724
    [XJ2003]Xifeng Yan,Jiawei Han:CloseGraph:mining closed frequent graph patterns.KDD 2003:286-295
    [YH2002]Yan X.and Han J.,Graph-based Substructure Patterns Mining,Proceedings of IEEE the 2002 International Conference on Data Mining(ICDM'02),December 2002.
    [YH2003]Yan X.and Han J.,Closegraph::Mining Closed Frequent Graph Patterns,Proceedings of ACM SIGKDD the 2003 International Conference on Data Mining and Knowledge Discovery(KDD'03),August 2003.
    [YMI1994]Yoshida K.,Motoda H.and Indurkhya N.,Graph-based Induction as a Unified Learning Framework,Journal of Applied Intelligence,1994.
    [YYR2003]Yun Chi,Yirong Yang,Richard R.Muntz:Indexing and Mining Free Trees.ICDM 2003:509-512
    [YYYR1904]Yun Chi,Yirong Yang,Yi Xia,Richard R.Muntz:CMTreeMiner:Mining Both Closed and Maximal Frequent Subtrees.PAKDD 2004:63-73
    [Z2002]M.Zaki.Efficiently mining frequent trees in a forest.In D.Hand,D.Keim,and R.Ng,editors,Proc.KDD-02,pages 71-80.ACM Press,2002.
    [ZJWB2005]Zijing Tan,Jianjun Xu,Wei Wang,Baile Shi:Storing Normalized XML Documents in Normalized Relations.CIT 2005:123-129
    [AAP20001R.Agarwal, CAggarwal, and V.V.V.Prasad. A tree projection algorithm for generation of frequent iterasets. In Journal of Parallel and Distributed Computing,2000.

    [Aga+2000] Ramesh C. Agarwal, e al. Depth first generation of long patterns, KDD 2000,Boston,USA, 2000

    [Aga+2001] Ramesh C. Agarwal, et al. A tree projection algorithm for generation of frequent itemsets. J. of Parallel and Distributed Computing,2001,61 (3):350-371

    [AIS1993]R.AgrawaI, T.Imielinski, and A.Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, 207-216, 1993.

    [AMSTV1996]R.Agrawal,H.Mannila,R.Srikant,H.Toivonen,and A.I.Verkamo.Fast Discovery of Association Rules.In U.F.et al,editor,Advances in Knowledge Discovery and Data Mining.MIT press, 1996.

    [AS1994a]R.Agrawal and R.Srikant. Fast algorithms for mining association rules in large databases.In Research Report RJ9839, IBM Almaden Research Center, 1994.

    [AS1994b] Agrawal R, Srikant R. Fast algorithm for mining association rules[C]. In:Proceedings of the 20th International Conference on VLDB.Santiago,1994.487-499.

    [AS1995]R.Agrawal and R.Srikant. Mining sequential patterns. In Proc. 1995 Int.Conf.Data Engineering,3-14,1995.

    [Asi2002]T.Asia,Efficient substructure discovery from large semi-structured data[C] .International Conference on Data Mining(SDM2002), Proceedings of the Second SLAM. 2002,158-174

    [ATH2000]Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. PKDD 2000: 13-23

    [B1998]R.J.Bayardo.Efficiently mining long patterns from databases.ln Proc. 1998 ACM -SIGMOD In.cof.Management of Data(SIGMOD'98),page85-93,Seattle,Washington, june 1998.

    [BCG2001]D.Burdick,M.Calimlim,and J.Gehrke.MAFIA: A maximal frequent itemset algorithm for transactional databases[C]. Proceedings of the 17th International Conference on Data Engineering,Heidelberg, Germany, 2001. 443-452

    [BMS1997]Brin S.,Motwani R.and Silverstein C.1997.Dynamic Itemset Counting and Implica -tion Rules for Market Basket Data.Proceedings of the ACM SIGMOD International Conference on the Management of Data.New York,pp.255-264.

    [Bril997+] Sergey Brin , et al.. Dynamic itemset counting and implication rules for market basket data SIGMOD 1997 , Tucson, USA ,1997

    [BW1998]Buchter O.Wirth R.1998 Discovery of Association Rules over Ordinal Data: A New and Faster Algorithm and Its Application to Market Basket Data. Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining Melbourne Australia, pp. 36-47.

    [BWJ1997]C.Bcttini,X.Wang,S.Jajodia,Satisfiability of Quantitative Temporal Constraints with Multiple Granularities[C].Third International Conference on Principles and Practice of Constraint Programming (CP97),1997.435-449

    [BWJ1998a]C.Bettini,X.Wang,S.Jajodia, Mining Temporal Relationships with Multiple Granularities in Time Sequences[J],IEEE Data Engineering Bulletin, Volume 21 Number 1,1998,222-237

    [BWJ1998b]C.Bettini,X.Wang,S.Jajodia,A General Framework for Time Granularity and Its Application to Temporal ReasoningfJ], Annals of Mathematics and Artificial Intelligence, v.22 n.1-2,p.29-58, 1998

    [CH1994] J.Cook and L.Holder.Substructure discovery using minimum description length and background knowledge.J.Artificial Intel.Research, 1:231 -255,1994

    [CH2000] Diane J.Cook and Lawrence B.Holder.Graph-based Data Mining.IEEE Volume 15,Issue 2,Page(s):32-41, March-April 2000

    [CHD1996] D.J.Cook,L.B.Holder,and S.Djoko.Scalable discovery of informative structural concepts using domain knowledge.IEEE Expert, 11(15),1996

    [CNFF1996]Cheung D.W.,Ng V.T,Fu A.W and Fu Y.1996. Efficient Mining of Association Rules in Distributed Databases.IEEE Transactions on Knowledge and Data Engineering, 8(6):911-922.

    [CTTYX2004] Chen AL.Tang CJ,Tao HC.Yuan CA,Xie FJ.An improved algorithm based on maximum clique and FP-tree for mining association rules.Journal of Software,2004, 15(8): 1198—1207

    [CYM2003]Y.Chi,Y.Yang,and .R.R. Muntx. Index and mining free trees[C]. In Proceedings of the 2003 IEEE International Conference on Data Mining,,2003,509-512

    [CYM2004]Yun Chi,Yirong Yang,Richard R.Muntz.HybridTreeMiner:An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms[C].IEEE Transactions on Knowledge and Data Engineering 2004,190-202
    [CYTW2005]Chen Wang,Yongtai Zhu,Tianyi Wu,Wei Wang,Baile Shi:Constraint-Based Graph Mining in Large Database.APWeb 2005:133-144
    [CW1999]CHEN Ying,WANG Neng-bin.Querying and Optimizing Semistructured Data,JOURNAL OF SOFTWARE,1999,10(8):883-890
    [CWJY2004]Chen Wang,Wei Wang,Jian Pei,Yongtai Zhu.,Baile Shi:Scalable mining of large disk-based graph databases.KDD 2004:316-325
    [DK2001]L.De Raedt and S.Kramer.The levelwise version space algorithm and its application to molecular fragment finding.In IJCAI'01:Seventeenth International Joint Conference on Artificial Intelligence,volume 2,pages:853-859,2001
    [DK1998]Lin D1,Kedem ZM.Pincer-Search:A new algorithm for discovering the maximum frequent set[C].In:Schek HJ,ed.Proceedings of the 6th European Conference on Extending Database Technology.Heidelberg:Springer-Verlag,1998.105-119.
    [DLM1998]G.Das,K.Lin,H.Mannila,G.Renganathan,P.Smyth:Rule Discovery from Time Series[C].In proceedings of 4th International Conference on Knowledge Discovery and Data Mining 1998,16-22.
    [DT1999]L.Dehaspe and H.Toivonen.Discovery of frequent datalog patterns.Data Mining and Knowledge Disc.very,3(1):7-36,1999.
    [F1996]S.Fortin.The graph isomorphism problem.Technical Report TR96-20,Department of Computing Science,University of Alberta,1996.
    [FFLP2005]Facca,Federico Michele,Lanzi,PierLuca.Mining interesting knowledge from web logs:a survey.Data & Knowledge Engineering,2005,53(3):225-241
    [GC2002]Ghazizadeh S,Chawathe S.SeuS:Structure Extraction using Summaries.In:Proc.of the 5th Intl.Conf.on Discovery Science.2002
    [GRS1999]M.Garofalakis,R.Rastogi,and K.Shim.Spirit:Sequential pattern mining with regular expression constraints[C].Proceedings of the 25th International Conference on Very Large Data Bases,1999.223-234
    [GW2000]Geoffrey 1.Webb.Efficient search for association rules.In KDD-2000 Boston,MA August,2000
    [HCHC1998]Hilderman R.J.,Carter C.,Hamilton H.J.and Cercone N, 1998.Mining Association Rules from Market Basket Data using Share Measures and Characterized Itemsets.International Journal of Artificial Intelligence Tools.7(2):189-220.

    [HCD1994] Holder L, Cook D,Djoko S. Substructure discovery in the SUBDUE system. In: Proc. of the Workshop on Knowledge Discovery in Databases, 1994.169—ISO

    [HP2000]J Han and J.Pei. Mining Frequent Patterns by Pattern-Growth: Methodology and Implications[C].ACM SIGKDD Explorations SIGKDD Explorations,2000 2(2): 14-20

    [HPY2000]J.Han,J.Pei, and Y.Yin. Mining frequent patterns without candidate generation. In Proc.2000 ACM-SIGMOD Int.Conf.Management of Data,1-12,2000.

    [HWPY2004] Huan Jun, Wang Wei, Prins Jan, Yang Jiong. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases: [UNC Technical Report TR04—018]. 2004

    [IM2003] Inokuchi A. Motoda T H. Complette Mining of Frequent Patterns from Graphs: Mining Graph Data In: Machine Learning , 2003. 321-354

    [INM2002] Inokuchi A. Nishimura T K, Motoda H. A Fast Algorithm for Mining Frequent Connected Subgraph IBM Research. Tokyo Research Laboratory, 2002

    [IWM2002] Inokuchi A, Washio T. Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. In: Proc. of the 4th European Conf. on Principles and Practice of Data Mining and Knowledge Discovery (PKD 2000),2000. Proc of the 8th ACM SIGKDD intl. conf. on Knowledge discovery and data mining . ACM press,2002. 13-23

    [IK2003] A.Inokuchi,H-Kashima: Mining Significant pairs of patterns from graphs Structures with Class Labels:ICDM 2003:83-90,2003.

    [JHC2002] I.Jonyer,L.Holder,and D.Cook.Concept formation using graph grammars.In Workshop Notes:MRDM 2002 Workshop on Multi-Relational Data Mining,pages:71-792,2002.

    [JL1994]J.Cook and L.Holder. Substructure discovery using minimun description length and background knowledge. J.Artificial Intel. Research,1:231-255,1994.

    [JM1998]John Punin, Mukkai Krishnamoorthy. WWW Pal System—A System for Analysis and Synthesis of Web Pages. WebNet 98 Conference, 1998

    [JM2002]Jiawei Han & Micheline Kamber.Data Mining Concepts and Techniques.Bei Jing: China Machine Press,2002

    [JPMMZ2001]John R. Punin, Mukkai S. Krishnamoorthy, Mohammed J. Zaki.LOGML: LogMarkup Language forWeb UsageMining.WWW10 Conference,2001
    [JWJJ2004]Jun Huan,Wei Wang,Jan Prins,Jiong Yang:SPIN:mining maximal frequent subgraphs from graph databases.KDD 2004:581-586
    [KH1995]K.Koperski and J.Han.Discovery of spatial association rules in geographic infoormation databases.In Proc.4~(th) Int.Symp.Large Spatial Databases,47-66,1995.
    [KK2001]Kuramochi M,Karypis G.Frequent Subugraph discovery.In:Proc.2001 Int.Co nf.Data Mining(ICDM01),San Jose,CS,Nov.2001.313-320
    [KK2002]Kuramochi M,Karypis G.An Efficient Algorithm for Discovering Frequent Subgraphs.Technical Report 02-026,Department of Computer Science/Army HPC Research Center,University of Minnesota,2002.
    [KK2004]Michihiro Kurarnochi George Karypis.Finding Frequent Patterns in a Large Sparse Graph:In:Proc.of the 2004 SIAM Data Mining Conf.2004
    [LH1999]L.Dehaspe and H.Toivonen.Discovery of frequent datalog patterns.Data Mining and Knowledge Discovery,3(1):7-36,1999.
    [LHF1998]H.Lu,J.Han,and L.Feng.Stock movement and n-dimensional intertransaction association rules.In Pro.1998 SIGMOD Workshop on Research Issue on Data Mining and Knowledge Discovry,pages 12:1-12:7,1998.
    [LL2001]LU Song-feng,LU Zheng-ding.Fast Mining Maximum Frequent Itemsets.JOURNAL OF SOFTWARE 2001 Vol.12 No.2 P.293-297.
    [LZJ2004]LI Li,ZHAI Dong-hai,JIN Fan.Graph-Based Algorithm for Mining Frequent Closed Itemsets.JOURNAL OF SOUTHWEST JIAOTONG UNIVERSITY,2004,39(3):385-389
    [Man+1997]Heikki Mannila,et al..Search and borders of theories in knowledge discovery.Data Mining and Knowledge Discovery,1997,1(3):241-258
    [MG2001]Kuramochi,M.,Karypis,G.:Frequent Subgraph Discovery,Proceedings of the 2001IEEE International Conference on Data Mining(ICDM01),November 2001.
    [MHZW2003]Mingsheng Hong,Haofeng Zhou,Wei Wang,Baile Shi:An Efficient Algorithm of Frequent Connected Subgraph Extraction.PAKDD 2003:40-51
    [MT1996]H.Mannila and H.Toivonen.Discovering generalized episodes using minimal occurrences.In 2nd Intl.Conf.Knowledge Discovery and Data Mining,pages:146-151 1996.
    [MPE2001]MRDM'01:Workshop multi-relational data mining.In conjunction with PKDD'01 and ECML'01,2002.http://www.kiminkii.com/mrdm/.

    [MTI1997JMannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, 1(3), 1997.

    [MTV1994]H.Mannila, H.Toivonen, and A.I.Verkamo. Efficient algorithms for discovering association rules. In Proc. AAAI'94 Workshop Knowledge Discovery in Databases, 181-192, 1994.

    [MTV1997]R.J.Miller and Y.Yang. Association rules over interval data. In Proc. 1997 ACM-SIGMOD Int.Conf.Management of Data, 452-461,1997.

    [NYRL1999]Nicolas Pasquier,Yves Bastide,Rafik Taouil and Lotfl.Efficient ming of associ -ation rules using closed itemset lattices.In Information System Vol.24.no.I. pp.25-46,1999.

    [Par+1995] Jong Soo Park , et al. An effective Hash based algorithm for mining association rules.SIGMOD1995,San Jose.USA, 1995

    [PBTL1999]N.Pasquier,Y.Bastide,R.Taouil, and L.Lakhal. Discovering frequent closed itemsets for association rules. In Proc.7~(th) Int.Conf.Database Theory,398-416,1999.

    [PCY1995]J.S.Park, M.S.Chen, and P.S.Yu. An effective hash-based algorithm for mining association rules. In Proc.1995 ACM-SIGMOD Int.Conf.Management of Data, 175-186,1995.

    [PE1997] Mike Perkowitz , Oren Etzioni. Adaptive sites : Automatically learning from user access patterns. WWW'97 , Santa Clara, 1997

    [Pei+2001a]J.Pei , et al. H-Mine: Hyper-structure mining of frequent patterns in large databases.ICDM'01, San Jose, CA, 2001

    [Pei+2001b] J.Pei, et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth ICDE'01, Heidelberg,2001

    [PF1998] Povinelli,R.J.and Feng,X.,Temporal Pattern Identification of Time Series Data Using Pattern Wavelets and Genetic Algorithms[M], Artificial Neural Networks in Engineering, St.Louis,Missouri, 691-696.

    [PH2000] J.Pei and J.Han. Can We Push More Constraints into Frequent Pattern Mining?[C]. In Proceedings of conference on Knowledge Discovery and Data Mining.KDD 2000: 350-354

    [PH2001] J.Pei and J.Han.PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth[C]. Proceedings of the 17th International Conference on Data Engineering.IEEE Computer Society Press, 2001.215- 224

    [PHL2001]Jian Pei Jiawei Han,Laks V.S.Lakshmanan.Mining Frequent Itemsets with Convert -ible Constraints.In Proc.2001 ACM-SIGMOD Submitted.
    [PHM2000]J.Pei,J.Han,and R.Mao.CLOSET:An efficient algorithm for mining frequent closed itemsets.In Proc.2000 ACM-SIGMOD Int.Workshop Data Mining and Knowledge Discovery,11-20,2000.
    [Pov1999]Povinelli,R.J.Time Series Data Mining:Identifying Temporal Patterns for Characterization and Prediction of Time Series Events[D],PhD Dissertation,Marquette University,Milwaukee,1999
    [QYHW2002]Qingqing Yuan,Yubo Lou,Haofeng Zhou,Wei Wang,Baile Shi:Extract Frequent Pattern from Simple Graph Data.WAIM 2002:158-169
    [RC1977]R.C.Read and D.G.Comeil.The graph isomorph disease.Journal of Graph theory,1:339-363,1977.
    [RK1996]Ruckert U,Kramer S.Frequent Free Tree Discovery in Graph Data,https://portal.acm.org/poplogin.cfm?dl—ACM&coll=GUIDE & comp_id=968018 & want_href=delivery%2Ecfm%3Fid%313968018%26type%3Dpdf&CFID =33435631 &CFTOKEN=519350&td=1102493452296
    [RK2001]L.De Raedt and S.Kramer.The levelwise version space algorithm and its application to molecular fragment finding.In IJCAI'01:Seventeenth International Joint Conference on Artificial Intelligence,volume 2,pages853-859,2001.
    [SAR1996]Srikant,R.,Agrawal,R.:Mining Sequential Patterns:Generalizations and Performance Improvements,Proc.5th Int.Conf.Extending Database Technology,EDBT96,1996.
    [SF1996]Scott Fortin.The Graph Isomorphism Problem.Technical Report TR 96-20,1996.
    [SJ2003]Siegfried Nijssen,Joost N.Kok:Efficient Frequent Query Discovery in FARMER.PKDD 2003:350-362
    [SK1997]Srinisavan,A.,King,R.D.,Muggleton,S.H.and Sternberg,M.J.E.1997.The predictive toxicology evaluation challenge.In Proc.of the Fifteenth International Joint Conferenceon Artificial Intelligence,4-9.
    [TH2003]Takashi Washio,Hiroshi Motoda:State of the art of graph-based data mining.SIGKDD Explorations 5(1):59-68(2003)
    [W1990]Wang Shu He,Graph Theory and Algorithm China Science and Technology university press.1990.
    [WDX2004] Wang XY,Du XP,Xie KQ.Research on Implementation of the FP-growth Algorithm-COMPUTER ENGINEERING AND APPLICATIONS, Sep.2004.174-176

    [WHP+2004]Chen Wang, Mingsheng Hong,Jian Pei,Haofeng Zhou, Wei Wang, Baile Shi.Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining[C].In: In Proceedings of 8th Pacific Asia Conference on Knowledge Discovery and Data, 2004.441-451

    [WL1997]K Wang and H.Liu.Schema discovery for semistructured data[C].In Proceedings of Conference on Knowledge Discovery and Data Mining.1997,271-274

    [WM2003]T.Washio and H.Motoda,State of the Art of Graph-based Data Mining,SIGKDD Explor-ations Special Issue on Multi-Relational Data Mining,Volume 5,Issue 1,2003.

    [XJ2002]Xifeng Yan, Jiawei Han: gSpan: Graph-Based Substructure Pattern Mining. ICDM 2002:721-724

    [XJ2003J Xifeng Yan, Jiawei Han: CloseGraph: mining closed frequent graph patterns. KDD 2003:286-295

    [YH2002]Yan Xifeng , Han Jiawei. gspan: Graph-based substructure pattern mining. [Technical Report UIUCDCS-R-2002—2296]. Department of Computer Science,University of Illinois at UrbanaChampaign, 2002

    [YH2003]Yan Xifeng, Han Jiawei. CloseGraph: Mining Closed Frequent Graph Patters. In: Proc. of the ninth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. ACM Press,Aug. 2003. 286- 295

    [YMI1994] K.Yoshida,H.Motoda,and N.Indurkhya.Graph-based induction as a unified learning framework.J.of Applied Intel.,4:297-328,1994

    [ZA2003] M.J.Zaki and C.C.Aggarwal.XRules: An effective structural classifier for XML data[C].In Proceedings of the 2003 International Conference Knowledge Discovery and Data Mining,2003.316-325

    [Zak2001]M.Zaki.SPADE:An efficient algorithm for mining frequent sequences [J] Machine Learnng,2001.40:31-60

    [Zak2002] M.J.Zaki.Efficiently mining frequent trees in a forest[C].In Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002,71-80.
    [ZJWB2005]Zijing Tan, Jianjun Xu, Wei Wang, Baile Shi: Storing Normalized XML Documents in Normalized Relations. CIT 2005: 123-129