图数据库查询处理技术的研究

英文题名：Research on Techniques for Query Processing on Graph Databases
作者：张硕
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：超图查询 ; 概率top-k子图匹配查询 ; 图统计信息 ; 子图同构 ; 不确定图
英文关键词：Supergraph Query ; Probabilistic Top-k Subgraph Matching Query ; GraphStastics ; Subgraph Isomorphism ; Uncertain Graph
学位年度：2010
导师：李建中
学科代码：081202
学位授予单位：哈尔滨工业大学
论文提交日期：2010-06-01

摘要

作为一种通用的数据结构,图可以用来表示数据对象之间的复杂联系。例如:图可以表示化合物的分子结构,蛋白质交互网络,社会网络等。随着科学与工程领域中图数据的大量出现和累积,图数据管理已成为数据管理领域一个重要和热点研究的子领域。图数据库查询处理是其中最重要的研究分支之一,其对图相关的绝大部分处理和应用(例如:图挖掘、化学数据库PubChem)起着基础支撑作用。本文主要对图数据库中的查询处理技术进行深入研究,归纳总结了现有研究成果的主要思想和优缺点,提出了一些新的图数据库查询处理方法,主要研究成果如下:
     1.提出一种图数据库中高效处理超图包含查询的新方法。新方法综合的从图数据库的压缩组织、构造有效的特征索引以及基于压缩组织来处理查询三个方面着手考虑问题。(1)在图数据库的压缩组织方面,提出图数据库的有效组织方法,以提高整体查询处理效率。现有的采用过滤-验证机制的方法将图数据库中的图逐个的独立存放。提出方法将图数据库中图结构化的压缩组织起来。通过压缩组织方法,产生一个逻辑数据结构GPTree,其中记录了数据库中图的公共子图的信息。为了优化的构造GPTree,形式化定义了最优诱导子图选择问题;证明了其是一个NP难问题,并提出了一个近似比为2的近似算法。(2)在构造有效的特征索引方面,提出高效而不依赖于历史查询的子图索引特征生成方法,以及两种索引结构CRGraph和FGPForest。首先基于分析,给出索引特征的显著性度量。提出了找出所有显著性不小于用户需求的索引特征的方法,即精确索引特征生成方法。为了适应需要更加快速的生成索引的应用场景,提出了特征索引构造的一个近似方法。这两种方法都是基于图模式挖掘的方法。为了高效使用索引特征,对索引特征进行排序;并且基于理论分析给出了求解其最优排序的算法。(3)在基于压缩组织来处理查询方面,提出从多个图到一个图的子图同构检测的新方法,称为GPTreeTest。现有方法逐个的考察每个图对进行检测,新方法能够利用压缩组织中公共子图的信息,显著减少对多个图的子图同构检测的总时间。最后,在真实数据集和合成数据集上的实验结果表明,提出方法比目前最好方法高效1至2个数据量级。
     2.提出不确定图数据库上概率top-k子图匹配查询的新问题、以及一种查询处理方法。首先给出不确定图数据模型,结合现实需求提出概率top-k子图匹配查询问题。一个顶点的邻居子图是由其距离不大于给定阈值内的所有顶点和边构成的子图。基于图结构空间相关性的特点,以附带概率信息的邻居子图为基础,设计一种有效的索引结构NG-Index。NG-Index索引可以很容易实现于成熟的关系数据库中,具有强健壮性。提出一种高效的基于搜索树的算法来进行查询处理。其中运用了一种概率剪枝技术来提高性能。最后通过实验考察并证实提出方法具有良好的效率和可扩展性。
     3.提出结合概念分层的图统计信息定义以及查询处理方法。具体地说,给出了结合顶点关联的概念分层,根据用户指定的搜索兴趣来高效地计算数据图中统计信息的方法。首先提出一种结合概念分层的图统计分布表示。本文将用户搜索兴趣建模为概念图,并以用户概念图的子图匹配计数为基础来表示图统计信息。其次,为了高效计算此统计分布信息,设计了一种基于子图密度的索引结构并提出两阶段的计算方法: (1)先基于索引快速地去除数据图中的不相关边并将数据图打散划分为若干小尺寸的连通图;(2)再对这些连通小图分别计算统计信息,最后合并得出结果。在连通小图上计算统计信息的核心是概念图的子图匹配计数问题。文中针对这个子问题着重提出两种高效算法:前向计算算法和后向计算算法。这种在精确计算之前将数据大图快速打散为多个小图的分治思想是总体效率提升的关键所在。最后,在真实数据集上的实验结果表明所提出方法具有良好的效率和可扩展性。
     4.提出了一种较大尺寸的标签图子图同构检测方法及其应用方法。所提出的检测方法是一种基于搜索的方法。本文从标签图的特性出发,以标签信息和图拓扑结构相结合的方式来缩减搜索空间。首先,将标签按照出现的频率比转换为数值。然后,将标签信息与结构相结合,来构造多组细粒度的顶点不变量。顶点不变量是关于顶点的固有属性,其在同构映射下保持不变。借助于所构造的细粒度的顶点不变量,将标签信息沿图拓扑结构传播开来,并缩减匹配顶点候选集来减小搜索空间。再次,基于顶点不变量生成了细粒度的剪枝条件。由于结合标签信息和拓扑结构,这些条件具有更强的剪枝能力。另外,将提出检测方法中的技术细节应用到第2章提出的GPTree结构上,来显示其可用来优化已有方法的适用性。最后实验结果表明,提出方法具有良好的高效性,同时应用新技术的GPTreeTest*算法效率优于原始方法GPTreeTest。
Graph is a general data structure in computer science, and can be used for modelingcomplex relationships between data objects. For example, graphs can model chemicalcompound structures, protein interaction networks and social networks etc.. With therapid proliferation of graph data in various domains, graph data management has becomean important research subfield in the data management domain. Query processing ongraph databases is one of the most important and fundamental research topics in graphdata management, as a large number of graph-related tasks and many applications, suchas graph mining and the PubChem chemical database, rely on processing queries in anefficient way. In this dissertation, we focus on the techniques for processing querieson graphs databases, review the current research results, formulate several new researchproblems and propose some novel approaches for efficiently query processing. The maincontributions of the dissertation are as follows:
     1. We propose a novel approach for efficiently processing supergraph queries ongraph databases. The proposed new approach involves the techniques of (1) compactlyorganizing graph databases, (2) constructing index by using significant subgraph fea-tures and (3) processing queries based on the compact organization and the index. (1)For graph query processing, many existing methods use the filtering-and-verification (orF&V) methodology, i.e. first fast filter some negative results to obtain a candidate set andthen verify each candidate. For processing supergraph queries, the existing F&V-basedmethod arranges the graphs in databases one by one separately. We propose to mergesome common subgraphs shared by multiple graphs and compactly organizing graphdatabases, which derives a logical data structure, namely GPTree. In order to form a goodGPTree, an optimization problem of optimal induced subgraph selecting is formulated.After proving it is NP-hard, we propose an approximation algorithm with ratio bound 2.(2) For constructing effective indices, two algorithms for generating significant indexedsubgraph features, which are independent of historical queries, are proposed. Based onthe generated indexed subgraph features, two indices CRGraph and FGPForest are de-vised. In particular, by analyzing the effectiveness of a set of subgraphs, a significancemeasure w.r.t. subgraph is defined. An algorithm for finding all the subgraphs with no less than a user-specified significance threshold is presented. Also, another algorithm forextracting an approximate set of the significant subgraphs is proposed, for the purpose offast index creating that is required by some real-life applications. In order to upgrade theeffectiveness of index, based on the mathematical analysis, we propose a method for op-timally ordering the extracted indexed subgraphs. (3) An algorithm, called GPTreeTest,for testing subgraph isomorphisms from multiple graphs in a GPTree to a query graph isproposed. This new algorithm can utilize the information of common subgraphs recordedin a GPTree and reduce the overall computational cost for subgraph isomorphism test-ing. The experimental results on both real and synthetic datasets show the new approachoutperform the existing counterpart by one to two orders of magnitude.
     2. We present a new problem of probabilistic top-k subgraph matching query onuncertain graph databases, and propose a query processing method. Firstly, a uncertaingraph data model is presented. Based on the data model, the formulation of probabilis-tic top-k subgraph matching query is introduced. The neighborhood subgraph w.r.t. avertex in a graph is the subgraph consisting of all the vertices with the distance no morethan a threshold and all the edges between these vertices. By using the characteristics ofspace-locality of structures in a graph, based on neighborhood subgraphs, we propose aneffective index structure, namely NG-Index, with probabilistic information in a data un-certain graph. NG-Index can be easily implemented in relational databases. In addition,based on NG-Index, we present an efficient search-tree-based algorithm with probabilis-tic pruning technique to search large uncertain graphs. Experiment results validate theefficiency and scalability of the proposed techniques.
     3. We present a new problem of querying graph statistics in the presence of concepthierarchy. We formulate the user search interest by a concept graph. Given a concepthierarchy, a user-issued concept graph, a solution to the problem aims at efficiently evalu-ating the statistics of a given data graph. Firstly, we present a definition of graph statistics,which is based on the number of subgraph matchings of user-issued concept graphs. Then,we devise an index structure based on subgraph density, and propose a two-phase evalu-ation method. In the first phase, we fast break apart the original data graph into a set ofsmall-size connected graphs; in the second phase, each small graph is evaluated and thenthe results are merged together. The core sub-problem involved in the second phase iscomputing the number of subgraph matchings of concept graphs. For this sub-problem, we propose two algorithms, i.e. forward computation and backward computation. At last,the experimental results on real datasets show the elegance of the proposed method.
     4. We propose a new search-based method for testing subgraph isomorphisms formoderate-size labeled graphs; and after that we apply the proposed new techniques inthe GPTree data structure and propose GPTreeTest* algorithm. We analyze and utilizethe characteristics of labeled graphs, and reduce the search space by integratedly usingthe label and structure information. Firstly, labels are converted into numbers based onlabel frequency. Secondly, we record a set of fine-grained vertex invariants. Vertex invari-ants are some inherent properties of the vertices that do not change across isomorphismmappings. With the help of these fine-grained vertex invariants, label information is prop-agated through the structure in graphs, which reduces the overall search space. Thirdly,we derive pruning conditions from the fine-grained vertex invariants. These conditionsare quite strong since the label and structure information are combined together. Throughapplying the proposed new techniques in GPTree, we show the proposed new techniquesare able to improve a number of tasks. The experimental results show the high efficiencyof the proposed testing method and show that GPTreeTest* outperforms the original GP-TreeTest algorithm.

引文

1 S. Brin, L. Page. The Anatomy of a Large-scale Hypertextual Web Search En-gine[J]. Computer Networks, 1998, 30(1-7):107–117.
    2 L. B. Holder, D. J. Cook. Graph-based Relational Learning: Current and FutureDirections[J]. ACM SIGKDD Explorations Newsletter, 2003, 5(1):90–93.
    3 D. Shasha, J. T.-L. Wang, R. Giugno. Algorithmics and Applications of Tree andGraph Searching[C]//Proceedings of the 21st ACM SIGACT-SIGMOD-SIGARTSymposium on Principles of Database Systems (PODS). Madison, Wisconsin,USA: ACM Press, 2002:39–52.
    4 Pubchem Structure Search. http://pubchem.ncbi.nlm.nih.gov/search/search.cgi.
    5 P. Hintsanen, H. Toivonen. Finding Reliable Subgraphs from Large ProbabilisticGraphs[C]//Proceedings, Part I, of the European Conference on Machine Learningand Knowledge Discovery in Databases (ECML/PKDD 2008). Antwerp, Belgium:Springer-Verlag, 2008:15.
    6 P. Hintsanen, H. Toivonen. Finding Reliable Subgraphs from Large ProbabilisticGraphs[J]. Data mining and knowledge discovery, 2008, 17(1):3–23.
    7 R. Milo, S. Itzkovitz, N. Kashtan, et al. Superfamilies of Evolved and DesignedNetworks[J]. Science, 2004, 303(5663):1538–1542.
    8 C. E. Tsourakakis. Fast Counting of Triangles in Large Real Networks withoutCounting: Algorithms and Laws[C]//Proceedings of the 8th IEEE InternationalConference on Data Mining (ICDM). Pisa, Italy: IEEE Computer Society Press,2008:608–617.
    9 L. Becchetti, P. Boldi, C. Castillo, et al. Efficient Semi-streaming Algorithms forLocal Triangle Counting in Massive Graphs[C]//Proceedings of the 14th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining(KDD). Las Vegas, Nevada, USA: ACM Press, 2008:16–24.
    10 C. E. Tsourakakis, U. Kang, G. L. Miller, et al. Doulion: Counting Triangles inMassive Graphs with a Coin[C]//Proceedings of the 15th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining (KDD). Paris, France:ACM Press, 2009:837–846.
    11 T. Washio, H. Motoda. State of the Art of Graph-based Data Mining[J]. ACMSIGKDD Explorations Newsletter, 2003, 5(1):59–68.
    12 H. Jiang, H. Wang, P. S. Yu, et al. Gstring: A Novel Approach for Efficient Searchin Graph Databases[C]//Proceedings of the 23nd International Conference on DataEngineering (ICDE). Istanbul, Turkey: IEEE Computer Society Press, 2007:566–575.
    13 X. Yan, P. S. Yu, J. Han. Graph Indexing: A Frequent Structure Based Ap-proach[C]//Proceedings of the ACM SIGMOD International Conference on Man-agement of Data. Paris, France: ACM Press, 2004:335–346.
    14 X. Yan, P. S. Yu, J. Han. Graph Indexing Based on Discriminative FrequentStructure Analysis[J]. ACM Transactions on Database Systems (TODS), 2005,30(4):960–993.
    15 J. Cheng, Y. Ke, W. Ng, et al. Fg-index: Towards Verification Free Query Pro-cessing on Graph Databases[C]//Proceedings of the ACM SIGMOD InternationalConference on Management of Data. Beijing, China: ACM Press, 2007:857–872.
    16 S. Zhang, M. Hu, J. Yang. Treepi: A Novel Graph IndexingMethod[C]//Proceedings of the 23rd International Conference on Data Engi-neering (ICDE). Istanbul, Turkey: IEEE Computer Society Press, 2007:966–975.
    17 H. Shang, Y. Zhang, X. Lin, et al. Taming Verification Hardness: An EfficientAlgorithm for Testing Subgraph Isomorphism[J]. Proceedings of the VLDB En-dowment, 2008, 1(1):364–375.
    18 D. W. Williams, J. Huan, W. Wang. Graph Database Indexing Using Struc-tured Graph Decomposition[C]//Proceedings of the 23nd International Conferenceon Data Engineering (ICDE). Istanbul, Turkey: IEEE Computer Society Press,2007:976–985.
    19 P. Zhao, J. X. Yu, P. S. Yu. Graph Indexing: Tree + Delta >=Graph[C]//Proceedings of the 33rd International Conference on Very Large DataBases (VLDB). University of Vienna, Austria: ACM Press, 2007:938–949.
    20 H. He, A. K. Singh. Closure-tree: An Index Structure for GraphQueries[C]//Proceedings of the 22nd International Conference on Data Engineer-ing (ICDE). Atlanta, Georgia, USA: IEEE Computer Society Press, 2006:38–47.
    21 B. T. Messmer, H. Bunke. A Decision Tree Approach to Graph and SubgraphIsomorphism Detection[J]. Pattern Recognition (PR), 1999, 32(12):1979–1998.
    22 B. T. Messmer, H. Bunke. Efficient Subgraph Isomorphism Detection: A Decom-position Approach[J]. IEEE Transactions on Knowledge and Data Engineering,2000, 12(2):307–323.
    23 C. Chen, X. Yan, P. S. Yu, et al. Towards Graph Containment Search and Index-ing[C]//Proceedings of the 33rd International Conference on Very Large Data Bases(VLDB). University of Vienna, Austria: ACM Press, 2007:926–937.
    24 X. Yan, P. S. Yu, J. Han. Substructure Similarity Search in GraphDatabases[C]//Proceedings of the ACM SIGMOD International Conference onManagement of Data. Baltimore, Maryland, USA: ACM Press, 2005:766–777.
    25 X. Yan, F. Zhu, P. S. Yu, et al. Feature-based Similarity Search in Graph Struc-tures[J]. ACM Transactions on Database Systems (TODS), 2006, 31(4):1418–1453.
    26 X. Yan, F. Zhu, J. Han, et al. Searching Substructures with Superimposed Dis-tance[C]//Proceedings of the 22nd International Conference on Data Engineering(ICDE). Atlanta, Georgia, USA: IEEE Computer Society Press, 2006:88–97.
    27 S. Fortin. The Graph Isomorphism Problem[R]. Tech. rep., University of Alberta,1996.
    28 H. Bunke. Graph Matching: Theoretical Foundations, Algorithms, and Appli-cations[C]//Proceedings of the 13th International Conference on Vision Interface.Montre′al, Que′bec, Canada: The International Association for Pattern RecognitionPress, 2000:82–88.
    29 D. Conte, P. Foggia, C. Sansone, et al. Thirty Years of Graph Matching in PatternRecognition[J]. International Journal of Pattern Recognition and Artificial Intelli-gence (IJPRAI), 2004, 18(3):265–298.
    30 S. A. Cook. The Complexity of Theorem-proving Procedures[C]//Proceedings ofthe 3rd Annual ACM Symposium on Theory of Computing. Shaker Heights, Ohio,USA: ACM Press, 1971:151–158.
    31 D. G. Corneil, C. C. Gotlieb. An Efficient Algorithm for Graph Isomorphism[J].Journal of ACM, 1970, 17(1):51–64.
    32 J. R. Ullmann. An Algorithm for Subgraph Isomorphism[J]. Journal of ACM, 1976,23(1):31–42.
    33 S. Fortin. The Graph Isomorphism Problem[R]. Tech. Rep. TR96-20, Departmentof Computing Science, University of Alberta, 1996.
    34 L. P. Cordella, P. Foggia, C. Sansone, et al. A (sub)graph Isomorphism Algorithmfor Matching Large Graphs[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence (TPAMI), 2004, 26(10):1367–1372.
    35 J. Han, M. Kamber. Data Mining: Concepts and Techniques[M]. 2nd ed., MorganKaufmann Publishers, 2006:535–590.
    36 A. Inokuchi, T. Washio, H. Motoda. An Apriori-based Algorithm for Mining Fre-quent Substructures from Graph Data[C]//Proceeding of the 4th European Confer-ence on Principles and Practice of Knowledge Discovery in Databases (PKDD).London, UK: Springer-Verlag, 2000:13–23.
    37 M. Kuramochi, G. Karypis. Frequent Subgraph Discovery[C]//Proceedings of the2001 IEEE International Conference on Data Mining (ICDM). San Jose, California,USA: IEEE Computer Society Press, 2001:313–320.
    38 C. Borgelt, M. R. Berthold. Mining Molecular Fragments: Finding Relevant Sub-structures of Molecules[C]//Proceedings of the 2002 IEEE International Confer-ence on Data Mining (ICDM). Maebashi City, Japan: IEEE Computer SocietyPress, 2002:51–58.
    39 X. Yan, J. Han. Gspan: Graph-based Substructure Pattern Mining[C]//Proceedingsof the 2002 IEEE International Conference on Data Mining (ICDM). MaebashiCity, Japan: IEEE Computer Society Press, 2002:721–724.
    40 C. Wang, W. Wang, J. Pei, et al. Scalable Mining of Large Disk-based GraphDatabases[C]//Proceedings of the 10th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD). Seattle, Washington, USA: ACMPress, 2004:316–325.
    41 J. Huan, W. Wang, J. Prins. Efficient Mining of Frequent Subgraphs in the Pres-ence of Isomorphism[C]//Proceedings of the 3rd IEEE International Conference onData Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society Press,2003:549–552.
    42 S. Nijssen, J. N. Kok. A Quickstart in Frequent Structure Mining Can Make aDifference[C]//10th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining. Seattle, WA, USA: ACM Press, 2004:549–552.
    43 X. Yan, J. Han. Closegraph: Mining Closed Frequent Graph Pat-terns[C]//Proceedings of the 9th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD). Washington, DC, USA: ACMPress, 2003:286–295.
    44 M. Wo¨rlein. Extension and Parallelization of a Graph-mining-algorithm[D].Erlangen-Nu¨rnberg, Germany:Friedrich-Alexander-Universita¨t, 2006:1–76.
    45 J. Huan, W. Wang, J. Prins, et al. Spin: Mining Maximal Frequent Subgraphs fromGraph Databases[C]//Proceedings of the 10th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining (KDD). Seattle, Washington,USA: ACM Press, 2004:581–586.
    46 L. T. Thomas, S. R. Valluri, K. Karlapalem. Margin: Maximal Frequent SubgraphMining[C]//Proceedings of the 6th IEEE International Conference on Data Mining(ICDM). Hong Kong, China: IEEE Computer Society Press, 2006:1097–1101.
    47 Y. Liu, J. Li, H. Gao. Summarizing Graph Patterns[C]//Proceedings of the 24th In-ternational Conference on Data Engineering (ICDE). Cancun, Mexico: IEEE Com-puter Society Press, 2008:903–912.
    48 L. B. Holder, D. J. Cook, S. Djoko. Substructures Discovery in the Subdue Sys-tem[C]//Knowledge Discovery in Databases: Papers from the 1994 AAAI Work-shop. Seattle, Washington, USA: AAAI Press, 1994:169–180.
    49 K. Yoshida, H. Motoda, , et al. Graph-based Induction as a Unified Learning Frame-work[J]. Journal of Applied Intelligence (JAI), 1994, 4(3):297–316.
    50 M. Kuramochi, G. Karypis. Finding Frequent Patterns in a Large SparseGraph[C]//Proceedings of the 4th SIAM International Conference on Data Mining(SDM). Lake Buena Vista, Florida, USA: Springer Netherlands, 2004:243–271.
    51 A. Srinivasan, R. D. King, S. Muggleton, et al. On Classification and Regres-sion[C]//Proceedings of the 7th International Workshop on Inductive Logic Pro-gramming (ILP). Prague, Czech Republic: Springer-Verlag, 1997:273–287.
    52 N. S. Ketkar, L. B. Holder, D. J. Cook. Comparison of Graph-based and Logic-based Multi-relational Data Mining[J]. ACM SIGKDD Explorations Newsletter,2005, 7(2):64–71.
    53 A. Inokuchi, T. Washio, H. Motoda. Complete Mining of Frequent Patterns fromGraphs: Mining Graph Data[J]. Machine Learning, 2003, 50(3):321–354.
    54 A. Inokuchi. Mining Generalized Substructures from a Set of LabeledGraphs[C]//Proceedings of the 4th IEEE International Conference on Data Mining(ICDM). Brighton, UK: IEEE Computer Society Press, 2004:415–418.
    55 M. Kuramochi, G. Karypis. An Efficient Algorithm for Discovering FrequentSubgraphs[J]. IEEE Transactions on Knowledge and Data Engineering, 2004,16(9):1038–1051.
    56 G. Ehud, S. E. Shimony, V. Natalia. Discovering Frequent Graph Patterns UsingDisjoint Paths[J]. IEEE Transactions on Knowledge and Data Engineering, 2006,18(11):1441–1456.
    57 C. Borgelt, T. Meinl, M. R. Berthold. Moss: A Program for Molecular SubstructureMining[C]//Proceedings of the 1st International Workshop on Open Source DataMining Software. Chicago, IL, USA: ACM Press, 2005:6–15.
    58 M. J. Zaki. Efficiently Mining Frequent Trees in a Forest[C]//Proceedings of the
    8th ACM SIGKDD International Conference on Knowledge Discovery and DataMining (KDD). Edmonton, Alberta, Canada: ACM Press, 2002:71–80.
    59 Y. Chi, Y. Yang, R. R. Muntz. Indexing and Mining Free Trees[C]//Proceedingsof the 3rd IEEE International Conference on Data Mining (ICDM). Melbourne,Florida, USA: IEEE Computer Society Press, 2003:509–512.
    60 D. Shasha, J. T.-L. Wang, S. Zhang. Unordered Tree Mining with Applications toPhylogeny[C]//Proceedings of the 20th International Conference on Data Engineer-ing (ICDE). Boston, USA: IEEE Computer Society Press, 2004:708–719.
    61 A. Termier, M.-C. Rousset, M. Sebag. Dryade: A New Approach for DiscoveringClosed Frequent Trees in Heterogeneous Tree Databases[C]//Proceedings of the4th IEEE International Conference on Data Mining (ICDM). Brighton, UK: IEEEComputer Society Press, 2004:543–546.
    62 R. Jin, C. J. Wang, D. Polshakov, et al. Discovering Frequent Topological Struc-tures from Graph Datasets[C]//Proceedings of the 11th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining (KDD). Chicago,Illinois, USA: ACM Press, 2005:606–611.
    63 T. Horvath, P. Ramon, S. Wrobel. Frequent Subgraph Mining in OuterplanarGraphs[C]//Proceedings of the 12th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD). Philadelphia, PA, USA: ACMPress, 2006:797–802.
    64 Y. Ke, J. Cheng, W. Ng. Correlation Search in Graph Databases[C]//Proceedingsof the 13th ACM SIGKDD International Conference on Knowledge Discovery andData Mining (KDD). San Jose, California, USA: ACM Press, 2007:390–399.
    65 Y. Ke, J. Cheng, W. Ng. Efficient Correlation Search from Graph Databases[J].IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12):1601–1615.
    66 S. Raghavan, H. Garcia-Molina. Representing Web Graphs[C]//Proceedings ofthe 19th International Conference on Data Engineering (ICDE). Bangalore, India:IEEE Computer Society Press, 2003:405–416.
    67周水庚,蔚赵春,蒋豪良.图结构数据搜索的概念、问题与进展[J].中国计算机学会通讯, 2007, 3(8):59–65.
    68 R. Saito, H. Suzuki, Y. Hayashizaki. Interaction Generality, a Measurement toAssess the Reliability of a Protein-protein Interaction[J]. Nucleic Acids Research,2002, 30(5):1163–1168.
    69 Z. Zou, J. Li, H. Gao, et al. Frequent Subgraph Pattern Mining on Uncertain GraphData[C]//Proceeding of the 18th ACM conference on Information and knowledgemanagement (CIKM). Hong Kong, China: ACM Press, 2009:583–592.
    70邹兆年,李建中,高宏,等.从不确定图中挖掘频繁子图模式[J].软件学报,2009, 20(11):2965–2976.
    71 N. N. Dalvi, D. Suciu. Efficient Query Evaluation on ProbabilisticDatabases[C]//Proceedings of the 30th International Conference on Very LargeData Bases (VLDB). Toronto, Canada: Morgan Kaufmann, 2004:864–875.
    72 N. N. Dalvi, D. Suciu. Efficient Query Evaluation on Probabilistic Databases[J].VLDB J., 2007, 16(4):523–544.
    73 C. Re, N. N. Dalvi, D. Suciu. Efficient Top-k Query Evaluation on ProbabilisticData[C]//Proceedings of the 24th International Conference on Data Engineering(ICDE). Istanbul, Turkey: IEEE Computer Society Press, 2007:886–895.
    74 M. A. Soliman, I. F. Ilyas, K. C.-C. Chang. Top-k Query Processing in UncertainDatabases[C]//Proceedings of the 24th International Conference on Data Engineer-ing (ICDE). Istanbul, Turkey: IEEE Computer Society Press, 2007:896–905.
    75 X. Lian, L. Chen. Probabilistic Ranked Queries in UncertainDatabases[C]//Proceedings of the 11th International Conference on Extend-ing Database Technology (EDBT). Nantes, France: ACM Press, 2008:511–522.
    76 M. Hua, J. Pei, W. Zhang, et al. Ranking Queries on Uncertain Data: A ProbabilisticThreshold Approach[C]//Proceedings of the ACM SIGMOD International Confer-ence on Management of Data. Vancouver, BC, Canada: ACM Press, 2008:673–686.
    77 G. Cormode, F. Li, K. Yi. Semantics of Ranking Queries for Probabilistic Dataand Expected Ranks[C]//Proceedings of the 25th International Conference on DataEngineering (ICDE). Shanghai, China: IEEE Computer Society Press, 2009:305–316.
    78 M. A. Soliman, I. F. Ilyas. Ranking with Uncertain Scores[C]//Proceedings ofthe 25th International Conference on Data Engineering (ICDE). Shanghai, China:IEEE Computer Society Press, 2009:317–328.
    79 J. Pei, B. Jiang, X. Lin, et al. Probabilistic Skylines on UncertainData[C]//Proceedings of the 33rd International Conference on Very Large DataBases (VLDB). University of Vienna, Austria: ACM Press, 2007:15–26.
    80 G. Beskales, M. A. Soliman, I. F. Ilyas. Efficient Search for the Top-k ProbableNearest Neighbors in Uncertain Databases[J]. Proceedings of the VLDB Endow-ment, 2008, 1(1):326–339.
    81 R. Cheng, L. Chen, J. Chen, et al. Evaluating Probability Threshold K-nearest-neighbor Queries Over Uncertain Data[C]//Proceedings of the 12th InternationalConference on Extending Database Technology (EDBT). Saint Petersburg, Russia:ACM Press, 2009:672–683.
    82 B. Kimelfeld, Y. Kosharovsky, Y. Sagiv. Query Efficiency in Probabilistic XmlModels[C]//Proceedings of the ACM SIGMOD International Conference on Man-agement of Data. Vancouver, BC, Canada: ACM Press, 2008:701–714.
    83 L. Chang, J. X. Yu, L. Qin. Query Ranking in Probabilistic XmlData[C]//Proceedings of the 12th International Conference on Extending DatabaseTechnology (EDBT). Saint Petersburg, Russia: ACM Press, 2009:156–167.
    84 Y. Tian, R. C. McEachin, C. Santos, et al. Saga: A Subgraph Matching Tool forBiological Graphs[J]. Bioinformatics, 2007, 23(2):232–239.
    85 Y. Tian, J. M. Patel. Tale: A Tool for Approximate Large Graph Match-ing[C]//Proceedings of the 24th International Conference on Data Engineering(ICDE). Cancun, Mexico: IEEE Computer Society Press, 2008:963–972.
    86 S. Zhang, S. Li, J. Yang. Gaddi: Distance Index Based Subgraph Matchingin Biological Networks[C]//Proceedings of the 12th International Conference onExtending Database Technology (EDBT). Saint Petersburg, Russia: ACM Press,2009:192–203.
    87 L. Zou, L. Chen, Y. Lu. Top-k Subgraph Matching Query in a LargeGraph[C]//Proceedings of the First Ph.D. Workshop in the 16th ACM Conferenceon Information and Knowledge Management (CIKM). Lisbon, Portugal: ACMPress, 2007:139–146.
    88 G. Gou, R. Chirkova. Efficient Algorithms for Exact Ranked Twig-pattern Match-ing Over Graphs[C]//Proceedings of the ACM SIGMOD International Conferenceon Management of Data. Vancouver, BC, Canada: ACM Press, 2008:581–594.
    89 J. Cheng, J. X. Yu, B. Ding, et al. Fast Graph Pattern Matching[C]//Proceedings ofthe 24th International Conference on Data Engineering (ICDE). Cancun, Mexico:IEEE Computer Society Press, 2008:913–922.
    90 H. Wang, J. Li, J. Luo, et al. Hash-base Subgraph Query Processing Method forGraph-structured Xml Documents[J]. Proceedings of the VLDB Endowment, 2008,1(1):478–489.
    91 L. Zou, L. Chen, M. T. O¨zsu. Distance-join: Pattern Match Query in a Large GraphDatabase[J]. Proceedings of the VLDB Endowment, 2009, 2(1):886–897.
    92 A. Ruepp, A. Zollner, D. Maier, et al. The Funcat, a Functional Annotation Schemefor Systematic Classification of Proteins from Whole Genomes[J]. Nucleic AcidsResearch, 2004, 32(18):5539–5545.
    93 The Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biol-ogy[J]. Nature genetics, 2000, 25(1):25–29.
    94 A. Cakmak, G. Ozsoyoglu. Taxonomy-superimposed Graph Min-ing[C]//Proceedings of the 11th International Conference on Extending DatabaseTechnology (EDBT). Nantes, France: ACM Press, 2008:217–228.
    95 C. Chen, X. Yan, F. Zhu, et al. Graph Olap: Towards Online Analytical Process-ing on Graphs[C]//Proceedings of the 8th IEEE International Conference on DataMining (ICDM). Pisa, Italy: IEEE Computer Society Press, 2008:103–112.– 136–
    96 R. Milo, S. Shen-Orr, S. Itzkovitz, et al. Network Motifs: Simple Building Blocksof Complex Networks[J]. Science, 2002, 298(5594):824–827.
    97 J. Leskovec, J. M. Kleinberg, C. Faloutsos. Graphs Over Time: DensificationLaws, Shrinking Diameters and Possible Explanations[C]//Proceedings of the 11thACM SIGKDD International Conference on Knowledge Discovery and Data Min-ing (KDD). Chicago, Illinois, USA: ACM Press, 2005:177–187.
    98 J. Leskovec, K. J. Lang, A. Dasgupta, et al. Statistical Properties of CommunityStructure in Large Social and Information Networks[C]//Proceedings of the 17th In-ternational Conference on World Wide Web (WWW). Beijing, China: ACM Press,2008:695–704.
    99 H. Shang, K. Zhu, X. Lin, et al. Similarity Search on Supergraph Contain-ment[C]//Proceedings of the 26th International Conference on Data Engineering(ICDE). Long Beach, California, USA: IEEE Computer Society Press, 2010:637–648.
    100 Y. Ke, J. Cheng, J. X. Yu. Querying Large Graph Databases[C]//Database Systemsfor Advanced Applications, 15th International Conference (DASFAA). Tsukuba,Japan: Springer, 2010:487–488.
    101 L. Zou, L. Chen, J. X. Yu, et al. A Novel Spectral Coding in a Large GraphDatabase[C]//Proceedings of the 11th International Conference on ExtendingDatabase Technology (EDBT). Nantes, France: ACM Press, 2008:181–192.
    102 D. K. Agrafiotis, D. Bandyopadhyay, J. K. Wegner, et al. Recent Advancesin Chemoinformatics[J]. Journal of Chemical Information and Modeling, 2007,47(4):1279–1293.
    103 L. P. Cordella, P. Foggia, C. Sansone, et al. Fast Graph Matching for DetectingCad Image Components[C]//Proceedings of the International Conference on PatternRecognition. Barcelona, Spain: IEEE Computer Society Press, 2000:6034–6037.
    104 A. K. Gupta, D. Suciu. Stream Processing of Xpath Queries with Predi-cates[C]//Proceedings of the 2003 ACM SIGMOD International Conference onManagement of Data. San Diego, California, USA: ACM Press, 2003:419–430.
    105 P. Bohannon, W. Fan, M. Flaster, et al. Information Preserving Xml Schema Em-bedding[C]//Proceedings of the 31st International Conference on Very Large DataBases (VLDB). Trondheim, Norway: ACM Press, 2005:85–96.
    106 R. M. Karp. Reducibility Among Combinatorial Problems[M]. Plenum Press, 1972.
    107周傲英,金澈清,王国仁,等.不确定性数据管理技术研究综述[J].计算机学报, 2009, 32(1):1–16.
    108李建中,于戈,周傲英.不确定性数据管理的要求与挑战[J].中国计算机学会通讯, 2009, 5(4):6–14.
    109高宏,张炜.不确定图数据管理研究现状[J].中国计算机学会通讯, 2009,5(4):31–36.
    110 J. Wang, W. Hsu, M. L. Lee, et al. A Partition-based Approach to Graph Min-ing[C]//Proceedings of the 22nd International Conference on Data Engineering(ICDE). Atlanta, Georgia, USA: IEEE Computer Society Press, 2006:74–83.
    111 Z. Zeng, J. Wang, J. Zhang, et al. Fogger: An Algorithm for Graph Genera-tor Discovery[C]//Proceedings of the 12th International Conference on ExtendingDatabase Technology (EDBT). Saint Petersburg, Russia: ACM Press, 2009:517–528.
    112 J. Pei, D. Jiang, A. Zhang. On Mining Cross-graph Quasi-cliques[C]//Proceedingsof the 11th ACM SIGKDD International Conference on Knowledge Discovery andData Mining (KDD). Chicago, Illinois, USA: ACM Press, 2005:228–238.
    113 J. Wang, Z. Zeng, L. Zhou. Clan: An Algorithm for Mining Closed Cliques fromLarge Dense Graph Databases[C]//Proceedings of the 22nd International Confer-ence on Data Engineering (ICDE). Atlanta, Georgia, USA: IEEE Computer SocietyPress, 2006:73–82.
    114 Z. Zeng, J. Wang, L. Zhou, et al. Coherent Closed Quasi-clique Discovery fromLarge Dense Graph Databases[C]//Proceedings of the 12th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining (KDD). Philadelphia,PA, USA: ACM Press, 2006:797–802.
    115 Z. Zeng, J. Wang, L. Zhou, et al. Out-of-core Coherent Closed Quasi-clique Miningfrom Large Dense Graph Databases[J]. ACM Transactions on Database Systems(TODS), 2007, 32(2):228–267.
    116 National Library of Medicine. http://chem.sis.nlm.nih.gov/chemidplus/.
    117 A. Srinivasan, R. D. King, S. Muggleton, et al. The Predictive Toxicology Evalu-ation Challenge[C]//Proceedings of the Fifteenth International Joint Conference onArtificial Intelligence (IJCAI). Nagoya, Japan: Morgan Kaufmann, 1997:4–9.
    118 J. D. Westbrook, Z. Feng, L. Chen, et al. The Protein Data Bank and StructuralGenomics[J]. Nucleic Acids Research, 2003, 31(1):489–491.
    119 M. Kanehisa, S. Goto. Kegg: Kyoto Encyclopedia of Genes and Genomes[J]. Nu-cleic Acids Research, 2000, 28(1):27–30.
    120 S. Asthana, O. D. King, F. D. Gibbons, et al. Predicting Protein Complex Mem-bership Using Probabilistic Network Reliability[J]. Genome Research, 2004,14(6):1170–1175. http://dx.doi.org/10.1101/gr.2203804.
    121 K.-S. Fu. A Step Towards Unification of Syntactic and Statistical Pattern Recog-nition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986,8(3):398–404.
    122 A. K. C. Wong, S. W. Lu, M. Rioux. Recognition and Shape Synthesis of 3-d Ob-jects Based on Attributed Hypergraphs[J]. IEEE Transactions on Pattern Analysisand Machine Intelligence, 1989, 11(3):279–290.
    123 S. Berretti, A. D. Bimbo, E. Vicario. Efficient Matching and Indexing of GraphModels in Content-based Retrieval[J]. IEEE Trans. Pattern Anal. Mach. Intell.,2001, 23(10):1089–1105.
    124 E. K. Wong. Model Matching in Robot Vision by Subgraph Isomorphism[J]. Pat-tern Recognition, 1992, 25(3):287–303.
    125 The International Network for Social Network Analysis. http://www.insna.org/.
    126 D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Net-works[J]. 2002, 296(5571):1302–1305. http://arxiv.org/abs/cond-mat/0205383.
    127 L. P. Cordella, P. Foggia, C. Sansone, et al. Subgraph Transformations for theInexact Matching of Attributed Relational Graphs[J]. Computing (Supplement),1998, 12:43–52.
    128 A. T. Balaban, T.-S. Balaban. New Vertex Invariants and Topological Indices ofChemical Graphs Based on Information on Distances[J]. Journal of MathematicalChemistry, 1991.
    129 H. He, A. K. Singh. Graphs-at-a-time: Query Language and Access Methods forGraph Databases[C]//Proceedings of the ACM SIGMOD International Conferenceon Management of Data. Vancouver, BC, Canada: ACM Press, 2008:405–418.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700