Graph summarization with quality guarantees
详细信息    查看全文
  • 作者:Matteo Riondato ; David García-Soriano…
  • 关键词:Graph analysis ; Cut ; norm ; Approximation algorithms ; Summarization ; Regularity lemma
  • 刊名:Data Mining and Knowledge Discovery
  • 出版年:2017
  • 出版时间:March 2017
  • 年:2017
  • 卷:31
  • 期:2
  • 页码:314-349
  • 全文大小:
  • 刊物类别:Computer Science
  • 刊物主题:Data Mining and Knowledge Discovery; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Statistics for Engineering, Physics, Computer Science, Chemistry and Earth Sciences;
  • 出版者:Springer US
  • ISSN:1573-756X
  • 卷排序:31
文摘
We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation. In this work we study a very natural type of summary: the original set of vertices is partitioned into a small number of supernodes connected by superedges to form a complete weighted graph. The superedge weights are the edge densities between vertices in the corresponding supernodes. To quantify the dissimilarity between the original graph and a summary, we adopt the reconstruction error and the cut-norm error. By exposing a connection between graph summarization and geometric clustering problems (i.e., k-means and k-median), we develop the first polynomial-time approximation algorithms to compute the best possible summary of a certain size under both measures. We discuss how to use our summaries to store a (lossy or lossless) compressed graph representation and to approximately answer a large class of queries about the original graph, including adjacency, degree, eigenvector centrality, and triangle and subgraph counting. Using the summary to answer queries is very efficient as the running time to compute the answer depends on the number of supernodes in the summary, rather than the number of nodes in the original graph.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700