图概要技术研究进展
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Progress and Challenges of Graph Summarization Techniques
  • 作者:王雄 ; 董一鸿 ; 施炜杰 ; 潘剑飞
  • 英文作者:Wang Xiong;Dong Yihong;Shi Weijie;Pan Jianfei;Faculty of Electrical Engineering and Computer Science, Ningbo University;Baidu Online Network Technology Company;
  • 关键词:综述 ; 图概要 ; 图聚集 ; 图概化 ; 图压缩 ; 可视化
  • 英文关键词:survey;;graph summarization;;graph aggregation;;graph synopsis;;graph compression;;visualization
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:宁波大学信息科学与工程学院;百度在线网络技术有限公司;
  • 出版日期:2019-06-15
  • 出版单位:计算机研究与发展
  • 年:2019
  • 期:v.56
  • 基金:国家自然科学基金项目(61572266);; 浙江省自然科学基金项目(LY16F020003);; 宁波市自然科学基金项目(2017A610114)~~
  • 语种:中文;
  • 页:JFYZ201906021
  • 页数:18
  • CN:06
  • ISSN:11-1777/TP
  • 分类号:208-225
摘要
图的概要化,简称图概要,旨在寻找一组简洁的超图或稀疏图,阐明原始图的主要结构信息或变化趋势.当前图概要的研究大多结合原始图的应用领域和背景,使用不同的概要技术构建一个特定的概要图,解决目前大图面临的信息过载、查询优化、空间压缩、影响分析、社交网络可视化等问题.对现有的图概要技术进行了汇总,以概要主要目的作为分类标准划分为基于空间压缩的图概要、基于查询优化的图概要、基于模式可视化的图概要和基于影响分析的图概要四大类,针对部分属性图和无属性图概要算法在真实数据集上进行了相关实验,并从压缩率、信息保持率、信息熵和时间进行对比分析.点明图概要的发展趋势,并指出图概要面临的挑战和可深入探索的研究方向,结合热门的深度学习技术提出了部分有价值的的宏观想法用以解决当前挑战.
        Graph summarization aims to search a group of simple hypergraphs or sparse graphs, which illustrate the main structural information or change trend of the original graph. Based on the application field and background of original graph, different graph summarization techniques are used to construct a specific summary graph, which can solve the problems of information overload, query optimization, spatial compression, impact analysis, social network visualization and so on. According to the classification criteria of the main purpose of the summary, the existing graph summarization techniques are divided into four categories: the graph summarization based on spatial compression, the graph summarization based on query optimization, the graph summarization based on pattern visualization and the graph summarization based on impact analysis. The partial graph summarization algorithms of non-attribute graphs and attribute graphs are tested on real data sets to analyze the indexes of information retention rate, compression rate, information entropy and running time experimentally. At last, not only the development trends of the graph summarization are highlighted, but also the challenges and the future research directions that can be explored in depth are pointed out. Combining with the popular deep learning technology, some valuable and potential Macro coutermeasures are put forward to solve these challenges.
引文
[1]Cheng Xueqi,Jin Xiaolong,Wang Yuanzhuo,et al.Survey on big data system and analytic technology[J].Journal of Software,2014,25(9):1889- 1908 (in Chinese)(程学旗,靳小龙,王元卓,等.大数据系统和分析技术综述[J].软件学报,2014,25(9):1889- 1908)
    [2]Zhang Yu,Liu Yanbing,Xiong Gang,et al.Survey on succinct representation of graph data[J].Journal of Software,2014,25(9):1937- 1952 (in Chinese)(张宇,刘燕兵,熊刚,等.图数据表示与压缩技术综述[J].软件学报,2014,25(9):1937- 1952)
    [3]Bonchi F,Castillo C,Gionis A,et al.Social network analysis and mining for business applications[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):No.22
    [4]Kistler M,Perrone M,Petrini F.Cell multiprocessor communication network:Built for speed[J].IEEE Micro,2006,26(3):10- 23
    [5]Newman M E.The structure of scientific collaboration networks[J].Proceedings of the National Academy of Sciences of the United States of America,2000,98(2):404- 409
    [6]Crovella M E,Bestavros A.Self-similarity in world wide Web traffic:Evidence and possible causes[J].ACM SIGMETRICS Performance Evaluation Review,1996,24(1):160- 169
    [7]Holm D B,Eriksson K,Johanson J.Business networks and cooperation in international business relationships[J].Journal of International Business Studies,1996,27(5):1033- 1053
    [8]Nagurney A.A multiclass,multicriteria traffic network equilibrium model[J].Mathematical & Computer Modelling,2000,32(3):393- 411
    [9]Ruan Tong,Feng Donglei,Li Jing.Research on information filtering model based on Bayesian network[J].Journal of Computer Research and Development,2002,39(12):1564- 1571 (in Chinese)(阮彤,冯东雷,李京.基于贝叶斯网络的信息过滤模型研究[J].计算机研究与发展,2002,39(12):1564- 1571)
    [10]Tan Lu,Jiang Lu.Systems biology and biological networks[J].Complex Systems and Complexity Science,2005,2(4):1- 9 (in Chinese)(谭璐,姜璐.系统生物学与生物网络研究[J].复杂系统与复杂性科学,2005,2(4):1- 9)
    [11]Shen Jinping.Statistical report on Internet development in China[OL].[2018-12-03].http://www.techweb.com.cn/internet/2017-01-22/2478300.shtml(沈金萍.中国互联网络发展状况统计报告[OL].[2018-12-03].http://www.techweb.com.cn/internet/2017-01-22/2478300.shtml)
    [12]Tang Jie,Shen Huawei.Review and prospect of social network computing[OL].[2018-12-03].http://www.360doc.com/content/17/1122/10/49955868_706086795.shtml(唐杰,沈华伟.社会网络计算的回顾与展望[OL].[2018-12-03].http://www.360doc.com/content/17/1122/10/49955868_706086795.shtml)
    [13]Seyyedi S H,Minaei-Bidgoli B.Estimator learning automata for feature subset selection in high-dimensional spaces,case study:Email spam detection[J].International Journal of Communication Systems,2018,31(8):e3541
    [14]Newman M E,Forrest S,Balthrop J.Email networks and the spread of computer viruses[J].Physical Review E Statistical Nonlinear & Soft Matter Physics,2002,66(3):035101
    [15]Liu Yike,Safavi T,Dighe A,et al.Graph summarization methods and applications:A survey[J].ACM Computing Surveys,2018,51(3):62- 64
    [16]Aggarwal C,Wang H.Graph data management and mining:A survey of algorithms and applications[M] //Managing and Mining Graph Data.Berlin:Springer,2010:13- 68
    [17]Pan Qiuping,You Jinguo,Zhang Zhipeng,et al.Progress and challenges of graph aggregation and summarization techniques[J].Journal of Software,2015,26(1):167- 177 (in Chinese)(潘秋萍,游进国,张志朋.等.图聚集技术的现状与挑战[J].软件学报,2015,26(1):167- 177)
    [18]Shneiderman B,Dunne C.Interactive network exploration to derive insights:Filtering,clustering,grouping,and simplification[C] //Proc of the 20th Int Conf on Graph Drawing.Berlin:Springer,2012:2- 18
    [19]Escolano F,Curado M,Biasotti S,et al.Shape simplification through graph sparsification[C] //Proc of the 11th Int Workshop on Graph-Based Representations in Pattern Recognition.Berlin:Springer,2017:13- 22
    [20]Ahmed N K,Duffield N,Neville J,et al.Graph sample and hold:A framework for big-graph analytics[C] //Proc of the 20th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2014:1446- 1455
    [21]Kong Xiangnan,Wei Fan,Philip S.Dual active feature and sample selection for graph classification[C] //Proc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2011:654- 662
    [22]Hao Yin,Benson A R,Leskovec J,et al.Local higher-order graph clustering[C] //Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2017:555- 564
    [23]Schaeffer S E.Survey:Graph clustering[J].Computer Science Review,2007,1(1):27- 64
    [24]Abbe E.Graph compression:The effect of clusters[C] //Proc of the 54th Annual Allerton Conf on Communication,Control,and Computing.Piscataway,NJ:IEEE,2016:1- 8
    [25]Zhao Danfeng,Zhang Yiyi,Lin Junchen,et al.Query of marine big data based on graph compression and views[C] //Proc of the 2018 Int Conf on Data Mining Workshops (ICDMW).Piscataway,NJ:IEEE,2018:252- 257
    [26]Lin Shoude,Yeh M Y,Li Chengte.Sampling and summarization for social networks[OL].[2018-12-03].https://mslab.csie.ntu.edu.tw/tut-pakdd13/samsum-pakdd13.pdf
    [27]Damnjanovic A,Montojo J,Wei Yongbin,et al.A survey on 3GPP heterogeneous networks[J].Wireless Communications IEEE,2011,18(3):10- 21
    [28]Toivonen H,Zhou F,Hartikainen A,et al.Compression of weighted graphs[C] //Proc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2011:965- 973
    [29]Dhillon I,Guan Yuqiang,Kulis B.A fast kernel-based multilevel algorithm for graph clustering[C] //Proc of the 11th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2005:629- 634
    [30]Stella X Y,Shi Jianbo.Multiclass spectral clustering[C] //Proc of the 9th IEEE Int Conf on Computer Vision.Los Alamitos:IEEE Computer Society,2003:313- 326
    [31]Krishna K,Murty M N.Genetic K-means algorithm[J].IEEE Transactions on Systems,1999,29(3):433- 439
    [32]Zhu Linhong,Ghasemi-Gol M,Szekely P,et al.Unsupervised entity resolution on multi-type graphs[C] //Proc of the Int Semantic Web Conf.Berlin:Springer,2016:649- 667
    [33]Barron A,Rissanen J,Yu Bin.The minimum description length principle in coding and modeling[J].IEEE Transactions on Information Theory,1998,44(6):2743- 2760
    [34]Navlakha S,Rastogi R,Shrivastava N.Graph summarization with bounded error[C] //Proc of the 2008 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2008:419- 432
    [35]Khan K U,Nawaz W,Lee Y K.An efficient algorithm for MDL based graph summarization for dense graphs[J].Contemporary Engineering Sciences,2014,7(16):791- 796
    [36]Lim Y,Kang U,Faloutsos C.SlashBurn:Graph compression and mining beyond caveman communities[J].IEEE Transactions on Knowledge & Data Engineering,2014,26(12):3077- 3089
    [37]Maccioni A,Abadi D J.Scalable pattern matching over compressed graphs via dedensification[C] //Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2016:1755- 1764
    [38]Khan K U,Nawaz W,Lee Y K.Set-based unified approach for attributed graph summarization[C] //Proc of the 4th IEEE Int Conf on Big Data and Cloud Computing.Piscataway,NJ:IEEE,2015:378- 385
    [39]Dong Wei,Wang Zhe,Josephson W,et al.Modeling LSH for performance tuning[C] //Proc of the 17th ACM Conf on Information and Knowledge Management.New York:ACM,2008:669- 678
    [40]Wu Ye,Zhong Zhinong,Xiong Wei,et al.Graph summarization for attributed graphs[C] //Proc of the 2014 Int Conf on Information Science,Electronics and Electrical Engineering.Piscataway,NJ:IEEE,2014:503- 507
    [41]Liu Zheng,Yu Xu,Cheng Hong.Approximate homogeneous graph summarization[J].Journal of Information Processing,2012,20(1):77- 88
    [42]Seo H,Park K,Han Y,et al.An effective graph summarization and compression technique for a large-scaled graph[J].Journal of Supercomputing,2018,10(12):1- 15
    [43]LeFevre K,Terzi E.GraSS:Graph structure summarization[C] //Proc of the 2010 SIAM Int Conf on Data Mining.Philadelphia,PA:Society for Industrial and Applied Mathematics,2010:454- 465
    [44]Riondato M,García-Soriano D,Bonchi F.Graph summarization with quality guarantees[J].Data Mining and Knowledge Discovery,2017,31(2):314- 349
    [45]Tian Yuanyuan,Hankins R A,Patel J M.Efficient aggregation for graph summarization[C] //Proc of the 2008 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2008:567- 580
    [46]Zhang Ning,Tian Yuanyuan,Patel J M.Discovery-driven graph summarization[C] //Proc of the 26th Int Conf on Data Engineering.Piscataway,NJ:IEEE,2010:880- 891
    [47]Shoaran M,Thomo A,Weber-Jahnke J H.Zero-knowledge private graph summarization[C] //Proc of the 2013 IEEE Int Conf on Big Data.Piscataway,NJ:IEEE,2013:597- 605
    [48]Miao Qiguang,Wang Baoshu.Image fusion method based on improved Laplace pyramid transform[J].Journal of Optics,2007,27(9):1605- 1610 (in Chinese) (苗启广,王宝树.基于改进的拉普拉斯金字塔变换的图像融合方法[J].光学学报,2007,27(9):1605- 1610)
    [49]Hassanlou N,Shoaran M,Thomo A.Probabilistic graph summarization[C] //Proc of the 14th Int Conf on Web-Age Information Management.Berlin:Springer,2013:545- 556
    [50]Shrivastava A,Li Ping.In defense of minhash over SimHash[J].arXiv preprint arXiv:1407.4416,2014
    [51]Cormode G,Muthukrishnan S.An improved data stream summary:The count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58- 75
    [52]Cohen E,Kaplan H.Tighter estimation using bottom k sketches[J].Proceedings of the VLDB Endowment,2008,1(1):213- 224
    [53]Zhao Peixiang,Aggarwal C C,Wang Min.gSketch:On query estimation in graph streams[J].Proceedings of the VLDB Endowment,2011,5(3):193- 204
    [54]Tang Nan,Chen Qing,Mitra P.On summarizing graph streams[J].arXiv preprint arXiv:1510.02219,2015
    [55]Bandyopadhyay B,Fuhry D,Chakrabarti A,et al.Topological graph sketching for incremental and scalable analytics[C] //Proc of the 29th ACM Int on Conf on Information and Knowledge Management.New York:ACM,2016:1231- 1240
    [56]Lee B,Plaisant C,Parr C S,et al.Task taxonomy for graph visualization[C] //Proc of the 2006 AVI Workshop on Beyond Time and Errors:Novel Evaluation Methods for Information Visualization.New York:ACM,2006:1- 5
    [57]Shen Zeqian,Ma K L,Eliassiard T,et al.Visual analysis of large heterogeneous social networks by semantic and structural abstraction[J].IEEE Transactions on Visuali-zation and Computer Graphics,2006,12(6):1427- 1439
    [58]Barthelemy M,Chow E,Eliassirad T.Knowledge representation issues in semantic graphs for relationship detection[C] //Proc of the 2005 AI Technologies for Homeland Security.Menlo Park:AAAI,2005:91- 98
    [59]Li Chengte,Lin Shoude.Egocentric information abstraction for heterogeneous social networks[C] //Proc of the 2009 Int Conf on Advances in Social Network Analysis and Mining.Piscataway,NJ:IEEE,2009:255- 260
    [60]Buehrer G,Chellapilla K.A scalable pattern mining approach to Web graph compression with communities[C] //Proc of the 2008 Int Conf on Web Search and Data Mining.New York:ACM,2008:95- 106
    [61]Dunne C,Shneiderman B.Motif simplification:Improving network visualization readability with fan,connector,and clique glyphs[C] //Proc of the SIGCHI Conf on Human Factors in Computing Systems.New York:ACM,2013:3247- 3256
    [62]Koutra D,Kang U,Vreeken J,et al.Summarizing and understanding large graphs[J].Statistical Analysis & Data Mining,2015,8(3):183- 202
    [63]Shah N,Koutra D,Zou Tianmin,et al.TimeCrunch:Interpretable dynamic graph summarization[C] //Proc of the 21st ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2015:1055- 1064
    [64]Qu Qiang,Liu Siyuan,Jensen C S,et al.Interestingness-driven diffusion process summarization in dynamic networks[M] //Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2014:597- 613
    [65]Lin Yuru,Sundaram H,Kelliher A.Summarization of social activity over time:People,actions and concepts in dynamic networks[C] //Proc of the 17th ACM Conf on Information and Knowledge Management.New York:ACM,2008:1379- 1380
    [66]Padilla P,Lopez M,Gorriz J M,et al.NMF-SVM based CAD tool applied to functional brain images for the diagnosis of Alzheimer's disease[J].IEEE Transactions on Medical Imaging,2012,31(2):207- 216
    [67]Purohit M,Prakash B A,Kang C,et al.Fast influence-based coarsening for large networks[C] //Proc of the 20th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2014:1296- 1305
    [68]Yu Jin,JaJa J F.Network summarization with preserved spectral properties[J].arXiv preprint arXiv:1802.04447,2018
    [69]Lei Shi,Huang Tongtong,Tang Jie,et al.VEGAS:Visual influence graph summarization on citation networks[J].IEEE Transactions on Knowledge & Data Engineering,2015,27(12):3417- 3431
    [70]Maserrat H,Jian Pei.Neighbor query friendly compression of social networks[C] //Proc of the 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2010:533- 542
    [71]Yongwook C,Wojciech S.Compression of graphical structures:Fundamental limits,algorithms,and experiments[J].IEEE Transactions on Information Theory,2012,58(2):620- 638
    [72]Boldi P,Vigna S.The webgraph framework I:Compression techniques[C] //Proc of the 13th Int Conf on World Wide Web.New York:ACM,2004:595- 602
    [73]Kang U,Tong Hanghang,Sun Jimeng,et al.Gbase:A scalable and general graph management system[C] //Proc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2011:1091- 1099
    [74]Zhou Yang,Cheng Hong,Yuxu.Clustering large attributed graphs:An efficient incremental approach[C] //Proc of the 10th IEEE Int Conf on Data Mining.Piscataway,NJ:IEEE,2011:689- 698
    [75]Liu Xingjie,Tian Yuanyuan,He Qi,et al.Distributed graph summarization[C] //Proc of the 23rd ACM Int Conf on Information and Knowledge Management.New York:ACM,2014:799- 808
    [76]Kalavri V,Simas T,Logothetis D.The shortest path is not always a straight line:Leveraging semimetricity in graph analysis[J].Proceedings of the VLDB Endowment,2016,9(9):672- 683
    [77]Li Ronglu,Wang Jianhui,Chen Xiaoyun,et.al.Chinese text classification based on maximum entropy model[J].Journal of Computer Research and Development,2005,42(1):94- 101(李荣陆,王建会,陈晓云,等.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94- 101)
    [78]Ley M.The DBLP computer science bibliography:Evolution,research issues,perspectives[C] //Proc of the 9th Int Symp on String Processing and Information Retrieval.Berlin:Springer,2002:1- 10
    [79]Mehmood Y,Barbieri N,Bonchi F,et al.CSI:Community-level social influence analysis[C] //Proc of the 2013 Joint European Conf on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2013:48- 63
    [80]You Jinguo,Pan Qiuping,Shi Wei,et al.Towards graph summary and aggregation:A survey[G] //Social Media Retrieval and Mining.Berlin:Springer,2013:3- 12
    [81]Pouriyeh S,Allahyari M,Liu Qingxia,et al.Graph-based ontology summarization:A survey[J].arXiv preprint arXiv:1805.06051,2018
    [82]Fei Hao,Park D S.Sketch:A novel framework for capturing cliques from big graph[J].The Journal of Supercomputing,2018,74(3):1202- 1214
    [83]Cormode G,Cormode G,Muthukrishnan S.Summarizing and mining skewed data streams[C] //Proc of the 2005 SIAM Int Conf on Data Mining.Philadelphia,PA:SIAM,2005:44- 55
    [84]Yu Xiaodong,Chawla N V,Swami A.metapath2vec:Scalable representation learning for heterogeneous networks[C] //Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2017:135- 144
    [85]Kong Xiangnan,Philip S Y.Graph classification in heterogeneous networks[G] //Encyclopedia of Social Network Analysis and Mining.Berlin:Springer,2014:641- 648
    [86]Khan A,Bhowmick S,Bonchi F,et al.Summarizing static and dynamic big graphs[J].Proceedings of the VLDB Endowment,2017,10(12):1981- 1984
    [87]Chuan Shi,Hu Binbin,Xin Zhao,et al.Heterogeneous information network embedding for recommendation[J].arXiv preprint arXiv:1711.10730,2017
    [88]Lecun Y,Bengio Y,Hinton G.Deep learning[J].Nature,2015,521(7553):436- 444

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700