大图上的SuperSimRank近似计算方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Accuracy Estimate and Optimization Techniques for SuperSimRank Computation on Massive Graphs
  • 作者:张应龙 ; 夏学文 ; 余鹰 ; 邓志刚
  • 英文作者:ZHANG Ying-long;XIA Xue-wen;YU Ying;DENG Zhi-gang;School of Physics and Information Engineering,Minnan Normal University;School of Software,East China Jiaotong University;
  • 关键词:SuperSimRank ; 节点相似度 ; 大图
  • 英文关键词:SuperSimRank;;nodes similarity;;big graph
  • 中文刊名:DZXU
  • 英文刊名:Acta Electronica Sinica
  • 机构:闽南师范大学物理与信息工程学院;华东交通大学软件学院;
  • 出版日期:2019-07-15
  • 出版单位:电子学报
  • 年:2019
  • 期:v.47;No.437
  • 基金:国家自然科学基金(No.61762036,No.61663009,No.61563016);; 江西省自然科学基金(No.20171BAB202012,No.20181BAB202023);; 江西省交通厅科研项目(No.2017D0038);; 江西省教育厅科技项目(No.GJJ180322)
  • 语种:中文;
  • 页:DZXU201907026
  • 页数:5
  • CN:07
  • ISSN:11-2087/TN
  • 分类号:201-205
摘要
网络数据具有规模大的特点,而基于关系的相似度计算复杂度高,因此大图上的相似度计算具有很大挑战.文章针对一个新的相似度度量SuperSimRank在大图上的优化计算问题展开研究.首先提出了阈值过滤技术,使得在计算过程中忽略那些对SuperSimRank值影响较小但消耗计算资源的路径值,并通过严格数学证明论证了近似值和准确值的误差;然后在此基础上提出了高效的外存算法,该算法避免了随机访问文件而是通过顺序的读写文件,极大的减少了I/O代价;最后实验验证了算法的有效性.
        Due to the high computational cost and space cost in computing node similarity,it is a challenge when it comes to efficiently computing the similarity on big graphs.In this paper,the following problem will be resolved:how to fast compute SuperSimRank similarity on massive graphs using a single PC.A threshold sieving technology and an external algorithm are introduced.With the help of threshold sieving technology,our external algorithm can efficiently compute the similarity on massive graphs.Experimental results demonstrate the efficiency of the computation.
引文
[1] ASHISH G,et al.The who-to-follow system at twitter:Strategy,algorithms,and revenue impact [J].Interfaces,2015,45(1):98-107.
    [2] FORTUNATO S.Community detection in graphs[J].Physics Reports,2010,486(3-5):75-174.
    [3] 俞春花,等.基于上下文相似度和社会网络的移动服务推荐方法[J].电子学报,2017,45(6):1530-1536.YU Chu-hua,et al.Mobie service recommendation based on context similairity and social network[J].Acta Electronica Sinica,2017,45(6):1530-1536.(in Chinese)
    [4] JEH G,WIDOM J.Scaling Personalized Web Search [OL].http://www2003.org/cdrom/papers/refereed/p185/html/p185-jeh.html.2018-12-01.
    [5] JEH G,WIDOM J.SimRank:A Measure of Structural-Context Similarity[OL].http://ilpubs.stanford.edu:8090/508/1/2001-41.pdf.2018-12-01.
    [6] 张应龙,等.信息网络中一个有效的基于链接的结点相似度度量[J].软件学报,2014,25(11):2602-2615.ZHANG Y,et al.Effective link-based measure of node similarity on information networks[J].Journal of Software,2014,25(11):2602-2615.(in Chinese)
    [7] KYROLA A,et al.Graphchi:Large-scale graph computation on just a PC[A].Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation[C].US:USENIX Association,2012.31-46.
    [8] 吴城文,等.从系统角度审视大图计算 [J].大数据,2015,1(3):39-47.WU C.et al.Reviewing large graph computing from a system perspective[J].Big Data Research,2015,1(3):39-47.(in Chinese)
    [9] CHENG J,et al.VENUS:Vertex-centric streamlined graph computation on a single PC[A].Proceedings of IEEE 31st International Conference on Data Engineering[C].New York:IEEE,2015.1131-1142.
    [10] ZHU X,et al.GridGraph:Large-scale graph processing on a single machine using 2-level hierarchical partitioning[A].Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation[C].US:USENIX Association,2015.375-386.
    [11] CHI Y,et al.NXgraph:An efficient graph processing system on a single machine[A].Proceedings of IEEE 32nd International Conference on Data Engineering[C].New York:IEEE,2016.409-420.
    [12] 吴甘沙.大数据技术发展的十个前沿方向 (中)[J].大数据,2015,1(2):109-117.WU G-S.Ten fronties for big data technologies(Part B)[J].Big Data Research,2015,1(2):109-117.(in Chinese)
    [13] LIZORKIN D,et al.Accuracy estimate and optimization techniques for simrank computation[J].VLDB Journal,2010,19(1):45-66.
    [14] YU W,et al.Dynamical SimRank search on time-varying networks[J].VLDB Journal,2018,27(1):79-104.
    [15] WANG S,et al.FORA:Simple and effective approximate single-source personalized pagerank[A].Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery in Data Mining[C].New York:ACM,2017.505-514.
    [16] MINHAO J,et al.READS:a random walk approach for efficient and accurate dynamic SimRank[J].Proceedings of VLDB Endow,2017,10(9):937-948.
    [17] PEROZZI B,et al.Deepwalk:online learning of social representations[A].Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining[C].New York:ACM,2014.701-710.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.