SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks
详细信息    查看全文
  • 作者:Zhiang Wu ; Guangliang Gao ; Zhan Bu ; Jie Cao
  • 关键词:Complex network ; Community detection ; Parallel computing ; MapReduce ; K ; means
  • 刊名:Cluster Computing
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:19
  • 期:1
  • 页码:211-221
  • 全文大小:1,234 KB
  • 参考文献:1.Apache Software Foundation: Apache Mahout: Scalable machine-learning and data-mining library. http://​mahout.​apache.​org
    2.Bernard, T., Bui, A., Pilard, L., Sohier, D.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)CrossRef
    3.Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)CrossRef
    4.Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 493–507 (1952)
    5.Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of KDD, pp. 621–628 (2011)
    6.Clauset, A.: Finding local community structure in networks. Phys. Rev. E 72, 026132 (2005)CrossRef
    7.Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)MathSciNet CrossRef
    8.Gregori, E., Lenzini, L., Mainardi, S.: Parallel k-clique community detection on large-scale networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1651–1660 (2013)CrossRef
    9.Hubler, C., Kriegel, H.P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proceedings of the 2008 IEEE 7th International Conference on Data Mining, pp. 283–292. IEEE (2008)
    10.Hui, P., Yoneki, E., Chan, S.Y., Crowcroft, J.: Distributed community detection in delay tolerant networks. In: Proceedings of 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, p. 7. ACM (2007)
    11.Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th Conference on Design Automation, pp. 343–348 (1999)
    12.LaSalle, D., Karypis, G.: Multi-threaded modularity based graph clustering using the multilevel paradigm. J. Parallel Distrib. Comput. 76, 66–80 (2015)CrossRef
    13.Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)
    14.Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 177–187. ACM (2005)
    15.Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Advances in Neural Information Processing Systems, pp. 539–547 (2012)
    16.Li, F., Ooi, B.C., Ozsu, M., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31 (2013)
    17.Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on mapreduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014)
    18.Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 66–113 (2004)CrossRef
    19.Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
    20.Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)MathSciNet CrossRef MATH
    21.Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)CrossRef
    22.Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012)
    23.Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.L.: Shaping communities out of triangles. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1677–1681. ACM (2012)
    24.Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.: High quality, scalable and parallel community detection for large real graphs. In: 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April, pp. 225–236 (2014)
    25.Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036–106 (2007)CrossRef
    26.Serrano, M.Á., Boguñá, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483–6488 (2009)CrossRef
    27.Shi, J., Xue, W., Wang, W., Zhang, Y., Yang, B., Li, J.: Scalable community detection in massive social networks using mapreduce. IBM J. Res. Dev. 57(3/4), 12-1 (2013)
    28.Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool Publishers, San Rafael (2010)
    29.Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)MathSciNet CrossRef
    30.White, T.: Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media (2009)
    31.Wu, J., Liu, H., Xiong, H., Cao, J.: A theoretic framework of k-means-based consensus clustering. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1799–1805. AAAI Press (2013)
    32.Wu, Z., Cao, J., Wu, J., Wang, Y., Liu, C.: Detecting genuine communities from large-scale social networks: a pattern-based method. Comput. J. 57(9), 1343–1357 (2014)CrossRef
    33.Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of ICDM, pp. 745–754 (2012)
    34.Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. ACM (2013)
    35.Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI’12, pp. 2–2. USENIX Association, Berkeley, CA (2012)
    36.Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, vol. 10, p. 10 (2010)
    37.Zhang, Y., Wang, J., Wang, Y., Zhou, L.: Parallel community detection on large networks with propinquity dynamics. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 997–1006. ACM (2009)
    38.Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing, CloudCom ’09, pp. 674–679. Springer (2009)
  • 作者单位:Zhiang Wu (1)
    Guangliang Gao (2)
    Zhan Bu (1)
    Jie Cao (1)

    1. Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China
    2. College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
  • 刊物类别:Computer Science
  • 刊物主题:Processor Architectures
    Operating Systems
    Computer Communication Networks
  • 出版者:Springer Netherlands
  • ISSN:1573-7543
文摘
Community detection is a classic and very difficult task in complex network analysis. As the increasingly explosion of social media, scaling community detection methods to large networks has attracted considerable recent interests. In this paper, we propose a novel SIMPLifying and Ensembling (SIMPLE) framework for parallel community detection. It employs the random link sampling to simplify the network and obtain basic partitionings on every sampled graphs. Then, the K-means-based Consensus Clustering is used to ensemble a number of basic partitionings to get high-quality community structures. All of phases in SIMPLE, including random sampling, sampled graph partitioning, and consensus clustering, are encapsulated into MapReduce for parallel execution. Experiments on six real-world social networks analyze key parameters and factors inside SIMPLE, and demonstrate both effectiveness and efficiency of the SIMPLE.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700