大规模社区网络的社团发现及特征分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息时代,随着互联网的广泛应用,社区网络种类和规模都在迅速发展。发现其社团特征,有利于人们研究社会活动规律,建立更好的交流平台或阻止不良信息的传播。但是,如何快速有效地发现大规模社区网络中的社团还是一个很大难题。
     本文以利用局部信息快速有效发现大规模网络中的社团为目的,分析改进已有算法实现了适合本文研究的算法,并对社团的特征做出分析。本文工作可以分为三部分。第一,分析已有的一些算法,确定了最适合本文研究的LFM算法。LFM算法是采用团测度的局部算法,只需要局部信息就可以发现社团,并且还能发现重叠点。第二,发现了LFM算法中存在的两个问题,从多个角度尝试改进,验证分析并确定了最佳的改进方案。第一个问题是原始算法遇到干扰点时会出现达不到停止条件的现象,第二个问题是遇到大规模社团时效率极低。本文从断开干扰点连边、提供优化种子、禁用干扰点选择优化种子生长方向三个角度针对第一个问题做了改进。提供优化被选集,且让社团中每个节点都作为种子参与社团发现过程的办法提高了算法的效率。第三,本文分析了社区网络的特征,包括社团规模,社团规模与重叠点关系,社团密度。通过仿真发现社团规模分布呈现重尾分布特征。重叠点在小社团中出现的概率大,大社团中很少或几乎没有重叠点,且在一定范围内社团内重叠点数量与社团规模呈现正相关。社团内部连接密度与规模成正相关,外部连接密度在不同拓扑网络中表现不同,团测度基本不随社团规模变化而变。本文在最后做出了总结并对下一步工作提出了展望。
With the development of Internet, online social networks develop rapidly and vast scale datasets of such networks are available for study. Community characteristics are found critical in understanding social activities, designing better application of communication and preventing the spread of harmful information. However, how to efficiently find community structures in large social networks is still a hard problem.
     This research targets on taking advantage of local information to find community in large-scale networks quickly and effectively. Based on the analysis of existing algorithms, we improve the LFM algorithm and analyze the community characteristics of some social networks with the modified algorithm. The main work of this paper consists of three parts. Firstly, some algorithms have been analyzed and obtained the most suitable algorithm. The LFM algorithm is based on the local optimization of a fitness function, it can find overlapping node. Secondly, we found two disadvantages of LFM, try a number of improved methods, and determine the best improvement. When there is noise node in the network, the original LFM cannot stop. When there is large-scale community, the time complexity of the original LFM is too high. We improve the LFM form three strategies, removing the edge between noise node and community, selecting the best node as the starting node that to found an sub graph as community, and removing the noise node as staring node and adding the node of highest CC as second node. To improve the time complexity, we determine everyone of community as starting node to detecting the sub-graph for finding community. Thirdly, we analyzed characteristics of community of different strategies in three aspects: community size, relationship between overlapping node and community size, and the density of community. Our experiment results have proved that community sizes satisfy Heavy-tailed distribution, and small communities more likely have overlapping node, but large community have litter or no overlapping node. The larger the community size, the higher intra-density. Intra-density and size within the community is positively related. Inter-density has a different performance in different network topologies; the fitness does not change with changes in the size of community. In the end of the paper, we point out the directions of next step.
引文
[1]. http://research.cnnic.cn/
    [2].吴金闪,狄增如.从统计物理学看复杂网络研究.物理学进展,2004,24(1).
    [3]. Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs.Bell System Technical Journal 49:291-307,1970.
    [4]. Pothen A, Simon H, L iou K2P. Partitioning sparse matrices with eigenvectors of graphs[J]. SIAM J Matrix AnalApp 1,1990,11 (3):430-452.
    [5]. NewmanM E J, GirvanM. Finding and evaluating community structure in networks[J]. Phys Rev E,2004,69 (2):026113.
    [6]. NewmanM E J. Fast algorithm for detecting community structure in networks[J]. Phys Rev E,2004,69 (6):066133.
    [7]. A. Clauset, M.E.J. Newman, C. Moore, Finding community structure in very large networks, Phys. Rev. E 70 (6) (2004) 066111.
    [8]. Girvan, M. & Newman, M. (2002) Proc. Natl. Acad. Sci. USA 99,8271-8276.
    [9]. Palla, G., I. Der_enyi, I. Farkas, and T. Vicsek,2005, Nature 435,814.
    [10]. Andrea Lancichinetti, Santo Fortunato, Janos Kert' esz.Detecting the overlapping and hierarchical community structure in complex networks.2009
    [11]. Mark Goldberg, Stephen Kelleyj(?), Malik Magdon-Ismail_, Konstantin Mertsalov_, and Al Wallace. Finding Overlapping Communities in Social Networks
    [12]. S. Gregory. Finding Overlapping Communities UsingDisjoint Community Detection Algorithms. Department of Computer Science, University of Bristol, Bristol BS8 1UB, England
    [13]. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, Introduction to Information Extraction Technoiogy. Proceedings of National Academy of Sciences 101,2658 (2004).1-3.
    [14]. Zachary W W 1977 J. of Anthropol. Research 33 452
    [15]. Mancoridis, S., B. S. Mitchell, C. Rorres, Y. Chen, and E. R.Gansner,1998, in IWPC'98: Proceedings of the 6th In-temational Workshop on Program Comprehension (IEEE Computer Society, Washington, DC, USA).
    [16]. A Lancichinetti, S Fortunato, J' anos Kert'esz,Detecting, The overlapping and hierarchical community structure in complex networks,New Journal of Physics,2009
    [17].解惶,汪小帆.复杂网络的一种快速局部社团划分算法,计算机仿真.2007第11期.
    [18]. M. Girvan, and M. E. J. Newman. Community structure in social and biological networks.2002 Proc. Nat. Acad.Sci.99 7821.
    [19]. JP BagrewandEM Bolh. A LocalMethodforDetectingCommu-nities[J]. Phys Rev E,2005, 72:046108.
    [20].刘绍海,刘青昆,谢福鼎,等.复络基于局部模块度的社团划分方法2009,30(20)4709
    [21]. W right W E. Gravitationa 1 clustering [J]. Patte rn Recog2nition,1977,9 (3):151-166.
    [22].淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法,电子学报,2006,2.
    [23].淦文燕,赫南,李德毅,王建民.一种基于拓扑势的网络社区发现方法.JournalofSoftware, V0l.20, No.80 Augllst2009, PP.2241-2254
    [24]. Newman MEJ.Strogatz SH.Watts DJ Random graphs with arbitrary degree distributions and their applications 2001(02)
    [25]. Zhemin Zhu,Chen Wang(?) Li Ma, Yue Pan, Zhiming Ding。Scalable Community Discovery of Large Networks. Web-Age Information Management,2008.
    [26]. T.S. Evans (?) R. Lambiotte. Overlapping Communities, Link Partitions and LineGraphs. Theoretical Physics, Imperial College London, SW7 2AZ, U.K.
    [27]. HEP-TH http://snap.stanford.edu/data/ca-HepTh.html
    [28]. "Enron email network", http://snap.stanford.edu/data/email-Enron.html
    [29]. "cond-mat-1999",http://snap.stanford.edu/data/ca-CondMat.html
    [30]. "Pennsylvania road network",http://snap.stanford.edu/data/roadNet-PA.html

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700