社会网络社团挖掘若干关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
社会网络泛指由世界上具有复杂连接关系的、与研究目的相关的事物或对象所组成的一种网络,广义上的社会网络包括人际关系网络、酶蛋白质网络、生物链网络、新陈代谢网络以及以互联网为代表的各种社交网络等。社会网络可以抽象为一个具有复杂连接关系和结构的可演化图。通常,社团是一组内部连接紧密,外部连接稀疏的节点集合。社会网络社团挖掘对于加强对社会网络的拓扑结构理解、功能分析和行为预测具有重要的理论意义及实用价值,已被广泛应用于恐怖组织识别、新陈代谢路径预测、蛋白质相互作用网络分析、搜索引擎开发和网络舆情监控等诸多领域。
     本文针对社团挖掘的不同应用场景,分别就多重图模型下的社会网络、含有不完全连接信息的社会网络、二部图模型下的社会网络和多类型节点组成的社会网络的社团挖掘关键技术进行研究,主要贡献有:
     1、提出了一种基于层次化子图树的多重图社会网络社团并行挖掘方法。根据多重图数学模型,社会网络中任意两个节点之间允许有多条连接边,每条边对应于一个连接属性,赋以一个自然数的权重。对于社会网络图中的任一子图,本文运用该数学模型,通过综合该子图各边权重之和与节点个数引入子图密度的概念,并基于从权重较低的边出发进行逐步分解的思想,构造了子图密度逐层递增的层次化子图树HAT,提出了一种针对静态社会网络的层次化社团并行分解更新算法。进而,根据动态社会网络在相邻时间段内的局部变化原理,提出了针对动态社会网络的层次化社团并行分解动态更新算法。模拟数据集上的实验表明,以上算法在社团挖掘的准确度上优于GN等同类算法;数千万边和节点规模的真实数据集上的实验表明,以上算法适于高度并行计算,可以有效应用于大数据社会网络的社团挖掘。
     2、提出了基于节点亲近度和马氏距离的不完全信息社会网络社团挖掘方法。假设考察的社会网络各节点具有相同的属性,每个节点对应于一个n维属性值向量;节点之间的连接信息具有不完全性,即某些局部区域的某些连接未知。本文选取具有完全连接信息的某些区域作为学习的样本区域,首先引入节点亲近度的概念,任意一对节点的亲近度由各自连接的邻居数和公共邻居数的某个经验公式给出,亲近度越接近于1,节点对越亲近,为0的节点对视为不亲近。接着对样本区域的每个节点的n维值向量,乘以某个矩阵,将它们变换到一个称为马氏空间的新空间中,使越亲近的节点对,其值向量在马氏空间中的距离,即马氏距离越小。于是,问题便转化为能够迭代求解的数学规划问题:寻找马氏变换矩阵,使所有亲近的节点对,其马氏距离的亲近度之加权和达到最小,同时使不亲近的节点对的马氏距离之和达到最大。最后,通过以上学习过程得到的马氏矩阵,将整个社会网络中的所有节点变换到马氏空间中,经贪婪搜索,给出一种基于节点马氏距离的聚类算法,以及一种-近似社团启发式分解方法。经数千个节点规模的真实数据集的实验,表明本文提出的方法和解决方案是有效的。
     3、提出了基于图像变换的二部图社会网络重叠社团挖掘方法。二部图社会网络指由两组子网组成的社会网络,边连接仅存在于不同子网的节点之间,子网内部没有任何边连接。其社团挖掘问题是要找出那些分属两个子网的两个社团,称为社团对,尽管社团内部各节点互不直接相连,却与共同的对方社团的节点有密切关联。设两组子网分别包含m个和n个节点,二部图连接关系可以用一个m×n的0/1关系矩阵或黑白图像描述。其社团对的挖掘问题转化为:如何通过整行(列)图像的不断交换与调整,使图像中的黑色像素点最大程度的聚合在一起。本文首先基于哈夫曼图像编码理论,以信息量最小化为目标,改进了一种行(列)贪婪搜索与交换的双聚类迭代算法,使整幅图像分解成若干矩形块,找出那些黑像素点密度较高的块,也就挖掘出了相应的行社团和列社团。接着,进一步针对某些行(列)节点可能同时属于多个不同的行(列)社团的应用场景,提出了一种基于增益函数的重叠社团模式搜索双聚类算法,其核心思想是,对于某一其它行(列)社团的节点能否同时属于本社团,取决于该节点的加入能否使本行(列)社团得到正收益,为此引入增益函数,它是新老社团所对应的块状区域密度差异的加权和。模拟数据集和真实数据集上的实验表明,算法具有良好的二部图社团挖掘效果。
     4、提出了一种具有多类型节点的社会网络社团挖掘方法。在多类型节点组成的社会网络中,连接既存在于同类节点之间,也存在于不同类节点之间,而社团组成仅限于同类节点。具有n类节点的社会网络的连接关系可以用n×(n+1)/2幅黑白图像描述。本文首先将第一类节点相关的n幅图像拼接成一长幅图像,与第二类节点相关的n-1幅剩余图像也拼成一长幅图像,以此类推。接着把社团挖掘问题转化为与二部图相似的长幅图像的行和列的交换问题,其差异主要在于列交换时必须注意保持各列的语义一致性,其复杂性主要表现在需要综合各长幅图像的黑像素点聚集度。本文运用哈夫曼图像编码理论,以最大化压缩描述上述所有图像为目标,分别在社团数量已知和未知条件下,提出了相应的多聚类贪婪搜索算法。模拟数据集和真实数据集上的实验表明,算法在聚类精度上优于NMF等同类算法。
A social network is a social structure made up of a set of entities such asindividuals or organizations, and the dyadic ties between these entities. Generally, asocial network is related with the research goals and can be used to describe differentrelationships in different fields. Commonly, social networks include human interactingnetworks, enzymatic protein networks, biologic chain networks, metabolic networks anddifferent kinds of virtual interacting networks of the Internet. The social network can bedescribed by a dynamical evolution graph, in which each node stands for an entity andeach edge between two nodes stands for the interaction between them. Communities areimportant components of the social networks. Usually, the community is formed with aset of nodes, in which the nodes are densely connected internally and loosely connectedexternally. Community detection is critical for the topological structures understanding,function analysis, and node actions prediction of the social networks. Hence, the socialnetwork has important theoretical and practical value on research and applications.Moreover, community detection is commonly used in many applications such asterrorist organization identification, metabolic pathway prediction, search enginedevelopment and public sentiment monitoring, etc. In this paper, according to differentapplication environment, we focus on community detection in multigraph socialnetwork, bipartite social network, incomplete information social network and the socialnetwork with multi-type nodes respectively. The contributions of this paper include:
     Firstly, a parallelizable method based on hierarchical subgraph trees forcommunity detection in multigraph social network is proposed. According to themathematical definition of multigraph, the multigraph is permitted to have multipleedges between any two nodes, and each edge is permitted to attach with a naturalnumber weight. We use the multigraph to describe the social networks. Based on themultigraph model, the density of any subgraph in the social network is defined bysimultaneous considering the sum of the weight and the total number of the edges in thesubgraph. In order to generate the hierarchical subgraph tree (called HAT), aparallelizable algorithm for generating hierarchical communities in static social networkis proposed. Considering the dynamical social network only changes in a small localarea in the short continuous time, a parallelizable algorithm for generating hierarchicalcommunities in dynamic social network is designed. Through the experiments on realand synthetic datasets, we demonstrate our algorithms are more accurate than thecompared algorithms such as GN etc. Further experiments on big data social network,which has more than ten millions of nodes and edges, validate our algorithms arescalable and fit for community detection in the big data social networks.
     Secondly, a community detection method based on closeness and Mahalanobis distance for the incomplete information social network is presented. We assume eachnode in the incomplete information social network is attached with a set of attributes,which is described by a vector. We also assume there are many local completeinformation regions in the incomplete information network. Moreover, in each localcomplete information region, the complete link information is provided, but other linkinformation outside of the local complete information areas is unknown or incomplete.The main idea of detecting the community in incomplete information networks is asfollows. We first learn a global distance metric from the local information regions withcomplete link information. Then, we use the global metric to measure the distancebetween any pair of nodes in the network. Because the metric is learned from thestructure of the network, the distance should also reflect the missing link structure in thenetwork. In order to learn such a global distance metric, the closeness between any pairof nodes is defined by considering how many neighbors they shared and how manyneighbors they have respectively. The value of closeness ranges from0to1, and thelarger value of closeness between any two nodes, the closer between them from theview of link structure. Subsequently, the closeness between any pair of nodes in eachlocal complete information region can be computed. Consequently, the problem oflearning the global distance metric is transformed into learning a Mahalanobis matrixwhich satisfies: for any pair of nodes in any local complete information region, thehigher value of closeness between them, the short distance between them after beingprojected in the Mahalanobis space. Finally, a distance-based community detectionalgorithm for incomplete information social network is presented. To speed up theclustering process, a heuristic method for generating the-approximation communityis designed. The experiments on real data sets demonstrate the effectiveness andefficiency of the proposed methods.
     Thirdly, an overlapping community detection method based on imagetransformation for bipartite social network is presented. The bipartite social network ismade up of two sub-networks. Particularly, in bipartite social network, edges can onlyexist between nodes from different sub-networks, and not exist between the nodes fromthe same sub-networks. The problem of community detection in bipartite social networkis to discover the community pairs which exist in different sub-networks of the bipartitesocial network. Assume there are m nodes in one sub-network and n nodes in anothersub-network. Then, the bipartite social network can be described by a0/1relationalmatrix or a white and black image. Then, the community detection problem in bipartitesocial network is transformed into how to adjust the rows and column of the white andblack image for clustering the black pixel together as much as possible. Based on theHuffman coding theory for image, we try to compress the total information which isused to describe the image by adjusting the positions of rows and columns. Then, a greedy co-clustering algorithm, which can discover the row clusters and column clusters,is improved. The blocks, which are at the crossing of the row clusters and columnclusters, either have a very high or a very low density of pixel. In order to detect theoverlapping row clusters and column clusters, a gain function is defined. Next, anoverlapping co-clustering algorithm based on the gain function is designed. The mainidea of this algorithm is that: for any row (column), if it can be put into more than onerow (column) clusters, it should improve the value of the gain function, otherwise, itshould not be put into the new row (column) cluster. The experiments on synthetic andreal data sets validate the effectiveness and efficiency of the proposed methods.
     Fourthly, a community detection method for social networks with multi-type nodesis presented. In the social networks with multi-type nodes, the edges exist betweennodes from not only the same type but also different type. Assume there are n types ofnodes in the social network, we can use n(n+1)/2images at the most to describe therelationships between different types of nodes in the social network. We firstconcatenate the n images, which share the same type of nodes. Then, for the second typeof nodes, the left n-1images are concatenated together as the first type of nodes. Thisprocess is repeated until all of the images are dispatched to different concatenatedimages. According to the Huffman coding theory for image, the problem of communitydetection in social networks with multi-type of nodes is transformed into a new problemwhich is how to compress the set of concatenated images by adjusting the positions ofrows and columns in the concatenated images. Since the new problem was proved to beNP-hard, two greedy algorithms, which are used to detect communities when thenumber of communities are provided and not provided, are presented respectively.Experiments on synthetic and real datasets demonstrate the presented algorithms aremore effective than the compared methods such as NMF.
引文
[1]中国互联网信息中心.中国互联网络发展状况统计报告[R].中国互联网信息中心(CNNIC).2012.
    [2] Dunbar, R.I.M. Neocortex size as a constraint on group size in primates [J].Journal of Human Evolution,1992,22(6):469-493.
    [3] Travers J, Milgram S. An experimental study of the small world problem [J].Sociometry,1969,32(4):425-443.
    [4] Watts DJ, Strogatz SH. Collective dynamics of Small-World networks [J].Nature,1998,393(6638):440442.
    [5] Barabási AL, Albert R. Emergence of scaling in random networks [J]. Science,1999,286(5439):509512.
    [6] Albert R, Barabási AL, Jeong H. The Internet’s Achilles heel: Error and attacktolerance of complex networks [J]. Nature,2000,406(2115):378382.
    [7] Barabási AL, Albert R, Jeong H, et al. Power-Law distribution of the WorldWide Web. Science [J],2000,287(5461):2115a.
    [8] Girvan M, Newman MEJ. Community structure in social and biologicalnetworks [C]. In Proceedings of the National Academy of Science,2002,9(12):78217826.
    [9] Newman MEJ, Girvan M. Finding and evaluating community structure innetworks [J]. Physical Review E,2004,69(2):026113.
    [10] Kleinberg JM. Authoritative sources in a hyperlinked environment [J]. Journalof the ACM,1999,46(5):604632.
    [11] Palla G, Derenyi I, Farkas I, et al. Uncovering the overlapping communitystructures of complex networks in nature and society [J]. Nature,2005,435(7043):814818.
    [12] OSLOM. Community Structure Visualization [Z]. http://www.oslom.org/.
    [13] Barabási AL, Oltvai ZN. Network biology: understanding the cell's functionalorganization [J]. Nature Reviews Genetics,2004,101-113.
    [14] Jure L, Lars B, Ravi K, et al. Microscopic evolution of social networks [C]. InProceedings of the14th ACM SIGKDD international conference on Knowledgediscovery and data mining (KDD '08). ACM, New York, NY, USA,2008,462-470.
    [15] Elena Z, Hossam S, Lise G. Co-evolution of social and affiliation networks[C]. In Proceedings of the15th ACM SIGKDD international conference on Knowledgediscovery and data mining (KDD '09). ACM, New York, NY, USA,2009,1007-1016.
    [16] Alan M, Massimiliano M, Krishna P. Gummadi, et al. Measurement andanalysis of online social networks [C]. In Proceedings of the7th ACM SIGCOMMconference on Internet measurement (IMC '07). ACM, New York, NY, USA,2007,29-42.
    [17]杨博,刘大有,刘继明等.复杂网络聚类方法[J].软件学报,2009,20(1):54-66.
    [18] Albert LB, Albert R. Emergence of scaling in random networks [J]. Science,1999,286(5439):509–512.
    [19] Broder A, Kumar R, Maghoul F,et al. Graph structure in the web [C], inProceedings of the9th International World Wide Web Conference,2000,309-320.
    [20] GitHub Network Structure. Mapping GitHub–a network of collaborativecoders [Z]. http://flowingdata.com/2010/03/31/mapping-the-github-community/.
    [21]杨楠,弓丹志,李忺等. Web社团发现技术综述.计算机研究与发展,2005,2005(3):439-447.
    [22] Moltka D, Judith T, Matthias, et al. IRIS (1978-2006) Historical ReflectionThrough Visual Analysis [C]. in Proceedings of the30th Information Systems ResearchSeminar in Scandinavia IRIS2007,2007,150-159.
    [23] Newman MEJ. Modularity and communities structure in networks [C]. InProceedings of the National Academy of Science,2006,103(23):85778582.
    [24] Shiga M, Takigawa I, Mamitsuka H. A spectral clustering approach tooptimally combining numerical vectors with a modular network [C]. In Proceeding ofthe13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. New York: ACM Press,2007,647656.
    [25] Newman MEJ. Detecting community structure in networks [J]. EuropeanPhysical Journal (B),2004,38(2):321330.
    [26] Newman MEJ. Fast algorithm for detecting community structure in networks[J]. Physical Review E,2004,69(6):066133.
    [27] Guimera R, Amaral LAN. Functional cartography of complex metabolicnetworks [J]. Nature,2005,433(7028):895900.
    [28] Reichardt J, Bornholdt S. Detecting fuzzy community structures in complexnetworks with a potts model [J]. Physical Review Letters,2004,93(19):218701.
    [29]淦文燕,赫南,李德毅等.一种基于拓扑势的网络社区发现方法[J].软件学报,2009,20(8):2241-2254.
    [30]何东晓,周栩,王佐等.复杂网络社区挖掘--基于聚类融合的遗传算法[J].自动化学报,2010,36(8):1160-1170.
    [31]沈华伟,程学旗,陈海强等.基于信息瓶颈的社区发现[J].计算机学报,2008,31(4):677-686.
    [32] Tyler JR, Wilkinson DM, Huberman BA. Email as spectroscopy: Automateddiscovery of community structure within organizations [C]. In Proceedings of the1stInternational Conference on Communities and Technologies. Dordrecht: KluwerAcademic Publishers,2003.8196.
    [33] Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining andidentifying communities in networks [C]. In Proceeding of the National Academy ofScience,2004,101(9):26582663.
    [34] Flake GW, Lawrence S, Giles CL, Coetzee FM. Self-Organization andidentification of Web communities [J]. IEEE Computer,2002,35(3):6671.
    [35] Wu F, Huberman BA. Finding communities in linear time [J]. A physicsapproach. European Physical Journal B,2004,38(2):331338.
    [36] Yang B, Cheung WK, Liu J. Community mining from signed social networks[J]. IEEE Transaction on Knowledge and Data Engineering,2007,19(10):13331348.
    [37] Zhang Y Z, Wang J Y, Wang Y, et al. Parallel community detection on largenetworks with propinquity dynamics [C]. In Proceedings of the15th ACM SIGKDDInterna-tional Conference on Knowledge Discovery and Data Mining. Paris, France:ACM,2009,997-1005.
    [38] Wasserman S, Faust K. Social Network Analysis [M]. Cambridge: CambridgeUniversity Press,1994.
    [39] Pons P, Latapy M. Computing communities in large networks using randomwalks [C]. In Proceedings of the20th International symposium on Computer andInformation Sciences. Berlin: Springer-Verlag,2005,284293.
    [40] Yang B, Liu J, Discovering global network communities based on localcentralities [J]. ACM Transaction on the Web,2008,2(1): article9:132.
    [41] Zhang C Q, Ou Y, Clustering, community partition and disjoint spanning trees[J]. ACM Transactions on Algorithms,2008,4(3):35
    [42] Lin Y R, Sun J, Paul C, et al. Parallel community detection on large networkswith propinquity dynamics [C]. In Proceedings of the15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. Paris, France: ACM,2009,527-536
    [43] Kim MS Han J. A Particle-and-Density Based Evolutionary ClusteringMethod for Dynamic Networks [J]. in Proceedings of Very Large Database,2009,2(1):622-633.
    [44] Lin YR, Chi Y, Zhu S, et al. Facetnet: a framework for analyzingcommunities and their evolutions in dynamic networks [C]. In17th International WorldWide Web Conference,2008:685–694.
    [45] Halperin S, Zwick U, Optimal Randomized EREW PRAM Algorithms forFinding Spanning Forests [J]. Algorithms,2001,39(1):1-46.
    [46] Fortunato S, Barthelemy M, Resolution limit in community detection [J]. InProceedings of the National Academy of Sciences of the United States of America,2007,104(1):3641.
    [47] Integer Partition. Integer Partition [Z].http://en.wikipedia.org/wiki/Integer_partition.
    [48] NASH-WILLIAMS C. Edge-disjoint spanning trees of finite graphs [J].Journal of the London Mathematical Society36(1):445–450.
    [49]陈国良.并行计算——结构、算法、编程[M].北京:高等教育出版社,2003.
    [50] DBLP. DBLP Digital Library [Z]. http://www.informatik.uni-trier.de/~ley/db/.
    [51] Sina Blog. Sina Blog [Z]. http://t.sina.com.cn/.
    [52] Kwak H, Lee C, Park H, and Moon S. What is Twitter, a Social Network or aNews Media?[C]. In Proceedings of the19th International World Wide WebConference, Raleigh, NC, USA,2010,591-600.
    [53] Girvan M, Newman MEJ. Community structure in social and biologicalnetworks [J]. in Proceedings of the National Academy of Sciences of the United Statesof America,2002,99(6):7821-7826.
    [54] Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E, Fast unfolding ofcommunities in large networks [J]. Journal of Statistical Mechanics: Theory andExperiment,2008, P10008.
    [55] Ahn YY, Bagrow JP, Lehmann S. Link communities reveal multiscalecomplexity in networks [J]. Nature,2010,466:761-764.
    [56] Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testingcommunity detection algorithms [J]. Physial Review E,2008,78(4):046110.
    [57] Huang J, Sun H, Han J, et al. SHRINK: A Structural Clustering Algorithm forDetecting Hierarchical Communities in Networks [C]. In Proceedings of the2010ACMCIKM International Conference on Information and Knowledge Management (CIKM2010).2010,219-228.
    [58] David L, Jon K. The Link Prediction Problem for Social Networks [C]. inProceedings of the2003ACM CIKM International Conference on Information andKnowledge Management (CIKM2003). Louisiana, USA: ACM,2003,556-559.
    [59] Ben T, Ming FW. Link Prediction in Relational Data [C]. In Proceedings ofAdvances in Neural Information Processing Systems(NIPS2003). British Columbia,Canada: MIT Press,2003,45-52.
    [60] Sarkar P, Chakrabarti D, Jordan MI. Nonparametric link prediction indynamic networks [C]. In Proceedings of the29th International Conference on MachineLearning (ICML), Edinburgh, UK,2012,245-258.
    [61] Bowen Y, Steve G. Finding missing edges in networks based on theircommunity structure [J]. Physical Review E,2012,85:056112.
    [62] Ron E, Sarit K, Avi R. Identifying Missing Node Information in SocialNetworks [C]. In Proceedings of the25th AAAI Conference on Artificial Intelligence,2011,190-201.
    [63] Wang C, Satuluri V, Parthasarathy S. Local Probabilistic Model for LinkPrediction [C]. In Proceedings of the7th International Conference on Data Mining,Nebraska, USA: IEEE Computer Society,2007.322-331.
    [64] Lin YL, Tao Z. Similarity index based on local paths for link prediction ofcomplex networks [J]. Physical Review E,2009,80(4):1-9.
    [65] Myunghwan K, Jure L. The Network Completion Problem: Inferring MissingNodes and Edges in Networks [C]. in Proceedings of the11th SIAM InternationalConference on Data Mining,2011,47-58.
    [66]万怀宇,林友芳,黄厚宽.社会网络中的链接稳定性预测问题研究[J].北京交通大学学报,2009,33(5):99-103.
    [67]郭景峰,王春燕,邹晓红等.一种改进的针对合作关系网络的链接预测方法[J].计算机科学,2008,35(12):126-128.
    [68] Kluger Y, Basri R, Chang JT, et al. Spectral biclustering of microarray data:coclustering genes and conditions [J]. Genome Research,2003,13(4):703–716.
    [69] Dhillon I, Mallela S, and Modha D. Information-theoretic co-clustering [C]. InProceedings of the19th ACM SIGKDD international conference on Knowledgediscovery and data mining,2003,89-98.
    [70] Wang H, Wang W, Yang J, et al. Clustering by pattern similarity in large datasets [C]. in Proceedings of the The ACM SIGMOD International Conference onManagement of Data,2002,394-405.
    [71] Liu J, Wang W. Op-cluster: clustering by tendency in high dimensional space
    [C]. in Proceedings of the3rd IEEE International Conference on Data Mining,2003,19-22.
    [72] Tanaya A, Sharanr R, Kupiec M, et a1. Revealing modularity and organizationin the yeast molecular network by integrated analysis of highly heterogeneousgenomewide data [C]. in Proceedings National Academy of Science,2004,2981-2986.
    [73] Cho H, Dhillon IS, Guan Y, et al. Minimum sum-squared residueco-clustering of gene expression data [C]. In Proceedings SIAM Proceedings of the4thSIAM International Conference on Data Mining, Lake Buena Vista, USA,2004,114-125.
    [74] Banerjee A, Dhillon I, Ghosh J, et al. A generalized maximum entropyapproach to bregman co-clustering and matrix approximation [J]. Journal of MachineLearning Research,2007,8:1919-1986.
    [75] Anagnostopoulos A, Dasgupta A, Kumar R. Approximation algorithms forco-clustering [C]. In Proceedings of the28th ACM SIGMOD/PODS InternationalConference on Management of Data/Principles of Database Systems. Vancouver, BC,Canada,2008,201-210.
    [76] Bryan K, Cunningham P, Bolsuakova N. Biclustering of Expression DataUsing Simulated Annealing [C]. in Proceedings of the18th IEEE Symposium onComputer-Based Medical Systems,2005,383-388.
    [77] Yin X, Han J, Yu P. Crossclus: user-guided multirelational clustering [J].Journal of Data Mining and Knowledge Discovery,2007,15(3):321-348.
    [78] Sun YZ, Han J, Zhao P, et al. RankClus: integrating clustering with rankingfor heterogeneous information network analysis [C]. In Proceedings of the12thInternational Conference on Extending Database Technology: Advances in DatabaseTechnology (EDBT '09), ACM, New York, NY, USA,2009,565-576.
    [79] Sun YZ, Norick B, et al: Integrating meta-path selection with user-guidedobject clustering in heterogeneous information networks[C]. In Proceedings of the18thACM SIGKDD Knowledge Discovery and Data Mining.2012,1348-1356.
    [80] Multigraph. Multigraph [Z]. http://en.wikipedia.org/wiki/Multigraph.
    [81] Agarwal G and Kempe D. Modularity-maximizing graph communities viamathematical programming [J]. European Physical Journal B,2008,66:409-418.
    [82] Aggarwal C, Xie Y, and Yu P. Towards community detection in locallyheterogeneous networks [C]. in Proceedings of the Eleventh SIAM InternationalConference on Data Mining,2011,391-402.
    [83] Alani H, Dasmahapatra S, Hara K, et al. Identifying communities of practicethrough ontology network analysis [J]. Journal of Intelligent Systems,2003,18(2):18-25.
    [84] Boyd S and Vandenberghe L. Convex Optimization [M]. CambridgeUniversity Press,2004.
    [85] Evans T and Lambiotte R. Line graphs, link partitions and overlappingcommunities [J]. Physical Review E,2009,80:016105.
    [86] Feng Z, Xu X, Yuruk N, et al. A novel similarity-based modularity functionfor graph partitioning [C]. in Proceedings of the9th International Conference of DataWarehousing and Knowledge Discovery (DaWak),2007,385-396.
    [87] Ge R, Ester M, Gao B, et al. Joint cluster analysis of attribute data andrelationship data: The connected k-center problem, algorithms and applications [J].ACM Transactions on Knowledge Discovery From Data,2008,2:1-35
    [88] Good BH, Montjoye YA, and Clauset A. The performance of modularitymaximization in practical contexts [J]. Physical Review E,2010,81:046106.
    [89] Guimera R and Amaral LN. Functional cartography of complex metabolicnetworks [J]. Nature,2005,433(7028):895-900.
    [90] Tang J, Liu H. Community evolution in dynamic multi-mode networks [C]. inProceedings of the14th ACM SIGKDD International Conference on Knowledgediscovery and data mining,2008,677-685.
    [91] Lancichinetti A, Fortunato S. Community detection algorithms: Acomparative analysis [J]. Physical Review E,2009,80(5):056117.
    [92] Leskovec J, Lang K, Dasgupta A, et al. Statistical properties of communitystructure in large social and information networks [C]. in Proceedings of the17thInternational World Wide Web Conference,2008,695-704.
    [93] Leskovec J, Lang K, Mahoney M. Empirical comparison of algorithms fornetwork community detection [C]. in Proceedings of the19th International World WideWeb Conference,2010,631-640.
    [94] Lin Y, Sun J, Castro P, et al. Metafac: community discovery via relationalhypergraph factorization [C]. in Proceedings of the15th ACM SIGKDD InternationalConference on Knowledge discovery and data mining,2009,527-536.
    [95] Palla G, Derenyi I, Farkas I, et al. Uncovering the overlapping communitystructure of complex networks in nature and society [J]. Nature,2005,435:814.
    [96] Rosvall M and Bergstrom C. Maps of random walks on complex networksreveal community structure [J]. in Proceedings of the National Academy of Sciences ofthe United States of America,2008,105:1118.
    [97] Sales-Pardo M, Guimer`a R, Moreira A, and Amaral L. Extracting thehierarchical organization of complex systems [J]. in Proceedings of the NationalAcademy of Sciences of the United States of America,2007,104(39):15224-15229.
    [98] Sen P, Namata G, Bilgic M, et al. Collective classification in network data [J].Artificial Intelligence Magazine,2008,29(3):93-106.
    [99] Shi J and Malik J. Normalized cuts and image segmentation [J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
    [100] Jolliffe IT. Principal Component Analysis [M]. Series: Springer Series inStatistics,2nd edtion, Springer, NY,2002.
    [101] Shiga M, Takigawa I, and Mamitsuka H. A spectral clustering approach tooptimally combining numerical vectors with a modular network [C]. in Proceedings ofthe13th ACM SIGKDD International Conference on Knowledge discovery and datamining,2007,647-656.
    [102] Tian Y, Hankins R, and Patel J. Efficient aggregation for graphsummarization [C]. in Proceedings of the28th ACM SIGMOD/PODS InternationalConference on Management of Data/Principles of Database Systems,2008,567-580.
    [103] Tsai C and Chiu C. Developing a feature weight self-adjustment mechanismfor a k-means clustering algorithm [J]. Computational Statistics Data Analysis,2008,52:4658-4672.
    [104] Wakita K, Tsurumi T. Finding community structure in mega-scale socialnetworks [C]. in Proceedings of the16th International World Wide Web Conference,2007,1275-1276.
    [105] Xing E, Jordan A, and Russell S. Distance metric learning, with applicationto clustering with side-information [C]. in Proceedings of the2002Neural InformationProcessing Systems,2002,505-512.
    [106] Xu X, Yuruk N, Feng Z, et al. Scan: A structural clustering algorithm fornetworks. in Proceedings of the13th ACM SIGKDD International Conference onKnowledge discovery and data mining,2007,824-833.
    [107] Zhou Y, Cheng H, and Yu J. Graph clustering based on structural/attributesimilarities [J]. VLDB Endowment,2009,2:718-729.
    [108] Ahn YY, Bagrow JP, and Lehmann S. Link communities reveal multiscalecomplexity in networks [J]. Nature,2010,466:761-764.
    [109] Chakrabarti D, Papadimitriou S, Modha D S, et al. Fully automaticcross-associations [C]. in Proceedings of the10th ACM SIGKDD InternationalConference on Knowledge discovery and data mining,2004,79-88.
    [110] Classic3. Classic3Classic4Data sets [Z].http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets.
    [111]3Sources.3Sources Dataset [Z]. http://mlg.ucd.ie/datasets.
    [112] Cheng Y, George MC. Biclustering of expression data [C]. In Proceedings ofthe Biclustering of expression data,2000,93-103.
    [113] Linda MC and Clyde WD. Omega: A general formulation of the rand indexof cluster recovery suitable for non-disjoint solutions [J]. Multivariate BehavioralResearch,1988,23(2):231-242.
    [114] Arnetminer. Arnetminer [Z]. http://arnetminer.org/.
    [115] Hirsch JE. An index to quantify an individual's scientific research output [J].in Proceedings of the National Academy of Sciences of the United States of America,2005,102(46):16569-16572.
    [116] Deodhar M, Gupta G, Ghosh J, et al. A scalable framework for discoveringcoherent co-clusters in noisy data [C]. In Proceedings of the26th Annual InternationalConference on Machine Learning (ICML '09),2009,241-248.
    [117] Dhillon IS. Co-clustering documents and words using bipartite spectral graphpartitioning [C]. in Proceedings of the7th ACM SIGKDD International Conference onKnowledge discovery and data mining,2001,269–274.
    [118] Dhillon IS. and Guan Y. Information theoretic clustering of sparseco-occurrence data [C]. Proceedings of the3rd IEEE International Conference on DataMining,2003,517-528.
    [119] Dhillon IS, Mallela S, and Modha DS. Information-theoretic co-clustering. inProceedings of the9th ACM SIGKDD International Conference on Knowledgediscovery and data mining [C],2003,89-98.
    [120] Evans TS and Lambiotte R. Line graphs of weighted networks foroverlapping communities [J]. The European Physical Journal B-Condensed Matter andComplex Systems,2010,77:265-272.
    [121] Steve G. Finding overlapping communities using disjoint communitydetection algorithms [J]. Complex Networks,2009,207:47-61.
    [122] Havemann F, Heinz M, Struck A, et al. Identification of overlappingcommunities and their hierarchy by locally calculating community-changing resolutionlevels [J]. Journal of Statistical Mechanics: Theory and Experiment,2011(01): P01023.
    [123] Hubert L and Arabie P. Comparing partitions [J]. Journal of Classification,1:193-218.
    [124] Lancichinetti A, Fortunato S, and Kertsz J. Detecting the overlapping andhierarchical community structure in complex networks [J]. New Journal of Physics,2009,11(3):033015.
    [125]20Newsgroup Dataset.20Newsgroup Dataset [Z].http://people.csail.mit.edu/jrennie/20Newsgroups/
    [126] Han EH, Karypis G. Centroid-based document classification: Analysis andexperimental results [C]. in Proceedings of the4th European Conference on Principlesof Data Mining and Knowledge Discovery,2010,424–431.
    [127] Lazzeroni L and Owen A. Plaid models for gene expression data [J].Statistica Sinica,2000,12:61-86.
    [128] Lee DD. and Seung HS. Algorithms for nonnegative matrix factorization [C].in Proceedings of the2000Neural Information Processing Systems,2000,556-562.
    [129] Long B, Zhang Z, Wu X, and Yu P. Spectral clustering for multi-typerelational data [C]. in Proceedings of the23rd International Conference on MachineLearning,2006,585-592.
    [130] Long B, Zhang Z, and Yu P. Co-clustering by block value decomposition [C].in Proceedings of the11th ACM SIGKDD International Conference on Knowledgediscovery and data mining,2005,635-640.
    [131] Palla G, Derenyi I, Farkas I, et al. Uncovering the overlapping communitystructure of complex networks in nature and society [J]. Nature,2005,435:814-818.
    [132] Papadimitriou S, Sun S, Faloutsos C, et al. Hierarchical, parameter-freecommunity discovery [C]. In Proceedings of the European conference on MachineLearning and Knowledge Discovery in Databases,2008,170-187.
    [133] Wang X, Tang L, Gao H, et al. Discovering overlapping groups in socialmedia [C]. in Proceedings of the10th IEEE International Conference on Data Mining,2010,569-578.
    [134] Han JW, Sun YZ, Yan XF, et al. Mining Knowledge from Data: AnInformation Network Analysis Approach [C]. in Proceedings of the IEEE28thInternational Conference on Data Engineering,2012,1214-1217.
    [135] Banerjee A, Basu S, and Merugu S. Multi-way clustering on relation graphs[C]. in Proceedings of the7th SIAM International Conference on Data Mining.2006,190-201.
    [136] Barron A, Rissanen J, and Yu B. The minimum description length principlein coding and modeling [J]. IEEE Transactions on Information Theory.1998,44(6):2743-2760.
    [137] Bekkerman R and Jeon J. Multi-modal clustering for multimedia collections[C]. in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR '07),2007,1-8.
    [138] Bekkerman R and Mccallum A. Multi-way distributional clustering viapairwise interactions [C]. in Proceedings of the22th International Conference onMachine Learning,2005,41-48.
    [139] Chen Y, Wang L, and Dong M. Non-negative matrix factorization forsemisupervised heterogeneous data coclustering [J]. IEEE Transactions on Knowledgeand Data Engineering,2010,22(10):1459-1474.
    [140] Gao B, Liu TY, and Ma WY. Star-structured high-order heterogeneous dataco-clustering based on consistent information theory [C]. in Proceedings of the IEEEInternational Conference on Data Mining,2006,880-884.
    [141] Wang G, Zhao YC, Shi X, et al. Magnet community identification on socialnetworks [C]. in Proceedings of the18th ACM SIGKDD Knowledge Discovery andData Mining,2012,588-596.
    [142] Gao B, Liu TY, Zheng X, et al. Consistent bipartite graph co-partitioning forstar-structured high-order heterogeneous data coclustering [C]. In Proceedings of the11th ACM SIGKDD international conference on Knowledge discovery in data mining,2005,41-50.
    [143] Guimera R and Amaral L N. Functional cartography of complex metabolicnetworks [J]. Nature,2005,433(7028):895-900.
    [144] Han EH and Karypis G. Centroid-based document classification: Analysisand experimental results [C]. In Proceedings of the4th European Conference onPrinciples of Data Mining and Knowledge Discovery,2000,424-431.
    [145] Havemann F, Heinz M, Struck A, et al. Identification of overlappingcommunities and their hierarchy by locally calculating communitychanging resolutionlevels [J]. Journal of Statistical Mechanics: Theory and Experiment,2011,2011(01):P01023.
    [146] He J, Tong H, Papadimitriou S, et al. Pack: Scalable parameter-freeclustering on k-partite graphs [C]. in Proceedings of the10th SIAM InternationalConference on Data Mining Workshop on Link Analysis, Counterterrorism and Security,2009,10-22.
    [147] ACM. ACM Digital Library [Z]. http://dl.acm.org/
    [148] Hubert L and Arabie P. Comparing partitions [J]. Journal of Classification,1985,(1):193-218.
    [149] Dino I, Robardet C, Pensa R, et al. Parameter-less co-clustering forstar-structured heterogeneous data [J]. Data Mining and Knowledge Discovery,2012,1-38.
    [150] Long B, Wu X, Zhang Z, et al. Unsupervised learning on k-partite graphs [C].In Proceedings of the12th ACM SIGKDD international conference on Knowledgediscovery and data mining,2006,317-326.
    [151] Long B, Zhang Z, and Yu PS. A general framework for relation graphclustering [J]. Knowledge Infornation System,2010,24:393-413.
    [152] Papadimitriou S, Gionis A, Tsaparas P, et al. Parameter-free spatial datamining using mdl [C]. in Proceedings of the5th IEEE International Conference on DataMining,2005,8-17.
    [153] Sales-Pardo M, Guimer`a R, Moreira A, et al. Extracting the hierarchicalorganization of complex systems [J]. In Proceedings of the National Academy ofSciences of the United States of America,2007,104(39):15224-15229.
    [154] Shiga M, Takigawa I, and Mamitsuka H. A spectral clustering approach tooptimally combining numerical vectors with a modular network.[C]. In Proceedings ofthe13rd ACM SIGKDD international conference on Knowledge discovery and datamining,2007,647-656.
    [155] Sun Y, Yu Y, and Han J. Ranking-based clustering of heterogeneousinformation networks with star network schema [C]. In Proceedings of the15th ACMSIGKDD international conference on Knowledge discovery and data mining,2009,797-806.
    [156] Sun YZ, Norick B, Han J, et al. Integrating meta-path selection withuser-guided object clustering in heterogeneous information networks [C]. inProceedings of the18th ACM SIGKDD international conference on Knowledgediscovery and data mining,2012,1348-1356.
    [157] Tsai C and Chiu C. Developing a feature weight self-adjustment mechanismfor a k-means clustering algorithm [J]. Computational Statistics Data Analysis,2008,52:4658-4672.
    [158] Zeng H J, Chen Z, Ma WY. A unified framework for clusteringheterogeneous web objects [C]. in Proceedings of the3rd international conference onweb information systems engineering,2002,161-172.
    [159] Wang J, Zeng H, Chen Z, et al. Recom: reinforcement clustering ofmulti-type interrelated data objects [C]. in Proceedings of the26th annual internationalACM SIGIR conference on Research and development in informaion retrieval,2003,274-281.
    [160] Shi C, Kong XN, Yu P, et al. Relevance search in heterogeneous networks[C]. in Proceedings of the15th International Conference on Extending DatabaseTechnology (EDBT '12),2012,180-191.
    [161] Wang X, Tang L, Gao H, and et al. Discovering overlapping groups in socialmedia [C]. in Proceedings of the10th IEEE International Conference on Data Mining,2010,569-578.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700