基于Web和Email的多元社会网络抽取与分析关键技术研究

英文题名：Research on Some Key Technologies of Multi-element Social Network Extraction and Analysis Based on the Web and the Email
作者：尹美娟
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：社会网络 ; 社会网络分析 ; 多元社会网络 ; 人物属性抽取 ; 社会关系评估 ; 社团发现
英文关键词：Social Network ; Social Network Analysis ; Multi-element Social Network ; Person
英文关键词：Attribute Extraciton ; Social Relation Evaluation ; Community Detection
学位年度：2012
导师：王清贤
学科代码：081202
学位授予单位：解放军信息工程大学
论文提交日期：2012-04-15

摘要

随着信息技术和网络通信技术的发展，利用互联网组织的违法行为和活动越来越多。如何从多种网络数据中准确抽取人物的属性和社会关系等信息，进而挖掘潜在的关键人物和社团组织等网情信息，已成为一个备受关注的问题。目前，基于单一网络数据的社会网络抽取与分析技术已比较成熟，但相关技术还无法解决基于多种网络数据的社会网络抽取与分析问题。本文对基于网络数据的社会网络抽取与分析相关技术的应用和研究现状进行了分析，在此基础上，针对Web页面和Email消息这两种数据，围绕基于多种网络数据的社会网络抽取与分析中的几个关键技术，包括社会网络模型、人物属性抽取、社会关系评估及社团发现等，开展了深入研究。主要工作和研究成果包括以下几个方面：
     （1）社会网络模型方面。针对现有社会网络模型不能充分描述人物在多种网络数据中的属性及社会关系信息，提出了多元社会网络的概念与模型，并给出了基于Web和Email的多元社会网络实例的具体描述方法；该模型为基于多种网络数据的人物的属性抽取、社会关系评估和社团发现等社会网络抽取与分析技术提供了研究基础。基于此模型，提出了多元社会网络抽取分析技术框架，并对其中的关键技术进行了分析，该框架对面向多元社会网络的相关研究和系统设计具有较好的指导意义。
     （2）基于Web页面的人物属性抽取方面。针对现有Web人物属性抽取的概念和方法不能适用于Web页面中不同类型的人物属性自动抽取问题，提出了广义Web人物属性抽取的概念，并对其进行了形式化描述。为解决广义Web人物属性抽取问题，提出了基于多特征自动推理的Web人物属性抽取方法（MFAR）。在MFAR方法的关联规则定义问题上，提出了多种具有通用性的关联特征，建立了基于单一特征和多特征的属性关联规则，对关联特征和关联规则进行了逻辑表示。提出了利用Markov逻辑网来解决MFAR方法中的关联规则自动训练与推理问题，并给出了基于Markov逻辑网的关联规则自动训练与推理框架。实验结果表明：面对不同类型的Web人物属性抽取问题，与现有基于单一规则的Web人物属性抽取方法相比，该算法可以更准确地从Web页面中自动抽取出人物属性。
     （3）基于Email数据的人物属性抽取方面。提出了基于邮件数据的人物属性抽取框架；针对框架中邮件正文称呼块和签名块内的候选人名属性抽取问题，提出了基于统计和规则的块定位算法；针对框架中候选人名可信度评估问题，提出了基于聚类和通信重要度的候选人名可信度评估算法，算法通过对候选人名聚类并分析人名在邮件通信中体现的重要度，评估候选人名类的可信度，进而抽取出人物的可信人名。在Enron邮件数据集上的实验结果表明利用提出的块定位算法可以较为准确地抽取出邮件正文中的称呼块和签名块，提出的候选人名可信度评估算法可以准确地抽取出人物的正式人名及其别名。
     （4）基于Web页面的社会关系评估方面。针对现有Web社会关系评估方法的评估结果准确度不高、稳定性不好等问题，提出了一个基于搜索引擎和文本分析的Web社会关系评估模型；在该模型的基础上，设计了两种关系评估函数，并构建了相应的关系评估方法。实验结果表明：与现有基于搜索引擎和基于文本分析的典型方法相比，基于SETARM模型设计的两种关系评估方法计算出的关系权重更准确、稳定性更好；模型中两类基本方法以线性方式融合且基于文本分析的方法其贡献更大时模型的性能更好。
     （5）社团发现算法方面。针对现有社团发现算法不能很好地解决多元社会网络中的社团发现问题，提出了多元社会网络中社团发现的基本思想。基于该思想，针对多元社会网络向有权网络的转换问题，提出了综合多元信息的关系紧密度评估方法（MICE）；针对有权网络中的社团发现问题，提出了两阶段局部贪婪扩展算法（TSLGE），算法在种子的选择、扩展评价函数的定义和相似社团合并等关键问题的处理上提出了改进方法。在基于Enron邮件集构建的多元社会网络实例上的实验结果，验证了利用MICE方法评估出的节点之间的关系紧密度与真实社会关系更接近；在仿真网络和基于Enron邮件集构建的多元社会网络实例上进行的社团发现实验结果表明，TSLGE算法具有较好的时间性能，且与现有基于局部扩展的典型社团发现算法相比，TSLGE算法在无权网络和有权网络上均可以比较准确地发现网络中的社团。
     最后，对全文工作进行了总结，并对多元社会网络抽取和分析技术进行了展望，提出了下一步的研究方向。
With the development of information technology and network communication technology,the number of illegal activities and incidents through the Internet is getting increasingly larger.Therefore, it has become an important research topic to accurately extract person attributes andsocial relations from multiple types of network data, and then mine the potential key persons andcommunity organizations. Although there are many developed technologies about social networkextraction and analysis on a single type of network data, these technologies can’t resolve theproblem of extracting and analyzing social networks based on a variety of network data. Thisthesis firstly analyzes the related works and applications about the important technologies onsocial network extraction and analysis based on network data. Then, taking Web pages and Emaildata as examples, this thesis carries out an in-depth study on several key issues in social networkextraction and analysis based on a variety of network data. These issues include the socialnetwork models, person attribute extraction, social relation evaluation and community detection.And the primary works and contributions are as follows:
     (1) The social network models. As the existing social network models can’t show the entireinformation about the attributes of persons and social relations among persons in a great deal ofnetwork data, the concept and the model of multi-element social network are proposed, and thedescription of the multi-element social network instance based on the Web and the Email arepresented. And the model provides basis for researches on social network extraction and analysisbased on various network data, such as person attribute extraction, social relation evaluation,community detection, and etc. With this model, the framework of multi-element social networkextraction and analysis is put forward, and a brief analysis of the key techniques in theframework is presented. And the framework is a good guidance to different researches andsystem designs related to multi-element social network.
     (2) The person attribute extraction in Web pages. As the existing concept and approach ofWeb person attribute extraction can not resolve the problem of automatically extracting personattributes in Web pages with different types of the known person attribute, the concept andformalization description of the generalized Web person attribute extraction are proposed. Inorder to solve the problem of the generalized Web person attribute extraction, a novel Webperson attribute extraction method named MFAR is put forward, which extracts person attributesby using multi-feature automated reasoning. In defining the attribute association rules of MFAR,multiple association features with good versatility are raised, based on one or several of whichthe attribute association rules are defined, and logical representations of the association featuresand rules are presented as well. Also, the problems of automated training and reasoning theassociation rules in MFAR are resolved by using the Markov Logic Networks, and theframework of automated training and reasoning association rules based on the Markov LogicNetworks is put forward. The experiment results show that when faced with different kinds ofWeb person attribute extraction problems, the proposed approach can more accurately extractperson attributes from Web pages automatically than some of the existing methods based on asingle rule.
     (3) Person attribute extraction in Email data. The Email data based person attributeextraction framework is raised to solve the problem of person attribute extraction from Emaildata. Considering one of the problems in the framework, that is extracting the candidate nameattribute from salutations and signatures in Email bodies, the statistics and rule based blocklocating algorithm is proposed. And for the other problem, that is ranking the reliabilities of thecandidate names, the candidate name reliability evaluation algorithm based on clustering andcommunication importance is put forward. This algorithm evaluates candidate name reliabilityby clustering candidate names and analyzing the importance of names in Email communications,and extract persons’ credible names and their aliases based on the reliabilities. The experimentresults on the Enron Email datasets show that the proposed block locating algorithm canrelatively locate and extract salutation and signature texts in Email bodies, the candidate namereliability evaluation algorithm can precisely extract person’s formal names and aliases.
     (4) Social relation evaluation based on Web pages. As the existing Web social relationevaluation method is insufficient to acquire accurate and stable results, a Web social relationevaluation model named SETARM based on the search engine and the text analysis is proposed.With this model, two typies of relation evaluation functions are designed and the correspondingevalutation methods are presented. The experiment results demonstrate that the SETARM modelbased relation evaluation methods are able to acquire relatively high accuracy and stability, andthe performance of the model can be better when the two primary relation evaluation approachesof the model are integrated in the linear way and the method based on text analysis makes abigger contribution.
     (5) Community detection algorithms. As the existing community detection algorithms arenot able to well solve the problems of community detection in multi-element social network, thebasic idea of community detection in multi-element social network is proposed. On account ofthis idea, to transform the multi-element social networks into the weighted networks, themulti-element information based relation closeness evaluation method named MICE is putforward. In order to discover communities on the weighted networks, the two-stage local greedexpansion algorithm named TSLGE is proposed. This algorithm makes improvement in the keyissues, such as seed selection, expansion evaluation function definition, similar communitiesmerging, and etc. The experiment results on the Enron Emai datasets show the relation closenessevaluated by the MICE method can reflect the real relationship among people. The experimentresults on the synthetic benchmarks and empirical networks show that the performance on therun time of TSLGE is good, and compared with some typical community detection algothrimsbased on local expansion, TSLGE can detect communities on both unweigthed networks andweigthed networks with good qualities.
     Finally, the research work of this thesis is summarized, and the future developing directionsof multi-element social network extraction and analysis are indicated.

引文

[1] I. Gorton, P. Greenfield, A. Szalay, R. Williams, Data-Intensive Computing in the21stCentury[J]. Computer,2008,41(4):30-32.
    [2] J. Scott. Social Network Analysis: A Handbook[M]. Newbury Park. CA: Sage Publications,2000.
    [3]A.R.Radcliffe Brown, On Social Structrue[J],The Journal of the Royal AnthropologicalInstitute of Great Britain and Ireland,1940,70(1):1-12.
    [4]M. Jamali, and H. Abolhassani, Different Aspects of Social Network Analysis[C], Ieee,2006:66-72.
    [5]郑楠.异构社会网络挖掘方法研究[D].吉林:吉林大学,2011.
    [6]M.Girvan, M.E.J.Newman. Community Structure in Social and Biological Networks[A]. In:Proceedings of the National Academy of Sciences of the United States of America[C]. USA,2002,99(12):7821-7826.
    [7]F.Radicchi, C.Castellano, F.Cecconi, et al.. Defining and Identifying Communities inNetworks[A]. In: Proceedings of the National Academy of Sciences[C],2004,101(9):2658-2663.
    [8]I. Farkas, D. Abel, G. Palla, T. Vicsek. Weighted network modules[J]. New Journal ofPhysics.,2007,9(6):180.
    [9]D. Cai, Z. Shao, X. He, X. Yan, J. Han. Mining hidden community in heterogeneous socialnetworks[A]. In: Workshop on Link Discovery: Issues, Approaches and Applications(LinkKDD’05)[C], Chicago,2005:58–65.
    [10]林聚任.社会网络分析:理论、方法与应用[M].北京:北京师范大学出版社.2009.
    [11]K. M. Carley. Dynamic network analysis[A]. In: R. Breiger, K. M. Carley,&P.Pattison(Eds.). Dynamic social network modeling and analysis: Workshop summary andpapers[M]. Washington, DC: The National Academies Press,2003:134-145.
    [12]M.V. Alstyne, J. Zhang. EmailNet: A System for Automatically Mining Social Networksfrom Organizational Email Communication[A]. In: Proceedings of the North AmericanAssociation for Computational Social and Organizational Science [C], Pittsburgh: CarnegieMellon,2003.
    [13]J. Xu, H. Chen, CrimeNet Explorer: A Fremework for Criminal Network KnowledgeDiscovery[J]. ACM Transactions on Information Systems,2005,23(2):201-226.
    [14]C. Neustaedter, A. Brush, M.Smith, D. Fisher. The Social Network and RelationshipFinder:Social Sorting for Email Triage[A]. In: Proceedings of the2005Conference on Emailand Anti-Spam (CEAS)[C].2005.
    [15]P. Mika. Flink: Semantic web technology for the extraction and analysis of social networks[J]. Web Semantics: Science, Services and Agents on the World Wide Web,2005,3(2-3):211-223.
    [16]Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H. Takeda, K. Hasida, M.Ishizuka. Polyphonet: An advanced social network extraction system[A], In: Proceedings ofthe15th International World Wide Web Conference,2006:397-406.
    [17]唐常杰,刘成,温粉莲等.社会网络分析和社团信息挖掘的三项探索—挖掘虚拟社团的结构、核心和通信行为[J].计算机应用,2006,26(9):2020-2023.
    [18]乔少杰,唐常杰,于中华等.基于属性筛选支持向量机挖掘虚拟社团结构[J].计算机科学,2005,32(7增A):208-212..
    [19]温粉莲,唐常杰,乔少杰等.基于社会网络最短路径挖掘犯罪集团核心[J].计算机科学,2006,33(增11):266-268.
    [20]刘威,唐常杰,乔少杰等.基于概念邮件系统的犯罪数据挖掘新方法[J].计算机科学,2007,34(2):213-215.
    [21]乔少杰,唐常杰,彭京,刘威,温粉莲,邱江涛.基于个性特征仿真邮件分析系统挖掘犯罪网络核心[J].计算机学报,2008,31(10):1795-1803.
    [22]He Jing, Liu Yuan, Tu Qichen, Yao Conglei, Di Nan. Efficient Entity Relation Discovery onWeb [J]. Journal of Computational Information Systems,2007,3(2):203-213.
    [23]姚从磊,邸楠.一种基于Web的大规模人物社会关系提取方法[J].模式识别与人工智能,2007,20(6):740-744.
    [24]邸楠,姚从磊,李晓明.基于中文Web社会网络的提取、测量与分析[J].广西师范大学学报:自然科学版,2007,25(2):169-172.
    [25]Y. Matsuo, M. Hamasaki, H. Takeda, J. Mori, B. Danushka,, H. Nakamura,T. Nishimura, K.Hashida, M. Ishizuka,Spinning multiple social network forsemantic web[A]. In: Proceedingsof the21st National Conference on Artificial Intelligence(AAAI)[C].2006.
    [26]J. J. Jung, C. M. Koo, G. S. Jo, A divide-and-conquer approach to detecting latentcommunity of practice from virtual organizations[A], In: Proceedings of the2007International Conference on Convergence Information Technology[C], Los Alamitos, CA,USA,2007:129-134.
    [27]M. Szell, R. Lambiotte, and S. Thurner. Multirelational organization of large-scale socialnetworks in an online world[A]. In: Proceedings of the National Academy of Sciences[C],2010,107(31):13636-13641.
    [28]林琛. WEB环境下的社会网络挖掘研究[D].上海:复旦大学,2009.
    [29]牛树梓.基于数据挖掘技术的网络社区发现方法的研究与实现[D].沈阳:东北大学,2009.
    [30]索利军.多关系社会网络分析和可视化系统的研究[D].北京:北京邮电大学,2010.
    [31]I.H. Ting, H. J. Wu, P. S. Chang. Analyzing multi-source social data for extracting andmining social networks[A]. In: Proceedings of the International Conference onComputational Science and Engineering[C]. Vancouver, BC, Canada,2009:815-820.
    [32]Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han. Community Mining fromMulti-relational Networks[A]. In: Proceedings of the9thEuropean Conference on Principlesand Practice of Knowledge Discovery in Databases[C]. Porto, Portugal,2005:445-452.
    [33]Zeqian Shen, Kwan-Liu Ma, Tina Eliassi-Rad. Visual analysis of large heterogeneous socialnetworks by semantic and structural abstraction[J]. IEEE Transactions on Visualization andComputer Graphics2006;12(6):1427-1439.
    [34] E. P. Lim, Maureen, N. L. Ibrahim, A. Sun, A. Datta, K. Chang. Ssnetviz: a visualizationengine for heterogeneous semantic social networks[A]. In: Proceedings of the InternationalConference on Electronic commerce[C]. Taipei,China,2009:213–221.
    [35]Y. R. Lin, J. Sun, P. Castro R. Konuru, H. Sundaram, A. Kelliher. Metafac: Communitydiscovery via relational hypergraph factorization[A]. In: Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining[C]. Paris, France,2009:527-536.
    [36]Marko A. Rodriguez, Joshua Shinavier. Exposing multi-relational networks tosingle-relational network analysis algorithms[J]. Informetrics (JOI),2010,4(1):29-41.
    [37]S. Sekine, J. Artiles. Weps2attribute extraction task[A]. In:2nd Web People SearchEvaluation Workshop (WePS2009),18th WWW Conference[C]. Madrid, Spain,2009.
    [38]B. Rosenfeld, R. Feldman. URES: An unsupervised web relation extraction system[A]. InProceedings of the Association for Computational Linguistics (ACL)[C]. Sydney, Australia,2006:667-674.
    [39]R. Bunescu, R. Mooney. Learning to extract relations from the web using minimalSupervision[A]. In Proceedings of the Association for Computational Linguistics (ACL)[C].Prague, Czech Republic,2007:576-583.
    [40]Y. Chen, S. Y. M. Lee, and C.-R. Huang. Polyuhk: A robust information extraction systemfor web personal names. In:2nd Web People Search EvaluationWorkshop (WePS2009),18th WWW Conference[C]. Madrid, Spain,2009.
    [41]Xianpei Han and Jun Zhao. CASIANED: People Attribute Extraction based on InformationExtraction. In:2nd Web People Search Evaluation Workshop (WePS2009),18th WWWConference[C]. Madrid, Spain,2009.
    [42]于满泉.面向人物追踪的知识挖掘研究[D].北京:中国科学院研究生院（计算技术研究所）,2006.
    [43]叶正,林鸿飞,苏绥,刘菁菁.基于支持向量机的人物属性抽取[J].计算机研究与发,2007, z2:271-275.
    [44]周婷.异构信息源的领域人物信息抽取研究[D].哈尔滨:哈尔滨工业大学,2010.
    [45]P. Kalmar and D. Freitag. Features for web person disambiguation. In:2nd Web PeopleSearch Evaluation Workshop (WePS2009),18th WWW Conference[C]. Madrid, Spain,2009.
    [46]L. Jiang, J. Wang, N. An, S. Wang, J. Zhan, L. Li, Grape: A graph-based framework fordisambiguating people appearances in web search[A]. In: Proceedings of the IEEEInternaional Conference on Data Mining-ICDM2009[C]. Miami, FL, USA,2009:199-208.
    [47]C. Long, L. Shi. Web person name disambiguation by relevance weighting of extendedfeature sets[A]. In: Proceedings of the Third Web People Search Evaluation Forum (WePS-3)CLEF2010[C]. Padua, Italy,2010:1-13.
    [48]I. T. Nagy, R. Farkas. Person attribute extraction from the textual parts of Web pages[A]. In:Proceedings of the Third Web People Search Evaluation Forum (WePS-3) CLEF2010[C].Padua, Italy,2010.
    [49]J. Artiles, A. Borthwick, J. Gonzalo, S. Sekine, and E. Amigó. Weps-3evaluation campaign:Overview of the web peoplesearch clustering and attribute extraction tasks[A]. In:Proceedings of the Third Web People Search Evaluation Forum (WePS-3) CLEF2010[C].Padua, Italy,2010.
    [50]Octavian Popescu, Bernardo Magnini. Irst-bp: Web people search using name entities[A]. InProceedings of the Fourth International Workshop on Semantic Evaluations[C]. Prague,Czech Republic,2007:195-198.
    [51]刘金红,陆余良,施凡,宋舜宏.基于语义上下文分析的因特网人物信息挖掘[J].安徽大学学报：自然科学版,2009,4:33-37.
    [52]A. Cakmak, M. Kirac, G. Ozsoyoglu. PopulusLog: People Information Database[A].Computer and Information Sciences,(ISCIS2009)[C]. Turkey,2009:165-170.
    [53]S. Satpal, S. Bhadra, S. Sellamanickam, R. Rastogi, P. Sen, Web information extractionusing Markov logic networks[A]. In: Proceedings of the17th ACM SIGKDD internationalconference on Knowledge discovery and data mining[C]. San Diego, CA,2011:1406-1414.
    [54]J. Zhu, Z. Nie, J. Wen, B. Zhang, W. Ma. Simultaneous record detection and attributelabeling in web data extraction[A]. In: Proceedings of the12th ACM SIGKDD internationalconference on Knowledge discovery and data mining[C]. Philadelphia, USA,2006:494--503.
    [55]C. Bird, A. Gourley, P. Devanbu, M. Gertz, A. Swaminathan, Mining email socialnetworks[A]. In: Proceedings of the2006international workshop on mining softwarerepositories[C]. Shanghai, China,2006:137-143.
    [56]C. Diehl, L. Getoor, G. Namata, Name reference resolution in organizational emailarchives[A]. In: Proceedings of the SIAM International Conference on Data Mining[C].Bethesda, MD, USA,2006:70-81.
    [57]T. Elsayed, D. W. Oard. Modeling Identity in Archival Collections of Email[A]. In:Proceedings of the Third Conference on Email and Anti-Spam[C]. Mountain View,California, USA,2006:95-103.
    [58]H. Chen, J. Hu, R. Sproat. Integrating geometrical and linguistic analysis for e-mailsignature block parsing[J]. ACM Transactions on Information Systems,1999,17(4):343-366.
    [59]Tamer Elsayed, Douglas W. Oard, Galileo Namata. Resolving personal names in email usingcontext expansion[J]. In Association for Computational Linguistics(ACL),2008.
    [60]T. Elsayed, G. Namata, L. Getoor, and D. W. Oard. Personal name resolution in email: Aheuristic approach[J]. Technical Report UMIACS LAMP-TR-150, University of Maryland,2008.
    [61]V. Carvalho, W. Cohen. Learning to extract signature and reply lines from email[A]. InProceedings of the2004Conference on Email and Anti-Spam (CEAS04)[C]. MountainView, California,2004.
    [62]李潇,罗军勇,尹美娟.基于邮件通联关系的邮箱用户权威别名评估[J].计算机应用与软件,2011,28(4):271-279.
    [63]李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用,2003,39(17):7-10.
    [64]于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495.
    [65]闵可锐,赵迎宾,刘昕,赵泽宇,闫华.互联网话题识别与跟踪系统设计及实现[J].计算机工程,2008,34(19):212-214.
    [66]G. Cselle, K. Albrecht, R. Wattenhofer. BuzzTrack: Topic Detection and Tracking inEmail[A]. In: Proceedings of the12th international conference on Intelligent userinterfaces[C]. Honolulu, Hawaii, USA,2007:190-197.
    [67]A. C. Surendran, J. C. Platt, E. Renshaw. Automatic Discovery of Personal Topics toOrganize Email[A]. In: Proceedings of the Conference on Email and Anti-Spam(CEAS)’05[C]. Stanford University, California, USA,2005.
    [68]Han Jiawei, M. Kamber数据挖掘概念与技术[M].北京:机械工业出版社,2005.
    [69]Gerard Salton, Cblris Buekley. Term Weighting Approaches in Automatic Text Retrieval[J].Information Processing&Retrieval,1998,24(5):513-523.
    [70]Salton G, Wong A, Yang C. A Vector Space Model for Automatic Indexing[J].Communications of ACM,1975,18(11):613-620.
    [71]姚清耘,刘功申,李翔基.基于向量空间模型的文本聚类算法[J].计算机工程,2008,34(18):39-41.
    [72]张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78.
    [73]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
    [74]Frey B J, Dueck D. Clustering by Passing Messages Between Data Points[J].Science,2007,315:972-976.
    [75]管仁初,裴志利,时小虎,杨晨,梁艳春.权吸引子传播算法及其在文本聚类中的应用[J].计算机研究与发展,2010,47(10):1733-1740.
    [76]H. Kautz, B. Selman, M. A. Shah. The hidden Web [J]. AI Magazine,1997,18(2):27-36.
    [77]L. Adamic, E. Adar. Friends and neighbors on the web [J]. Social Networks,2003,25(3):211-230.
    [78]M. Oka, Y. Matsuo. Measuring the Weight of Relations Between Entities[A]. In: Proceedingsof the Third International Workshop on Service Matchmaking and Resource Retrieval in theSemantic Web[C], Washington DC,USA,2009.
    [79]H. Kautz, B. Selman, M. Shah. ReferralWeb: Combining social networks and collaborativefiltering[J]. Communications of the ACM,1997,40(3):63-65.
    [80]C. D. Manning, H. Schutze. Foundations of statistical natural language processing[J]. TheMIT Press, London,2002:999.
    [81]S. H. Lee, P. J. Kim, Y. Y. Ahn, H. Jeong. Googling social interactions: Web search enginebased social network construction[J]. PLoS One,2010,5:e11233.
    [82]Y. Matsuo, H. Tomobe, K. Hasida, M. Ishizuka. Finding social network for trustcalculation[A]. In: Proceedings of the16th European Conference on ArtificialIntelligence[C]. Valencia, Spain,2004:510-514.
    [83]Y. Matsuo, H. Tomobe, K. Hasida, M. Ishizuka. Social network extraction from the webinformation[J]. Japanese Society for Artificial Intelligence,2005,20(1E):46–56.
    [84]X. Canaleta, P. Ros, A. Vallejo, D. Vernet, A. Zaballos. A system to extract social networksbased on the processing of information obtained from Internet [J]. In: Proceedings of the11th International Conference of the Catalan Association for Artificial Intelligence[C]. SantMartí d'Empúries (Spain),2008:283-292.
    [85]Y. Jin, Y. Matsuo, M. Ishizuka. Extracting social networks among various entities on theweb[A]. In: Proceeding of the4th European Semantic Web Conference[C]. Innsbruck,Austria2007:251-266.
    [86]E. Jin, Y. Matsuo, M. Ishizuka. Extracting a social network among entities by webmining[A]. In: Proceeding of the ISWC’06Workshop on Web Content Mining with HumanLanguage Technologies[C]. Athens, GA,2006.
    [87]Y. Matsuo, H. Tomobe, K. Hasida, M. Ishizuka. Mining social network of conferenceparticipants from the Web[A]. In: Proceedings of the IEEE/WIC International Conference onWeb Intelligence[C]. Halifax, Canada,2003:190-194.
    [88]X. Li, B. Liu, P. S. Yu. Mining Community Structure of Named Entities from Web Pages andBlogs[A]. In: Proceedings of the AAAI Spring Symposia2006on ComputationalApproaches to Analysing Weblogs[C]. Stanford, California, USA2006.
    [89]谢德平. Mining Email读书报告[J].软件学报,2004,15(1):1-10.
    [90]J. V. Reijsen, R. Helms, T. Jackson, A. Vleugel, S. Tedmori. Mining Email to LeverageKnowledge Networks in Organizations[A]. In: Proceedings of the10th EuropeanConference on Knowledge Management(ECKM2009)[C]. Vicenza, Italy,2009.
    [91]R. Rowe, G. Creamer, S. Hershkop, S. J. Stolfo. Automated Social Hierarchy Detectionthrough Email Network Analysis[A]. In: Joint Workshop on the9th WEBKDD and the1stSNAKDD[C]. San Jose, California, USA,2007.
    [92]Anatoliy Gruzd, Caroline Haythornthwaite. Automated Discovery and Analysis of SocialNetworks from Threaded Discussions[C]. In: Proceedings of the International Network ofSocial Network Analysis (INSNA) Conference[C], Florida,2008.
    [93]A.L. Lockerd. Understanding implicit social context in electronic communication[D].Boston: Massachusetts Institute of Technology (MIT),2002.
    [94]R. McArthur, P. Bruza. Discovery of Social Networks and Knowledge in Social Networks byAnalysis of Email Utterances[A]. In: Proceedings of the European Conference ComputerSupported Cooperative Work2003(ECSCW'03) Workshop on Social Networks[C]. Helsinki,Finland,2003.
    [95]J.R. Tyler, D.M. Wilkinson, B.A. Huberman. Email as Spectroscopy: Automated Discoveryof Community Structure within Organizations[J]. The Information Society,2005,21:143-153.
    [96]C. Neustaedter, A.J. B. Brush, M.A. Smith, D. Fisher. The Social Network and RelationshipFinder: Social Sorting for Email Triage(SNARF)[A]. In: Proceedings of the Conference onEmail and Anti-Spam (CEAS’05)[C]. Stanford, California,2005.
    [97]熊金,刘悦,白硕.基于结构的e-mail挖掘算法：EHITS[J].计算机应用研究,2008,25(4):1171-1174.
    [98]Yan Liu, Qingxian Wang, Qiang Wang, Qing Yao, Yao Liu. Email Community DetectionUsing Artificial Ant Colony Clustering[A]. In: Proceedings of the Advances in Web andNetwork Technologies, and Information Management[C]. Springer, Berlin,Heidelberg,2007:287-298.
    [99]张中军,郭华平,范明.带调整策略的微聚类-宏聚类邮件社区划分算法[J].小型微型计算机系统,2010,10:1970--1973.
    [100]H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, A.-L. Barab′asi. The large-scale organizationof metabolic networks[J]. Nature407,2000:651–654.
    [101]E. Ravasz,, A. L. Somera,, D. Mongru, A., Z. Oltvai, A. L. Barabasi. Hierarchicalorganization of modularity in metabolic networks[J]. Science297,2002:1551-1555.
    [102]B. W. Kernighan, S. Lin An. efficient heuristic procedure for partitioning graphs[J]. BellSystem Technical Journal,1970,49:291-307.
    [103]A. Pothen, H. Simon, K. P. Liou. Partitioning sparse matrices with eigenvectors of graphs.SIAM J, Matrix Anal, Applic,1990,11(3):430-452.
    [104]M. E. J. Newman, M. Girvan. Finding and evaluating community structure innetworks[J],Phys Rev,2004, E69:026113.
    [105]M. E. J. Newman. Fast algorithm for detecting community structure in networks[J]. PhysRev.2004, E69:066133.
    [106]R. L. Breiger, S. A. Boorman, P. Arabie. An Algorithm for Clustering Relations Data withApplications to Social Network Analysis and Comparison with MultidimensionalScaling[J]. Journal of Mathematical Psychology,1975,12:328-383.
    [107]A. Capocci，V. D. P. Servedio，G. Caldarelli，F. Colaiori． Detecting communities in largenetworks[J]. Physica A: Statistical Mechanics and Its Applications,2005,352(2-4):669-676.
    [108]M. Meila, D. Verma. A comparison of spectral clustering algorithms[R], Washington:University of Washington Department of Computer Science Technical Report,2003.
    [109]Jian Liu, Na Wang. Detecting Community Structure of Complex Networks by AffinityPropagation[A]. In: Proceedings of the Intelligent Computing and IntelligentSystems.(ICIS2009)[C]. Shanghai, China,2009:13-19.
    [110]Yang Shuzhong, Luo Siwei. Community Detection Based on Adaptive Kernel AffinityPropagation[A] In: Proceedings of the Computer Science and InformationTechnology(ICCSIT2009)[C]. BeiJing, China,2009:1-4.
    [111]G. Palla, I. Derenyi, I. Farkas, T. Vicsek. Uncovering the overlapping community structureof complex networks in nature and society[J]. Nature,2005,435:814–818,
    [112]J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: thestate of the art and comparative study[J]. Arxiv preprint arXiv:1110.5813,2011.
    [113]J.M. Kumpula, M. Kivel, K. Kaski, J. Saram ki. Sequential algorithm for fast cliquepercolation[J]. Phys. Rev. E,2008,78(2):26109.
    [114]Bowen Yan and Steve Gregory. Detecting communities in networks by merging cliques[A].In: Proceedings of the Intelligent Computing and Intelligent Systems.(ICIS2009)[C].Shanghai, China,2009:832-836.
    [115]Y. Y. Ahn, J. P. Bagrow, S. Lehmann.. Link communities reveal multiscale complexity innetworks[J]. Nature466,2010:761-764.
    [116]TS. Evans, R. Lambiotte. Line graphs, link partitions and overlapping communities[J].Phys. Rev. E.2009,80(1):016105.
    [117]TS. Evans, R. Lambiotte.. Line graphs of weighted networks for overlappingcommunities[J]. The European Physical Journal B-Condensed Matter and ComplexSystems,2010,77(2):265-272.
    [118]S. Fortunato. Community detection in graphs[J]. Physics Reports486,2010:75-174.
    [119]S. Zhang,, R. S. Wangb, X. S. Zhang. Identification of overlapping community structure incomplex networks using fuzzy c-means clustering[J]. Physica A374,2007:483-490.
    [120]S. Zhang,, R. S. Wangb, X. S. Zhang. Uncovering fuzzy community structure in complexnetworks[J]. Phys. Rev. E76,2007,4,046103.
    [121]Ding, F., Luo, Z., Shi, J., and Fang, X. Overlapping community detection by kernel-basedfuzzy affinity propagation[A]. In: Proceedings of the Intelligent Systems and Applications(ISA),20102nd International Workshop on [C]. WuHan,2010:1-4.
    [122]骆志刚,丁凡,蒋晓舟,石金龙.复杂网络社团发现算法研究新进展.国防科技大学学报,2011,33(001):47-52.
    [123]J. Baumes, M. Goldberg, M. Krishnamoorthy, M. Magdon-Ismail, N. Preston. Findingcommunities by clustering a graph into overlapping subgraphs[A]. In: Proceedings of theInternational Conference on Applied Computing (IADIS2005)[C]. Algarve, Portugal,2005:97-104.
    [124]J. Baumes, M. Goldberg, M. Magdon-Ismail. Efficient identification of overlappingcommunities[A]. In: Proceedings of the Intelligence and Security Informatics(ISI2005)[C].Atlanta, Georgia, USA,2005:1-4.
    [125]S. Kelley, M. Goldberg, M. Magdon-Ismail, K. Mertsalov, A. Wallace. Defining andDiscovering communities in social networks[J]. Handbook of Optimization in ComplexNetworks,2011:139-168.
    [126]A. Lancichinetti, S. Fortunato, J. Kert′esz. Detecting the overlapping and hierarchicalcommunity structure of complex networks[J]. New Journal of Physics,2009,11:033015.
    [127]F. Havemann, M. Heinz, A. Struck, J. Glaser. Identification of overlapping communitiesand their hierarchy by locally calculating community-changing resolution levels[J]. Journalof Statistical Mechanics: Theory and Experiment,2011,01: P01023.
    [128]C. Lee, F. Reid, A. McDaid, N. Hurley. Detecting highly overlapping community structureby greedy clique expansion[A]. In: Proceedings of the4th Workshop on Social NetworkMining and Analysis[C]. Washington, DC,2010:33-42.
    [129]S. Ming-Sheng, C. Duan-Bing, Z. Tao.Detecting. Overlapping Communities Based OnCommunity Cores in Complex Networks[J]. Chinese Physics Letters,2010,27(5):058901.
    [130]R. Cazabet, F. Amblard, and C. Hanachi. Detection of Overlapping Communities inDynamical Social Networks[A]. In: Proceedings of the IEEE International Conference onSocial Computing(SOCIALCOM2010)[C], Minneapolis, Minnesota, USA,2010:309-314.
    [131]I.S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevelapproach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(11):1944-1957.
    [132]D. Chen, M. Shang, Z. Lv, Y. Fu. Detecting Overlapping Communities of WeightedNetworks Via a Local Algorithm[J]．Physiea A：Statistical Mechanics and its Applications,2010,389(19):4177-4187.
    [133]吴玲玉,高学东.考虑对象属性信息的复杂网络社团结构发现算法.数学的实践与认识,2010,024:161-167.
    [134]Y. Tian, R. A. Hankins, J. M. Patel. Efficient aggregation for graph summarization[A]. In:Proceedings of the2008ACM SIGMOD International Conference On Management ofData (SIGMOD’08)[C]. Vancouver, Canada,2008:567-580.
    [135]Y. Zhou, H. Cheng, J. X. Yu, Graph clustering based on structural/attribute similarities[J],Proceedings of the VLDB Endowment,2009,2(1):718-729.
    [136]Y. Zhou, H. Cheng, J. X. Yu, Clustering Large Attributed Graphs: An Efficient IncrementalApproach[A]. In: Proceedings of the2010IEEE International Conference on DataMining[C]. Sydney, Australia,2010:689-698.
    [137]L Tang, X. Wang, H. Liu. Uncovering groups via heterogeneous interaction analysis[A]. In:Proceeding of the IEEE International Conference on Data Mining[C]. Miami, Florida, USA,2009:503-512.
    [138]Y. Song, J. Huang, I. Councill, J. Li, C. Giles.Efficient topic-based unsupervised nameDisambiguation[A]. In: Proceedings of the ACM/IEEE Joint Conference on DigitalLibraries[C]. Vancouver, BC,2007:342-351.
    [139]Malouf R. Markov models for language-independent named entity recognition[A]. In:Proceedings of the Sixth Conference on Computational Natural Language Learning[C].Taipei, China,2002:187-190.
    [140]P. Domingos and M. Richardson. Markov logic: A unifying framework for statisticalrelational learning[A]. In Proceedings of the International Conference on MachineLearning (ICML2004) Workshop on Statistical Relational Learning and its Connections toOther Fields[C]. Banff, Alberta, Canada,2004:49-54.
    [141]X. Li, J. Bilmes. A Bayesian divergence prior for classifier adaptation[A]. In Proceedings ofthe11th International Conference on Artificial Intelligence and Statistics[C], San Juan,PuertoRico,2007:180-191.
    [142]刘大有,齐红,孙舒杨,孙成敏,高滢,刘杰。统计关系学习综述[A]．中国人工智能学会第11界全国学术年会论文集:中国人工智能进展[C]．北京:北京邮电大学出版社,2005:241-253.
    [143]徐从富,郝春亮,苏保君,楼俊杰.马尔可夫逻辑网络研究[J]. Journal of Software,2011,22(8):1699-1713.
    [144]S. Kok, P. Domingos. Statistical predicate invention[A]. In: Proceedings of the24thinternational conference on Machine learning (ICML2007)[C]. Corvallis,2007:433-440.
    [145]Singla P, Domingos P. Entity resolution with Markov logic[A]. In: Proceedings of the6thIEEE Int’l Conf. on Data Mining (ICDM2006)[C]. Hong Kong,2006:572-582.
    [146]H. Poon, P. Domingos. Joint inference in information extraction[A]. In: Proceedings of the22nd National Conf. on Artificial Intelligence(AAAI2007)[C]. Vancouver,2007:913-918.
    [147]P. Singla, H. Kautz, J. B. Luo. Gallagher A. Discovery of social relationships in consumerphoto collections using Markov logic[A]. In: Proceedings of the CVPR Workshop onSemantic Learning and Applications in Multimedia[C]. Anchorage,2008:1-7.
    [148]H. Poon, P. Domingos. Joint unsupervised coreference resolution with Markov logic[A]. In:Proceedings of the Conf. on Empirical Methods in Natural Language Processing (EMNLP2008)[C]. Honolulu,2008:650-659.
    [149]M. Richardson, P. Domingos. Markov logic networks[J]. Machine Learning,2006,62(1-2):107-136.
    [150]P. Singla, P. Domingos. Discriminative training of Markov logic networks[A]. In:Proceedings of the20th National Conf. on Artificial Intelligence(AAAI2005)[C].Pittsburgh,2005:868-873.
    [151] S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, P. Domingos. The AlchemySystem for Statistical Relational AI [EB/OL].: http://www.cs.wachington.edu/ai/alchemy,2010-11-8.
    [152] IBM developerWorks: IBM’s resource for developers and IT professionals [EB/OL].:http://www.ibm.com/developerworks/,2010-10-19.
    [153] The email collection of Enron Corporation [EB/OL].: http://www-2.cs.cmu.edu/~enron/.2003,2008-3-10.
    [154] gwt-google-apis: The Official Google API Libraries for Google Web Toolkit[EB/OL].:http://code.google.com/p/gwt-google-apis/,2009-10-17.
    [155]Jenny R. Finkel, Trond Grenager, and Christopher Manning. Incorporating non-localinformation into information extraction systems by Gibbs sampling[A]. In: Proceedings ofthe43rd Annual Meeting on Association for Computational Linguistics[C]. Ann Arbor,Michigan,2005:363-370.
    [156].Wanxiang Che, Zhenghua Li, Ting Liu. LTP: A Chinese Language Technology Platform
    [A]. In: Proceedings of the Coling2010: Demonstrations [C]. Beijing, China,2010:13-16.
    [157]Jeffrey Friedl．Mastering Regular Expressions[M]. O'Reilly Media,Inc.2006.
    [158]陈志伟.网页Email信息采集器设计与实[J].电脑编程技巧与维护,2009,24:83-86.
    [159]D. Cai, S. Yu, J. R. Wen, W. Y. Ma. VIPS: a visionbased page segmentation algorithm[R],Beijing: Microsoft Research Asia,2003.
    [160] F. Patman, P. Thompson. Names: A new frontier in text mining[A]. In: Proceedings of theIntelligence and Security Informatics(ISI-2003)[C]. Tucson, Az, USA,2003:27-38.
    [161]李彬.计算字符串相似度的矩阵算法[J].软件技术,2007,41(3):110-111.
    [162]于海英.字符串相似度度量中LCS和GST算法比较[J].电子科技,2011,24(3):101-103.
    [163]P. Christen. A Comparison of Personal Name Matching: Techniques and Practical Issues[R].Canberra, Australia: Computer Science Laboratory, The Australian National University,2006.
    [164]E. S. Ristad, P. N. Yianilos. Learning String Edit Distance[J]. IEEE Transactions on PatternAnalysis and Machine Intelligence.1998,20(5):522-532.
    [165]邢晓辉,刘慧.基于LCS的中文缩写字段匹配问题的研究[J].山东科学,2008,21(4):52-56.
    [166]王映龙,杨炳儒,宋泽锋,陈卓,唐建军.基因序列相似度的LCS算法研究[J].计算机工程与应用,2007,43(31):45-47.
    [167]C. Friedman, R. Sideli. Tolerating spelling errors during patient validation[J]. Computersand Biomedical Research,1992,25(5):486-509.
    [168]Leon Danon, Albert Díaz Guilera, Jordi Duch, Alex Arenas. Comparing communitystructure identification[J]. Journal of Statistical Mechanics: Theory and Experiment,2005(9): P09008–09008.
    [169]H. Tomobe, Y. Matsuo, K. Hasida. Social Network Extraction of ConferenceParticipants[A]. In: Proceedings of the Twelfth International World Wide WebConference(WWW2003)[C]. Budapest, Hungary,2003.
    [170]J. Tang, D. Zhang, L. Yao. Social Network Extraction of Academic Researchers[A]. In:Proceedings of the Seventh IEEE International Conference on Data Mining(ICDM2007)[C]. Omaha, Nebraska, USA,2007:292–301.
    [171]Matsuo Y, Tomobe H, Nishimura T. Robust Estimation of Google Counts for SocialNetwork Extraction [C]. AAAI2007, Twenty-Second National Conference on ArtificialIntelligence,2007:1395-1401.
    [172]H. Chen, M. Lin, and Y. Wei.2006. Novel associationmeasures using web search withdouble checking[A]. In: Proceedings of Proceedings of the21st International Conferenceon Computational Linguistics and the44th annual meeting of the Association forComputational Linguistics(COCLING/ACL2006)[C]. Morristown, NJ, USA,2006:1009-1016.
    [173]2011年福布斯中国名人榜[EB/OL].:http://www.forbeschina.com/review/201105/0009378.shtml,2011-9-5.
    [174]第七届全国信息检索学术会议(CCIR2011)程序委员会[EB/OL].:http://ir.sdu.edu.cn/ccir2011/organization.htm,2011-9-5.
    [175]第27届中国数据库学术会议（NDBC2010）程序委员会[EB/OL].:http://ndbc2010.ruc.edu.cn/committee.html,2011-9-5.
    [176]N. Du, B Wang, B. Wu. Overlapping community structure detection in networks[A]. InProceeding of the17th ACM conference on Information and knowledge management[C].Napa Valley, California, USA,2008:1371-1372.
    [177]H. Shen, X. Cheng, K. Cai, M. B. Hu. Detect overlapping and hierarchical communitystructure in networks[J]. Physica A: Statistical Mechanics and its Applications,2009,388(8):1706-1712.
    [178]H. Shen, X. Cheng, J. Guo. Quantifying and identifying the overlapping communitystructure in networks [J]. Journal of Statistical Mechanics: Theory and Experiment,2009:P07042.
    [179]S. H. Zhang, X. M. Ning, X. S. Zhang．Identification0f Functional Modules in a PPINetwork By Clique Percolation Clustering [J], Comput Biol Chem,2006,30(6):445-451.
    [180]汪小帆,李翔,陈关荣.复杂网络理论及其应用[M].北京:清华大学出版社,2006.
    [181] W. H. Woodall. The design of cusum quality control charts[J]. J. Qual. Technol,1986,18:99–102.
    [182]A. Lancichinetti, S. Fortunato. Benchmarks for testing community detection algorithms ondirected and weighted graphs with overlapping communities [J]. Phys. Rev. E,2009,80(1):16118.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700