中文Web信息作者同一认定技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的日益普及,网上出现了各种各样的Web信息,比如:网上论坛、博客、电子邮件等,这些信息成为人们生活和工作中重要的信息来源。然而,这些Web信息给人们带来便利的同时,也产生许多问题,比如:某些人通过网上论坛、博客及电子邮件等手段发布反动、诈骗、色情、威胁、赌博等违法信息,网络为违法犯罪分子提供了新的犯罪空间和手段,造成极其恶劣的影响,严重影响社会的稳定以及国家和政府的安全。
     目前,抵制这种现象的主要方法就是安装过滤软件,把包含敏感词汇的信息过滤掉,但是,这种被动防御的方法不能杜绝非法Web信息的现象,因为,犯罪分子会使用一些替代词汇,突破过滤软件的防御。通过立法手段,追究犯罪分子的刑事责任,可以有力地打击这种犯罪行为,国家已经出台多项相关法律,有法可依,但是,由于缺乏有效的证据,而使类似案件无法立案侦查。如果能同一认定Web信息的作者,找出犯罪分子的犯罪证据,为计算机取证提供证据,对司法办案和促进社会的安全与稳定、净化网络环境都具有重要的应用价值和现实意义。
     本文利用写作风格学的原理和技术,研究Web信息作者的写作风格,提取能代表作者写作特点的写作特征,利用机器学习算法自动辨别出作者的真实身份。本文的研究集中在以下几个方面:(1)对Web信息作者身份识别相关研究领域的国内外现状进行了全面、详细地调研和分析,提出Web信息作者身份识别的系统模型和系统框架;(2)针对Web页面和E-mail信息,提出本文对Web信息内容的提取方法;(3)提取能表达Web信息作者的写作特点的三方面特征,包括语言特征、结构特征和格式特征;(4)对支持向量机算法进行了改进,提出基于相似度的渐进直推支持向量机算法(PSTSVM),使其适合小样本分类识别;(5)研究和开发了中文Web信息作者识别实验系统;(6)为了调查犯罪分子的社会关系,对社会关系网络进行了研究,提出了基于作者身份真实性判断的社会关系网络建立方法。
     为了验证本文提出的研究方法的有效性,搜集大量数据,设置多个实验,对本研究各种影响因素进行了验证。实验结果表明,本研究提出的三种特征提取方法是有效的,而且各种特征的特征组合比用单个特征的效果更好,文学作品、Blog、电子邮件数据集的分类识别正确率超过86%,实验结果表明本研究所提出的方法是有效的,此方法用于计算机取证是切实可行的。
With the increasing popularization of Internet, Various Web information such as BBS, Blog, E-mail etc arises, which become important information source in the daily life and work. However, this web information provide convenient to people, at the same time bring a lot of problems. Some illegal web information, such as antisocial information, fraud information, pornographic information, terroristic threatening information, gambling information appears by means of BBS, Blog or E-mail. The Internet provides criminals new criminous space and means. These phenomena cause wicked effect, which affect social stabilization and national security seriously.
     Now, the main methods to prevent these phenomena are installing filtering software to filter the information containing sensitive words. But, the passive defensive methods can’t put an end of these phenomena of illegal web information, because criminal can make use of some substitute words to break through the defense of filtering software. Purnishing the criminals by means of law can strike these crimes effectively. Our state has come on interrelated law. There are laws to resort to. But due to lacking effective evidence, the law case can’t be put into court. If web information’s authorship is identified, criminal’s evidence can be found, evidence for computer forensic can be collected, which have important application value and practical significance to law enforcement, social safety and stabilization, Internet environments’purifying.
     Making use of the theory and techniques of stylometry, the web information author’s writing style were investigated in this paper. Some writing features that could represent author’s writing style were extracted. The machine-learning algorithm was used to identify the authorship of web information. The main work was listed in the following. (1) The related research were investigated and analyzed comprehensive and detailedly. The model and framework of web information authorship identification were provided. (2) The methods for e-mail and web page’s content extraction were brought forward. (3) The linguistic features, structural features and format features that could express author’s writing style were extracted. (4) The support vector machine algorithm was improved. The PSTSVM algorithm that suited small sample’s classification was brought forward. (5) The Chinese web information authorship identification system was developed. (6) To investigate the criminal’s social relations, the social network was researched. The social network building methods based on authorship’s authenticity judgement were provided.
     To test validity of the method in this paper, large datasets were collected. Several experiments were done. Some influencing fators were tested in the experiments. Experimental results proved that the three feature extraction methods were effective. The three features combination had a better result than single feature. The classification accuracy for dataset of literature, blog and e-mail exceeded 86 percent. The experimental results proved that the method of the research was effective and it was feasible to apply for computer forensic.
引文
[1]中国互联网络信息中心.第25次中国互联网络发展状况统计报告[EB/OL]. http://www.cnnic.cn/uploadfiles/pdf/2010/1/15/101600.pdf.
    [2] 12321网络不良与垃圾信息举报受理中心[EB/OL]. http://www.12321.cn/index.php
    [3] Olivier De Vel. Mining E-mail Authorship[C]. KDD-2000 Workshop on Text Mining, ACM International conference on knowledge Discovery and Data Mining, Boston, MA, USA, 2000.
    [4] Olivier De Vel, Anderson A, Corney M, et al. Mining E-mail Content for Author Identification Forensic[J]. SIGMOD Record, 2001, (30)4:55-64.
    [5] Olivier. De Vel, Anderson A, Corney M, et al. Multi-Topic E-mail Authorship Attribution Forensics[C]. ACM Conference on Computer Security Workshop on Data Mining for Security Applications, November 8, 2001, Philadelphia, PA.
    [6] Olivier De Vel, Corney M, Anderson A, et al. Language and gender author cohort analysis of e-mail for computer forensics[C]. In Proc. digital forensic research workshop,2002, New York, USA.
    [7] Corney M W. Analysing E-mail Text Authorship for Forensic Purpose[D]. Australia: University of Software Engineering and Data Communications, 2003.
    [8] Corney M W, Olivier. De Vel, Anderson A, et al. Gender-Preferential Text Mining of E-mail Discourse[C]. In Proceedings of the 18th Annual Computer Security Applications Conference, 2002, Washington, USA.
    [9] Tsuboi Y. Authorship Identification for Heterogeneous Documents[D]. Japanese: Nara Institute of Science and Technology, University of Information Science, 2002.
    [10] Iqbal F, Hadjidj R, Benjamin C, et al. A novel approach of mining write-prints for authorship attribution in e-mail forensics[J]. Digital Investigation, S42-51, 2008
    [11] Sabordo M. Who wrote the Letter to the Hebrews? - Data mining for detection of text authorship[J]. Smart Structures, Devices, and Systems II, 2005,Vol. 5649:513-524..
    [12] Zheng R, Jiexun L, Chen H, et al. A framework for authorship identification of online messages: Writing-style features and classification techniques[J]. Journal of the American Society for Information Science and Technology,2006, 57(3):378-393.
    [13] Zheng R, Qin Y, Huang Z, et al. Authorship analysis in cybercrime investigation[C]. In Proceeding of the first international symposium on intelligence and security informatics(ISI), 2003.
    [14] Abbasi A, Chen H. Applying Authorship Analysis to Extremist- Group Web Forum Messages[J].IEEE Intelligence System, 2005, 20(5): 67-75.
    [15] Abbasi A, Chen H. Visualizing authorship for identification[C]. In Proceeding of IEEE International Conference on Intelligence and Security Informatics, 2006, San Diego.
    [16] Abbasi A, Chen H. Writeprints: A Stylemetric Approach to Identity-Level Identification and Similarity Detection in Cyberspace[J]. ACM Transactions on Information System , 2008, 26(2).
    [17] Diederich J, kindermann J, Leopold E et al. Authorship attribution with support vector machines[C]. poster presented at The Learning Workshop, 2000.
    [18] McEnery T, Oakes M. Authorship Identification and Computational Stylometry[M]. in Handbook of Natural Language Processing, chapter 23, pages 545-562. Marcel Dekker Inc, 2000.
    [19] Hirashi T, yoshida A, Nobesawa S, et al. Effective Features of Authorship Identification[J]. In IPSJ SIG Notes, 2001,number 2001-FI-64, 2001-NL-145: 83-90.
    [20]金奕江,孙晓明,马少平.因特网上的写作风格鉴别[J].广西师范大学学报.2003
    [21]孙晓明,马少平.基于写作风格的作者识别辉煌二十年[C].中国中文信息学会二十周年学术会议,2001.
    [22]郭秋香,包兵,罗永刚,等.电子邮件取证模型的研究[J].计算机安全.2007,01.
    [23]杨泽明,刘宝旭,许榕生.电子邮件取证技术[J].信息网络安全.2002,06.
    [24]刘浩阳.电子邮件的调查与取证.辽宁警专学报[J].2007,05.
    [25] Teng G F, Lai M S, Ma J B, et al. Authorship Mining for Chinese E-mail Documents[C], In the Proceedings of 8th World Multiconference on Systemics,Cybernetics and Informatics, 2004,USA Orlando.
    [26] Teng G F, Lai M S, Ma J B, et al.E-mail authorship mining based on SVM for computer forensic[C]. Proceedings of 2004 International Conference on Machine Learning and Cybernetics(IEEE Cat. No. 04EX826),2004, vol.2.
    [27] Teng G F, Lai M S, Ma J B.Feature Extracion of Chinese E-mail Documents for Authorship Mining[C]. 2005,INFORMATION,8(4).
    [28] Teng G F, Ma J B, Li Y, et al. Feature Analysis and Representation for Chinese E-mail Document. Proceedings of 9th World Multiconference on Systemics,Cybernetics and Informatics, 2005,Orlando.
    [29] Holmes D I. The evolution of stylometry in humanities scholarship[J]. Literary and Linguistic Computing, 1998, 13(3):111–117.
    [30] Efron R, Thisted B. Estimating the number of unseen species: How many words did Shakespeare know? Biometrika, 1976, 63(3):435–447.
    [31] Elliott W E Y, Valenza R J. A touchstone for the Bard[J]. Computers and the Humanities, 1991, 25(4):199–209.
    [32] Elliott W E Y, Valenza R J. Was the Earl of Oxford the true Shakespeare? A computer aided analysis[J]. Notes and Queries, 1991,236:501–506,..
    [33] Lowe D, Matthews R. Shakespeare vs. Fletcher: A stylometric analysis by Radial Basis Functions[J]. Computers and the Humanities, 1995, 29:449–461
    [34] Merriam T, Matthews R. Neural compuation in stylometry II: An application to the works of Shakespeare and Marlowe[J]. Literary and Linguistic Computing, 1994,9:1–6.
    [35] Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist[J]. Addison-Wesley Publishing Company, Inc., 1964, Reading, MA.
    [36] Holmes D I, Forsyth R. The Federalist revisited: New directions in authorship attribution[J]. Literary and Linguistic Computing, 1995,10(2):111–127.
    [37] Tweedie F J, Singh S, Holmes D I. Neural network applications in stylometry: The Federalist papers[J]. Computers and the Humanities, 1996,30(1):1–10.
    [38]鲁秋枫.作者是谁?让数学来证明[J].大科技,2004, (4).
    [39]徐传胜.概率论与红楼梦[J].数学通报, 2004, (1).
    [40]余清祥统计在红楼梦的应用[J].国立政治大学学报,1998, (76).
    [41]钱锋,陈光磊.关于发展汉语计算风格学的献议[A] .修辞学发凡与中国修辞学[C] .上海:复旦大学出版社,1983.
    [42]王景丹.从句频分析看八位剧作家的风格异同[J ] .修辞学习, 2003, (4) .
    [43]金明哲.日本的定量文体研究的现状[R] .中国修辞学年会, 2002,昆明.
    [44]金明哲.中文文章的作者识别[R] .第二届中国社会语言学国际学术研讨会暨中国社会语言学会成立大会, 2003,澳门.
    [45]刘岩斌,俞士汶,孙钦善.古诗研究的计算机支持环境的实现[J ] .中文信息学报, 1997, (1).
    [46] Mendenhall. The Characteristic Curves of Composition[J]. Science,IX, 1887,237-249.
    [47] Zipf G K. Selected Studies of the Principle of Relative Frequency in Language[J]. Harvard University Press, 1932,Cambridge, MA.
    [48] Yule G U. On sentence-length as a statistical characteristic of style in prose, with applications to two cases of disputed authorship[J]. 1938, 30:363–390.
    [49] Holmes D I. The analysis of literary style: A review[J]. Journal of the Royal Statistical Society (Series A), 1985,148(4):328–341.
    [50] Foster D, Funeral A, William E. Shakespeare’s“best-speaking witnesses.”[J]. Modern Language Association of America, 1996a, 111(5):1080.
    [51] Sallis P, MacDonell S, MacLennan G, et al. Identified: Software Authorship Analysis with Case-Based Reasoning[C]. In Proc. Addendum Session Int. Conf.Neural Info. Processing and Intelligent Info. Systems, 1997, 53~56.
    [52] Krsul I. Authorship analysis: Identifying the author of a program[R]. Department of Computer Science, Purdue University, 1994.
    [53] Spafford E, Weeber S. Software forensics: tracking code to its authors[J]. Computers and Security, 1993, (12): 585~595.
    [54] Krsul I, Spafford E. Authorship analysis: Identifying the author of a program[J]. Computers and Security, 1997, (16): 248~259.
    [55] Smith M W A. Recent experience and new developments of methods for the determination of authorship[J]. ALLC Bulletin, 1983, 11:73–82.
    [56] Farringdon J M, Morton A Q, Farringdon M G. Analysing for Authorship: A Guide to the Cusum Technique. University of Wales Press, Cardiff, 1996.
    [57] Lohrey A. Linguistics and the law. Polemic, 1991, 2(2):74–76.
    [58] Storey K. Forensic text analysis[J]. Law Institute Journal, 1993, 67(2):1176–1178,.
    [59] Khmeleve.Using Leteral and Grammatical Statistics for Authorship Attribution[J]. Problem Peredachi Informatsii, 37(2): 96~108.
    [60] Burrows J F. Computers and the study of literature[J]. Applied Language Studies, 1992, 167–204. Blackwell, Oxford.
    [61] Holmes D I, Robertson M, Paez R. Stephen Crane and the New-York Tribune: A case study in traditional and non-traditional authorship attribution[J]. Computers and the Humanities, 2001,35(3):315–331.
    [62] Merriam T, MatthewsR. Neural Computation in Stylometry II: An Application to the Works of Shakespeare and Marlowe[J], Literary and Linguistic Computing, 1994,(9): 1-6.
    [63] Lowe D. Matthews. Shakespeare vs. Fletcher: A Stylometric Analysis by Radial Basis Functions[J], Computers and the Humanities, 1995, (29): 449-461.
    [64] Kjell B. Authorship attribution of text samples using neural networks and Bayesian classifiers [C]. In IEEE International Conference on Systems, Man and Cybernetics,San Antonio, TX, 1994a.
    [65] Tweedie F J, Singh S, Holmes D I. Neural network applications in stylometry:The Federalist papers[J]. Computers and the Humanities, 1996,30(1): 1-10.
    [66] Holmes D I, Forsyth R S. The 'Federalist' Revisited: New Directions in Authorship Attribution[J], Literary and Linguistic Computing, 1995,(10): 111-127.
    [67] Matsura, Kanada. Authorship Detection of Sentences by 8 Japanese Modern Authors via N-gram Distribution[J].IPSJ SIG Notes, 2000-NL-137:1-8.
    [68] Yoshida, Nobesawa, Saito. Effective Features of Authorship Identification[J]. IPSJ SIG Notes, 2001-NL-145:83-90.
    [69] Hoorn J, Frank S, Kowalczyk W, et al. Neural network identification of poets using letter sequences[J]. Literary and Linguistic Computing, 14(3):311~338.
    [70] Catal C, Erbakici K, Erenler Y. Computer-Based authorship attribution for Turkish documents[C]. IJCI Proceeding of Intl.XII.Turkish Symposium on Artificial Intelligence and Neural Networks, 2003,1.1(1).
    [71] Salton G,Yang C S. On the specification of term values in automatic indexing[J]. Journal of Documentation, 1973, 29(4): 351-372.
    [72]张东礼,王东升,郑伟民.基于VSM的中文文本分类系统的设计与实现[J].清华大学学报(自然科学版),2003,9:1288-1291.
    [73]庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究, 2001, 18(9): 23~26.
    [74]朱华宇,孙正兴,张福炎.一个基于向量空间模型的中文文本自动分类系统[J].计算机工程,2001,27(2):15~17.
    [75] Salton F, Buckley C. Improving retrieval performance by relevance feedback[J]. Journal of the American Society for Information Science. 1990:288-291.
    [76]李雪蕾,张冬茉.一种基于向量空间模型的文本分类方法[J].计算机工程, 2003, 29(17):90-92.
    [77]刘明吉,王秀峰. Web文本信息的特征获取算法[J].小型微型计算机系统, 2002, 23(6):683-686.
    [78]范劲松.特征选择和提取要素的分析及其评价[J].计算机工程与应用,2001,13:95-99.
    [79] Yiming Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]. Proceedings of{ICML}-97,14th International Conference on Machine Learning,1997. 412-420.
    [80] Sahami M. Using machine learning to improve information access[J]. Ph.d.thesis, Computer Science Department,Stanford University,1999.
    [81] Lano K,Haughton H. Formal Development in B Abstract Machine Notation[J]. Information and Software Technoloy, 1995, 37(5-6): 303-316.
    [82] Wordsworth J B. Getting the Best From Formal Methods[J]. Information and Software Technoloy, 1999, 41(4): 1027-1032.
    [83] Abrial JR,著,裘宗燕,译. B方法[M].电子工业出版社. 2004.
    [84]刘丽珍,宋瀚涛.文本分类中的特征选取[J].计算机工程, 2004, 4(20): 14-16.
    [85]周茜,赵明生,等.中文文本分类中的特征选择研究[J].中文信息学报, 2004, 18(3): 17-23.
    [86] Lewis D D, Schapire T E. Training Algorithms for Linear Text Classifiers[C],Proceedings of the 19thAnnual International ACM-SIGIR Conference, 1996, Konstanz:Hartung-Gorre Verlag,298-306.
    [87] Breiman L.Bagging predictors[J]. Machine Learning, 1996, 24:123-140.
    [88] Shapire R.E, Singer Y. BoosTexter: A System for Multi-Label Text Categorization[J].Draft,Mars 1998.
    [89] Palmer G. A road map for digital forensic research[J]. Digital forensic research workshop, August, 2001.
    [90]杨旭.个人言语风格的司法鉴定[J].上海市政法管理干部学院学报,2000,(6).
    [91]岳俊发,论书面言语鉴定[J].国刑警学院学报,2004,(6).
    [92]王志家,贾玉文,王艳玲,等.一起涉案打印文件言语人的同一认定[J],中国司法鉴定, 2003(1).
    [93]王志家,王相臣,林红.电子打印文件个人言语特征的同一认定[J].江苏公安专科学校学报,2002, 16(4).
    [94] Victor R C, William W C. Learning to extract signature and reply lines from email[EB/OL]. http://www.cs.cmu.edu/~wcohen/postscript/email-2004.pdf.
    [95]徐泽平,李振星等. Web多媒体邮件中的编码解码方法研究[J].计算机工程与应用, 2001, 18:95-97.
    [96]张国煊,王小华,周必水.快速书面汉语自动分词系统及其算法设计[J].计算机研究与发展,1993, 30(l):61-65.
    [97]揭春雨,刘源,梁南元.论汉语自动分词方法[J].中文信息学报, 1989, 3(1):1-8.
    [98]孙茂松,黄昌宁,等. .利用汉字二元语法关系解决汉语自动分词中的交集型歧义[J].计算机研究与发展, 1997, 34(5):332-339.
    [99]孙茂松,左正平,邹嘉彦.高频最大交集型歧义切分字段在汉语自动分词中的作用[J].中文信息学报, 1999, 21(01):23-26.
    [100]秦浩伟,步丰林.一个中文新词识别特征的研究[J].计算机工程, 2004, 30(21):369-370.
    [101]张茂元,卢正鼎,邹春燕.一种基于语境的中文分词方法研究[J].小型微型计算机系统2005, (01):129-134.
    [102] Joachims T Text categorization with support vector machines: Learning with many relevant features[C]. In Proceedings of the European Conference on Machine Learning, Springer, 1998.
    [103] Vapnik V, Lerner A. Pattern recognition using generalized portrait method[J]. Automation and Remote Control, 1963, (24).
    [104] Joachims T. Transductive inference for text classification using support vector machines[C]. In Proceedings of the 16th International Conference on Machine Learning(ICML).San Francisco:Morgan Kaufmann Publishers,1999.200-209.
    [105]陈毅松,汪国平,董事海.基于支持向量机的渐进直推式分类学习算法[J].软件学报, 2003, 14(3).
    [106] Newman,M.Who is the best connected scientist?A study of scientific co-authorshipnetworks[M].Lecture Notes in Physics, Springer,Heidelberg,2004:337-370.
    [107] Adamic et al..A social network caught in the web[J],First Monday,2003,8(6):1-22.
    [108] Bekkerman R, McCallum A. Disambiguating Web appear-ances of people in a social network[C]. In Proceedings ofthe 14th international conference on World Wide Web.New York,ACM Press,2005:463-470.
    [109] Jin Y, Matsuo Y, Ishizuka M. Extracting social networks among various entities on the Web[C]. In:Proceedings of the Fourth European Semantic Web Conference.Heidelberg, Springer Berlin,2007:251-266.
    [110] Klenberg M. Authoritative sources in a hyperlinked environment[J]. ACM, 46(5), 1999, 604-632.
    [111] Krebs Ve. Mapping networks of terrorist cells. Connections[J], 2002, 24(3):43-52.
    [112] Dom Brosk Im J, Carley Km. NETEST:estimating a terrorist network’s structure[J]. Casos 2002, Computational and Mathematical Organization Theory, 2002, 235-241.
    [113] Jennifer J X, Chen H. CrimeNet Explorer a framework for criminal network knowledge discovery[J]. ACM Transactions on Information systems, 2005,23(2):202-226.
    [114] Ozgul F, Erdem Z, Aksoy H. Comparing two models for terrosist group detection: GDM or OGDM[C]. Intelligence and Security informatics, Lncs 5075, 2008, 149-160.
    [115] Memon N, Larsen H L. Detecting hidden hierarchy in terrorist networks: some case studies. Intelligence and Security informatics, Lncs 5075, 2008, 477-489.
    [116] Fu T J, Chen H. Analysis of cyberactivism: a case stydy of online free tibet activities[C]. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008,1-6.
    [117] Chen Y D, Abbasi A, Chen H. Developing ideological networks using social network analysis and writeprints: a case stydy of the international falun gong Movement. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008,7-12.
    [118] Christopher C Y, Tobun D.N. Applyzing content development and visualizing social interactions in web forum[C]. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008.6,25-30.
    [119] Peng Y T, Wang J H. Link analysis based on webpage Co-occurrence Mining– a case stydy on a notorious gang leader in Taiwan[C]. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008.6, 31-34.
    [120] Christopher C.Yang. Information sharing and privacy protection of terrorist or criminal social networks[C]. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008.6, 40-45.
    [121] Le M T, Dang H V, Lim E P, et al. WikinetViz: Visualizing friends and adversaries in implicitsocial networks. Proceedings of 2008 IEEE international conference on intelligence and security informatics,Taipei Taiwan, 2008.6, 52-57.
    [122] Xu X M, Zhan J, Zhu H T. Using social networks to organize researcher community[C]. Intelligence and Security informatics, Lncs 5075, 2008, 421-427.
    [123]高鹏,曹先彬.基于社会网络的聊天数据噪声过滤[J].计算机工程, 2008,34(5):166-168.
    [124]尹洪章,曹先彬.结合内容相似性和时序性的社会网络挖掘[J].计算机工程, 2008, 34(1):83-85.
    [125]郭绍忠,段丹,刘晓楠,等. .邮件挖掘技术在社会网络分析中的研究与应用[J].计算机工程与设计. 2008, 29(9):2339-1341.
    [126]邸楠,姚从磊,李晓明.基于中文Web社会网络的提取、测量与分析[J].广西师范大学学报, 2007, 25(2):169-172.
    [127]唐常杰,刘威,温粉莲,等.社会网络分析和社团信息挖掘的三项探索—挖掘虚拟社团的结构、核心和通信行为[J].计算机应用, 2006, 26(9):2020-2023.
    [128]于静,赵燕平.基于社会网络分析的BBS内容安全动态检测模型[C].第二届全国信息检索与内容安全学术会议, 2005, .319-328.
    [129]杨莉莉,杨永川.基于社会网络的犯罪组织关系挖掘[J].计算机工程, 2009, 35(15):.91-93.
    [130]张玥,朱庆华. Web 2.0环境下学术交流的社会网络分析—以博客为例[J].理论与探索, 2009, 32(8):28-32.
    [131]陈绍宇,宋佳兴,刘卫东,等.关系网格:一种基于小世界模型的社会关系网络[J].计算机应用研究, 2006,(5):194-197.
    [132]郎君,秦兵,宋巍,等.基于社会网络的人名检索结果重名消解[J].计算机学报, 2009, 32(7):1365-1374.
    [133]齐惠颖. Web中的社会网络分析技术[J].情报科学, 2009, 27(12):1871-1875.
    [134] Milgram S. The small world problem[J]. Psychology Today, 1967, 60-67.
    [135] Gartner T.Exponential and geometric kernels for graphs.In NIPS Workshop on Unreal Data[J], Principles of Modeling Nonvectorial Data,2002.
    [136] Kashima H, Inokuchi. A. Kernels for graph classification[J].In ICDM Workshop on Active Mining,2002.
    [137] Gartner T..A survey of kernels for structured data[J].SIGKDD Explorations,2003,5(1):49-58.
    [138] Kuramochi M, Karypis G. Frequent subgraph discovery[C].In IEEE International Conference on Data Mining,2001,313-320.
    [139] Yan X, Han J.gSpan.Graph-based substructure pattern mining[C].In International onference on Data Mining,2002.
    [140] Ketkar N, Holder L, Cook D.Comparison of graph-based and logic-based multi-relational datamining[J].SIGKDD Explorations, 2005,7(2).
    [141] Fiedler M. Algebraic connectivity of graphs. Czech. Math.J, 1973,23:298.
    [142] Kernighan B W,Lin S.An efficient heuristic procedure for patitioning graphs[J].Bell System Technical Journal,1970,49(2):291-307.
    [143] Girvan M, Newman M E J. Community structure in social and biological networks[J], Proc Natl. Acad. Sci., 2001,99:7821-7826.
    [144] Newman M E J. Fast algorithm for detecting community structure in networks[J]. Phys Rev E, 2004, 69(6):066133.
    [145] Palla G, Der nyi I, Farkas I,et al.Uncovering the overlapping community structure of complex networks in nature and society[J].Nature,2005,435(7043):814-818.
    [146] Wasserman S, Faust K.Social Network Analysis:Methods and Applications[M].Cambridge University Press,Cambridge,1994.
    [147] Heer J, Boyd D. Vizster:Visualizing Online Social Networks[C]. Proceedings of the Proceedings of the 2005 IEEE Symposium on Information Visualization, 2005,USA.
    [148] Matsuo Y, Mori J, Hamasaki M, et al. POLYPHONET: An advanced social network extraction system from the Web[J]. Journal of web semantics,2007, 5(4):262-278.
    [149] Newman M E J,Girvan M.Finding and evaluating community structure in networks[J].Phys Rev E,2004,69(2):026113.
    [150] Pothen A,Simon H,Liou K-P.Partitioning sparse matrices with eigenvectors of graphs[J]. SIAM J Matrix Anal Appl,1990,11(3):430-452.
    [151] Capocci A,Servedio V D P,Caldarelli G,et al.Detecting communities in large networks[J]. Physica A,2005,352(2-4):669-676.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700