文本挖掘算法及其在知识管理中的应用研究

英文题名：Text Mining Algorithms and Their Applications in Knowledge Management
作者：宣照国
论文级别：博士
学科专业名称：管理科学与工程
中文关键词：知识管理 ; 文本知识发现 ; 文本分类 ; 文本聚类
英文关键词：Knowledge Management ; Knowledge Discovery from Texts ; Text Categorization ; Text Clustering
学位年度：2008
导师：党延忠
学科代码：1201
学位授予单位：大连理工大学
论文提交日期：2008-10-01

摘要

随着知识经济的到来,知识管理在社会经济中的作用日益重要。大多数的知识管理研究是为企业服务的,针对科研管理部门的知识管理研究非常少,本文对我国科研管理部门的知识管理问题进行研究。与其他领域相比,科研管理部门的知识管理有一定的特殊性。比如,科研管理部门管理着蕴含大量知识的立项建议申请书。挖掘并利用申请书中的知识,能够在从科学研究整体层面、学科领域层面和项目管理层面对科研管理工作提供决策支持。
     申请书中的知识隐含在申请书内容之中,从申请书中挖掘知识会面临如下问题:申请书的知识表示不能完全依赖于词典;申请书研究内容与申报学科领域不能完全吻合;学科代码体系结构与实际研究领域的体系结构不能完全一致。针对上述问题,本文在以下几个方面进行了研究:
     第一,提出一种不依赖于词典抽取高频词的桥接模式滤除算法(BPFA)。首先基于N-gram技术获取文本中的汉字结合模式及出现频率,然后通过消除桥接频率得到模式的支持频率,并依此来判断和提取正确词语。实验结果显示,BPFA能够有效提高分词结果的查准率和查全率。该算法适用于对词语频率敏感的中文信息处理。本文应用该算法,抽取申请书中出现的新术语,补充到系统词表中。
     第二,粗分类数据中包含有文本内容与类别标记不符的噪声数据,这些噪声数据会对文本分类结果的精度产生不良影响。本文提出一种针对粗分类数据的噪音修正算法。首先建立文档关联网络,把文档上标记的类别作为原始的社团结构,并用模块度衡量社团结构的质量,通过优化模块度指标把噪声数据调整到正确的类别中,从而提高数据质量。实验结果表明,本文所提算法能够有效修正粗分类数据中的噪声,具有较高的有效性和鲁棒性。该算法可以用于文本分类训练数据的预处理,或作为辅助技术用于文献库建设等工作。本文把申报到各个学科代码下的申请书作为粗分类数据,应用该算法把与代码不符的申请书调整到正确的代码中。并根据调整后的数据建立代码模型,分析代码所代表研究领域的内涵和外延、代码之间的交叉关系。
     第三,提出基于公共连接强度的快速聚类算法。利用社团成员之间的相似关系定义了社团连接强度,根据社团的公共连接强度定义了一种新的相似度计算方法,并应用该相似度计算方法提出一种凝聚聚类算法。在相似度计算中,综合考虑了社团内部和外部结构关系,因此能够避免其他算法在聚类初期容易出现的聚类错误。分别对拓扑和加权的实验数据进行聚类,实验结果证明了所提算法比其他算法更为有效。本文应用该算法对申请书进行聚类分析,形成了项目类,并对项目类和学科代码之间的关系进行了分析。
     本文在理论方法研究的基础上,对国家自然科学基金委员会的基金管理工作进行了应用研究,分析了我国基础科学研究的整体发展状况和发展规律、各个学科领域的研究状况及其关系等,为制定发展规划、发展战略、学科代码体系调整以及项目管理提供决策支持。
With the advent of knowledge-based economy, the Knowledge Management(KM) contributes much more than before in the social and economic lives. Most of the researchers focus on the ones on the enterprises, and there are little research works aiming at solving the KM problems in Scientific Management Departments(SMDs). In this dissertation, the KM of SMDs of China is studied. KM in SMDs is different from those in the other domains. For instance, SMDs of China holds many research proposals with lots of knowledge. Obviously, the activities to mine and utilize the knowledge in research proposals can strongly provide decision support for the SMDs in the following levels: the whole discipline, the sub-domain of the discipline and the research projects.
     Knowledge is contained in the contents of research proposals. In order to discover knowledge from the proposal's contents, there are several problems should to be solved, including knowledge representations of research proposals cannot fully rely on the thesaurus; the contents of research proposals are not completely consistent with the submitted subject field; and the structure of subject coding system is not entirely identical with that of actual research field. In terms of the aforementioned issues, the following three folds are carried out.
     Firstly, a bridge-connection pattern filtering algorithm is presented for extracting high-frequency words without thesaurus. The frequencies of co-occurrence patterns of the Chinese characters are counted from documents. The supported frequencies of patterns are obtained by eliminating the bridge-connection frequencies. Based on the supported frequencies, the words can be better identified and extracted than the ones obtained by using the primary appearing frequencies. This algorithm can be applied to the Chinese information processing, which is sensitive to the word frequencies. Using this algorithm, the new features which don't exist in the thesaurus could be extracted from the proposals and added into the thesaurus.
     Secondly, a revision algorithm for noise texts is presented to study the effect of the noisy data to the clustering results. In the algorithm, the document similarity network is constructed firstly based on similarities of the document's contents. The categories constitute the corresponding community structure in the network, and modularity is used to evaluate the quality of categories. The noise texts can be revised by optimizing the modularity. This algorithm can be used in the preprocessing of text mining or taxonomy building. In this dissertation, the research proposals belonging to subject codes are regarded as texts with noise. Using the presented algorithm, the proposals that are submitted into the wrong subject codes can be transferred to the correct ones. By using the modified data, the models of the subject codes are built, and the intension and extension of each research area, expressed by code, can be confirmed. Moreover, the relationships between codes can be analyzed.
     Finally, inspired by the node similarity of social networks, a new definition, named community similarity, is presented based on the common connecting strengths. Based on this definition, a clustering algorithm is designed. In the initial stage each document is treated as a cluster. At each step, two clusters with the largest similarity are combined. Because the relations between and within the clusters are taken into account, some combining errors can be avoided and better clustering results are obtained. Based on this algorithm, the research proposals are clustered into subject categories, and the relations between subject categories and codes are analyzed.
     According to the theoretical research results, in this dissertation, some application issues on funds management of National Natural Science Foundation of China are conducted. More specially, we analyze the whole trends and regulations of basic discipline research, the current situations of all the subject fields and their relations. These works can afford powerful decision support for establishing of development programs and development strategies, and adjusting of subject coding system and management of projects.

引文

[1]党延忠.基础科学学科发展的宏观知识挖掘.管理工程学报,2006,20(2):102-107.
    [2]Text mining summit conference brochure,http://www.textminingnews.com/,2006.
    [3]Han J,Kamber M.Data Mining Concepts and Techniques.北京:高等教育出版社,2001.285-295.
    [4]王继成,潘金贵,张福炎.Web文本挖掘技术研究.计算机研究与发展,2000,37(5):513-520.
    [5]Hotho A,Nurnberger A,PaaB A.A Brief Survey of Text Mining.LDV FORUM,2005,20(1):19-62.
    [6]Feldman R,Dagan I.Kdt-knowledge discovery in texts.In Proceedings of the First International Conference on Knowledge Discovery and Data Mining(KDD),Canada,1995:112-117.
    [7]郭萌,王珏.数据挖掘与数据库知识发现:综述.模式识别与人工智能,1998,11(3):292-299.
    [8]Losiewicz P,Oard D W,Kostoff R N.Textual Data Mining to Support Science and Technology Management.Journal of Inteligent Information Systems,2000(15):99-119.
    [9]Nahm U and Mooney R.Text mining with information extraction.In Processdings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases.Stanford/CA,2002:60-67.
    [10]Gaizauskas R.An information extraction perspective on text mining:Tasks,technologies and prototype applications,http://www.itri.bton.ac.uk/projects/euromap/TextMiningEvent/Rob-Gaizauskas.pdf,2003.
    [11]张春霞,郝天用.汉语自动分词的研究现状与困难.系统仿真学报,2005,17(1):138-143.
    [12]金翔宇,孙正兴,张福炎.一种中文文档的非受限无词典抽词方法.中文信息学报.2001,15(6):33-39.
    [13]韩客松,王永成,陈桂林.无词典高频字串快速提取和统计算法研究.中文信息学报.2001,15f2):23-30.
    [14]吴应良,韦岗,李海洲.一种基于N-Gram模型和机器学习的汉语分词算法.电子与信息学报,2001,23(11):1148-1153.
    [15]郭祥昊,钟义信,杨丽.基于两字词簇的汉语快速自动分词算法.情报学报,1998,17(5):352-357.
    [16]WANG Y,HUANG S.Apriori and N-gram Based Chinese Text.Feature Extraction Method.Journal of Shanghai Jiaotong University(Science),2004,9(4):11-14.
    [17]Salton G,Wong A,Yang C S.A vector space model for automatic indexing.Communications of the ACM,1975,18(11):613-620.
    [18]Salton G and Buckley C.Term weighting approaches in automatic text retrieval.Information Processing & Management,1988,24(5):513-523.
    [19]Salton G,Allan J,Buckley C.Automatic structuring and retrieval of large text files.Communications of the ACM,1994,37(2):97-108.
    [20]Luhn H P.A Statistical Approach to Mechanized Encoding and Sear-ching of Literary Information.IBM Journal of Research and Development,1957,4(1):309-317.
    [21]黄萱菁,吴立德.基于向量空间模型的文档分类系统.模式识别与人工智能,1998,11(2):147-153.
    [22]Joachims T.Text categorization with support vector machines:Learning with many relevant features.In Proceedings Of the 10th European Conference on Machine Learning,Chemnitz,DE,1998:137-142.
    [23]Saltom G.Automatic Text Processing:The Transformation,Analysis,and Retrieval of Information by Computer.MA:Addison-Wesley,1989.
    [24]Yang Y,Pederson J.A comparative study on feature selection in text categorization.In Proceedings Of the 14th International Conference on Machine Learnging,Nashville,US,1997:412-420.
    [25]Yang Y,Liu X.A re-examination of text categorization methods.Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,New York,1999:42-49.
    [26]Han E,Karypis G.Centroid-based document classification algorithms:Analysis & experimental results.In European Conference on Principles of Data Mining and Knowledge Discovery(PKDD),2000.
    [27]Lain W,Ho C Y.Using a generalized instance set for automatic text categorzafion.In SIGIR-98,1998:81-89.
    [28]Tom M Mitchell.机器学习[M].北京:机械工业出版社,2003,36-58.
    [29]Joachirns T.Text Categorization with Support Vector Machines.Learning with Many Relevant Features[R],Ls Viii Technical Report,Na 23,University of Dortmand,1997.
    [30]吴军,王作英等.汉语语料的自动分类.中文信息学报,1995,9(4),25-32.
    [31]邹涛,王继成.中文文档自动分类系统的设计与实现.中文信息学报,1999,13(3),26-32.
    [32]孙健,王伟,钟义信.基于k-最近距离的自动文本分类的研究.北京邮电大学学报,2001,24(1),42-46.
    [33]李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器.计算机学报,2001,241(1),62-68.
    [34]何洁,杨海棠.一种基于N-Gram技术的中文文献自动分类方法.情报学报,2002,21(4),422-427.
    [35]鲍文,胡清华,于达仁.基于k-近邻方法的科技文献分类.情报学报,2003,22(4),452-456.
    [36]杨建良,王永成.基于KNN与自动检索的迭代近邻法在自动分类中的应用.情报学报,2004,23(2),137-141.
    [37]Jensen FV.Bayesian networks and decision graphs.New York:Springer,2001.
    [38]朱华宇,孙正兴,张福炎.一个基于向量空间模型的中文文本自动分类系统.计算机工程2001,27(2):15-17.
    [39]曹素青,曾伏虎,曹焕光.一个中文文本自动分类数学模型.情报学报,1999,18(1):27-32.
    [40]贺海军,王建芬,周青等.基于决策支持向量机的中文网页分类器.计算机工程,2003,29(2):47-48.
    [41]Masand B,Linoff G,Waltz D.Classifying News Stories Using Memory based Reasoning.Proceedings of the 15th ann Int ACM SIGIR Conference on Research and Development in Information Retrieval.Copenhagen,Denmark.1992.New York:ACM Press,1992.59-64.
    [42]Yang Y.Expert network:Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval.Proceedings of the 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin,Ireland,1994.New York:ACM Press,1994.13-22.
    [43]Yang Y.An Evaluation of Statistical Approaches to Text Categorization.Journal of Information Retrieval,1999,1(1-2):69-90.
    [44]刘斌,黄铁军,程军等.一种新的基于统计的自动文本分类方法.中文信息学报,2002,06:18-24.
    [45]Joachims T.A Probabilistic Analysis of the Recchio Algorithm with TFIDF for Text Categorization.Proceedings of the 14th International Conference on Machine Learning,Nashville,US.1997.San Francisco:Morgan Kaufmann Publishers Inc.,1997.143-151.
    [46]Yang Y,Ault T,Pierce T,et al.Lattimer Improving Text Categorization Methods for Event Tracking.Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval Research.Athens,Greece.2000.New York:ACM Press,2000.65-72.
    [47]Moschitti A.A Study on Optimal Parameter Tuning for Recchio Text Classifier.Proceedings of the 25th European Conference on Information Retrieval Research.Pisa,Italy.2003.London:Springer-Vedag Press,2003.420-435
    [48]陆汝钤.知识科学与计算科学.北京:清华大学出版社,2003
    [49]Sahami M.Learning limited dependence Bayesian classifiers.Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland,Oregon,USA.1996.Menlo Park:AAAI press,1996.335 338.
    [50]Lewis D.Naive Bayes at Forty:The Independence Assumption in Information Retrieval.Proceedings of the 10th European Conference on Machine Learning.Chemnitz,Germany.1998.New York:Springer Vedag,1998.415.
    [51]McCallum A,Nigam K.A Comparison of Event Models for Naive Bayes Text Classification.Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.Madison,Wisconsin,USA.1998.Menlo Park:AAAI press,1998.41 48.
    [52]Domingos P,Pazzani M J.On the Optimality of the Simple Bayesian Classifier under Zero-One Loss.Machine Learning,1997,29(2-3):103 130.
    [53]Chakrabarti S,Dom B E,Indyk P.Enhanced Hypertext Categorization Using Hyperlinks.in:L.M.Haas,A.Tiwary eds..Proceedings of ACM International Conference on Management of Data.Seattle,USA.1998.New York:ACMPress,1998.307 318.
    [54]Cortes,Vapnik.Support-vector networks.Machine Learning,1995,20(3):273-297.
    [55]李辉,史忠植,许卓群.运用文本领域的常识改善基于支撑向量机的文本分类器性能.中文信息学报,2002,02:7-13.
    [56]邹金凤,林鸿飞,杨志豪.文本分类中多分类器的综合机制.计算机工程与应用,2005,26:166-169.
    [57]Kloptchenko A,Back B,Visa A,et al.Toward content based retrieval from scientific text corpora.Artificial Intelligence Systems,2002.(ICAIS 2002).2002 IEEE International Conference on,5-10 Sept.2002:444-449.
    [58]战学刚,林鸿飞,姚天顺.中文文献的层次分类方法.中文信息学报,1999,13(06):20-25.
    [59]Baeza-Yates R and Ribeiro-Neto B.Modern Information Retrieval.ACM Press Series/Addison Wesley,New York,1999.
    [60]李聪,张勇,高智.一种新的聚类算法[J].模式识别与人工智能,1999,12(2):205-209.
    [61]Duda R O,Hart P E.Pattern Classification and Scene Analysis.J.Wiley & Sons,New York.NY,USA,1973.
    [62]Hartigan J.Clustering Algorithms.John Wiley and Sons,New York,1975.
    [63]Gersho A,Gray R M.Vector quantization and signal compression.Kluwer Academic Publishers,1992.
    [64]Steinbach M,Karypis G,Kumara V.A comparison of document clustering techniques.In KDD Workshop on Text Mining,2000.(see also TR 00-034,University of Minnesota,MN).
    [65]Mendes M E,Sacks L.Dynamic knowledge representation.for e-learning application.In Proc.of BISC International Workshop on Logic F,the Internet(FLINT'2001),176-181,Berkeley,USA,2001.ERL,College of Engineering,University of California.
    [66]Borgelt C,N(u|¨)rnberger A.Fast fuzzy clustering of web page collections.In Proc.of PKDD Workshop on Statistical Approaches for Web Mining(SAWM),Pisa,Italy,2004.
    [67]Richard C D,Jain A K.Algorithms for Clustering Data,Prentice Hall,1988.
    [68]Kaufman L,Rousseeuw P J.Finding Groups in Data:an Introduction to Cluster Analysis,John Wiley and Sons,1990.
    [69]Everitt B.Cluster Analysis.John Wiley,New York 1974.
    [70]Girvan M,Newman M E J.Community structure in social and biological networks,Proc.Natl.Acad.Sci.USA,2002 99:8271-8276.
    [71]Guimerà R,Danon L,Diaz-Guilera A,et al.Self-similar community structure in organizations,Physical Review E,2003 68:065103.
    [72]Holme P,Huss M,Jeong H.subnetwork hierarchies of biochemical pathways,Bioinformatics,2003 19:532-538.
    [73]Wilkinson D M,Huberman B A.A Method for Finding Communities of Related Genes.Proc.Nail.Acad.Sci.USA,2004,101:5241-5248.
    [74]Newman M E J,Girvan M.Finding and evaluating community structure in networks.Phys.Rev.E,2004 69:026113.
    [75]Hotho A,Staab S,Stumme S.Text Clustering Based on Background Knowledge.University of Karlsruhe,Institute AIFB.2003.
    [76]Shannon C E.Mathematical Theory of Communication.Bell System Technical Journal,1948,27:379-423.
    [77]Larsen B,Aone C.Fast and effective text mining using linear-time document clustering.In Proc.of the Fifth ACM SIGKDD Int'1 Conference on Knowledge Discovery and Data Mining,KDD-99,San Diego,California,1999,16-22.
    [78]Rijsbergen C J.Information Retrieval,Buttersworth,London,second edition,1989.
    [79]Kowalski G.Information Retrieval Systems-Theory and implementation.Kluwer Academic Publishers,1997 8.
    [80]王众托.知识系统工程:知识管理的新学科.大连理工大学学报,2000,S1,115-122.
    [81]王众托.知识系统工程.北京:科学出版社,2004.
    [82]Daniel E O'Leary.Using AI in knowledge management:knowledge bases and ontologies.IEEE Intelligent Systems,1998,13(3):34-39.
    [83]Warton A.Common knowledge.Document World,Oct/Nov 1998.
    [84]乌家培.正确认识信息与知识及其相关问题的关系.情报理论与实践,1999,22(1):1-4.
    [85]李海鹰.图书馆知识管理的基本理念与策略.图书与情报,2004,23(4):14-16.
    [86]李丹.科学研究活动中的知识管理研究:(博士学位论文).武汉:武汉大学,2005.
    [87]国家图书馆《中国图书馆分类法》编辑委员会.中国分类主题词表.北京图书馆出版社,2005,ISBN:7-89996-811-9.
    [88]Lapedriza A,Vitrià J.Open N-Grams and Discriminant Features in Text World:An Empirical Study.Catalan Conference on Artificial Intelligence,2004:In press Recent Advances in Artificial Intelligence Research and Development,IOS Press,Amsterdam,2004.
    [89]Chelba C,Acero A.Discriminative Training of N-Gram Classifier for Speech and Text Routing.Proceedings of Eurospeech,Geneva,Switzerland,2003.
    [90]PENG F,Shuurmans D,WANG S.Language and Task Independent Text Categorization with Simple Language Models.HLT-NAACL 2003,Main Papers:110-117.
    [91]William B C,John M.Trenlde.N-Gram Based Text Categorization.Proceeding of the Third Annual Symposium on Document Analysis and Information Retrieval,Las Vegas,NV,1994:161-175.
    [92]William B C.N-Gram-Based Text Filtering for TREC-2.Proceedings of the Second Text Retrieval Conference,1993:171-179.
    [93]何浩,杨海棠.一种基于N-Gram技术的中文文献自动分类方法.情报学报,2002,21(4):421-427.
    [94]JoachimsT.A probilistic analysis of the Rocchio algorithm with TFIDF for text categorization [A].Proceedings of ICML-97,14th international conference on machine learning [C].Nashville,TN,1997,143-151.
    [95]Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34:1-47.
    [96]Cohen W W,Hirsh H.Joins that generalize:text classification using WHIRL[A].Proc of the fourth int'l conference on knowledge discovery and data mining[C].1998.
    [97]Yang Y.Expert network:effective and efficient learning from human decisions in text categorization and retreval[A].In SIGIR-94[C],1994.
    [98]陈涛,谢阳群.文本分类中的特征降维方法综述.情报学报,2005,24(6):690-695.
    [99]Chen Q.Feature Selection for the Topic-Based Mixture Model in Factored Classification [C].Computation Intelligence and Security,2006 International Conference,Nov.2006(1):39-44.
    [100] Li R L, Hu Y F. Noise Reduction to Text Categorization Based on Density fro KNN[C]. Machine Learning and Cybernetics, 2003 International Conference. Nov. 2003(5): 3119-3124.

    [101] Zhou S G, Ling T W, Guan J H, et al. Fast Text Classification: A Training-Corpus Pruning Based Approach[C]. Database Systems for Advanced Applications, 2003, Proceedings Eighth International Conference. Mar. 2003: 127-136.

    [102] David A B, Guan J W, Bi Y. On Combining Classifier Mass Functions for Text Categorization[C]. Knowledge and Data Engineering, IEEE Transactions, Oct. 2005(17): 1307-1319.

    [103] http://www.nlp.org.cn/docs/download.php?doc_id=281.

    [104] Girvan M, Newman M E J. Community structure in social and biologican networks[C]. Proc, Natl. Acad. Sci. USA, 2002, 99:7821-7826.

    [105] Newman M E J. Detecting community structure in networks[J]. Eur. Phys. J. B, 2004, 38: 321-330.

    [106] Sneath P M, Sokal R R. Numerical Taxonomy. London: UK:Freeman, 1973.

    [107] King B. Step-wise clustering procedures. Journal of the American Statistical Association, 1967,69:86-101.

    [108] Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of the 15th Int'l Conf. on Data Eng., 1999, 512-521.

    [109] Karpis G, Han E H, Kumar V. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999,32(8): 68-75.

    [110] Zhao Y, Karypis G. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 2005, 10: 141-168.
    [111] MacQueen J. Some methods for classification and analysis of multivariate observations. In Proc. 5th Symp. Math. Statist. Prob., 1967, 281-297.
    [112] Ng R, Han J. Efficient and effective clustering method for spatial data mining. In Proc. of the 20th VLDB Conference. Santiago, Chile, 1994, 144-155.
    [113] Cheeseman P, Stutz J. Baysian classification: Theory and results. In U.M. Fayyad, G. iatetsky-Shapiro, P. Smith and P. Uthurusamy, Advances in Knowledge Discovery and Data Mining, 1996, 253-180.
    [114]Zahn K.Graph-theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers,1971,20:68-86.
    [115]Han E H,Karypis G,Kumar V,et al.Hypergraph based clustering in high-dimensional data sets:A summary of results.Bulletin of the Technical Committee on Data Engineering,1998,21(1):15-22.
    [116]Strehl A,Ghosh J.Scalable approach to balanced,high-dimensional clustering of marketbaskets.In Proceedings of HiPC,2000,525-536.
    [117]Boley D.Principle direction divisive partitioning.Data Mining and Knowledge Discovery,1998,2(4):325-344.
    [118]Ding C,He X,Zha H,et al.Spectral min-max cut for graph partitioning and data clustering.Berkeley:University of California,2001.
    [119]Cutting D R,Pedersen J O,Karger D R,et al.A cluster-based approach to browsing large document collections.In Proceedings of the ACM SIGIR,1992,318-329.
    [120]Aggarwal C C,Gates S C,Yu P S.On the merits of building categorization systems by supervised clustering.I Proc.of the Fifth ACM SIGKDD Int' 1 Conference on Knowledge Discovery and Data Mining,1999,352-356.
    [121]Scott J.Social Network Analysis:A Handbook,2nd edition.London:Sage Publications,2000.
    [122]Dodds P S,uhamad R M,Watts D J.An Experimental Study of Search in Global Social Networks.Science,2003,301:827-829.
    [123]Puzicha J,Hofmann T,Buhmann J.A theory of proximity based clustering:Structure detection by optimization.PATREC:Pattern Pecognition,Pergamon Press,2000,33(4):617-634.
    [124]Garey M R,Johnson D S.Computers and Intractability:A Guide to the Theory of NP-Completeness.San Francisco:Freeman,1979.
    [125]Kernighan B W,Lin S.An efficient heuristic procedure for partitioning graphs.Bell Syst.Tech.J,1970,49:291-307.
    [126]Watts D J,Strogatz S H.Collective dynamics of 'small-world' networks.Nature,1998,393:440-442.
    [127]Barabási A L,Albert R.Emergence of scaling in random networks.Science,1999,286:509-512.
    [128] Albert R, Barabási A L. Statistical mechanics of complex networks. Rev. Mod. Phys., 2002, 74: 47-97.
    [129] Dorogovtsev S N, Mendes J F F. Evolution of networks. Adv. Phys., 2002, 51: 1079-1187.
    [130] Newman M E J. The structure and function of complex networks. SIAM Rev., 2003,45:167-256.
    [131] Wasserman S, Faust K. Social Networks Analysis, Cambridge University Press, Cambridge, 1994.
    [132] Shen-Orr S, Milo R, Mangan S. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 2002, 31(1): 64-68.
    [133] Vespignani A. Evolution thinks modular. Nature Gen., 2003, 35: 118-119.
    [134] Capocci A, Servedio V D P, Caldarelli G, et al. Communities detection in large networks. Lecture Notes in Computer Science, 2004,3243: 181-187.
    [135] Latapy M and Pons P. Computing communities in large networks using random walks. arXive: cond-mat/0412368.
    [136] VanDongen S. Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht, 2000.
    [137] Radicchi F, Castellano C, Cecconi F, et al. Defining and identifying communities in networks. PNAS, 2004,101:2658-2663.
    [138] Hall K M. An r-dimensional quadratic placement algorithm. Manag. Sci., 1970,17:219-229.
    [139] Donath W E, Hoffman A J. Lower bounds for the partitioning of graphs. IBM J. Res. Dev., 1973,17:420-425.
    [140] Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 1973,49: 291-307.
    [141] Fiedler M. Czech. Algebraic connectivity of graphs. Math. J., 1973, 23:298-305.
    [142] Pothen A, Simon H, Liou K P. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl, 1990,11:430-452.
    [143] Mohar B. Laplace eigenvalues of graphs-a survey. Discrete Math.. 1992,109:171-183.
    [144] Seary A J, Richards W D. Partitioning networks by eigenvectors. in: M.G. Everett, K. Rennolls (Eds.), Proceedings of the International Conference on Social Networks, vol. 1: methodology, 1996,47-58.
    [145] Newman M E J. Fast algorithm for detecting community structure in networks. Phys. Rev. E, 2004,69: 066113.
    [146] Wu F, Huberman B A. Finding communities in linear time: A physics approch. Eur. Phys. JB, 2004, 38: 331-338.
    [147] Clauset A, Newman M E J, Moore C. Finding community structure in very large network. Phys. Rev. E, 2004, 70: 066111.
    [148] Flake G W. Self-organization and identification of Web comminities. IEEE Computer, 2002, 35(3): 66-71.
    [149] Clauset A. Finding instabilities in the community structure of complex networks. Phys. Rev. E, 2005, 72: 056135.
    [150] Wang X F. Complex Networks: Topology, Dynamics and Synchronization. Int. J. Bifur-cat. Chaos 12,2002, 885-916.
    [151] Boccaletti S, Latora V, Moreno Y, et al. Complex networks: Structure and dynamics Phys. Rep. 424, 2006, 175-308.
    [152] Newman M E J.,The structure of scientific collaboration networks. Proc. Natl. Acad. Sci., 2001, 98: 404-409.
    [153] Newman M E J. Scientific collaboration networks. I. Network construction and fundamental results Phys. Rev. E, 2001, 64: 016131.
    [154] Newman M E J. Phys. Rev. E 64,016132 (2001). Newman M E J. Clustering and preferential attachment in growing networks. Phys. Rev. E, 2001, 64: 025102.
    [155] Barabási A L, Jeong H, Néda Z, et al. Evolution of the social network of scientific collaborations. Physica A, 2002, 311: 590-614.
    [156] Pastor-Satorras R, Vespignani A. Evolution and Structure of the Internet: A Statistical Physics Approach (Cambridge University Press, Cambridge, England, 2004).
    [157] Barber M J, Krueger A, Krueger T, et al. Network of European Union-funded collaborative research and development projects. Phys. Rev. E, 2006,73: 036132.
    [158] Zhang P P, Chen K, He Y, et al. Model and empirical study on some collaboration networks Physica A, 2005, 360:599-616.
    [159] Li M H, Fan Y, Chen J W, et al. Weighted networks of scientific communication: the measurement and topological role of weight Physica A, 2005, 350:643-656.
    [160] Barrat A, Barthélemy M, Vespignani A. Weighted evolving networks: Coupling topology and weight dynamics. PHYSICAL REVIEW LETTERS, 2004, 92(22):228701.
    [161] Barrat A, Barthélemy M, Vespignani A. Modeling the evolution of weighted networks. PHYSICAL REVIEW E, 2004, 70(6):066149.
    [162] Barrat A, Barthélemy M, Pastor-Satorras R, et al. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA, 2004, 101: 3747.

    [163] Vázquez A. Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E, 2003, 67: 056104.
    [164] Xulvi-Brunet R, Sokolov I M. Reshuffling scale-free networks: From random to assortative. Phys. Rev. E, 2004, 70: 066102.
    [165] Catanzaro M, Caldarelli G, Pietronero L. Assortative model for social networks. Phys. Rev. E, 2004, 70: 037101.
    [166] Wang W X, Hu B, Zhou T, et al. Mutual selection model for weighted networks. Phys. Rev. E, 2005, 72: 046140.
    [167] Wang W X, Wang B H, Hu B, et al. General Dynamics of Topology and Traffic on Weighted Technological Networks. Phys. Rev. Lett., 2005, 94: 188702.
    [168] Liu J G, Dang Y Z, Wang W X, et al. Self-learning Mutual Selection Model for Weighted Networks. arXiv:physics/0512270.
    [169] Wasserman S, Fast K. Social Networks Analysis, Cabridge University Press, Cambridge, 2001.

    [170] Newman M E J. Assortative Mixing in Networks. Phys. Rev. Lett, 2002, 89: 208701.
    [171] Newman M E J, Park J. Why social networks are different from other types of networks, Phys. Rev. E, 2002, 68: 036122.
    [172] Gaertler M, Patrignani M. Dynamic analysis of the autonomous system graph. IPS2004, Inter-Domain Performance and Simulation, 2004, 13-24.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700