基于聚类分析的网络用户兴趣挖掘方法研究

英文题名：Clustering Based Net User Interest Mining
作者：马力
论文级别：博士
学科专业名称：电路与系统
中文关键词：网络用户兴趣 ; 兴趣挖掘模型 ; 特征降维 ; 文本聚类 ; 语义相似度计算
英文关键词：net user interest ; interest mining model ; feature dimension reduction ; text clustering ; semantic similarity calculation
学位年度：2012
导师：焦李成
学科代码：080902
学位授予单位：西安电子科技大学
论文提交日期：2012-04-01

摘要

网络应用的深入发展使网络信息服务系统的服务模式从集中统一的被动型向分布式个性化的主动型演进。实现这种服务模式转换的一个前提条件是对网络用户需求规律的深入理解，进而依据这些规律指导信息服务系统的信息资源组织与调整，使用户的需求信息与系统提供的尽可能一致。网络用户兴趣作为网络用户信息需求规律的一种形态，是构造新一代信息服务系统中资源组织自适应机制的工作基础。
     本文围绕用户兴趣模式提取这一目标，以用户访问的网页中文文本信息为对象，利用复杂网络理论、图论、随机过程理论、人工免疫网络原理及中文语义计算等方法与技术，较为深入的研究基于文本聚类的用户兴趣挖掘算法及相关问题，以期在降低聚类算法的计算复杂度，实现软聚类及探索新的处理方法等方面进行有益的尝试。主要研究内容包括下述四个方面：
     （1）用户兴趣挖掘模型。网络用户兴趣模式是用户个体和用户群体使用网络行为规律的描述，网络兴趣挖掘模型则是获取用户兴趣模式的一组规范处理流程。针对Web用户访问Web站点的行为过程，本文依据全信息理论中的信息过程模型，提出了一种网络用户兴趣挖掘概念模型，其核心是从信息认知角度描述挖掘用户兴趣模式的处理过程，这种信息认知是由语法认识和语义认知二个层次来描述。该挖掘模型的重要特点是将多层次多角度的用户兴趣处理过程统一到一个框架中。为了具体指导网络用户兴趣挖掘工作，文本给出基于聚类分析的用户兴趣模式及迁移模式的挖掘模型。应用实践表明所提出两个模型是合理的。
     （2）文本聚类中的降维处理算法。针对文本特征集维数较大这一典型问题，利用小世界网络模型具有描述自然界和人造系统的动态属性和结构特征之间关系的特点，本文采用K-最近邻耦合方式构造文本词语网络图，该文本词语网络中的节点表示文本中的词语，边表示词语间的某种空间距离上的相邻关系。引入词语聚类系数变化量和平均最短路径变化量度量词语的重要性。通过计算词语的这两个变化量来确定词语是否存在小世界特征，进而实现特证词的选择。该方法的特点从基于空间距离的文本组织结构中选择特征词。实验结果表明该方法是有效的，为文本特征提取提供一条新的解决途径。
     （3）文本聚类算法研究。虽然已有许多成熟聚类方法较好地实现文本聚类分析，但由于词语的多义性，文本特征的稀疏性以及文本类别分布的多样性，使得聚类结果很难保证生成文本类别与人们所期望的类保持高度一致。为此，仍需从多种技术途径研究聚类算法。
     鉴于传统基于优化方法的聚类算法普遍存在需要事先知道聚类类别数，对类边界不清晰的数据处理不当及易陷入局部极大等问题，将人工免疫系统（ArtificalImmune System，AIS）方法引入到文本聚类处理之中，提出自适应多克隆聚类算法，其主要处理环节是引入重组算子来增加抗体种群中个体的多样性，以扩大解的搜索范围，避免过早出现早熟现象；引入非一致变异算子增强局部求解的自适应性，优化局部求解性能，加快解的收敛速度；用亲和度函数调节聚类类别。另外，利用Markov链证明算法的收敛性。针对文本数据，对上述算法进行适当的裁剪，提出基于人工免疫网络的文本聚类算法，实验结果表明算法聚类的有效性高。
     现实生活中许多事物都可以用一个复杂的网络来描述，在这些实际网络中都存在着一个共同的性质：社团结构。复杂网络中的社团结构发现本质上就是网络上节点的聚类处理，本文将复杂网络理论中的方法引入到文本聚类分析中，提出基于社团结构发现的文本聚类算法，利用知网（Hownet）语义相似度计算公式，定义文本相似性度量方法，依据文本相似性构造文本关联图，利用称为Newman聚类算法实现文本的聚类分析。这种方法的特点是可处理大规模问题。
     针对目前的大多数文本聚类算法都将文本进行严格归为一类和计算复杂度高的问题，考虑后缀树模型能有效的表示特征词间的关系、具有增量式更新以及遍历时间短等特点，本文将后缀树模型引入文本聚类中，提出了基于语义计算的后缀树聚类算法，该算法通过对特征词语义相似度和权重的判断构建后缀树，选择基类节点构造基类连通图，求解树连通性以便实现聚类处理。为了降低算法的时间和空间复杂度，进一步提出基于语义后缀网的聚类算法，本算法的改进之处是：通过计算特征词间的语义相似度来构建后缀网，使后缀网的节点数和分支数减小，并通过特征词的权重判断来选择基类。实验结果表明这两种算法都能实现文本的软聚类，时间复杂度小，且聚类的类簇标识可读性强。
     （4）网络用户兴趣模式及变迁模式发现。用户兴趣模式实际形式是用一组有显著类别的特征词集合组成。本文通过计算文本簇中的大部分文本中出现同一个词语或者出现一类词义相似的词语的词频来选择生成用户兴趣模式的。用户兴趣的迁移模式是用户兴趣模式随时间动态变化的一种描述。针对文本存在多主题性这一问题，提出了一种基于隐马尔可夫原理的用户兴趣序列获取方法，该方法以用户访问序列和用户兴趣为对象，建立基于用户兴趣序列的隐马尔可夫模型，采用其解码问题相关算法实现用户最优兴趣序列的获取。采用序列模式挖掘算法获得用户兴趣序列的频繁模式。这些频繁模式就是用户兴趣的迁移模式，其本质是一种具有顺序特征的用户兴趣关联规则。为了提高挖掘效率，采用基于频繁链表-存取树（FlaAT）结构的挖掘算法获取频繁模式，该算法的优点是处理速度快且能通过更新FlaAT结构实现序列的增量式挖掘。实验表明所提方法是可行的，挖掘出的用户兴趣迁移模式不仅能够表现出用户兴趣的变化，也能够反映出用户兴趣之间的关联和变化规律。
With the development of net application, the service model transforms from theintegration and uniform model to the contribution and personalization model. To realizethis model conversion, one precondition is having in-depth understanding of demandrules to net users, results in guiding the organization and adjustment of informationresources of information service systems according to these rules, and making theinformation of user requirements and system supply as far as consistent. As one form ofnet user information demand rule, the net user interest is the foundation to construct anew generation system of information service which has the character ofself-configuration for the resource organization.
     To make beneficial attempt such as reducing the computational complexity ofclustering algorithms and realizing the soft clustering and exploring new methods tosolve the clustering problems, this paper conducts deeper research on user's interestmining algorithm based on text clustering algorithm and other related questions,focusing on the goal to extract the user interest, making Chinese text information ofusers accessing webpage as object, and using theories and techniques like complexnetwork theory, graphic theory, stochastic process theory, artificial immune networktheory and Chinese semantic calculation. The main content contains the following foursections:(1) The interest mining model. The network user interest model is the description for thebehavior law of individual user and user groups using net, and the mining model is a setof standardized processes for getting the user interest model. According to the behaviorprocess of Web users accessing to Web sites, this paper proposed a concept model of netuser’ s interest excavation, which was based on the model of information processcontained in the theory of the full information. The core of this model is a processingprocedure to describe and mine user’s interest mode from the Angle of informationcognition, which is described by grammar and semantic cognition. The importantfeature of this mining model is unifying the user’s interest process which is multi-leveland multi-perspective to a frame. To guide mining task in detail, this paper gave themining model of user’s interest mode and migration mode upon clustering analysis. Andat last, we use the experiment to demonstrate the rationality of the models.
     (2) Dimension reduction algorithm in text clustering. Focused on the typical problemsof the big dimension number, we used the features of description the dynamic properties and structure factors between the nature and the artificial system of the model ofsmall-word net and we used the K-nearest coupling algorithm to construct the textwords network diagram, and in this diagram, the nodes stands for the words in the textand the ledges stands for the neighbor relationship on distance. We also exposed theimportance of the changing of the clustering number and the shortest path length inmeasuring the words. Through calculating the variation of the words, we can verifywhether it has the feature of small-word network and to realize the selection of thefeature words. The results of experiments demonstrated that the method is rationalityand this is a new way to extract the feature of the text.
     (3) On the research of the clustering algorithm. Although there are many goodalgorithms for realizing the analysis of text clustering, it is hard to guarantee theconsistency between the text types and the requirements, because of the ambiguity ofthe words and the sparse of the text feature. So, it is necessary to research the clusteringalgorithm from other technical ways. Based on a simple description of the basicprinciple of biology immune and colonel process, a poly-colonel clustering algorithmwith self-adaptive feature is put forward. The main idea of the algorithm is to putvarious operators in artificial immune system into clustering process and adjustclustering numbers automatically by affinity function. The recombination operator isintroduced to increase the diversity of antibody group so as to broaden the search scopeof the global optimization solution and avoid early mature phenomenon of the group.And the non-consistent mutation operator is introduced to enhance the adaptability andoptimize the performance of local solution seeking; meanwhile convergence of thealgorithm is speeded up. The experimental result shows that reasonable clustering couldbe realized by the proposed algorithm.
     In this paper we introduce the method of complex network theory to textclustering analysis, based on the algorithm for detecting community structure incomplex networks, a new method of clustering algorithm is proposed. In the method,we define text similarity measure methods through HotNet similar calculation formula.Structure an association diagram according to the text document similarity by using aclustering algorithm named Newman to cluster texts and analysis. This method isappropriate for dealing with large-scale problems.
     Focused on the problems of strict classification and the high calculation complexityof the normal algorithms, we considered the feature of suffix tree in expressing therelation between the different words, the short ergodic time, and the increment refresh process, and brought the suffix tree model into the text clustering, and we also exposedthe suffix tree clustering algorithm based in the semantic calculation and the clusteringalgorithm based on the suffix net. The results of the experiments showed that bothalgorithms can realize the soft clustering and have the features of small time complexityand strong readable of class cluster identification.
     (4) On finding the interest model and drift pattern of net user. The actual form ofthe user interest model is composed with a group of feature-words which have acharacter of significant category. The method, calculating the frequency of the samewords or the similarly words in the most texts, was used in generating the interest of thenet user. It is a dynamic expression with the time going that the interest drift pattern ofnet user. Focused on the problem of multiple themes of text, a method for getting theinterest sequence based on Hidden Markov was proposed in this paper. In this method,the Hidden Markov Model of net user interest was created with the objects of the accesssequence and interest, using the decoding problem related algorithm to obtain the bestinterest sequence. Through sequential pattern mining algorithm to get the frequentsequence mode which is the interest drift pattern. The essence of the pattern is a kind ofinterest related rules with the feature sequence. In order to improve the miningefficiency, a mining algorithm based on Frequent Link-Access Tree (FLaAT) was usedto mine the frequent mode,, and this algorithm has some advantages, such as fastprocesses speed and the incremental mining through refreshing the structure sequenceof FlaAT. Experiments show that the proposed method is viable, the interest pattern digout can not only show the interest changes, but also can reflect the relationship and thechange rules between the interests.

引文

[1] R. Hausser. Foundations of Computational Linguistics: Man–machineComunication in Natural Language[J]. Computational Linguistics.2000,26(3):449~455.
    [2] R. Mitkov. The Oxford Handbook of Computational Linguistics[J].Computational Linguistics.2004,30(1):103~106.
    [3] P. Jackson, I. Moulinier. Natural Language Processing for OnlineApplications:Text Retrieval, Extraction and Categorization[M]. John BenjaminsPublishing Company.2007.
    [4] R. Feldman, J. Sanger. The Text Mining Handbook[M]. Cambridge UniversityPress.2006.
    [5] M. Berry, M. Castellanos. Survey of Text Mining II: Clustering,Classification and Retrieval[J]. Springer,2007.
    [6]孟宪军.互联网文本聚类与检索技术研究[D].哈尔滨:哈尔滨工业大学,2009.
    [7] Salton G, Wong A,Yang C Sa. Vector Space Model for AutomaticIndexing Communications of the ACM.1975,18(5):613~620.
    [8]宫秀军,史忠植.基于Bayes潜在语义模型的半监督Web挖掘.软件学报.2002,13(8):1508~1514.
    [9]姜宁,史忠植.文本聚类中的贝叶斯后验模型选择方法.计算机研究与发展,2002.39(5):580~587.
    [10]宗成庆.统计自然语言处理2008.北京:清华大学出版社.
    [11] Oren Eli Zamir, Oren Etzioni. Web Document Clustering: AFeasibilityDemonstration. Proc. ACM SIGIR'98.1998:46~54.
    [12] Oren Eli Zamir. Clustering Web Documents: A Phrase-based MethodforGrouping Search Engine Results. Doctor, University of Washington.1999.
    [13] Qi Yu-dong, Qu Ning, Xie Xiao-fang. Web Information Systems andMining(WISM). IEEE.2010.
    [14]唐晓文.基于本体论的文本特征提取[J].电脑与信息技术.2005,13(1):36~38.
    [15]贾焰.基于本体论的文本挖掘技术综述[J].计算机应用.2006,26(9):2014~2016.
    [16]杨建林.基于本体的文本信息检索研究[J].情报理论与实践.2006,29(5):598~601.
    [17]杨彩莲.基于本体的中文文体聚类技术研究[D].辽宁师范大学硕士学位论文.2009.
    [18] Y. Yang, J. Pedersen. A Comparative Study on Feature Selection in TextCatego-rization[C]. International Workshop Conference Machine Learning.1997:412~420.
    [19] S. Ru¨ger, S. Gauch. Feature Reduction for Document Clustering andClassification[M]. Imperial College of Science, Technology and Medicine,Department ofComputing (2000),2000.
    [20] T. Mitchell. Machine Learning[M]. McGraw Hill,1997.
    [21] K. Church, P. Hanks. Word Association Norms, Mutual Information, andLexicog-raphy[J]. Computational linguistics,1990,16(1):22~29
    [22] Thorsten Joactfims. Text Categorization with SupportVectorMachines:learningwithMany Relevant Features [A]. European Conference on Machinelearning (ECML)[C]. Berlin: Sp ringer,1998.
    [23].Moyotl-Hernández E, Jiménez H..En hancement of DTP Feature SelectionMethod for Text Categorization. Proceedings of the6th InternationalConference on Computational Linguistics and Intelligent Text Processing(CICLing).Mexico City, Mexico,2005,February13~19:719~722
    [24] Mademnic D., Grobelnik M.. Feature Selection for Unbalanced ClassDistribiution and Na ve Bayes. Proceedings of the Sixteenth InternationalConference on Machine Learning. Bled: Morgan Kaufmann.1999:258~267.
    [25]周茜,赵明生,扈旻.中文文本分类的特征选择研究[J].中文信息学报.2004,18(1):17~23.
    [26]代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报.2004,18(1):26~32.
    [27] Hu Q H, Yu D R, Duan Y F, et al. A Novel Weighting Formula and FeatureSelection for Text Classification Based on Rough Set Theory. Proceedings of theInternational Conference on Natural Language Processing and KnowledgeEngineering(NLP-KE). Beijing, China.2003:638~645.
    [28] Li S S, Zong C Q. A New Approach to Feature Selection for Text Categorization.Proceedings of the IEEE International Conference on Natural LanguageProcessing and Knowledge Engineering(NLP-KE). Wuhan, China.2005,October30th~November1th:626~630.
    [29] Yang Y M, Pederson J O.. A Comparative Study on Feature Selection inTextCategorization. Proceedings of the Fourteenth International Conference onMachine Learning (ICML).1997.
    [30] T. Liu, S. Liu, Z. Chen, et al. An Evaluation on Feature Selection for TextClustering[C]Proceedings of the Twentieth International Conference (ICML2003), Washington, DC, USA.2003:488~495.
    [31] Y. Li, C. Luo, S. Chung. Text Clustering with Feature Selection by UsingStatistical Data[J]. IEEE Transactions on Knowledge and Data Engineering,2008,20(5):641~652.
    [32] L. Rigutini, M. Maggini. A Semi-supervised Document Clustering AlgorithmBased on Em[C]. Web Intelligence, Proceedings. The2005IEEE/WIC/ACMInternational Conference on.2005:200~206.
    [33] N. Wyse, R. Dubes, A. Jain. A Critical Evaluation of Intrinsic DimensionalityAlgorithms[C]Pattern Recognition in Practice: Proceedings of an InternationalWorkshop Held in Amsterdam,1980,May21~23,:415~425.
    [34] B. Tang, X. Luo, M. Heywood, et al. A Comparative Study of DimensionReductionTechniques for Document Clustering[R]. Tech. rep., Technical ReportCS-2004-14,Faculty of Computer Science, Dalhousie University,2004.
    [35] S. Dumais, G. Furnas, T. Landauer, et al. Using Latent Semantic Analysis toImprove Access to Textual Information[C]. Proceedings of the SIGCHIconference onHuman factors in computing systems.1988:281~285.
    [36] S. Deerwester, S. Dumais, G. Furnas, et al. Indexing by Latent SemanticAnalysis[J]. Journal of the American society for information science,1990,41(6):391~407.
    [37] S. Kaski. Dimensionality Reduction by Random Mapping: Fast SimilarityComputation for Clustering[C]. Proceedings of IJCNN.1998,98:413~418.
    [38] I. Jolliffe. Principal Component Analysis.1986[M]. Springer, New York,1986.
    [39] Kolenda L., Hansen, S., Sigurdsson. Independent Components in Text[J].Advances in Independent Component Analysis,2000:235~256.
    [40] N. Slonim, N. Tishby. Document Clustering Using Word Clusters via theInformation Bottleneck Method[C]. Proceedings of the23rd annual internationalACM SIGIR conference on Research and development in information retrieval.2000:208~215.
    [41] M. Dash, H. Liu. Feature Selection for Clustering[J]. Lecture notes incomputerscience,2000:110~121.
    [42] M. Maggini, L. Rigutini, M. Turchi. Pseudo-Supervised Clustering for TextDocuments[C]. IEEE/WIC/ACM International Conference on Web Intelligence,2004.WI2004. Proceedings.2004:363~369.
    [43] A. Strehl, J. Ghosh, R. Mooney. Impact of Similarity Measures on Web-pageClustering[C]. Proc. AAAI Workshop on AI for Web Search (AAAI2000),Austin.2000:58~64.
    [44] H. Schu¨tze, C. Silverstein. Projections for Efficient Document Clustering[C].ACMSIGIR Forum.1997,31:74~81.
    [45] L. Kaufman, P. Rousseeuw. Finding Groups in Data: An Introduction toClusterAnalysis[M]. John Wiley&Sons, New York,1990.
    [46] S. Chu, J. Roddick, J. Pan. An Incremental Multi-centroid, Multi-runSamplingScheme for K-medoids-based Algortihms-extendedReport[C]Proceedings of theThird International Conference on Data MiningMethods and Databases, Data Min-ing.2002,3:553~562.
    [47] A. Jain, R. Dubes. Algorithms for Clustering Data[M]. Prentice-Hall,1988.
    [48] P. Sneath, R. Sokal. Numerical Taxonomy[M]. Springer,1973.
    [49] R. Forster. Document Clustering in Large German Corpora Using NaturalLanguage Processing[D]. University of Zurich,2006:23~25.
    [50]18S. Guha, R. Rastogi, K. Shim. Cure: An Efficient Clustering Algorithm forLarge Databases[C]. SIGMOD`98: Proceedings of the1998ACM SIGMODinternationalconference on Management of data. New York, NY, USA: ACM,1998:73~84.
    [51] S. Guha, R. Rastogi, K. Shim. Techniques for Clustering Massive DataSets[J].Clustering and Information Retrieval,2003:35~82.
    [52] P. Bellot, M. El-Be`ze. Clustering by Means of Unsupervised Decision Trees OrHierarchical and K-means-like Algorithm[C]. Proceedings of6th InternationalConference ‘Recherched’Information Assiste`e par Ordinateur`(RIAO`00),Paris, France.2000:344~363.
    [53] D. Cutting, D. Karger, J. Pedersen, et al. Scatter/gather: A Cluster-basedApproach to Browsing Large Document Collections[C]. Proceedings of the15thannual international ACM SIGIR conference on Research and development ininformation retrieval.1992:318~329.
    [54]许芳芳,王新伟.Web文本聚类算法的分析比较[J].计算机时代,2010,10:6~9.
    [55] Z.Huang. Extensions to the k-means algorithm for clustering large data sets withcategorical values.Data Mining and Knowledge Discovery[J].1998,2(2):283~304.
    [56] Ding C,He X.K-Nearest-Neighbor in data clustering:Incorporating localinformation into global optimization. Proc.of the ACM Symp.on AppliedComputing.Nicosia:ACMPress,2004.584?589.[EB/OL]http://www.acm.org/conferences/sac/sac2004/.
    [57]杨小兵.聚类分析中若干关键技术的研究[D].杭州:浙江大学计算机学院,2005.
    [58] Zhang T,Ramakrishnan R,Livny M.BIRCH:an efficient data clustering methodfor very large databases[J].ACM SIGMOD Record,1996,25(2):103~114.
    [59] Guha S,Rastogi R,Shim K.Cure:an efficient clustering algorithm for largedatabases[J].Information Systems,2001,26(1):35~58.
    [60] Guha S,Rastogi R,Shim K.Rock:A robust clustering algorithm for categoricalattributes[J].Information Systems,2000,25(5):345~366.
    [61]李伟,黄颖.文本聚类算法的比较[J].科技情报开发与经济.2006,16(22):234~236.
    [62] Sander J.,Easter M.,Kriegel H. P.,et al.Density-based clustering in spatialdatabases:The Algorithm GDBSCAN and its applications[J].Data Mining andKnowledge Discovery,1998,2(2):169~194.
    [63] Birant D.,Kut A..ST-DBSCAN:An algorithm for clustering spatial-temporaldata.Data&Knowledge Engineering,2007,60(1):208~221.
    [64] Wang W., Yang J., Muntz R.STING:A statistical information grid approach tospatial data minning[C]//Jarke M,Garey M J,Dittrich K R.Proceeding of the23rd International Conference on Very Large Data Based.Athens:MorganKaufmann,1997:186~195.
    [65] Zhao Y.C.,Song J..GDILC:A grid-based density isoline clustering algorithm.Zhong Y.X.,Cui S.,Yang Y.,et al. Proc.of the Internet Conf. on Info-Net.Beijing:IEEE Press,2001.140~145.
    [66]杨占华,杨燕,SOM神经网络算法的研究与进展[J].计算机工程.2006.32(16):201~202.
    [67]刘志勇,耿新青.基于模糊聚类的文本挖掘算法[J].计算机工程.2009,35(5):44~47.
    [68] Maria Halkidi. Onclusteringvalidation techniques. Journal Information Systems.2001,17,(2-3):107~145.
    [69]刘务华,罗铁坚,王文杰.文本聚类算法的质量评价[J].中国科学院研究生院学报.2006,23(5):460~467.
    [70]皮俊波,陈珂,陈刚等.基于用户兴趣模型两段式排序的隐私保护方法[J].浙江大学学报（工学版）.2010,44(9):1659~1665.
    [71]李珊.个性化服务中用户兴趣建模与更新研究[J].情报学报.2010,29(1):67~71.
    [72]刘鑫,钱松萍.时间元兴趣度度量方法和扩展VSM用户兴趣模型研究[J].小型微型计算机系统.2011,32(4):708~712.
    [73]赵景鹤,刘贵全.基于兴趣聚类的自动建模[J].计算机辅助工程.2007,16(2):74~78.
    [74] Liang Yong-guan,Zhao zhong-ying,Zeng Qing-tian.Mining user’s interest fromreading behavior in E_learning System [C]. IEEE Computer Society,8thACISInternational Cobference on Software Engineering Artificial Intelligence,Networkingand Parallel/Distributing, Qingdao, China.2007,417~422.
    [75]孙铁刑,杨凤芹.根据用户隐式反馈建立和更新用户兴趣模型[J].东北师范大学学报自然科学版.2003,35(3):99~104.
    [76] Fragoudis. D.. User Modeling in Information Discovery: An overview.Proceedings of Advanced Course on Artificial Intelligence,ACAI99,Greece.1999.
    [77] Balabanovie M., and Shoharn Y.. Learning Information Retrieval Agents:Experiments with Automated Web Browsing. Proceedings of the AAAI SpringSymposium Series on Information Gathering from Heterogeneous, DistributedEnvironments,March,1995:13~18.
    [78] Lieberman H.. Letizia. AN Agent that Assists Web Browsing. Proceedings of theInternational Joint Conference on Artificial Intelligence, Montreal, August,1995:924~929.
    [79] M.Claypool,P.le,M.Waseda,D.Brown.Implicit Interest Indicators. Proceedings ofthe6th International Conference.Santa Fe,New Mexico,USA.2001.ACM:30~40.
    [80] Huang He, Huang Hai, Wang Rujing. FCA-Based Web User Profile Ming forTopics of Interest. Proceedings of the2007IEEE International Conference onIntegration Technology. Shenzhen, China.2007:20~24.
    [81] Pazzani M.and BillsusD. Learning and Revising User Profiles: Theidentification of interesting Web sites. Machine Learning27.1997:313~331.
    [82] Chan PhiliP K. A Non-invasive Learning Approach to Building Web UserProfiles. Proceedings of KDD-99Workshop on Web Usage Analysis and UserProfiling. ACM Press. New York,1999:7~12.
    [83] Sehwab1, And Pohl W.. Learning User Profiles from Positive Examples.proceedings of Advanced Course on Artificial Intelligence.ACAI-99,Greece.1999.
    [84] Pohl W, Sehwab1.And Koyehev1. Learning about the User: A General Approachand Its Application. Proceedings of IJCAI`99Workshop: Learning About Users5toekholm, Sweden,1999.
    [85] Sehwab1. Kobsa A and Koyehev1. Out User from Observation. Proceedings ofAAAI Spring Symposium on Adaptive User Interface.2000.
    [86] Adomavicius G and Tuzhilin A. Using Data Mining Methods to Build CustomerProfiles. IEEE Computer. Feb2001:74~82.
    [87] Sofia Stamou, Alexandros Ntoulas. Search personalization through query andpage topical analysis. User Model User-Adap Iner.2009,19:5~33.
    [88] Hochul Jeon, Taehwan Kim, Joongmin Choi. Adaptive User Profiling forPersonalized Information Retrieval. Third2008International Conference onConvergence and Hybrid formation Technology.2008:836~841.
    [89] Cuncun Wei, Chongben Huang, Hengsong Tan. A Personalized Model forOntology-driven User Profiles Mining. IEEE.2009:484~487.
    [90]林鸿飞,杨元生.用户兴趣模型的表示和更新机制[J].计算机研究与发展.2002,39(7):844~846.
    [91]田萱,孟祥光.智能信息检索中个性化模型的表示形式研究[J].情报学报.2004,23(1):21~26.
    [92]应晓敏.一种面向个性化服务的客户端细粒度用户建模方法[J].计算机工程与科学.2003,25(6):21~26.
    [93]李宝林等.基于动态遗传算法的用户模型进化研究[J].计算机工程与应用.2006,14:200~201.
    [94]许欢庆,王永成.基于加权概念网络的用户兴趣建模[J].上海交通大学学报.2004,38(1):34~38.
    [95]邵秀丽等.基于综合用户信息的用户兴趣建模研究[J].南开大学学报.2009,42(3):8~15.
    [96] J. Velasquez,H. Yasuda,T. Aoki.Combining the Web content and usage mining tounderstand the visitor behavior[C].Proc of the3rd ICDM.LosAlamitos,CA:IEEE Computer Society Press,2003.
    [97] L. Lancieri,N. Durand.Internet user behavior:Compared study of the accesstraces and application to the discovery of communities[J].IEEE Trans onSystem,Man and Cyberneties-Part A:Systems and Humans.2006,36(1):208-219.
    [98] M. Chen,A. LaPaugh,S. J. Pal.Categorizing information objects from user accesspatterns[C].Proc of the CIKM'02.New York:ACM Press,2002:365~372.
    [99]吴晶,张品等.门户个性化兴趣获取与迁移模式发现[J].计算机研究与发展.2007,44(8):1284~1292.
    [100] Ralf Klinkenberg. Learning drifting concepts: Example selection vs exampleweighting[J]. Intelligent Data Analysis.2004,8(3):281~300.
    [101] Gerhard Widmer, Miroslav Kubat. Learning in the Presence of Concept Driftand Hidden Contexts [J]. Machine Learning.1996,23(1):69~104.
    [102] Schwab, Ingo, Koychev Ivan. Adaptation to Drifting User's Interests. proc ofECML2000Workshop: Machine Learning in New Information Age，Barcelona,Spain.2000:39~46.
    [103] Hee Seok Song. Mining the change of customer behavior in an internet shopping[J]. Expert Systems with Applications.2001,21(3):157~168.
    [104] Sung-Hwan Min. Detection of the customer time-variant pattern for improvingrecommender systems[J]. Expert Systems with Applications.2005,28(2):189~199.
    [105] Maloof Marcus, Michalski Ryszard. Selecting Examples for Partial MemoryLearning[J]. Machine Learning.2000,41(1):27~52.
    [106]邢春晓,高凤荣,战思南等.适应用户兴趣变化的协同过滤推荐算法[J].计算机研究与发展.2007,44(2):296~301.
    [107]费洪晓,戴戈,穆珺等.基于优化时间窗的用户兴趣漂移方法[J].计算机工程.2008,24(16):210~214.
    [108]金玮,张克君,曲文龙等.分布式Web用户兴趣迁移模式挖掘[J].计算机工程.2006,32(24):44~47.
    [109]胡学刚,潘春香.基于实例加权方法的概念漂移问题研究[J].计算机工程与应用.2008,44(21):188~191.
    [110] Bill Manaris.Natural language processing:A human-computer interactionperspective.Advances in Computers.1999,Volume47.
    [111]陆俭明,徐波,孙茂松.中文信息处理若干重要问题[M].北京:科学出版社2003.
    [112] C. Manning. Foundations of Statistical Natural Language Process-ing[M]. MITPress,1999.
    [113] D. Petrelli, M. Beaulieu, M. Sanderson, et al. Observing Users, DesigningClar-ity: A Case Study on the User-centered Design of a Cross-languageInformation Retrieval System[J]. Journal of the American Society forInformation Science and Technology.2004,55(10):923~934.
    [114] P. Jacob, L. Rau. Natural Language Techniques for Intelligent InformationRetrieval[C]Proceedings of the11th annual international ACM SIGIRconference on Research and development in information retrieval.1988:85~99.
    [115] A. Ram. Interest-based Information Filtering and Extraction in NaturalLanguage Understanding Systems[C]. Bellcore Workshop on High-PerformanceInformation Filtering, Morristown, NJ.1991.
    [116] E. Wendlandt, J. Driscoll. Incorporating a Semantic Analysis Into a DocumentRe-trieval Strategy[C]. Proceedings of the14th annual international ACM SIGIRcon-ference on Research and development in information retrieval.1991:270~279.
    [117] Rada R., Hafedh M., Bicknell E.,et al. Development and application of ametric on semantic nets. IEEE Transactions on System, Man, andCybernetics.1989,19(1):17~30.
    [118] Lee J.H.,Kim M.H., Lee Y.J..Information Retrieval based on conceptualdistance in ISA hierarchies. Jounral of Documentation.1993,49(2):188~207.
    [119] Miller G., Beck R.,Fellbaum C.,et al. Introduction to WordNet: An OnlineLexical Database. Intenrational Jounral of Lexicography.1990.3(4):235~244.
    [120] Resnik, P..Using information content to evaluate semantic similarity in ataxonomy.Proceedings of IJCAI.1995
    [121] P. Brown,S. Della Pietra,V. Della Pietra,et al. Word sense Disambiguation usingstatistical methods. Proceedings of the29th Meeting of the Association forComputational Linguistics (ACL-91), Berkley, C.A.,1991:264~270,.
    [122] Lillian Lee. Similarity-Based Approaches to Natural Language Processing. Ph.D.thesis. Harvard University Technical Report TR-11-97.
    [123]董振东,董强.知网[EB/OL]. htp://www.keenaize.com.
    [124]梅家驹等.同义词词林.上海:上海辞书出版社.1983.
    [125]詹蓉,陈荣秋.个性化需求分类的定量分析研究[J].软科学2007,21(3)：5～8.
    [126] Gruber T R. A translation approach to portable ontology specifications,Knowledge Acquisition.1993,5:199~220.
    [127] Studer R., Benjamins V.R., Fensel D.. Knowledge engineering:principles andmethods.Data and Knowledge Engineering.1998,2:161~197.
    [128]史树敏.基于领域本体的汉语共消解及其相关技术研究[D].南京理工大学.2008.
    [129]钟义信.信息科学原理（第3版）.北京:北京邮电大学出版社.2002.
    [130]易明.基于Web挖掘的个性化信息推荐.北京:科学出版社.2010.
    [131]孙铁利等.根据用户隐式反馈建立和更新用户兴趣模型[J].东北师大学报（自然科学版）.2003,35(3):99~104.
    [132] Kim J., Oard D. W., Romanik K.. Using implicit feedback for user modeling ininternet and int ranet searching[EB/OL]. http://www.clis.umd.edu/research/reports/00/00~01.pdf,2002:10~20.
    [133] Chen M. S., Park J. S., Yu P. S.. Data mining for path traversal paterns in a Webenvironment. Proc. of the16th intl. Conf. on Distributed Computing System.HongKong.1996:385~392.
    [134] Han J.W..Extensions to the K-means algorithm for clustering large data sets withcategorical values. Data Mining and Knowledge Discover.1998,2(1):283~304.
    [135] Mobasher B.,Cooley Retal.Creatingadaptive Websites through usage basedclustering of URL. Proc. of the1999IEEE Know ledge and Data EngineeringExchange Work shop (KDEX’99).NewYork:IEEEPress,1999:32~37.
    [136] Shahabi C., Zarkesh A. M., Adibi J., et al. Knowledge discovery from usersWeb-page navigation.proc. of Workshop on Research Issures in DataEngineering. Birmingham.1974:44~51.
    [137] Yan T., Jacobesn M., Garcia-Molina H., et al. Trom user access paterns todynamic hypertext lingking. proc.of the5th intl. World Wide Web Conf. Paris,1996:27~36.
    [138] Nasraoui O., Frigui H., Joshi A., et al. Mining Webaccess logs usingrelationalcompetitive fuzzy clustering. Proc.of the8th Fuzzy System Association Worldcongress. London: Spring-Verlag,1999.
    [139] Perkowitz M., Etzioni O.. Adaptive Websites: Automaticially synthesizing Webpages. Proc. of AAAI98Madison: AAAI Press,1998:35~40.
    [140]王实,高文,李锦涛等.路径聚类:在Web站点中的知识发现.计算机研究与发展.2001,38(4):482~486.
    [141] Lvhn H. P..A statistical approach to the mechanized encoding and searching ofliterary information [J]. IBM Research and Development.1957,1(4):309-317.
    [142] Salton G, Yang CS. on the specification of term values in automatic indexing [J].Documentation.1973,29(4):351~372.
    [143] Tunney P.D..Learning to extract Keyphrases from text [R].National ResearchCouncil, Canada, NRC Technical Report ERB-1057.1999.
    [144] Witten I.H,Paynter G.W, Frank E,Gutwin C,Nwvill-ManningC,G,KEA:Praceedings of the4th ACM conference on DigitalLibraried[C],Berkeley,California,US,1999:254~256.
    [145]程岚岚,何丕廉,孙越恒.基于朴素贝叶斯模型的中文关键词提取算法研究[J].计算机应用.2005,25(12):2780~2782.
    [146]李素建,王厚峰,俞士汶.关键词自动标引的最大熵模型应用研究[J].计算机学报.2004,27(9):1192~1197.
    [147] J.Morris,G.Hirst. Lexical Cohesion Computed by Thesaural relations as anIndicator of the structure of Text[J]. Computational Linguistics.1991,17(1):21~48.
    [148]索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报.2006,20(6):25~30.
    [149] Holme P.. Characteristics of Small World Networks. Umea University:Sweden.2001.
    [150] Watts, D.J. and S.H. Strogatz, Collective dynamics of 'small-world' networks.Nature.1998Jun4,1998. Vol.393:440~442.
    [151] Li, M., W.-C. Lee, and A. Sivasubramaniam. Semantic Small World: An OverlayNetwork for Peer-to-Peer Search. Proceedings of the12th IEEE InternationalConference on Network Protocols(ICNP2004).2004. Berlin, Germany:180-189.
    [152] Yutaka Matsuo,Yukio Ohsawa and Mitsurn Ishizuka. A Document as a SmallWorld. Proceedings of JSAI2001Workshops, LNAI2253,2001:444~448.
    [153] Mengxiao, Z., C. Zhi, and C.Q.A.. Keywords extraction of Chinese DocumentUsing Small World Structure. Proceedings of the11th IEEE/ACM InternationalSymposium on Modeling, Analysis and Simulation of ComputerTelecommunications Systems (MASCOTS'03).2003:201~209.
    [154] Mathias, N. and V. Gopal, Small Worlds: How and Why. Phys. Rev. E,2001.63:63~75.
    [155] Milgram, S.. The small world problem. Psychology Today.1967,2:60~67..
    [156] Cancho R.F.I. and R.V. Sole, The small world of human language. Proceedingsof The Royal Society of London, Series B, Biological Sciences.2001,268:2261~2265.
    [157] Newman M. E. J.. Fast algorithm for detecting community structurein network.Phy. Rev.E,2004,69:066133.
    [158] Yutaka Matsuo, Takeshi Sakaki. Graph-based Word clustering using aWeb Search Engine.2006.
    [159]汪小帆,李翔等.复杂网络理论及其应用.北京:清华大学出版社,2006.
    [160] Zachary W. W.. An information flow model for conflict and fission in smallgroups. Journal of Anthropological Research.1977,33:452~473.
    [161] Barabási AL., Albert R..Emergence of scaling in random networks.Science.1999,286(5439):509~512.
    [162]胡洁.高维数据特征降维研究综述.计算机应用研究.2008,25(9):2061~2066.
    [163]刘群,李素建.基于《知网》的词汇语义相似度计算[J].计算语言学及中文信息处理.2002,7:59~76.
    [164] Newman M. E. J., Girvan M.. Finding and evaluating community structure innetworks. Phys. Rev. E..2004,69:026113.
    [165] Gibson D.,Kleinberg J.,Raghavan P.. Inferring Web communities from linktopology. Proceedings of the9th ACM Conference on Hypertext andHypermedia.1998:225~234.
    [166] Flake G. W., Lawrence S. R.,Giles C. L.,et al. Self-organization andidentification of Web communities. IEEE Computer.2002,35:66~71
    [167] Adamic A. L.,Adar E.. Friends and neighbors on the Web. SocialNetworks.2003,25:211~230.
    [168] Shen Orr,Milo R.,Mangan S.,et al. Network motifs in the transcriptionalregulation network of Escherichia coli. Nature Cenetics.2002:31~64.
    [169] Milo R.,Shen Orr,Itzkovitz S.,Kashtan N.,et al. Network motifs: Simple buildingblocks of complex network. Science.2001:298~824
    [170] Holme P.,Huss M.,Jeong H.. Subnetwork hierarchies of biochemical pathways.Bioinformatics.2003,19:532~538.
    [171] Girvan M.,Newman M. E. J.. Community structure in social and biologicalnetworks. Proc. Natl. Acad. Sci..2001,99:7821~7826.
    [172] Gleiser P.,Danon L.. Community structure in jazz. Advances in ComplexSystem,2003,6:565~573.
    [173] Kernighan B. W., Lin S.. A efficient heuristic procedure for partitioning graphs.Bell System Technical Journal,1970,49:291~307.
    [174] Fiedler M.. Algebraic connectivity of graphs. Czech. Math. J.,1973,23:298.
    [175] Pothen A.,Simon H.,Lion K. P.. Partitioning sparse matrices with eigenvectors ofgraphs. SIAM J. Matrix Anal. Appl,1990,11:430.
    [176] Scott J.. Socil Network Analysis: A Handbook. London: Sage Publication,2002.
    [177] Breiger R. L.,Boorman S. A.,Arabie P.. An algorithm for clustering relations datawith applications to socail network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology.1975,12:328~383.
    [178]李洁,高新波,焦李成.一种基于CSA的混合属性特征大数据集聚类算法.电子学报.2004,32(3):367-372.
    [179]焦李成,杜海峰.人工免疫系统进展与展望.电子学报.2003,31(10):1540~1549.
    [180]张冠生.免疫学基础及病原生物学(第3版).成都:四川科学技术出版社,1999.
    [181]周光炎.免疫学原理[M].上海:上海科学技术出版社,1998.
    [182]邵学广,陈宗海,林祥钦.一种新型的信号拟合方法—免疫算法[J].分析化学.2002,28(2):152~155.
    [183] Jerne N K. Towards a Network Theory of the Immune System[M].ANN.Immunul, Paris(Inst Pasteur).1974,125C:373~389.
    [184]周志华,曹存根.神经网络其应及用.北京:清华大学出版社,2004.
    [185] Dasgupta D. Artificial neural networks and artificial immunesystems:Similarities and differences.1997IEEE International Conf. OnComputational Cybernetics and Simulation,Institute of Electrical andElectronics Engineers,Incorporated.1997:873~878.
    [186] Guha S., Rastogi R., Shim K. CURE: an efficient clustering algorithm for largedatabases. Haas, L.M., Tiwary,A., eds. Proceedings of the1998ACM SIGMODInternational Conference on Management of Data. Seattle: ACM Press,1998.73~84.
    [187]盛骤,谢式千,潘承毅.概率论与数理统计.北京:高等教育出版社,1989.
    [188] Leandro Nunes de Castro, Fenando J. Von Zuben. An Evolutionary ImmuneNetwork for Data Clustering. Proc. of the IEEE SBRN.2000:84~89.
    [189]张文修,梁怡.遗传算法的数学基础.西安:西安交通大学出版社,1999.
    [190]王磊.免疫进化计算理论及应用[D].西安电子科技大学,2001.
    [191] Gibson D., Kleinberg J.M., Raghavan P. Clustering categorical data: an approachbased on dynamical systems.Gupta A., Shmueli O.,Widom J., et al.inProceedings of the24th International Conference on Very Large Data Bases.New York: Morgan Kaufmann.1998:311~322.
    [192]焦李成,刘芳.智能数据挖掘.北京:科学出版社,2006.
    [193]马力,白琳,焦李成等.基于自适应多克隆聚类的入侵检测[C].中国计算机大会论文集(CNCC2005，湖北武汉).2005.
    [194] Ma Li Jiao li-cheng Bai lin et al. Intrusion Detection Based on AdaptivePolyclonal Clustering. International Conference on Computational Intelligentand Security (cis’06).2006.
    [195] MA Li, JIAO Li-cheng, BAI Lin,et al. Polyclonal clustering algorithm and itsconvergence.The Journal of China Universities of Posts andTelecommunications.2008.9,15(3):110~117.
    [196]马力,焦李成,白琳等.自适应多克隆聚类算法及收敛性分析[J].模式识别与人工智能.2008,21(1):72~81.
    [197]马力,周洋,白琳,焦李成.一种基于遗传算法的进化免疫网络聚类算法参数优化[C].2008年中国计算机大会.2008.
    [198]周洋，马力，白琳.基于多克隆的进化免疫网络聚类算法[J].计算机工程与应用.2009.45(27):146~150.
    [199]夏火松,刘建.基于VSM的文本分类挖掘算法综述[J].情报探索.2010,155(9):18~21.
    [200] Zamir O,Etzioni O.Web document clustering: A feasibilitydemonstration[C].Proceedings of SIGIR,New York:ACM.1998:46~54.
    [201] Dell Z., Yisheng D.. Semantic,Hierarchical: Online Clustering of Web SearchResults. APWeb.2004:69~78
    [202] Chim H., Deng X.. A new suffix tree similarity measure for documentclustering.WWW2007, ACM, New York,NY, USA.2007:121~130.
    [203] Janruang J., Kreesuradej W.. A New Web Search Result Clustering based onTrue Common Phrase Label Discovery. CIMCA'06, IEEE Computer Society,Washington, DC, USA.2006:242.
    [204] Zeng H., He Q., Chen Z.,et al.Learning to cluster web search results. SIGIR '04,New York, NY, ACM PressUSA.2004:210~217.
    [205] Guodong H., Wanli Z., Fengling H.,et al. Semantic-Based Hierarchicalize theResult of Suffix Tree Clustering. Second International Symposium onKnowledge Acquisition and Modeling.2009:221~224.
    [206] Li Yanjun.High performance text document clustering[D].USA:Wright StateUniversity.2007.
    [207]梅启斌,白帆.自适应Web站点设计中变色龙算法研究及实现[J].计算机科学.2004,21(7):2~3.
    [208]史庆伟,赵政,朝柯.一种基于后缀树的中文网页层次聚类方法[J].辽宁工程技术大学学报.2006,25(6):1~3.
    [209] L. R. Rabiner. An Introduction to Hidden Markov Models[J]. in Proceeding ofthe IEEE.1986,77:257~286.
    [210]吴瑞,张秀玲.基于FLAAT的加权偏爱模式的挖掘算法[J].计算机工程与应用.2005,41(19):182~184.
    [211] D. Xing, J. Shen. Efficient Data Mining for Web Navigation Pattern[J].Information and Software Technology.2004,46:55~63.
    [212] D.E. Krane,M.L. Raymer.孙啸等译.生物信息学概论[M].北京:清华大学出版社,2004.
    [213]陈卓,杨炳儒.序列模式挖掘综述[J].计算机应用研究.2008,25(7):1961~1976.
    [214] R.Agrawal, R.Srikant. Mining sequential patterns Data Engineering. inProceedings of the Eleventh International Conference on6~10March1995:3~14.
    [215] R.Agrawal and R.Srikant. Fast algorithms for mining as sociation rules. In Proc.1994Int. Conf. Very Large Data Bases (VLDB'94), Santiago, Chile,1994,9:489~499.
    [216] J. Janruang, S. Guha. Applying Semantic Suffix Net to Suffix Tree Clustering.The3rd Conference on Data Mining and Optimization (DMO),28~29June,2011:146~152.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700