基于群智能和随机索引的网络聚类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
聚类(Clustering)是将数据对象划分为有意义的组(或簇)的过程。作为数据挖掘中的一种重要的技术,聚类分析在很多领域中都扮演着重要的角色。尤其是,随着当今世界各种信息的数据量不断增大、研究问题的复杂度不断增加,现有的聚类分析技术也面临着越来越多的挑战,因此,研究新的聚类算法已经成为数据挖掘、机器学习、统计学和生物学等多个相关研究领域中的前沿和热点问题之一。
     群居昆虫的社会性行为,诸如寻找最好的食物源、搭建结构最优的巢穴、孵卵、保护幼虫、守卫种群等都表现出群体层面的宏观智能行为。群体智能(Swarm Intelligence,简称为SI)是为了解决复杂优化问题而创立的一类分布式智能范式体系,其灵感最初就源于对自然界中昆虫群体的观察,通过模拟自然界生物的这种群体行为来实现人工智能。因为聚类策略在多个领域应用的重要性,一些基于群体智能思想的优化算法,如蚂蚁种群优化和粒子群优化等,已经被引入数据挖掘领域,来解决聚类问题。由于聚类用的评价标准函数(Criterion Functions)通常是非凸的或者是非线性的,传统的聚类方法,特别是k均值(k-means)算法,具有对聚类的初始值敏感并且容易使搜索陷入局部最优的缺点。随着实际应用中数据集的维度不断增长,寻找标准函数的最优解是一个"NP-难”的问题。
     Web用户在浏览网站时,会根据他们不同的信息需求或潜在的任务和目的,而表现出多种多样的行为,这些行为都被Web访问日志跟踪并且记录下来。Web日志挖掘可以通过发现和分析网络用户访问行为的特征和规律,从而达到识别网站的潜在客户、提高对用户的服务质量的目的。基于聚类算法的Web日志挖掘与用户行为分析方法发展的较晚,并且在Web聚类技术中,目前比较常见的是针对Web会话和Web页面内容的聚类方法研究,针对Web用户浏览模式的聚类方法的研究还相对较少。而且,目前已有的Web用户行为分析和聚类技术只关注用户在页面级别的浏览行为,而对于Web用户活动之间的潜在联系或隐含特征很少关注,并且对与特定浏览模式之间隐藏或无法观察的因素也研究的很少。因此,需要研究和开发新的Web用户聚类技术和用户建模技术,发掘用户行为中潜在的隐藏信息,从而有助于有效地改进用户聚类技术的性能。
     Web用户行为聚类的结果可以用于各种途径的高级应用任务,例如Web缓存和预取。目前有很多Web挖掘方法被用于提高从Web访问日志中预测用户访问模式的准确率,以便高效地对Web对象进行预取。目前在预取领域,现有的这些技术大部分都仅仅局限于对单个用户请求的预测,而关于对群体用户的请求预测方面的研究还较少。
     本论文的主要创新工作可以归纳如下:
     (1)针对现有的聚类算法大多存在局限于单一类型的数据集、在搜索时容易陷入局部最优并难以在高维数据集上达到理想效果的问题,本论文在已有的混沌蚂蚁群(Chaotic Ant Swarm,简写为CAS))算法的基础上受蚂蚁混沌搜索和蚁群全局智能优化等行为的启发,根据数据聚类应用的特点,提出了一种新的基于蚂蚁混沌行为的聚类算法(简称为CAS-C算法))。本论文拓展了混沌蚂蚁群算法的应用领域,大量的数值仿真对比实验结果表明了本论文所提的CAS-C算法具有对中心初值不敏感、能够找到全局最优解、具有较高的算法稳定性和准确率的优点。本论文所提的算法更适合于对真实的数据集进行聚类。(2)菌群觅食(Bacterial Foraging,简写为BF)优化算法是一种基于细菌群体行为和进化过程的优化搜索算法,但目前它还不够完善,菌群觅食优化算法的改进及参数调整是目前研究的一个重要问题,尤其是,基于菌群觅食行为的聚类算法方面目前的研究还很少。本论文受菌群觅食行为的启发,提出了一种新的基于菌群觅食优化思想的聚类算法(简称为BF-C算法),通过模仿细菌觅食过程,寻找聚类的最优中心。本论文同时对算法中的各个参数在数据聚类领域的设置进行了详细地讨论与分析。与其他全局优化算法相比,本论文所提出的BF-C算法具有易于理解、计算简单、收敛速度快的优点,但其趋化步长由于缺少对环境的自适应性,需要根据具体应用问题的不同而需要进行具体的讨论。
     (3)应用传统的数据挖掘方法进行Web用户行为识别时,具有初值敏感、容易陷入局部最优和在高维数据的挖掘上性能有所下降的缺点。本论文针对Web聚类技术中目前面临的这些问题,将所提出的基于蚂蚁混沌行为的CAS-C聚类算法应用到Web日志分析与用户聚类当中,以发现用户的浏览模式,从而提高Web用户聚类的性能。为了检验所提方案的有效性和可行性,本论文将基于CAS-C的Web用户聚类结果与目前在Web挖掘领域广泛应用的两种算法(k值聚类算法和FCMdd算法)的Web用户聚类结果进行了比较。大量的计算机数值仿真实验表明了使用我们所提出的CAS-C算法能够获得凝聚度和分散度更好的Web用户聚类结果,可以有效地识别用户的公共兴趣。
     (4)在对Web用户日志进行分析和挖掘的过程中,需要对Web用户的浏览行为进行形式化的表示,这个过程一般被称为用户建模。目前已有的Web用户行为分析和聚类技术只关注用户在页面级别的浏览行为,而对于Web用户活动之间的潜在联系或隐含特征却很少关注,并且对与特定浏览模式之间隐藏或无法观察的因素也研究甚少。因此,我们提出基于随机索引的用户建模方式,借助自然语言处理领域“上下文”的概念,对URL进行分段索引建模。这样,在用户建模的过程中,能够将浏览模式中的隐藏信息加入其中,进而有效地指导Web用户聚类算法,改进聚类的效果。我们通过聚类实验比较了这两种建模方式:特征向量方法和随机索引方法,大量相关的聚类实验的结果表明了随机索引建模方式的优越性。
     (5)本论文所提的聚类算法可以用于各种高级应用任务,例如Web缓存和预取。同时,为了检验我们用户聚类算法的聚类效果,本文基于随机索引建模方法和CAS-C算法,提出了一种新的群体用户的行为预测和网页预取方案,通过建立用户公共档案,总结用户的共同兴趣,并且基于用户聚类结果,建立群体用户的网页预取规则,预取用户未来可能点击的网页,并存入网站的缓存中。为了使实验结果具有说服力,我们仍然选取经典的k均值聚类算法和在Web挖掘领域广泛应用的FCMdd算法作为比较算法。大量的预取实验结果表明了在随机索引用户模型的帮助下,基于CAS-C的Web用户聚类方案能够获得较高的网页预取的准确率。
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. In particular, with the amount of all kinds of information and data in the world increasing and the study problems becoming more and more complex, the existing clustering techniques are also facing increasing challenges. So the study about new clustering algorithms is an important issue in the research fields including data mining, machine learning, statistics, and biology.
     The social insects'behavior such as finding the best food source, building of optimal nest structure, brooding, protecting the larva, guarding, etc. show intelligent behavior on the swarm level. Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples. It can achieve artificial intelligence by simulating the natural biological behaviors. As the importance of clustering strategies in many fields, global optimization methods based on swarm intelligence have been applied to solve clustering problems. Since criterion functions for clustering are usually non-convex and nonlinear, traditional approaches, especially the k-means algorithm, are sensitive to initializations and easy to be trapped in local optimal solutions. As the increasing numbers and dimensions of data sets, finding solution to the criterion functions of the clustering has become an NP-hard problem.
     Users of a Web site usually exhibit various types of behaviours associated with their information needs and intended tasks by clicking or visiting Web pages. These behaviours can be traced in the Web access log files of the Web site that the user visited. Web usage mining, which captures navigational patterns of Web users from log files, could detect and analyze the characteristics of Web user behavior patterns of access to a Web site, and therefore identify potential customers and improve the quality of service to users. Clustering techonology is a newly developed paradigm in Web usage mining and Web user behavior analysis. Therein, current Web clustering methods are mostly based on Web sessions Web page content, while there are relatively few approaches to clustering Web users' navigation patterns. Moreover, the conventional Web usage mining techniques for analyzing user behavior only capture stand alone user behaviours at the page view level, but cannot identify the intrinsic characteristics of Web user activities, nor quantify the underlying and unobservable factors associated with specific navigational patterns. Thus, it is necessary to develop new Web user clutering and modeling methodologies to identify the latent factors or hidden relationships among users'navigational behavior and improve the performance of clustering technology effectively.
     The results produced by Web user clustering can be used in various advanced applications, for example, Web prefetching and catching. Many techniques, including Web Mining approaches, have been utilized or improving the accuracy of predicting user access patterns from Web access logs, making the prefetching of Web objects more efficient. Most of these techniques are, however, limited to predicting requests for a single user only. Predicting groups of users'interest have caught little attention in the area of prefetching.
     The main works of the dissertation could be summarized as follows:
     (1) Most existing clutering algorithms have some limitations, such as, limited to a single type of data set, easy to fall into local optimum during search process, and difficult to achieve encouraging results on high-dimensional data sets. To overcome these drawbacks of traditional clustering techniques, according to the characteristics of data clustering applications and on the basis of the existing chaotic ant swarm (CAS) algorithm, in this thesis we propose a clustering algorithm (referred to as the CAS-C algorithm) based on behaviors of ants'chaotic activities. Our work extends the application fields of the chaotic ant swarm algorithm. Numerical simulation experiments show that the proposed CAS-C algorithm has advantages such as not sensitive to initialized centers, finding a global optimum clustering result, and suitable to high-dimentional data and clusters with different shapes.
     (2) The Bacterial Foraging (BF) algorithm is a new stochastic search technique and optimization model based on the foraging behavior of bacteria swarm. However, as a new kind of intelligent bionic algorithm, BF is still not good enough. The algorithm improvement and parameter adjustment are important issues in the present study of the Bacterial Foraging optimization, where study about clustering based on bacteria foraging behavior is especially rare. Inspired by bacterial foraging behavior, this thesis proposes a new clustering algorithm (called, the BF-C algorithm) based on bacterial foraging optimization. Meanwhile, the thesis also gives out detailed investigations and analysis on setting BF-C parameters in data clustering. Compared to other global optimization-based clustering techniques, the BF-C algorithm is easier to understand, more fast and simple. However, the chemotactic step size is sensitive to the envorionment changement, and needs to be investigated for different condition-settings.
     (3) Traditional clustering methods are sensitive to the initial values and may get trapped in a local optimal easily. According to the problems and characteristics of traditional Web user clustering techniques, in this theis we introduce the clustering algorithm based on chaotic ant swarm to Web log analysis and user clustering to discover user navigation patterns, and as a result, improve the performance of Web user clustering. To evaluate the effect of the proposed methodology, the clustering results based CAS-C are compared to two methods that are widely used in Web mining (the k-means algorithm and the FCMdd algorithm). Large amount of numerical simulation experimental results show that our proposed CAS-C approach could get more compact and well-separated cluster clusters, and can effectively identify common interests of users.
     (4) During the process of analyzing and mining Web user access logs, Web user navigation behaviors need to be processed and fomalized to a certain form. Generally this process is called as user modeling. Current Web user behavior analysis and clustering techniques only capture stand alone user behaviours at the page view level, but cannot identify the intrinsic characteristics of Web user activities, nor quantify the underlying and unobservable factors associated with specific navigational patterns. Thus, we propose a Web user modelling approach based on Random Indexing (RI), segmenting and index modeling URL with the concept "context" in natural language processing. Thus, in the user modeling process, hidden information under the browse patterns could be mixed in, and furthermore, help the Web users clustering algorithm effectively and improve clustering results. Clustering experiments are conduct for two kinds of user modeling techniques, the feature vector method and the Random Indexing method, to show the superiority of the RI-based user model.
     (5) The results produced by our Web user clustering algorithm can be used in various advanced Web applications, such as Web caching and prefetching. Meanwhile, in order to evaluate the results of the proposed Web user clustering approach, we present a program of predicting behaviors of grouped users and Web page prefetching. Common interests of users are summarized through common user profile creation. Furthermore, based on results of Web user clustering, we establish prefetch rules for group users and put pages that users may click in the future into the cache of the Web site. To make our experimental results more convincing, our clustering and prefetching approaches are also compared to the k-means algorithm and the FCMdd algorithm. Numerical experimental results of prefetching show that with the help of the RI-based Web user model, the Web user clustering technique based on CAS-C could get higher accuracy of Web page prefetching.
引文
[1]Tan P.-N., Steinbach M., Kumar V., Introduction to Data Mining, Pearson Addison-Wesley,2006.
    [2]Jain A.K., Murty M.N., Flyn P.J., Data Clustering:A Review, ACM Computing Surveys, 31(3),1999,264-323.
    [3]Holland J.H., Adaptation in Natural and Artificial Systems, University of Michigan Press,1975. Republished by the MIT press,1992.
    [4]Conradt L., Roper T.J., Group Decision-Making in Animals, Nature,421(6919), Jan. 2003,155-158.
    [5]Couzin I.D., Krause J., Franks N.R., et al., Effective Leadership and Decision-Making in Animal Groups on the Move, Nature,433(7205), Feb.2005,513-516.
    [6]Dussutour A., Fourcassie V., Helbing D., et al., Optimal Traffic Organization in Ants under Crowded Conditions, Nature,428(6978), Mar.2004,70-73.
    [7]Dorigo M., Stutzle T., Ant Colony Optimization, MIT Press,2004.
    [8]Bonabeau E., Dorigo M., Theraulaz G., Swarm Intelligence:From Natural to Artificial Systems, Oxford University Press,1999.
    [9]Bonabeau E., Dorigo M., Theraulaz G., Inspiration for Optimization from Social Insect Behaviour, Nature,406(6), Jul.2000,39-42.
    [10]Bonabeau E., Theraulaz G., Swarm Smarts, Scientific American,282(3),2000,72-79.
    [11]Maniezzo V., Colorni A., The Ant System Applied to the Quadratic Assignment Problem, IEEE Transactions on Knowledge Data Engineering,11(15),1999,769-778.
    [12]Rajendran C., Ziegler H., Ant-Colony Algorithms for Permutation Flowshop Scheduling to Minimize Makespan/Total Flowtime of Jobs, European Journal of Operational Research,155(2),2004,426-438.
    [13]Song Y.H., Chou C.S., Stonham T. J., Combined Heat and Power Economic Dispatch by Improved Ant Colony Search Algorithm, Electric Power Systems Research,52(2),1999, 115-121.
    [14]Sim K.M., Sun W.H., Ant Colony Optimization for Routing and Load-Balancing: Survey and New Directions, IEEE Transactions on Systems, Man and Cybernetics Part A: Systems and Humans,33(5),2003,560-572.
    [15]Montemanni R., Smith D.H., Allen S.M., An Ants Algorithm for the Minimum-Span Frequency Assignment Problem with Multiple Interference, IEEE Transactions on Vehicular Technology,51(5),2002,949-953.
    [16]Dorigo M., Gambardella L.M., Ant Colony System:A Cooperative Learning Approach to the Traveling Salesman Problem, IEEE Transactions on Evolutionary Computation,1(1),1997,53-66.
    [17]Kennedy J., Eberhart R.C., Shi Y., Swarm Intelligence, Morgan Kaufmann Publishers, 2001.
    [18]Clerc M., Kennedy J., The Particle Swarm-Explosion, Stability, and Convergence in a Multidimensional Complex Space, IEEE Transactions on Evolutionary Computation,6(1), 2002,58-73.
    [19]Naka S., Genji T., Yura T., et al., A Hybrid Particle Swarm Optimization for Distribution State Estimation, IEEE Transactions on Power Systems,18(1),2003,60-68.
    [20]Kannan S., Slochanal S.M.R., Subbaraj P., et al., Application of Particle Swarm Optimization Technique and its Variants to Generation Expansion Planning Problem, Electric Power Systems Research,70(3),2004,203-210.
    [21]Yin P., A Discrete Particle Swarm Algorithm for Optimal Polygonal Approximation of Digital Curves, Journal of Visual Communication and Image Representation,15(2),2004, 241-260.
    [22]Passino K.M., Biomimicry of Bacterial Foraging for Distributed Optimization and Control, IEEE Control Systems Magazine,22(3),2002,52-67.
    [23]Monson C.K., Seppi K.D., The Kalman Swarm:A New Approach to Particle Motion in Swarm Optimization, In Proceedings of Genetic and Evolutionary Computation (GECCO 2004), Washington, USA,2004,140-150.
    [24]Monson C.K., Seppi K.D., Improving on the Kalman Swarm:Extracting Its Essential Characteristics, In Late Breaking Papers of the Genetic and Evolutionary Computation (GECCO 2004), Washington, USA,2004.
    [25]Hopfield J.J., Tank D.W., Computing with Neural Circuits:A Model, Science, 233(4767), Aug 1986,625-633.
    [26]Lee K.Y., Sode-Yome A., Park J.H., Adaptive Hopfield Neural Networks for Economic Load Dispatch, IEEE Transactions on Power Systems,3(2),1998,519-526.
    [27]Aiyer S.V.B., Niranjan M., Fallside F., A Theoretical Investigation into the Performance of the Hopfield Model, IEEE Transactions on Neural Networks,1 (4),1990, 204-215.
    [28]Li L., Yang Y., Peng H., et al., Parameters Identification of Chaotic Systems via Chaotic Ant Swarm, Chaos, Solitons and Fractals,28(5), June 2006,1204-1211.
    [29]Li L., Peng H., Wang X., et al., An Optimization Method Inspired by "Chaotic" Ant Behavior, International Journal of Bifurcation and Chaos,16(8),2006,2351-2364.
    [30]吴启迪;汪镭,智能蚁群算法及应用,上海科技教育出版社,2004年4月.
    [31]吴斌;史忠植,一种基于蚁群算法的TSP问题分段求解算法,计算机学报,24(12),2001,1328-1333.
    [32]Zhou Y., Zeng G, Yu F., Particle Swarm Optimization-Based Approach for Optical Finite Impulse Response Filter Design, Applied Optics,42(8),2003,1503-1507.
    [33]Zhang X. Yu L., Shen Y., et al., Two-Stage Adaptive PMD Compensation in a 10 Gbit/S Optical Communication System Using Particle Swarm Optimization Algorithm, Optics Communications,231(1-6),2004,233-242.
    [34]Sokal R.R., Sneath P.H.A., Principles of Numerical Taxonomy, San Francisco, CA: W.H. Freeman,1963.
    [35]MacQueen J., Some Methods for Classification and Analysis of Multivariate Observations, In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1(1),1967,281-297.
    [36]Ng R., Han J., Efficient and Effective Clustering Methods for Spatial Data Mining, In Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, Morgan Kaufmann,1994,144-155.
    [37]Zhang T., Ramakrishnan R., Linvy M., BIRCH:An Efficient Data Clustering Method for Very Large Databases, In Proceedings of ACM SIGMOD International Conference on Management of Data, ACM Press,1996,103-114.
    [38]Guha S., Rastogi R., Shim K., CURE:An Efficient Clustering Algorithm for Large Database, ACM SIGMOD,1998,73-84.
    [30]Ester M., Kriegel H., Sander J., et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, AAAI Press, 1996,126-231.
    [40]Ester M., Kriegel H., Sander J., et al., Density-connected Sets and Their Application for Trend Detection in Spatial Databases. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD'97), Neport Beach, Ca, AAAI Press,1997,10-15.
    [41]Ankerst M., Breunig M., Kriegel H., et al., OPTICS:Ordering Points to Identify the Clustering Structure, In Proceedings of ACM SIGMOD'99 International Conference on Management of Data, Philadelphia, PA,28(2),1999,49-60.
    [42]Wang W., Yang J., Muntz R., STING:A Statistical Information Grid Approach to Spatial Data Mining, In Proceedings of the 23rd International Conference on VLDB, Morgan Kaufmann,1997,186-195.
    [43]Agrawa R., Gehrke J., Gunopulos D., et al., Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington,27(2), 1998,94-105.
    [44]Sheikholeslami G, Chatterjee S, Zhang AD, WaveCluster:A Multi-resolution Clustering Approach for Very Large Spatial Databases, In Proceedings of the 24th International Conference on Very Large Data Bases,1,1998,428-439.
    [45]范周田;黄铮;张方,聚类问题的人工神经网络方法,数理统计与应用概率,1,1996.
    [46]Bezdek J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press,1981,95-107.
    [47]Zhang J., Leung Y., Improved Possibilistic C-means Clustering Algorithms, IEEE Transactions on Fuzzy Systems,12(2),2004,209-217.
    [48]Dhillon I.S., Guan Y., Kulis B., Weighted Graph Cuts without Eigenvectors:A Multilevel Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11),2007,1944-1957.
    [49]Dhillon I.S., Guan Y., Kulis B., A Unified View of Kernel k-means, Spectral Clustering and Graph Partitioning, Technical Report TR-04-25, UTCS,2005.
    [50]Filippone M., Camastra F., Masulli F., et al.,A Survey on Spectral and Kernel methods for clustering, Pattern Recognition,41(1),2008,176-190.
    [51]Hruschka E., Campello R., Castro L. de, Evolving Clusters in Gene-expression Data, Information Sciences,176(13),2006,1898-1927.
    [52]Shelokar P.S., Jayaraman V.K., Kulkarni B.D., An Ant Colony Approach for Clustering, Analytica Chimica Acta,509(2),2004,187-195.
    [53]van der Merwe D., Engelbrecht A.P., Data Clustering Using Particle Swarm Optimization, In Proceedings of IEEE Congress on Evolutionary Computation,2003, 215-220.
    [54]Ghosh A., Halder A., Kothari M., et al., Aggregation Pheromone Sensity based Data Clustering, Information Sciences,178(3),2008,2816-2831.
    [55]Handl J., Knowles J., Dorigo M., Ant-based Clustering and Topographic Mapping, Artificial Life,12(1),2006,35-62.
    [56]Wan M., Li L., Xiao J., Yang et al., CAS based Clustering Algorithm for Web Users, Nonlinear Dynamics,61(3),2010,347-361.
    [57]Theodoridis S., Koutroumbas K., Pattern Recognition, Third Edition, Academic Press, 2006.
    [58]Berry M.J.A., Linoff G., Data Mining Techniques for Marketing, Sales and Customer Support, John Wiley& Sons, Inc.,1996.
    [59]Cooley R., Mobasher B., Srivastava J., Data Preparation for Mining World Wide Web Browsing Patterns, Journal of Knowledge and Information Systems,1(1),1999,5-32.
    [60]Etzioni O., The World-wide Web:Quagmire or Gold mine? Communications of the ACM 39(11),1996,65-68.
    [61]Facca F.M., Lanzi P.L., Mining Interesting Knowledge from Weblogs:A Survey, Data & Knowledge Engineering 53,2005,225-241.
    [62]Joshi K.P., Joshi A., Yesha Y., On Using a Warehouse to Analyze Web Logs, Distributed and Parallel Databases 13(2),2003,161-180.
    [63]Nanopoulos A., Katsaros D., Manolopoulos Y., Exploiting Web Log Mining for Web Cache Enhancement, In Proceedings of WEBKDD2001,2001,68-87.
    [64]Han J., Kamber M., Data Mining Concepts and Techniques, Morgan Kaufmann,2001.
    [65]Niu E.S., El-Ramly M., Understanding Web Usage for Dynamic Web-site Adaptation: A Case Study, In Proceedings of the 4th International Workshop on Web Site Evolution, 2002,53-64.
    [66]Huang X., Cercone N., An A., Comparison of Interestingness Functions for Learning Web Usage Patterns, In Proceedings of the 11th International Conference on Information and Knowledge Management, ACM Press,2002,617-620.
    [67]Mortazavi-Asl B., Discovering and Mining User Web-page Traversal Patterns, [Dissertation], Simon Fraser University,2001.
    [68]Pei J., Han J., Mortazavi-asl B., Zhu H., Mining Access Patterns Efficiently from Web Logs, In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining,2000,396-407.
    [69]Menasalvas E., Millan S., Pena J., et al., Subsessions:A Granular Approach to Click Path Analysis, International Journal of Intelligent Systems,19(7),2004,619-637.
    [70]Jespersen S.E., Thorhauge J., Pedersen T.B., A Hybrid Approach to Web Usage Mining, In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, Springer-Verlag,2002,73-82.
    [71]Shahabi C., Banaei-Kashani F., A Framework for Efficient and Anonymous Web Usage Mining based on Client-side Tracking, Lecture Notes in Computer Science,2356, 2002,113-144.
    [72]Banerjee A., Ghosh J., Clickstream Clustering Using Weighted Longest Common Subsequences, In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining,2001,33-40.
    [73]Huang J.Z., Ng M., Ching W.K., et al., A Cube Model and Cluster Analysis for Web Access Sessions, Lecture Notes in Computer Science,2356,2002,48-67.
    [74]Heer J., Chi E.H., Mining the Structure of User Activity Using Cluster Stability, In Proceedings of the Workshop on Web Analytics, Second SIAM Conference on Data Mining, ACM Press,2002.
    [75]Xie Y., Phoha V.V., Web User Clustering From Access Log Using Belief Function, In Proceedings of the 1st International Conference on Knowledge Capture (K-CAP2001), ACM Press,2001,202-208.
    [76]Hay B., Wets G., Vanhoof K., Clustering Navigation Patterns on a Website Using a Sequence Alignment Method, In Proceedings of 17th International Joint Conference on Artificial Intelligence, Seattle, WA, USA,2001,1-6.
    [77]Chen Y.-S., Shahabi C., Automatically Improving the Accuracy of User Profiles with Genetic Algorithm, In Proceedings of International Conference on Artificial Intelligence and Soft Computing,2001,21-24.
    [78]Nasraoui O., Gonzalez F., Dasgupta D., The Fuzzy Artificial Immune System: Motivations, Basic Concepts, and Application to Clustering and Web Profiling, In Proceedings of the World Congress on Computational Intelligence (WCCI) and IEEE International Conference on Fuzzy Systems,2002,711-716.
    [79]吴健;董金祥,关于个性化网站的研究,计算机应用研究,9,2000,21-22.
    [80]董一鸿;庄越挺,基于新型的竞争型神经网络的Web日志挖掘,计算机研究与发展,40(5),2003,661-667.
    [81]陆丽娜;杨怡玲,Web日志挖掘中的序列模式识别,小型微型计算机系统,21(5),2000,481-483
    [82]宋擒豹;沈钧毅,Web页面和客户群体的模糊聚类算法,小型微型计算机系统,22(2),2001,229-231.
    [83]苏中;马少平;杨强等,基于Web-Log Mining的N元预测模型,软件学报,13(1),2002,136-141.
    [84]周龙镶;阳小华,基于用户访问模式的WWW浏览路径优化,软件学报,12(6),2001,846-850.
    [85]王实;高文;李锦涛,路径聚类:在Web站点中的知识发现,计算机研究与发展,38(4),2001,482-486.
    [86]Gong C.B., Mats N. Building an Adaptive Website Based on User Access Patterns, In Proceedings of the 2005 International Conference on Cyberworlds,2005,358-362.
    [87]Li L., Yang Y., Peng H., Fuzzy System Identification via Chaotic Ant Swarm, Chaos Solitons and Fractals,40,2009,1399-1407.
    [88]Cai J., Ma X., Li L., et al., Chaotic Ant Swarm Optimization to Economic Dispatch, Electric Power Systems Research,77,2007,1373-1380.
    [89]Cole B.J., Is Animal Behavior Chaotic? Evidence from the Activity of Ants, B-Biological Sciences,244(1311),1991,253-259.
    [90]李丽香,一种新的基于蚂蚁混沌行为的群智能优化算法及其应用研究[博士论文],北京邮电大学,2006.
    [91]Sole R.V., Miramontes O., Goodwill B.C., Oscillations and Chaos in Ant Societies, Journal of Theoretical Biology,161,1993,343-357.
    [92]Kennedy J., Eberhart R., Particle Swarm Optimization, In Proceedings of the IEEE International Joint Conference on Neural Networks (ICW),4,1995,1942-1948.
    [93]Holland, J. H., Genetic Algorithms, Scientific American, July 1992,66-72.
    [94]Spath H., Cluster Analysis Algorithms for Data Reduction and Classification, Ellis Horwood, Upper Saddle River, NJ,1980.
    [95]Handl J., Knowles J., Cluster Generators:Synthetic Data for the Evaluation of Clustering Algorithms, http://dbkgroup.org/handl/generators/.
    [96]UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/index.html, University of California, Irvine, Department of Information and Computer Science, Center for Machine Learning and Intelligent Systems,2007.
    [97]Englebrecht A.P., Computational Intelligence:An Introduction, John Wiley and Sons, 2002.
    [98]Kim D.H., Abraham A., Cho J.H., A Hybrid Genetic Algorithm and Bacterial Foraging Approach for Global Optimization, Information Sciences,177(18),2007,3918-3937.
    [99]Guney K., Basbug S., Interference Suppression of Linear Antenna Arrays by Amplitude-only Control Using a Bacterial Foraging Algorithm, Progress in Electromagnetics Research,79,2008,475-497.
    [100]Mishra S., Bhende C.N., Bacterial Foraging Technique-based Optimized Active Power Filter for Load Compensation, IEEE Transactions on Power Delivery,22(1),2007, 457-465.
    [101]Kim D.H., Cho J.H., Bacterial Foraging based Neural Network Fuzzy Learning, In Proceedings of IICAI 2005,2005,2030-2036.
    [102]Pal S.K., Ghosh A., Uma Shankar B., Segmentation of Remotely Sensed Images with Fuzzy Thresholding and Quantitative Evaluation, International Journal on Remote Sensing, 21(11),2000,2269-2300.
    [103]Dorigo M., Maniezzo V., Ant System:Optimization by a Colony of Cooperating Agents, IEEE Transactions on Systems, Man, and Cybernetics-Part B,26(1),1996,29-41.
    [104]van den Bergh F., An Analysis of Particle Swarm Optimizers [Dissertation], Department of Computer Science, University of Pretoria, Pretoria, South Africa,2002.
    [105]Wolpert D.H., Macready W.G., No Free Lunch Theorems for Optimization, IEEE Transactions on Evolutionary Computation,1(1),1997,67-82.
    [106]Landauer T., Dumais S., A Solution to Platos Problem:The Latent Semantic Analysis Theory for Acquisition, Induction and Representation of Knowledge, Psychological Review,104(2),1997,211-240.
    [107]Hou J., Zhang Y., Constructing Good Quality Web Page Communities, In Proceedings of the 13th Australasian Database Conferences (ADC2002),36,2002,65-74.
    [108]Hou J., Zhang Y, Effectively Finding Relevant Web Pages from Linkage Information, IEEE Transactions on Knowledge and Data Engineering,15(4),2003,940-951.
    [109]Zhou Y., Jin X., Mobasher B., A Recommendation Model Based on Latent Principal Factors in Web Navigation Data, In Proceedings of the 3rd International Workshop on Web Dynamics, New York 2004, ACM Press.
    [110]Jin X., Zhou Y., Mobasher B., A Unified Approach to Personalization Based on Probabilistic Latent Semantic Models of Web Usage and Content, In Proceedings of the AAAI 2004 Workshop on Semantic Web Personalization (SWP'04),2004.
    [111]Feng S., Wang D., Yu G., et al., Extracting Common Emotions from Blogs Based on Fine-grained Sentiment Clustering, Knowledge and Information Systems,24(1),2010, DOI 10.1007/s10115-010-0325-9.
    [112]Kanerva P., Kristofersson J., Holst A., Random Indexing of text samples for Latent Semantic Analysis, In Proceedings of the 22nd Annual Conference of the Cognitive Science Society,2000,1036.
    [113]Sahlgren M., Karlgren J., Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora, Journal of Natural Language Engineering, Special Issue on Parallel Texts,11(3),2005,1-14.
    [114]Chatterjee N., S. Mohan, Discovering Word Senses from Text Using Random Indexing, Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, in Gelbukh A. (Ed.):CICLing 2008, LNCS 4919,2008,299-310.
    [115]J. Gorman, J.R. Curran, Random Indexing Using Statistical Weight Functions, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2006,457-464.
    [116]Kanerva P., Sparse Distributed Memory, The MIT Press,1988.
    [117]Kanerva P., Sjodin G., Kristofersson J., et al., Computing with Large Random Patterns, In Uesaka Y., Kanerva P., Asoh H., Foundations of Real-world Intelligence, CSLI Publications,2001.
    [118]Curran J.R., From Distributional to Semantic Similarity [Dissertation], University of Edinburgh, UK,2004.
    [119]Halkidi M., Vazirgiannis M., Batistakis I., Quality Scheme Assessment in the Clustering Process, In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000),2000,265-276.
    [120]Yang S., Li Y., Wu X., et al., Optimization Study on k Value of K-means Algorithm, Journal of System Simulation 18(3),2006,97-101.
    [121]Kaski S., Dimensionality Reduction by Random Mapping:Fast Similarity Computation for Clustering, In Proceedings of the International Joint Conference on Neural Networks (IJCNN98),1,1999,413-418.
    [122]Berendt B., Using Site Semantics to Analyze, Visualize, and Support Navigation, Data Mining and Knowledge Discovery,6(1),2002),37-59.
    [123]Fu Y., Creado M., Ju C., Reorganizing Web Sites Based on User Access Patterns, In Proceedings of the Tenth International Conference on Information and Knowledge Management,2001,583-585.
    [124]Mobasher B., Cooley R., Srivastava J., Automatic Personalization Based on Web Usage Mining, Communications of the ACM,8(43),2000,142-151.
    [125]Bezerra B.L.D., Carvalho F.A.T., Symbolic Data Analysis Tools for Recommendation Systems, Knowledge and Information Systems,26(3),2010,385-418.
    [126]IBM, SurfAid Analytics, http://surfaid.dfw.ibm.com,2003.
    [127]Ansari S., Kohavi R., Mason L., et al., Integrating E-commerce and Data Mining: Architecture and Challenges, In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001),2000,27-34.
    [128]Nasraoui O., Frugui H., Krishnapuram R., et al., Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering, International Journal on Artificial Intelligence Tools,4(9),2000,509-526.
    [129]Krishnapuram R., Joshi A., Nasraoui O., et al., Low-complexity Fuzzy Relational Clustering Algorithms for Web Mining, IEEE Transaction of Fuzzy System 4(9),2003, 596-607.
    [130]Cadez I., Heckerman D., Meek C., et al., Visualization of Navigation Patterns on a Website Using Model Based Clustering, Technical Report MSR-TR-00-18, Microsoft Research, March 2002.
    [131]Paliouras G., Papatheodorou C., Karkaletsis V., et al., Clustering the Users of Large Web Sites into Communities, In Proceedings of the International Conference on Machine Learning (ICML),2000,719-726.
    [132]Oceans Research Group. http://cs-www.bu.edu/groups/oceans/Home.html, Department of Computer Science, Boston University.
    [133]The Internet Traffic Archive. http://ita.ee.lbl.gov/index.html
    [134]Cooley R., Web Usage Mining:Discovery and Application of Interesting Patterns from Web Data [Dissertation], University of Minnesota,2000.
    [135]Anderson C.R., A Machine Learning Approach to Web Personalization [Dissertation], University of Washington,2002.
    [136]Catledge L.D., Pitkow J.E., Characterizing Browsing Strategies in the World-Wide Web, Computer Networks and ISDN Systems,27,1995,1065-73.
    [137]Teng W., Chang C., Chen M., Integrating Web Caching and Web Prefetching in Client-Side Proxies, IEEE Transactions on Parallel and Distributed Systems,16,2005, 444-455.
    [138]Lan B., Bressan S., Ooi B.C., et al., Rule-Assisted Prefetching in Web Server Caching, In Proceedings of ACM International Conference on Information and Knowledge Management,1,2000,504-11.
    [139]Nanopoulos A., Katsaros D., Manolopoulos Y., Effective Prediction of Web-User Accesses:A Data Mining Approach, In Proceedings of WebKDD Workshop of Web Usage Analysis and User Profiling,2001.
    [140]Bundschus M., Yu S., Tresp V., et al., Hierarchical Bayesian Models for Collaborative Tagging Systems, In Proceedings of IEEE International Conference on Data Mining (ICDM 2009),2009,728-733.
    [141]Tian W., Choi B., Phoha V. V., An Adaptive Web Cache Access Predictor Using Neural Network, In Proceedings of 15th International Conference on IEA/AIE,2358,2002, 450-459.
    [142]Wu Y., Chen A., Prediction of Web Page Accesses by Proxy Server Log, World Wide Web,5,2002,67-88.
    [143]许欢庆;王永成,基于用户访问路径分析的网页预取模型,软件学报,14(6),2003,1142-1147.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700