基于web挖掘的自适应站点研究

英文题名：The Research of Adaptive Web Sites Based on Web Mining
作者：方成效
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：自适应站点 ; Web日志挖掘 ; 路径补充算法 ; MFFP~+ ; MAWSS
英文关键词：Adaptive web sites ; Web usage mining ; Path complementary algorithm ; MFFP~+ ; MAWSS
学位年度：2006
导师：袁可风
学科代码：081203
学位授予单位：华东交通大学

摘要

随着互连网技术和电子商务的迅猛发展，Internet正在前所未有地改变着我们的生活。越来越多的商品交易和服务通过Web来进行，如何更好地适应市场的变化、更好地为顾客服务成为各个网站关注的热点。为了更好地解决经营者和顾客的关系，自适应站点成为当前研究的热点。
     用户访问站点的日志文件为我们提供了一个观察用户与站点交互的机会。本文就是通过对web日志文件的分析和挖掘来研究和构建自适应站点。
     本文对构建自适应站点的基础理论和算法进行了全面的研究：分析了网站的类型和用户的浏览习惯；提出了全新的基于含弹出式页面的树形站点的会话识别和路径补充算法；给出了改进的最大向前频繁路径挖掘算法和目标页关联算法。
     为了应用和验证以上算法，实现了基于J2EE的自适应站点系统MAWSS。该系统由数据预处理、站点调整、页面推荐和目标页关联四个模块组成，数据预处理是基础，站点调整是核心。
With the swift and violent development of Internet technology and e-commerce, Web is dramatically changing our lives unprecedented. Because more business transactions and servies are carried out through the Web, better services for the need of Web-based applications and understanding the action of customers become the focus of attention today. In order to solve the problems of relationship between customers and providers, adaptive Web sites become to the focus of study at present.
    Logs of user accesses to a site provide an opportunity to observe users interacting with that site. Through web usage mining this paper aims to research and build the adaptive web sites.
    This article aims to provide a comprehensive research on the principles and algorithm of building adaptive sites. Through analyzing the types of websites and users' browsing habits, it proposes a fully new session identification algorithm and trail path complementary algorithm for tree sites containing pop_up pages, and also provides the improved maximum forward frequent trail path algorithm and object page association algorithm.
    To practice and verify the proposed algorithm and realize the J2EE based adaptive site MAWSS, which consists of four modules: data pretreatment, site adaptation, page recommendation and object page association. Among the four modules, the data pretreatment and the site adaptation play a basic and central role respectively.

引文

[1] Liliana Ardissono and Anna Goy. Trailoring the interaction with users in electronic shops[J]. In Judy Kay, editor, User Modeling: Proceedings of the Seventh International Conference, UM99 Pages 35-44, Vienna, New York, 1999. Springer Wien New York.
    [2] 史忠植．知识发现[M]．北京清华大学出版社，2002，1．
    [3] 韩家炜，孟小峰，王静，李盛恩．Web挖掘研究[J]．计算机研究与发展，2001，38(4)：405-414．
    [4] 陈新中，李岩，谢永红，杨炳儒．Web挖掘研究[J]．计算机工程与应用，2002，(13)：42-44
    [5] 张娥，冯秋红，宣慧玉，田增瑞．Web使用模式研究中的数据挖掘[J]．计算机应用研究，2001，18(3)：80-83
    [6] A. Foss, W. Wang, O. R. Zaane, A Non-Parametric Approach to Web Log Analysis, Proc[J]. Web Mining Workshop, in conjunction with the SIAM International Conference on Data Mining, Chicago, IL, USA, April 7, 2001
    [7] M. Perkowitz, O. Etzioni, Towards Adaptive Web Sites: Conceptual Cluster Mining, in Proc[J]. 17th Int. Joint Conf. AL, 1999.
    [8] M. Perkowitz, O. Etzioni, Towards Adaptive Web Sites: Conceptual Framework and Case Study, in Proc[J]. 16th Int. Joint Conf: WWW Conference.
    [9] M. Perkowitz, O. Etzioni, Adaptive Sites: Automatically Learning from User Access Patterns, in Proc[J]. 16th Int. Joint Conf. WWW Conference.
    [10] Fink, J., Kobsa, A. and Nill, A. User-oriented adaptivity and adaptability in the AVANTI project. Designing for the Web[J]. Empirical studies, Microsoft Usability group, Redmond, WA.
    [11] Mike Perkowitz. Oren Etzioni, Towards adaptive Web sites: Conceptual framework and case study[J]. Artificial Intelligence 2000(118): 245-275
    [12] Eric Schwarzkopf. An Adaptive Web Site for the UM2001 Conference[J]. the 8th International Conference on User Modeling, Sonthofen Germany, July 2001.
    [13] 武新玲．自适应站点的研究与实现[D]．浙江大学硕士学位论文．2002，3：21-23，35-40
    [14] 李常青，唐世渭．基于关联分析的粗粒度级个性化信息挖掘[J]．计算机科学，2000，29(1)：36-38
    [15] Hipp, U. Guntzer, and G. Nakaeizadeh. Algorithms for Association Rule Mining-A General Survey and Comparison. In Proc[J]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000.
    [16] 陆丽娜，杨怡玲，管旭东，魏恒义．Web日志挖掘中的数据预处理的研究[J]．计算机土程，2000，26(4)：66-72
    [17] 邓英，李明．用户访问模式挖掘中数据预处理问题的研究[J]．计算机工程与应用，2002，(01)：188-190
    [18] 赵红玲，宋瀚涛，朱振东等．web日志挖掘中数据处理的研究[J]．计算机应用研究，2005，vol．22(6)：67-69
    [19] 吴强，梁继民，杨万海．Web日志挖掘预处理中的用户识别技术[J]．计算机科学，2002，29(4)：64-66
    [20] Bamshad Mobasher, RCooley, J Srivastava. Data Preparation for Mining World Wide Web Browsing Patterns[J]. Journal of Knowledge and Information Systems, 1999: 1(1): 5-32
    [21] 鲍钮，黄国兴等．“WWW中用户通用访问路径模式的发现，，[J]．计算机工程，2003．1
    [22] 鲍钮，黄国兴等．“基于WEB日志挖掘的网站结构优化方法”[J]．计算机工程，2003．9
    [23] Creating adaptive Web sites through usage-based clustering of URLSKnowledge and Data Engineering Exchange[J], (KDEX'99) 1999: 19-25
    [24] Jose Borges, Mark Levene. A Fine Grained Heuristic to Capture Web Navigation Patterns[J]. ACM SIGKDD, 2000, 2(7): 1-40
    [25] 郭岩，白硕．因子分析在基于用户兴趣的web文档聚类中的应用[J]．模式识别与人工智能，2005，vol．18(1)：81-88
    [26] Jaideep Srivastava, Robert Cooley, Mukund Deshpande et al. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data[J]. SIGKDD Explorations, 2000, 1(2): 12-23
    [27] 王实，高文，李锦涛，谢辉．路径聚类：在Web站点中的知识发现[J]．计算机研究与发展，2001，38(4)：482-486
    [28] J. Pei, J. Han, B. Mortazavi-asl, and H. Zhu. Mining access patterns efficiently from web logs. In Proc. of the 4th Pacific-Asia Conf[J]. on Knowledge Discovery and Data Mining, April 2000: pages 396-407.
    [29] J. M. Adamo. Data Mining for Association Rules and Sequentiai Patterns[J]. Springer Verlag, New York, 2001.
    [30] Pitkow J. and Pirolli P. Mining longest repeating subsequences to predict www surfing[J]. In Proceedings of the 1999 USENIX Annual Technical Conference, 1999.
    [31] 李林，崔志明．用户web日志序列模式挖掘研究[J]．微机发展，2005，vol．15(5)：119-121
    [32] Joe Zuffoletto．BEA Weblogic Server宝典[M]．电子工业出版社．2003，1．
    [33] 宁建平．J2EE参考大全[M]．电子工业出版社．2003，5．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700