基于Web数据挖掘的个性化服务研究

英文题名：Personalized Service Based on Web Data Mining
作者：郭辉辉
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：Web数据挖掘 ; 关联规则 ; FUP算法 ; 基于划分的FUP算法
英文关键词：Web data mining ; association rules ; FUP algorithm ; FUP algorithm based on classification
学位年度：2011
导师：崔广才
学科代码：081203
学位授予单位：长春理工大学
论文提交日期：2011-03-01

摘要

近年来,随着网络技术的发展,数据量的飞速增长与信息量的日益缺乏两者之间相互矛盾,数据挖掘技术越来越被人们所关注。纵观各种数据挖掘技术,关联规则挖掘已经成为数据挖掘方向的一个重要研究课题。目前最常用的就是利用关联规则算法来挖掘Web日志来发现用户的访问规律,兴趣爱好,从而实现对不同的用户提供不同的服务,达到个性化服务。
本文首先简要介绍了数据挖掘技术的产生、Web数据挖掘的基本概念、分类、挖掘过程及常用的几种相关技术,详细讨论了数据预处理的过程；其次阐述了关联规则.的概念,详细了分析了关联规则的经典算法——Apriori算法；再次详细讨论了关联规则的增量数据挖掘算法,并且在此基础上,分析和总结其算法的不足之处,本文提出了基于划分的FUP算法,提高关联规则挖掘的效率；最后通过实验实现了基于划分的FUP算法,并且将实验结果与传统的FUP算法进行比较分析,从而验证了基于划分的FUP算法的有效性和正确性。
In recent years, with the development of network technology, the rapid growth data and the increasing lack of information conflicts with each other and data mining has been growing concerned. Throughout the various data mining techniques, association rule mining has become an important research topic of the direction of data mining. Today we usuallly mine Web logs to discover the access rules and hobbies of users by the use of association rule mining algorithms in order to personalize service that the different users have different services.
First, this article briefly describes the generation of data mining, the basic concepts, classification, data mining process of Web data mining and several related technologies that are largely used and discusses in detail the process of data preprocessing; Second, this article describes the concept of association rules, analysis in detail the classical algorithm of association rules—Apriori algorithm;Also the paper discusses in detail the incremental data mining algorithm of association rules and analysis and summary the inadequacies of the algorithm on the base. This paper proposed FUP algorithms based on classification to improve the efficiency of association rule mining; Last the FUP algorithms based on classification is come true by experiments and the validity and accuracy is verified when experimental results were compared with the traditional FUP algorithm.

引文

[1]中国互联网信息中.第25次中国互联网发展状况统计报告.http://www.techcn.com.cn/index.php?doc-view-141271.html,2009
    [2]Tang Tianhao,Dou Jinsheng, Yao Gang, Wang Tianzhen and Yan Ming. DATA MINING AND KNOWLEDGE DISCOVERY.2003. IMECE
    [3]李国慧.Web数据挖掘技术.电脑知识与技术,2008年04期
    [4]曼春丽,朱宏,杨全胜.Web数据挖掘研究与探讨.现代电子技术,2005年08期
    [5]刘艳慧,雷英杰.基于Web数据挖掘技术研究.现代电子技术,2007年09期
    [6]潘正高.Web数据挖掘技术综述.电脑知识与技术,Vol.5,No.15,May 2009,pp_3852-3853,3858
    [7]夏火松.数据仓库与数据挖掘技术.北京：科学出版社,2004
    [8]秦忠宝.网络环境下数据挖掘若干问题的述评[J].西北轻工业学院报,2002,(4)：96-98.
    [9]赵颖.数据挖掘技术研究.中国新技术新产品,2009,NO.12
    [10]易芝,汪林林,王练.基于关联规则相关性分析的Web个性化推荐研究[J].重庆邮电大学学报(自然科学版)；2007年02期
    [11]丁振国,陈静.基于关联规则的个性化推荐系统[J].计算机集成制造系统-CIMS;2003年10期
    [12]居晓琴,周学全.Web数据挖掘技术探索与应用.计算机应用,2009.
    [13]韩静娴,国内外高校图书馆个性化信息服务的比较研究,职教探索与研究,2007,01：80-83
    [14]Cooley R., Mobasher B. and Srivastava J.Data preparation for mining word wide web browsing patterns[J]. Knowledge and Information Systems,2005,1(1),5-32.
    [15]J. L. Lin, and M. H. Dunham.Mining association rules:Anti-skew algorithms. Proceedings of the International Conference on Data Engingeering, Orlando, Florida, February 1998.
    [16]R. Agrawal, and R. Srikant. Fast algorithms for mining association rules in large database. Technical Report FJ9839, IBMAlmadenReaearchCenter, San Jose, CA, Jun.1994.
    [17]R Agrawl, Srikant. Fast algorithm for mining association rules. In:Proc of Int' l Conf Very Large Database, Chile:Mmrgan Kaufmann,1994, sep,487-499
    [18]A grawal R, Imielinsk iT, Swam I A. Mining association rules between sets of items in large database [A]. Proceeding of the 1993 ACM S IGMOD International Conference on Management of Data [C]. New York:ACM Press,1993.207-216
    [19]Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases[A]. Proc of the 21th International Conf-erence on Very Large Database[C]. Zurich, Switzerland, Sept 1995:432-443
    [20]Park J. S, Chen M. S, Yu P.S. An effective hash-based algorithm for mining association rules[A]. Proceedings of ACM SIGMOD International Conference On Management of Data[C]. San Jose, CA, May 1995:175～186
    [21]宋海生,关联规则的增量式更新算法[J].兰州大学学报,2004,40(2)
    [22]D W Chueng, J Han, V T Ng,et al. Maintenance of Discovered Association Rules in Large Database:An Incremental Update Technique[A]. Proc ICDC' 96[C].1996:106-114
    [23]D W Chueng, J Han, S D Lee, et al. A General Incremental Technique for Maintaining Discovered Association Rules[A].Proc DASFAA'97[C].1997:185-194
    [24]厉浩,李珊.一种新的FUP-Based关联规则增量式更新算法[J].计算机工程与科学.2005,27(7)：74-76
    [25]陈劲松,施小英.一种关联规则增量更新算法[J].计算机工程,2002,28(27)：106-107
    [26]李宝东、宋翰宗,关联规则增量更新算法研究[J].计算机工程与应用,2002,23
    [27]陈丽,陈根才.改进的增量式关联规则维护算法[J].系统工程理论与实践,2001(11)：14-19
    [28]高峰,谢剑英.发现关联规则的增量式更新算法[J].计算机工程,2000,26(12)：49-50
    [29]薛锦,陈原斌.一种实用的关联规则增量式更新算法[J].计算机工程与应用,2003(13)：212-217
    [30]石冰,郑燕峰.改进型关联规则增量式更新算法与实现[J].小型微型计算机系统,2000,21(12)：1327-1329
    [31]皋军,王建东.关联规则挖掘算法更新与拓展[J].计算机工程与应用,2003(35)
    [32]杨明,孙志辉.一种基于前缀广义表的关联规则增量式更新算法.计算机学报,2003年10期
    [33]Hanjw.Pei J.Yin Y Mining frequent patterns without candidate generation 2000

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700