Web使用挖掘在电子商务推荐系统中的应用研究

英文题名：Research on Application of Web Usage Mining in Electronic Commerce Recommendation System
作者：梁伟
论文级别：硕士
学科专业名称：管理科学与工程
中文关键词：Web使用挖掘 ; 电子商务推荐系统 ; 数据预处理 ; 序列模式
英文关键词：Web Usage Mining ; electronic commerce recommendation system ; data preprocessing ; frequent patterns
学位年度：2004
导师：张慧颖
学科代码：1201
学位授予单位：天津大学
论文提交日期：2004-12-01

摘要

电子商务的流行使数据挖掘成为商业竞争中一项必不可少的技术。用户对网站的访问产生了海量的原始数据,这些数据以Web日志文件格式存储于Web服务器中,没有数据挖掘技术便不可能将这些海量数据转化为有用的信息。本论文主要研究Web使用挖掘,因为可以通过Web使用挖掘了解到用户的浏览行为模式,而这恰恰是电子商务推荐系统成败的关键。Web使用挖掘是数据挖掘技术在Web日志文件上的应用,其目的是从中获取有价值的信息为电子商务推荐系统所用。
    本文首先提出了一个电子商务推荐系统的体系结构,然后详细讲解了该系统中各个模块的构造、功能以及如何相互协作从而最终完成推荐任务。并着重研究了数据预处理和序列模式挖掘的实现。数据预处理是Web使用挖掘过程中关键一步,其处理结果的质量直接影响后续步骤比如事务识别、路径分析、关联规则挖掘和序列模式挖掘等的效果。提出了数据预处理算法USIA,不但在一次处理过程中可以识别出用户和会话,而且实验证明其处理效率较高而且识别准确。
    为了满足关联规则和序列模式挖掘的需要,提出了一个简洁但是高效的算法Predictor。经第一阶段实验检验基本满足了页面实时推荐的需要,而且该算法同时实现了数据的增量挖掘。所有实验数据完全为实际网站Web日志数据,非模拟生成,进一步保证了实验结果的准确性和可靠性。
The rising popularity of electronic commerce makes data mining anindispensable technology for business competitiveness. Customers' access producesabundant raw data in the form of Web access log that is stored in Web server. Withoutdata mining technology, it is impossible to make any sense of such massive data. Inthis thesis, we focused on Web usage mining because it helps most appropriatelyunderstand users' behavioral patterns, which is the key to successful electroniccommerce recommendation system. Web Usage Mining is the application of datamining techniques to Web logs files in order to produce results used in some aspects,such as electronic commerce recommendation system.
    Firstly, a framework of electronic commerce recommendation system waspresented. Then its every module's function and how they correspond and worktogether was expatiated. Data preprocessing and frequent patterns mining werefocused. Data preprocess is a critical step in Web Usage Mining. The results of datapreprocessing are relevant to the next steps, such as transaction identification, pathanalysis, association rules mining, frequent patterns mining, and so forth. Analgorithm called USIA is presented and experimentally evaluated that its efficiency ishigh and it also can identify user and session exactly.
     A simple and efficient algorithm called Predictor was presented. It can mineassociation rules and frequent patterns effectively and correctly. It can satisfy the needof real time Web page recommendation and also can be used to incremental mining.Experiments conducted on real Web server logs verify the usefulness and practicalityof our proposed techniques.

引文

[1]http://robotics.stanford.edu/~ronnyk/WEBKDD2000/index.html
    [2]O. Etzioni. The world wide Web: Quagmire or gold mine. Communications of the ACM, 1996, 39(11):65～68.
    [3]S. Chakrabarti, B.E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Klienberg. "Mining the Web's Link Structure". IEEE Computer, August 1999,Vol. 32(8): 60～67
    [4]Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.
    [5]http://it.sohu.com/67/89/article215168967.shtml
    [6]R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of data, 1993,207～216
    [7]J.Han,J.Pei,and Y.Yin.Mining frequent patterns without candidate generation.In Proc.2000 ACM-SIGMOD Int.Conf.Management of Data(SIGMOD'00),Dalas,TX,May 2000.
    [8]Edith Cohen,Mayur Datar,Shinji Fujiwara, Aristides Gionis,Piotr Indyk,Rajeev Motwani,Jeffrey D.Ullman,Cheng Yang.Finding Interesting Associations without Support Pruning.
    [9]Jiawei Han,Sonny H.S. Chee,Jenny Y.Chiang.Issues for On-Line Analytical Mining of Data Warehouses.
    [10]Information Discovery,Inc.OLAP and DataMining,Bridging the Gap.
    [11]J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. Proceedings of the 30th Annual Symposium on Theory of Computing, ACM. 1998.
    [12]J. Han, J. Chiang, S. Chee, J. Chen, Q. Chen, S. Cheng, W. Gong, M. Kamber, K. Koperski, G. Liu, Y. Lu, N. Stefanovic, L. Winstone, B. Xia, O. R. Zaiane, S. Zhang, H. Zhu, “DBMiner: A System for Data Mining in Relational Databases and Data Warehouses'', Proc. CASCON'97: Meeting of Minds, Toronto, Canada, November 1997.
    [13]R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. WebWatcher: A learning apprentice for the world wide Web. In C. Knoblock and A. Levy, editors, Heterogeneous, Distributed Environments, Technical Report SS-95-08. AAAI Press, 1995, 6～12
    [14]M. Balabanovic, and Y. Shoham. Learning information retrieval agents: Experiments with automated Web browsing. In C. Knoblock and A. Levy, editors, Heterogeneous, Distributed Environments, Technical Report SS-95-08,AAAI Press, 1995,13～18
    [15]Lieberman, H. (1995). Letizia, an agent that assists Web browsing. In Proceedings of IJCAI-95. AAAI Press.
    [16]D. Mladenic, "Personal Webwatcher: Design and implementation, " Technical report ijs-dp-7472, School of Computer Science, Carnegie-Mellon University, Pittsburgh, USA, 1996.
    [17]Loren Terveen , Will Hill , Brian Amento , David McDonald , Josh Creter, PHOAKS: a system for sharing recommendations, Communications of the ACM, v.40 n.3, March 1997,59～62
    [18]H. Kautz, B. Selman, and M. Shah, “Referral Web: Combining Social Networks and Collaborative Filtering,” Comm. ACM, Vol. 40, No. 3, Mar. 1997,63～65
    [19]计算机世界报第 22 期 B13
    [20]Kurt D. Bollacker, Steve Lawrence, C. Lee Giles: CiteSeer: An Autonous Web Agent for Automatic Retrieval and Identification of Interesting Publications. Agents 1998,116-123
    [21]Gediminas Adomavicius , Alexander Tuzhilin, Using Data Mining Methods to Build Customer Profiles, Computer, v.34 n.2, February 2001,74～82
    [22] Ajith Abraham, Vitorino Ramos, Web Usage Mining Using Artificial Ant Colony Clustering and Genetic Programming, in CEC′03 -Congress on Evolutionary Computation, IEEE Press, ISBN 0780378040,Canberra, Australia, 8-12 Dec. 2003,1384～1391
    [23]Jespersen S.E., Thorhauge J., and Bach T., A Hybrid Approach to Web Usage Mining, Data Warehousing and Knowledge Discovery, LNCS 2454, Y.Kambayashi, W.Winiwarter, M.Arikawa (Eds.), 2002,73～82
    [24]Smith K.A and Ng A., Web page clustering using a self-organizing map of user navigation patterns, Decision Support Systems, Volume 35, Issue 2, 2003,245～256
    [25]Chi E.H., Rosien A. and Heer J., LumberJack:Intelligent Discovery and Analysis of Web User Traffic Composition. In Proceedings of ACMSIGKDD Workshop on Web Mining for Usage Patterns and User Profiles, Canada, ACM Press, 2002
    [26]马征,李建华,基于多代理技术的分布式 Web 日志挖掘系统,微计算机信息,2004,20(2):113～114
    [27]Yew Kwong Woon, Wee Keong Ng, Ee-Peng Lim: Online and Incremental Mining of Separately-Grouped Web Access Logs. WISE 2002,53～62
    [28]P. Pirolli, J. Pitkow, and R. Rao, “Silk from a sow's ear: Extracting usable structures from the Web,” in Proc. of 1996 Conference on Human Factors in Computing Systems (CHI-96),Vancouver, British Columbia,Canada,1996.
    [29]R.Cooley, B.Mobasher, and J.Srivastava. Grouping Web page references into transactions for mining world wide Web browsing patterns, in Proc. Of the IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97), 1997.
    [30]L. Catledge, J. Pitkow. Characterizing browsing behaviors on the World Wide Web, Computer Networks and ISDN Systems 27(6), 1995, 1065～1073
    [31]R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world Wide Web browsing patterns. Journal of Knowledge and Information Systems 1, 1999, 5～32
    [32]Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules [C]. In Proc. 1994 Int. Conf. Very Large Data Bases, Santiago, Chile, September 1994, pages 487～499
    [33]Mike Perkowitz, Oren Etzioni. Towards Adaptive Web Sites: Conceptual Framework and Case Study [J]. Artificial Intelligence, Vol. 118, no. 1-2, 2000, 245～275
    [34]Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa. Effective Personalization Based on Association Rule Discovery from Web Usage Data [C]. In Proc. of ACM Workshop on Web Information and Data Management (WIDM) 2001, 103～112
    [35]谢中,邱玉辉,商业站点推荐策略研究,计算机科学,2002,Vol.29,121-123
    [36]赵亮,胡乃静,张守志,个性化推荐算法设计,计算机研究与发展,2002,39(8):986～991
    [37]周斌,吴泉源,高洪奎,用户访问模式数据挖掘的模型与算法研究,计算机研究与发展,1999,36(7):870～875
    [38]王继成,潘金贵等,Web 文本挖掘技术研究,计算机研究与发展,1998,37(5):513～520
    [39]刑东山,宋擒豹,沈钧毅,一种新的 Web 使用模糊类算法的研究,西安交通大学学报,2002,36(8):822～826
    [40]陆建江,宋自林,钱祖平,挖掘语言值关联规则,软件学报,2001,12(4):607～611
    [41] 陆建江,宋自林,钱祖平,正态云关联规则在预测中的应用,计算机研究与发展,2000,37(11):1317～1320
    [42]邹翔,张巍,蔡庆生等,大型数据库中的高效序列模式增量式更新算法,南京大学学报(自然科学),2003,39(2):165～171
    [43]赵畅,杨冬青,唐世渭,Web 日志序列模式挖掘,计算机应用,2000,20(9):13～16
    [44]张兴,稀疏矩阵的一种存储方法,微计算机应用,1996,17(2):18～21
    [45]何炎祥,陈伟,孔维强等,Web 数据挖掘中的增量挖掘,计算机工程,2002,28(4):67～69
    [46]刑东山,沈钧毅,Web 使用挖掘的数据采集,计算机工程,2002,28(1):39～46
    [47]韩家炜,孟小峰,王静等,Web 挖掘研究,计算机研究与发展,2001,38(4):405～414
    [48]陆建江,徐宝文,邹晓峰,模糊规则发现算法研究,东南大学学报(自然科学版),2003,33(3):271～274
    [49]鲍玉斌,王大玲,于戈,关联规则和聚类分析在个性化推荐中的应用,东北大学学报(自然科学版),2003,24(12):1149～1152
    [50]余力,刘鲁,事件序列的相似性研究,计算机工程,2003,29(15):13～14
    [51]刑东山,沈钧毅,一个可以准确反映 Web 浏览兴趣的度量值――偏爱度,控制与决策,2004,19(3):307～310
    [52]佘东晓,陈传波,在网络使用挖掘中的应用条件序列模式分析,计算机工程与科学,2003,25(5):23～26
    [53] Zhang Huiying, Liang Wei,“An intelligent algorithm of data pre-processing in Web usage mining”, Proceedings of the World Congress on Intelligent Control and Automation (WCICA), v 4, WCICA 2004 -Fifth World Congress on Intelligent Control and Automation, Conference Proceedings, 2004, p3119～3123
    [54] 张慧颖,梁伟,基于用户访问模式挖掘的网页实时推荐研究,计算机应用,2004,24(6):74～77
    [55] 张慧颖,梁伟,Web 使用挖掘中的数据预处理算法研究,微型机与应用,2004,23(8):25～28

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700