基于Web使用挖掘的用户模式识别研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是近年来随着数据库技术和人工智能技术的发展而出现的一种全新信息技术,也是计算机科学与技术,尤其是计算机网络的发展和普遍应用所提出的迫切需要解决的重要课题。
     数据挖掘是从大量数据库中发现人们感兴趣的、隐藏的、先前未知的知识。数据挖掘技术主要研究结构化的数据挖掘,而Web数据挖掘是应用于WWW的技术研究,是从半结构或无结构的Web页面中抽取令人感兴趣的、潜在的模式。Web服务器日志记录具有良好的结构,非常有利于进行数据挖掘。Web使用挖掘是Web挖掘中三个研究领域中非常重要的一个研究方向,通过分析和探索Web日志记录中的规律,可以识别电子商务中的潜在客户,增强对用户的网络服务质量,并改进Web服务器系统的性能。
     本文在基于聚类的基础上讨论了Web使用挖掘中的各种问题。首先系统地阐述了从数据挖掘、Web数据挖掘到Web日志挖掘整个过程。通过对基于Web日志的数据挖掘的讨论,说明如何进行Web日志挖掘以及在Web日志挖掘中应采取的数据挖掘技术。然后从理论的角度对聚类进行较为全面的探讨,分析了聚类的概念,常见的聚类方法和常见的聚类的算法。在Web使用挖掘的模式识别阶段,本文对BIRCH算法改进,将改进的算法应用于Web用户模式识别中,验证了算法的有效性。
Data mining is a new information technology which appeared with the development of the database technology and artificial intelligence technology in recent years. Also it is an important subject which was proposed by the development and application of computer science and technology, especially by the development of computer network, and it should be solved urgently.
     Data mining is used to discover the interesting, hidden and unknown knowledge from mass data. And it mainly deals with the structural data, while web data mining is based on WWW, which gets the interesting and potential pattern from the semi-structural or non-structural web pages. The log files of web server with a nice structure will be convenient for data mining. Web usage mining is one of the most important research fields in web mining. It could find out the potential customers of e-commerce and enhance the quality of web service by analyzing and exploring the rules of web logs. Moreover, it could improve the performance of the web server.
     In this thesis, we discuss different questions of Web Usage Mining based on clustering. Firstly, it introduces the development from data mining and web data mining to web log mining. By discussing data mining based on web log, it shows how to process the web log mining and which data mining technology should be taken in web log mining. Then, we discuss the clustering technology in depth, and analyze the concept of clustering, the familiar clustering methods and algorithms. During pattern discovery phase of Web Usage Mining, the thesis presents an ameliorated solution on traditional BIRCH algorithm. And then the improved algorithm is used in users patterns discovery to prove the validity of the arithmetic.
引文
[1]陈新中,李岩,杨炳儒等.Web日志挖掘技术进展[J].系统工程与电子技术,2003,25(4):492-495.
    [2]M.Spiliopulou and L.C.faulstich.WUM:A Web Utilization Miner.In EDBT Workshop WebDB98,Valencia,Spain,1998.Springer Verlag.
    [3]T.Joachims,D.Freitag,and T.Mitchell.WebWatcher:A tuor guide for the world wide web.In the 15~(th)International Conference on Artificial Intelligence,Naboya,Japan,1997.
    [4]R.Cooley,B.Mobasher and J.Srivastava.Web mining:Information and pattern discovery on the World Wide Web.In International Conf.on Tools with Artificial Intelligence,Newport Beach,CA,1997.
    [5]J.Pei,J.Han,B.Mortazavi-Asl,Mining Access Patterns Efficiently from web logs,Proc.2000Pacific-Asia Conf.on Knowledge Discovery,and Data Mining,Kyoto,Japan,April 2000.
    [6]M.S.Chen,J.S.Park and P.S.Yu.Data Mining for Path Traversal Patterns in a Web Environment.In Proc.of the 16~(th)International Conference on Distributed Computing Systems,1996.
    [7]宋爱波,胡孔法,董逸生.Web日志挖掘[J].东南大学学报,2003,32(1):15-1.
    [8]陈才扣,金远平.挖掘基于Web的访问路径模式[J].小型微型计算机系统,2001,22(1):107-108.
    [9]杨怡玲,管旭东,尤晋元Web日志挖掘预处理中的Frame页面过滤算法[J].计算机工程,2001,27(2):76-77.
    [10]王实,高文,李锦涛.路径聚类:在Web站点中的知识发现[J].计算机研究与发展,2001,38(4):482-485.
    [11]Keedwell.E,Bessler.F,Narayanan.A.From data mining to rule refining.A new tool for post data mining rule optimization,Tools with Artificial Intelligence,2000.ICTAI 2000.Proceedings.12~(th)IEEE Internatinal Conference,on 13-15 Nov.2000:82-85.
    [12]Jiawei Han,Micheline Kamber著.数据挖掘概念与技术[M].北京:机械工业出版社,2001.8.45-53.
    [13]李雄飞,李军著.数据挖掘与知识发现[M].北京:高等教育出版社,2003.20-25.
    [14]Pakesh.Agrawal,John.C.Parallel Mining of Association Rules.IEEE Transactions on Knowledge and Data Engineering,Dec,1996,8(6):5-10.
    [15]范明,孟小峰.数据挖掘:概念和技术[M].北京:机械工业出版社,2001.76-85.
    [16]Raymong Kosala,Hendrik Blockeel.Web Mining Research:A Survey,In SIGKDD,2000,(7):1-15.
    [17]涂承胜,鲁明羽,陆玉昌.Web内容挖掘技术研究[J].计算应用研究,2001,(11):5-9.
    [18]杨炳儒,李岩,陈新中,王霞.Web结构挖掘[J].计算机工程,2003,(20):28-30.
    [19]S.Brin,L.page.The anatomy of a large-scale hypertextual Web search engine,In 7~(th)International World Wide Web Conference,Brisbane,Australia.1998.
    [20]李国辉,汤大权,武德峰.信息组织与检索[M].北京:科学出版社,2003.63-66.
    [21]Pitkow J.Insearch of reliable usage data on the WWW[C].In:Proc of 6~(th)Int'l World Wide Web Conf.Santa Clara,California,1997.
    [22]Jaideep Srivastava,Robert Cooley,Mukund Deshpande,Pang Ning Tan.Web usage mining:discovery and applications of usage patterns from Web data[J].Appear in SIGKDD Explorations,2000,1(2):pp 12-23.
    [23]邢东山.Web使用挖掘技术的研究[D].西安交通大学博士论文,2002.
    [24]Yonatan A,O ren E,Ronen F,Mike P.Predicting event sequences:data mining for prefetching web pages[EB/OL].http://citeseer.nj.nec.com/aumarm98predicting.html.1998.
    [25]Thorsten J,Dayne F,Tom M.Web watcher:a tour guide tbr the World Wide Web [C].Procedings of International Joint Conference on Artificial Intelligence(UCAI),Morgan Kaufmann,1997.
    [26]Lieberman H.Letizia:an agent that assists web browsing[C].In Proc Of the 1995International Joint Conference on Artificial Intelligence,Montreal,Canada,1995.
    [27]Ngu DSW and Wu X.Sitehelper:a localized agent that helps incremental exploration of the World Wide Web[C].In:6~(th)International World Wide Web Conference,Santa Clara,CA,1997.
    [28]Cohen E,Krishnamurthy B and Rexford J.Improving end to end performance of the web using sever volumes and proxy filters[C].Proceedings of ACM Sigcomm,1998,pp241-253.
    [29]Charu C.Aggarwal and Philip S Yu.On disk caching of web objects in proxy server[C].In:CIKM97,Las Vegas,Nevada,1997,pp238-245.
    [30]Web Log Analysis.Available[EB/OL].At:http://www.boutell.com/wusage.
    [31]Fast Stats.Analyzer[EB/OL].Available[EB/OL].At:http://www.mach5.com /fast/,1999.
    [32]Mike.Perkowitz.Oren.Etzioni.Adaptive web sites:Conceptual cluster mining[C].In:16~(th)International Joint Conference on Artificial Intelligence,Stockho lm.Swwden,1999.
    [33]Mike.Perkowitz.Oren.Etzioni.Adaptive web sites:automatically synthesizing web pages[C].In:15~(th)National Conference on Artificial Intelligence,Madison,WI,1998.
    [34]Osmar R.Zaiane,Man Xin,Jiawei Han.Discovering web access patterns and trends by applying OLAP and DataMining technology on Web Logs[A].In:Proc.Advances in Digital Libraries Conference ADL'98,Santa Barbara,CA,USA,April 1998:19-29.
    [35]Web Trends log analyzer[EB/OL].http://www.webtrends.com,1999.
    [36]Alex Buchner and Maurice D Mulvenna.Discovering internet marking intelligence through online analytical web usage mining[J].SIGMOD Record,1998,27(4):pp54-61.
    [37]赵伟,何丕廉,陈霞等.Web日志挖掘中的数据预处理技术研究[J].计算机应用,2003,(5):25-28.
    [38]陆丽娜,杨怡玲,管旭东等.Web日志挖掘中的数据预处理技术研究[J].计算机工程,2000.04.
    [39]张娥,郑斐峰,冯耕中.Web日志挖掘的数据预处理方法研究[J].计算机应用研究,2004,(2):58-60
    [40]刘立军,周军,梅红岩.Web使用挖掘的数据预处理[J].计算机科学,2007.5.
    [41]陈宝树,党齐民.Web数据挖掘中的数据预处理[J].计算机工程,2002,(7):125-127.
    [42]汤明伟,浅谈Cookie技术.常州信息职业技术学院学报[J].2005,(3):46-48.
    [43]易敏昕,汪胜,张有仁等.Web使用数据挖掘中数据预处理的研究[J].计算机工程与应用,2003,(24):154-157.
    [44]邓英,李明.用户访问模式挖掘中数据预处理问题的研究[J].计算机工程与应用,2002,(1):188-190.
    [45]董恒庆,梅清,Web日志挖掘数据预处理研究[J].现代计算机,2004,(3):6-9.
    [46]王熙熙,王丽娟,袁方等.Web用户访问模式挖掘[J].河北大学学报(自然科学版),2003, (4):404-409.
    [47]郭伟刚.电子商务网站用户访问模式挖掘中的预处理技术[J].计算机应用,2005,(3):691-694.
    [48]张健沛,刘建东,杨静.基于Web日志挖掘数据预处理方法的研究[J].计算机工程与应用,2003.10:191-193.
    [49]A.Joshi,C.Punyapu,P.Karnam,Personalization and a synchronicity to support mobile web access.In Proc.Workshop on Web Information and Data Management,7~(th)Intl.Conf.on Information and Knowledge Management,Nov 1998.
    [50]邓英,李明.Web数据挖掘技术及工具研究[J].计算机工程与应用,2001,(2):92-94.
    [51]王实,高文.数据挖掘中的聚类方法[J].计算机科学,2000.4,3(4):54-57.
    [52]姜园,张朝阳.用于数据挖掘的聚类算法[J].电子与信息学报,2005.(4):25-29.
    [53]邵峰晶,于忠清.数据挖掘原理与算法[M].北京:中国水利出版社,2003.08.69-78.
    [54]杨占华,杨燕SOM神经网络算法的研究与进展[J].计算机工程,2006.08,132(16):78-82.
    [55]Margaret H.Dunham.数据挖掘教程[M].清华大学出版社,2005.05.73-87.
    [56]S Guha,R Rastogi,K Shim.CURE:An efficient clustering algorithm for large databases[C].In:Proceedings of ACM SIGMOD International Conferencd on Management of Data,New York,ACM 1998:73-84.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.