基于WEB日志的数据挖掘
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是数据库最活跃的领域之一。由于其广泛的应用背景和现实意义,数据挖掘技术的研究和应用都获得了突飞猛进的发展,在国内外的学术界和信息产业界备受关注。
     数据挖掘是从大量数据中发现人们感兴趣的、隐藏的、先前未知的知识。数据挖掘技术主要研究结构化的数据挖掘,而Web数据的挖掘是应用于Internet的技术研究,是从半结构或无结构的Web页面中,抽取感兴趣的、潜在的模式。尽管Internet是一个半结构化的系统,很难对它进行处理,但是Web服务器日志记录具有良好的结构,非常有利于数据挖掘的进行。此外,Web日志挖掘是Web使用挖掘的一个分支,它作为Web挖掘的一个重要组成部分,具有独特的理论和实践意义。
     本文系统地阐述了从数据挖掘、Web数据挖掘到Web日志挖掘整个过程,重点讨论Web日志的挖掘上。通过对基于Web日志的数据挖掘的讨论,说明如何进行Web日志挖掘及在Web日志挖掘中应采取的数据挖掘技术;然后将Web日志挖掘技术应用到商丘信息港网站,对其Web服务器的日志记录进行挖掘,建立一个Web日志挖掘系统。网络管理人员可以根据Web日志的分析结果改进网站的设计,实现网站的有效管理,保证网络的安全。最后对本文进行总结,并提出进一步的研究方向和将要做的工作。
One of the most important fields in database is Data mining. In view of its wide application and practical significance, the technique and application of data mining developed rapidly and attracted much more attention both in fields of academic research and information industry.
    Discovering the interested, hidden and unknown data from large data sets is the purpose of data mining. The main work of data mining is to deal with the structural data, whi le the web data mining is based on Internet to get the interesting and potential pattern from the half structural or not structural web pages. Data in Internet is a half structural system, and it is difficult to deal with them. Fortunately, the web sever log files have a nice structure and it is very convenient for data mining. Furthermore, web log mining is a branch of web usage mining and has special theory and practice significance as an important part of web mining.
    In this thesis, the process of data mining, web data mining and web log mining was reported. Focusing on the web log mining, the method and technology of web log mining were discussed in this thesis. Finally, the technology of web log mining was
    
    
    applied to shangqiu information web station (http://www. sqinfo. ha. en). Through the mining of its web sever log files, a data mining system based on web log mining was estabiish. The estabiished data mining system will faci Iitate the station management, the improvement of the design of web station and the security of network. At the end, the future direction and works in web log mining were proposed.
引文
1. http://www.cs.ust.hk/~qyang/537/PDF/weblog.pdf web log mining:an introduction
    2. http://www.dmgroup.org.cn/ 数据挖掘讨论组
    3. http://tjfx.myrice.com/index.htm 统计分析网站
    4. J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu "Mining Access Patterns Efficiently from Web Logs (PDF)", Proc. 2000 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'00), Kyoto, Japan, April 2000
    5. Masaru Kitsuroogawa, lko Pramudiono, Yusuko Ohura, and Masashi Toyoda. Some Experiences on Large Scale Web Mining.
    6. Yusuke Ohura, Katsumi Takahashi, Iko Pramudiono, Masaru Kitsuregawa. Expreiments on Query Expansion for Internet Yellow Page Services Using Web Log Ming.
    7. Charles X. Ling, Jianfeng Gao, Huajic Zhang. Mining Generalized Query Patterns from Web Logs.
    8. Jian Pei, Jiawei Han. Behxad Mortazavi-asd, and Hua Zhu. Mining Access Patterns Efficiently from Web Logs.
    9. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan. Web Usage Ming: Discovery and Applications of Usage Patterns from Web Data
    10. Susan Hendrix. Distill: A Web Log Analysis Tool.
    11. Anupam Joshi. On Mining Web Access Logs.
    12. Tapan Kamdar, Anupam Joshi. On creating Adaptive Web Servers using Weblog Mining.
    13. Ramakrishnan Srikant, Yinghui Yang. Mining Log Improve Site Organization.
    14. Mulvann MD, Discovering Internet marketing Intelligence through online Analytical web usage mining. [J] ACM SIGMOD Record, 1998,27(4)
    15. Ilgun K, Kemmerer R A, et al, State transition analysis A rule based intrusion detect approach. IEEE Transactions on Software Engineering, 1995,21(3).181-199
    16.邹显春,谢中,周炎晖.电子商务与Web数据挖掘.计算机应用.2001.5,21-23
    17.徐宝文,张卫丰.数据挖掘技术在Web预取中的应用研究.计算机学报.2001.4,430-436
    18.宋伟,王举成等.Internet数据挖掘原理及实现.重庆邮电学院学报.
    
    2001.6,58-61
    19.陈莉,焦李成.Internet/Web数据挖掘研究现状及最新进展.西安电子科技大学学报.2001.1,114-119
    20.袁友伟.基于Web的数据挖掘技术及访问路径模式的研究.株洲工学院学报.2001.5,38-40
    21.李绍华.OLAP和数据挖掘技术在Web日志上的应用.计算机应用技术.1999.3,16-18
    22.黄锦 李家滨.基于防火墙日志信息的入侵检测研究.计算机工程.2002.9,115-117
    23.常琤,田捷,李恒华,杨鑫.基于特征分析的网络入侵检测技术比较,第十二届中国计算机学会网络与数据通信学术会议论文.2002.12
    24.杨怡玲,管旭东,陆丽娜,尤晋元.一个简单的Web日志挖掘系统.上海交通大学学报.2000.7,932-935
    25.杨向荣,宋擒豹,沈钧毅.基于数据挖掘的智能化入侵检测系统.计算机工程.2001.9,115-117
    26.陈刚.基于代理的分布式数据挖掘系统设计.计算机工程.2001.9,65-67
    27.张静,田忠和.基于Ⅱ和Web日志的关联关系的挖掘.华中科技大学学报.2002.8,37-39
    28.肖立英,李建华,谭立球.Web日志挖掘技术的研究与应用.计算机工程.2002.7,276-277
    29.陆丽娜,魏恒义,杨怡玲,管旭东.Web日志挖掘中的序列模式识别.小型微型计算机系统.2000.5,481-483
    30.施建生,伍卫国,陆丽娜,杨怡玲.Web日志中挖掘用户浏览模式的研究。西安交通大学学报.2001.7,621-624
    31.范明,孟小峰.数据挖掘技术与概念技术.机械工业出版社.2001.8,290-295
    32.严蔚敏,吴伟民.数据结构.北京清华大学出版社.2001.3,188-191

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700