基于中国国家企业信息网的决策支持系统

英文题名：Decision Support System Based on Chinese National Enterprise Information Network
作者：陶玉香
论文级别：硕士
学科专业名称：模式识别与智能系统
中文关键词：决策支持系统 ; 数据挖掘 ; 数据预处理 ; Apriori算法
英文关键词：decision support system ; data mining ; data preprocessing ; Apriori algorithm
学位年度：2008
导师：高中文
学科代码：081104
学位授予单位：哈尔滨理工大学
论文提交日期：2008-03-01

摘要

随着Internet技术的发展,网络资源也迅猛增长。如何使Internet用户快速有效地获得所需的资源,已成为网站设计者亟待解决的问题。基于中国国家企业信息网的决策支持系统,将数据挖掘技术应用到Web服务器日志的挖掘,即通过挖掘服务器中的日志文件,获得用户的访问模式,从而进一步分析和研究用户的访问规律,来改进网站的组织结构和服务。
     本文将理论与实际应用相结合,开发了基于中国国家企业信息网的决策支持系统,并在以下几个方面做了深入的研究:
     1.分析了Web日志挖掘的流程以及Web数据的收集,详细分析了Web日志预处理的四个步骤:数据净化、用户识别、会话识别和路径补充,给出了具体实现流程,并在传统数据预处理的基础上加入时态信息,提出了基于时态数据库的Web日志预处理的流程。
     2.具体分析了Apriori算法,并对Apriori算法做了改进,经实验证明了其正确性,并且节约了时间,提高了算法的执行效率。
     3.结合中国国家企业信息网的日志及其拓扑结构,开发设计了基于中国国家企业信息网的决策支持系统。本系统是采用MVC模式,用基于Java和SqlServer技术实现的。系统主要做了两方面的内容:一是对网站的访问信息的统计,如日访问量等;二是利用Apriori算法对其进行数据挖掘,来发现隐藏的用户访问的路径和规则。
     基于中国国家企业信息网的决策支持系统在改进网站结构和网站性能方面,为网站的设计者和决策者提供了依据。如果进一步研究就可以实现用户浏览行为预测,以及能自动控制的自适应性网站。
Along with the development of the Internet technical,network resources also grow fast.So how to make the Internet customers acquire the resources which are needed effectively and quickly,become a very important problem for the website designers to resolve.
     The decision support system based on Chinese national enterprise information network,using the data mining technique in the web log mining, mining the log of the server to acquire the access mode of the customer, analyzing and studying the regulation of the log to improve the organization structure and the service of the website.
     The thesis combine the theories and physically applied together,developed the decision support system based on Chinese national enterprise information network,and do deep research follwing:
     1.Analyzed the mining processing of the web log and the web log collection,then analyzed the four steps in the preparing of the web log processing:data clean,user identification,session identification and path supple,also give the process flowing,a research of data preprocessing in web log based on temporal database.
     2.Analyzed the Apriori algorithm,and put forward a new algorithm of apriori.From the experiment we know it is right,also the algorithm can economized time and raised the efficiency.
     3.Using the topology structure and and web log of the Chinese national enterprise information network,develop the decision support system based on Chinese national enterprise information network.The system based on the MVC model,realize by using the Java and SqlServer technique.The system mainly do two things:statistic the access information,such as daliy accessing;to mining the website use the apriori algorithm,to discover the hidden path and rule which the user visited.
     The decision support system based on Chinese national enterprise information network can provide basis to the web decision maker and the web designer when improving the website structure and the website function.If further study , can carry out the self-adaptability of the website and the technique of the prediction of user browsing behavior.

引文

[1]赵红玲,宁瀚涛等.Web日志挖掘中数据预处理的研究[J].计算机应用研究,2005,(6):67-69.
    [2]吴新年,陈永平.决策支持系统发展现状与趋势分析[J].信息化与网络化建设,2007,No1:57-60.
    [3]王孝成.基于数据仓库与面向web的决策支持系统[J].教育信息化,2002,(7):39-40.
    [4]扬善林,倪志伟.机器学习与智能决策支持系统[M].北京:科学出版社,2003:1-384.
    [5]任明仑,杨善林,朱卫东.智能决策支持系统:研究现状与挑战[J].系统工程学报,2002,17(5):430-440.
    [6] MOHANNED J著.Apache Server系统管理员手册[P].北京:电子工业出版社.1999:271-290.
    [7] ZAIANE O R.Resource and Knowledge Discovery from the Internet and Multimedia Repositories[J].Vancouver:Simon FraserUniversity,1999.
    [8]彭运芳.论决策支持系统的发展概况和应用现状[J].科技与经济,2003,16(6):59-61.
    [9] PARR R O.Data Mining Cookbook-Modeling Data for Marketing,Risk,and Customer Relationship Management.John Wiley&Sons,Inc.,2001:21-59.
    [10]陈莉,焦李成.Internet/Web数据挖掘研究现状及最新进展[J].西安电子科技大学学报:自然科学版,2001,28(1):114-117.
    [11]恽爽,韩立新,董浚等.KDW综述:基于Web的数据挖掘[J].计算机工程,2003,29(1):284-286.
    [12]陈宁等.数据挖掘在Internet中的应用[J].计算机科学,1999.7,26(7):135-137.
    [13]周斌等.用户访问模式数据挖掘研究的模型与算法研究[J].计算机研究与发展,1997.7,36(7):870-875.
    [14] CHEN M S,PARD J S,YU P S.Effient Data mining for Path Traversal Patterns in a Web environment[C] . The 16th IEEE Internal Conf.On Distributed Computing Systems.1996.5,27(30):385-392.
    [15] MANNILA H ,TOIVON H.Discovering frequent dpisodes in Mining [J].Portland,Oregon,1996:146-151.
    [16] TAK Y,JACOBSEN M,HECTOR G M.From User Access Patterns to Dynamic Hypertext Linking[C].In Proceedings of the 5th International World Wide Web Conference.Paris,France,1996:222-227.
    [17] NGU D S W,WU X.Sitehelper:A localized agent that helps incremental exploration of the World Wide Web[C].In 6th International World Wide Web conference,Santa,Clara,CA,1997:691-700.
    [18] PERKOWITZ M,ETZIONI O.Adaptive Web Sites:Automatically Synthesizing Web Pages[C].In Proceedings of Fifteenth National Conference on Artificial Intelligence.Madison,WI,1997:321-332.
    [19] HAN J,ZAIANE O R,XIN M.Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Log[C].Advances in Digital Libraries Conf..(ADL’98),Santa Barbara,CA,April,1998:553-559.
    [20] SCHECHTER S,KRISHNANAND M,SMITH M D.Using path profiles to Predict http requests[C].In 7th International World Wide Web Conference,Brisbane, Australia,1998:623-630.
    [21] BORGES J , LEVENE M . Data mining of user navigation patterns[C].In Proceedings of the WEBKDD’99 Wordshop on Web usage analysis and User Profiling,San Diego,CA,USA,Agust 15,1999:112-117.
    [22] CHUAN J,HSIPENG L.Towards an understanding of the behavioural intention to use a web site[J] . International Journal of Infornation Management,2000:389-396.
    [23] GET Better Online Results and Share Insights Across your Organization [EB/OL].http://www.webtrends.com/WebTrendsAnalytics8.aspx.
    [24] Financial System for D2C[EB/OL].http://www.openmarket.com.
    [25] KOSALA R,BLOCKEEL H.Web Mining Research:A Survey.ACM SIGKDD Explorations,July,2000:125-132.
    [26]袁园,王永平.web数据挖掘技术综述[J].科技信息,2007,27:361.
    [27]张云涛,龚玲著.数据挖掘原理与技术[M].北京:电子工业出版社,2004:25-36.
    [28]范明,孟小峰.数据挖掘技术与概念技术[M].北京:机械工来出版社.2001.8:290-295.
    [29]赵伟,徐涌,王煦法.web日志挖掘中的数据预处理技术研究[J].计算机应用,2003.5,23(5):62-64,67.
    [30]代昆玉,胡滨.基于数据仓库的数据清理技术概述[J].贵州大学学报:自然科学版,2007,24(3):8-9.
    [31]王新梅,尹朝庆,吕亚兵,卢苇.基于文本挖掘的邮件分类与过滤[J].计算机工程与应用,2006,42(2):135-137.
    [32] DANIEL T,LAROSE M.Data Mining Methods and Models[J].Wiley-IEEE Press,2006:200-201.
    [33]王岚,翟正军.Web日志挖掘的预处理及路径补全算法的研究[J].微电子学与计算机,2006,23(8):113-116.
    [34]林志斌,刘明德,陈湘著.数据挖掘与OLAP理论与实务[M].北京:清华大学出版社,2003:75-76.
    [35]程红霞.基于关联规则的数据挖掘算法研究[J].数据库及信息管理,2006,12:593,609.
    [36]赵春玲,宁红云.Apriori算法的改进及其在物流信息挖掘中的应用[J].天津理工大学学报,2007,23(1):30-33.
    [37]区玉明,张师超,徐章艳等.一种提高Apriori算法效率的方法[J].计算机工程与设计,2004,25(5):846-848.
    [38]曾舸,刘先锋.关联规则挖掘中Apriori改进算法的研究[J].计算机与现代化,2007,(1):46-48.
    [39]王创新.关联规则提取中对Apriori算法的一种改进[J].计算机工程与应用,2004,34:183-185.
    [40]方芳,周力.WEB访问信息挖掘中的关联规则发现算法的研究[J].南昌航空工业学院学报:自然科学版,2005,20(4):73-76.
    [41]朱其祥,徐勇,张林.基于改进Apriori算法的关联规则挖掘研究[J].计算机技术与发展,2006.7,16(7):102-104.
    [42] Java开发工具JDK . http : //learning . sohu . com/upload/itweek01/pl-java.htm.
    [43] TYMANN T P,SCHNEIDER G M. Java现代软件开发技术[M].吴越胜,孙岩等译.北京:清华大学出版社,2005:1-667.
    [44] ECKEL B.Java编程思想[M].北京:机械工业出版社,2002:1-300.
    [45]飞思科技产品成本研发中心.SQL Server 2000 OLAP服务设计与应用[M].北京:电子工业出版社,2002:1-422.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700