一种商业智能中的OLAP与客户数据挖掘方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
商业经济的未来发展模式是以客户为中心的面向服务的运营模式,其核心是完善的服务保障体系和高效的经营管理策略。其中,Web日志分析和挖掘就是通过了解Web浏览用户的兴趣,来改善服务质量的一种商业智能的应用。本文将XML应用到OLAP中,并针对Web日志的分析和挖掘进行研究,主要做了如下几方面的工作。
     1.分析了XML在数据描述方面的优势,结合这些优势介绍XML在OLAP系统中的一些应用。
     2.提出一个基于XML的OLAP和数据挖掘过程。该过程考虑了异构数据源的集成,使用基于XML的数据立方体,OLAP系统与数据挖掘系统协同工作,并通过Internet发布分析和挖掘结果。
     3.具体分析了基于XML的OLAP和数据挖掘过程所涉及的关键技术,如基于XML的异构数据源集成的方法;引入了一种专门针对XML的多维表达式,用它构建数据立方体和进行多维查询,并介绍了使用XML描述Web日志立方体的具体方法。
     4.提出一种基于用户对网页分类的兴趣度来对用户聚类的方法,忽略用户对个别网页的点击量,并根据网页内容和深度设定网页的权重,可以理解为按用户宏观兴趣来聚类。最后通过实例,分析了这种方法与传统方法的区别。
     本文所做的研究工作,为基于XML的OLAP和数据挖掘系统的研究,和它在Web日志分析和挖掘中的应用,以及Web日志聚类的研究,提供了一定的参考价值。
In the future, the development pattern of commerce economy is a business pattern which takes customer as center and is service oriented. This pattern's core is perfect service ensures system and high-effect management control strategy. Web-log analysis and mining is an application of business intelligence whose purpose is to improve service quality through understanding web browsers' interest. This dissertation applies XML in OLAP, aiming at analysis and mining web-log. There are several aspects this dissertation does as fallows:
     1. This dissertation has analyzed the advantages of data description with XML and introduced some applications of XML in OLAP system.
     2. A process of OLAP and data mining based on XML is proposed. This process takes isomerous data source integration into account and use data cube with XML. In this process, OLAP system and data mining system associate, and the result of analysis and mining is released with internet.
     3. This dissertation has analyzed concretely the technologies involved in the process of OLAP and data mining based on XML, such as isomerous data source integration based on XML; Has introduced a multi-dimension expression based on XML, which is used to construct data cube and multi-selection; Has introduced the method of describing Web-log with XML.
     4. A method which makes users clustering with the rate of interest of users toward web page classes is proposed. This method ignores the hits amounts, set web page weight with its content and depth. It can be considered as a method with users' macro interest. At last, this dissertation analyzed the difference between this method and traditional ones.
     The research done by this dissertation, have provided certain reference value for the research of OLAP based on XML and its application in web log analysis and mining, and the research of web-log clustering.
引文
[1] Helen Hasan, Peter Hyland. Using OLAP and multidimensional data for decision making. IT Professional. 2001, 3(5): 44-80P
    [2] 刘夫涛,张雷,艾波.OLAM以及基于Web的OLAM.计算机工程与应用.2000(9):108-109页,156页
    [3] 石磊,石云,刘欲晓,周世俊.基于影响域的OLAM模型的研究.郑州大学学报(自然科学版).2000(6):16-20页
    [4] Han J W. Towards on-Line analytical mining in large databases. ACM SIGMOD Record. 1998, 27: 97-107P
    [5] 黄若波,左春,孙玉芳.基于Web环境下的OLAP技术的研究和实现.计算机工程.2000,26(10):7-8页,80页
    [6] 徐铭杰,李慧典.基于Web的空间OLAP研究.地域研究与开发.2002,21(3):89-92页
    [7] 叶德谦,张晶明,巩玉玺.基于并行分布处理的OLAP系统设计与实现.辽宁工程技术大学学报.2003,22(6):800-803页
    [8] 王晨,邵贝恩.构建集团企业总部的OLAP系统.航空精密制造技术.2003,39(2):19-23页
    [9] 陈细谦,迟忠先,徐世宏.有效支持空间OLAP的空间数据仓库模型研究.大连理工大学学报.2004,44(6):901-905页
    [10] M Jensen, T Moller, T Bach Pedersen. Specifying OLAP Cubes on XML Data. Journal Of Intelligent Information Systems. 2001, 17(2): 255-280P
    [11] Wang Xiaoling, Dong Yisheng. XML Based Data Cube and X-OLAP. Journal of Southeast University(English Edition). 2001, 17(2): 5-9P
    [12] 王晓玲,徐立臻,文继荣,董逸生.基于XML数据立方的面向对象扩展.小型微型计算机系统.2003(4):771-774页
    [13] 毕然,徐立臻.基于XML的OLAP信息发布技术.微型机与应用.2003(12):62-64页
    [14] 周斌,吴泉源,高洪奎.用户访问模式数据挖掘的模型与算法研究.计算机研究与发展.1999,36(7):103-108页
    [15] Aggarwal C, Yu P. Data Mining Techniques for Personalization. IEEE Data Engineering Bulletin. 2000, 23(1): 4-9P
    [16] Chen M S, Pads J S, Yu P S. Efficient Data Mining For Path Zraversal Patterns. Knowledge and Data Engineering. 1998, 10(2): 209-221P
    [17] 王新,马万青,潘文林.基于Web日志的用户访问模式挖掘.计算机工程与应用.2006(21):156-158页
    [18] Lin W, Alvarez S A, Ruiz C. Efficient Adaptive-support Association Rule Mining for Recommender Systems. Data Mining and Knowledge Discovery. 2002, 6(1): 83-105P
    [19] 张慧颖,焦霖楠.用户访问模式聚类分析在网页推荐中的应用.计算机工程.2006,32(15):64-66页
    [20] Mike Perkowitz, Oren Etzioni. Adaptive Web Sites: Automatically Synthesizing Web Pages. American Association for Artificial Intelligence. 1998: 727-732P
    [21] 周宽久,王艳萍,李瑶.Web用户聚类算法.计算机工程与应用.2006(16):184-186页,221页
    [22] Wang Shi, Gao Wen. Path clustering: Discovering the knowledge in the web site. Journal of Computer Research & Development. 2001, 38(4): 482-486P
    [23] 战立强,刘大昕.基于网页模糊分类的用户兴趣度分析方法.计算机工程与应用.2005(15):188-190页
    [24] 郭岩.用户兴趣空间的Web页面聚类.微电子学与计算机.2003(8):8-14页,68页
    [25] 喻纲,周定康.联机分析处理(OLAP)技术的研究.计算机应用.2001 (11):80-81页,84页
    [26] 蒋外文,熊东平,张肖霞.基于多维数据库的MOLAP存储及查询技术研究.计算机工程与应用.2005(24):166-168页
    [27] 罗可,蔡碧野,卜胜贤,谢中科.数据挖掘及其发展研究.计算机工程与应用.2002(14):182-184页,232页
    [28] 彭根木编著.数据仓库技术与实现.第一版.北京:电子工业出版社,2002:120-121页
    [29] 高新波著.模糊聚类分析及其应用.第一版.西安:西安电子科技大学出版社,2004:37页
    [30] 陈水利,李敬功,王向公编著.模糊集理论及其应用.第一版.北京:科学出版社,2005:96页
    [31] Paulraj Ponniah著.数据仓库基础.段云峰等译.第一版.北京:电子工业出版社,2004:469页
    [32] 蒲晓湘,刘文才.联机分析挖掘(OLAM)技术的现状与发展.重庆大学学报.2004(3):36-40页
    [33] 石磊,石云,刘欲晓,周世俊.基于影响域的OLAM模型的研究.郑州大学学报(自然科学版).2000(6):16-20页
    [34] 石磊,石云.OLAP与数据挖掘一体化模型的研究与发展.计算机科学.2000(27):45-49页
    [35] 毕然,徐立臻.基于XML的OLAP信息发布技术.微型机与应用.2003(12):62-64页
    [36] K. Mhd. An XML-based modeling language for the open interchange of decision models. Decision Support System. 2001, 31(4): 429-441P
    [37] Jensen M R, Moller T H, Pedersen T B. Converting XML data to UML diagrams for conceptual data integration. Data & Knowledge Engineering. 2003, 44(3): 323-346P
    [38] 姚家弈等编著.多维数据分析原理与应用.第一版.北京:清华大学出版社,2004:39页
    [39] M. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning. 2001(40):31-60P
    [40] Cooley R, Mobasher B. Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information System. 1999, 1(1): 17-24P
    [41] Srivastava J. Web usage mining: Discovery and application of usage patterns from Web data. SIGKDD Exploration. 2000, 1(2): 12-23P
    [42] 苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类.软件学报.2002,13(1):99-104页

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700