基于Web数据挖掘分类算法的个性化信息服务
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Internet技术的发展,继数据挖掘技术成功地应用于传统数据库领域后,人们又开始尝试将其应用于其他类型的数据库。基于Web的数据挖掘技术(简称Web挖掘)就是在这种背景之下发展起来的。
     本论文介绍了数据挖掘、Web挖掘技术和OLAP技术的理论发展及实际应用,并重点研究了数据挖掘中分类算法在Web挖掘中的应用,从而实现各类用户的个性化信息服务。论文主要内容有以下几部分组成:1.介绍数据挖掘技术的发展,挖掘过程及分类算法的实际应用情况。2.详细讨论Web挖掘技术。包括Web挖掘产生的背景,技术特点,Web访问挖掘的用户访问事务确定方法等。3.详细介绍了一种基于Web数据挖掘的分类算法。4.为了实现用户个性化信息服务,作者将上面提到的算法应用到了Web数据挖掘上,并对其中的决策树建树算法进行了改进。这一部分是本论文的重点。5.最后作者建立了在线个性化信息服务工具模型,提出下一步要做的工作并展望了Web挖掘技术未来发展方向。
With the development of Internet technology, people begin to
    attempt to apply data mining technology, which has been applied in traditional database successfully, into the case of using other types of database. The data mining technology based on Web, which is called Web mining simply, occurs in this case.
    We introduces the development and application of data mining, Web mining and OALP technologies and focuses on the research of Web mining application using classification algorithm of data mining for individual information service. This paper consists of the following parts:
    1. introduction of development of data mining, mining procedure and application on classification algorithm.
    2. detailed discussion of Web mining technology, including the history and the characteristic of Web mining and methods of determining user access session, etc.
    3. detailed introduction of an online classification algorithm based on conceptual induction.
    4. the core of the paper is the improvement of the decision tree algorithm, which is applied in the Web data mining for the purpose of individual information service.
    5. finally the tool model for the online individual information service is offered, so do the next work and prospect the future of web data mining technology.
引文
[1] Timo Honkela at al. Contextual Relations of Words in Grimm Tals Analyzed by SOM. ICANN-95, Paris, 1995: 3-7
    [2] S.Deerwester, S.Dumais, G.Furnas, T.Landauer & R.Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, Vol.41, No.6, pp.391-407, 1990
    [3] Joachims, T. Freitag, D. & Mitchell, T. (1997) . Web Watcher: A Tour Guide for the World Wide Web. Proceeding of the 15th International Joint Conference on Artificial Intelligence IJCAI-97 (pp.770-775)
    [4] K.A.Oostendorp, W.F.Punce, & R.W.Wiggins, A tool for individualizing the web, In Proc.2nd International World Wide Web Conference, 1994
    [5] O.R.Zaiane & J.Han. Resource and knowledge discovery in global information system: A preliminary design and experiment. In Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining, pages331-336, Montreal, Quebec, 1995
    [6] I.Khosla, B.Kuhn, & N.Soparkar, Database search using information mining, In Proc. of 1996 ACM-SIGMOD Int. Conf. on Management of Data, 1996
    [7] B.MObasher, N.Jain, E.Han, & J,Srivastava. Web mining: Patterns discovery from world wide web transactions. Technical Report TR 96-050, University of Minnesota, Dept. of Computer science, Minneapolis, 1996
    [8] M.S.Chen, J.S.Park, & P.S.Yu, Data Mining for path traversal patterns in a web environment, In Proceedings of the 16th International Conference on Distributes Computing Systems, pages385-392, 1996
    [9] P.Pirolli, J.Pitkow, & R,Rao. Silk from a sow's ear: Extracting usable structures form the web. In Proc. of 1996 Conference on Humma Factors in Computing System (CHI-96) , Vancouver, British Columbia, Canada, 1996
    
    
    [10]J.Pitkow and Krishna K.Bharat. Webviz: A tool for world wide web access log analysis. In First International WWW Conference, 1994
    [11]C.Dyreson. Using an incomplete data cube as a summary data sieve. Bulletin of the IEEE Technical Committee on Data Engineering, pages19-26, 1997.3
    [12]汪晓岩基于WEB挖掘的个性化信息智能信息检索关键算法研究中国科学技术大学2000.3
    [13]范明孟小峰等译,Jiawei Han Micheline Kamber著,数据挖掘概念与技术,机械工业出版社2001。8
    [14]邢乃宁 基于增量式遗传算法的分类规则挖掘 东南大学2000
    [15]Junjie Chen, Peng Guo, Hantao Song, Research of Determination User Session in Web Data Mining, The Second International Conference on Active Media Technology, Chong Qing, China, 2003
    [16]W.J.Frawley, G.Piateetsky-Shapiro & C.J.Matheus. Knowledge discovery in database: An overview. In G.Piateetsky-Shapiro & W.J.Frawley, eds, Knowledge Discovery in Database, 1-27, AAAI/MIT Press, 1991.
    [17]R.Michalski, Machine Learning: An Artificial Intelligence Approach,Volume 1, Morgan Kaufmann, Sam Mateo, California, 1984
    [18]曾黄麟,粗集理论及应用,重庆大学出版社,1998
    [19]R.Cheeseman & J.Stutz, Bayesian classification: Theory and results,Advances in Knowledge Discovery and Data Mining, AAA/MIT Press, 1996
    [20]陆伟 吴朝晖,知识发现方法的比较研究,计算机科学2000,Vol.27,NO.3,80-89
    [21]陈国良等编著,《遗传算法及其应用》人民邮电出版社,1996
    [22]R.Agrawal & R.Srikant, Fast algorithm for mining association rules, In Proc. Of the 20th VLDB Conference, pages 487-499, Santiago, Chile, 1994
    [23]A.Savasere, E.Omiecinski & S.Navathe. An efficient algorithm for mining
    
    association rules in large databases, In Proc. Of the 21th VLDB Conference, pages 432-443,Zurich, Switzerland, 1995
    [24] H.Mannila, H.Toivonen, and A.I.Verkamo, Discovery frequent episodes in sequences. In Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining, pages210-215, Montreal, Quebec, 1995
    [25] R.Srikant and R.Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the fifth Int'l Conference on Extending Database Technology, Avignon, Trance, 1996
    [26] Chun-Nan Hsu, Chien-chi Chang, Finite State Transducers for semi-structured Text Mining
    [27] Robert Cooley, Pang-Ning Tan & Jaideep Srivastava, Discovering of Interesting Usage Patterns from Web Data
    [28] Bamshad Mobasher, Robert Cooley, Jaideep Srivastava Automatic Personalization Based on Web Usage Mining 2000
    [29] Alex Nanopoloulos, Yannis Manopoloulos, Find Generalized Path Patterns from Web Log Data Mining
    [30] 何炎祥等 时序模式的几种开采算法及比较分析,小型微型计算机系统2001. 5
    [31] Han J & Fu Y, Exploration of the power of attribute-oriented induction in data mining. In: Fayyad U Metaleds. Advance in Knowledge Discover and Data Mining. Cambridge: AAA/MIT Press, 1996, 399-421
    [32] J.Han, Y.Cai & N.Cercone, Knowledge discovery in databases: An attribute Oriented approach, In Proc. 18th Int. Conf. Very Larger Data Bases, page547-559, Vancouver, Canada, August 1992
    [33] Cheung D W, Fu A W C, Han J. Knowledge discovery in databases: a rule based attribute oriented approach. In: Zbigniew R ed. Methodologies for Intelligent system:8th International Symposium. Berlin: Springer-Verlag, 1994. 164-173
    
    
    [34] Gray J, Chaudhuri S, Bosworth. Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1997, 1(1) : 29-53
    [35] Harinarayan V, Rajaraman A, Ullman J D, Implementing data cube efficiently. In: Jagadish J V, Mumick friderpal Singh eds. Proceedings of ACM SIGMOD international Conference on Management of Data, New York: ACM Press, 1996. 205-216
    [36] 王珊等《数据仓库技术与联机分析处理》科学出版社2001
    [37] SAgarwal, RAgrawal, P.M.Deshpande, A.Gupta, J.RNaughton, R.Ramakrishnan & S.Sarawagi. On the computation of multidimensional aggregates, In Proc. 1996 Int. Conf. Very large Data Bases, pages 506-521, Bombay, India, Sept 1996
    [38] S.Chaudhuri and U.Dayal. An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 26:65-74,1997
    [39] Y.Zhao, P.M.Deshpande & J.RNaughton. An array_Based algorithm for simultaneous multidimensional aggregates, In Proc. 1997 ACM_SIGMOD Int. Conf. Management of Data. Pages 159-170, Tucson, Arizona, May 1997
    [40] H.J.Loether & D.G.Mctavish. Descriptive and Inferential Statistics: An Introduction. Allyn and Bacon, 1993
    [41] Jr.D.H.Freemaa Applied Categorical Data Analysis. Marcel Dekker, New Yak, 1987.
    [42] E.B.Hunt, J.Marin, and P.T.Stone. Experiments in Induction. New York: Academic Press, 1996
    [43] L.Breiman, J.Friedman, R.Olshen, and C.Stone. Classification and Regression Trees. Wadsworth International Group, 1984
    [44] J.Cheng, U.M.Fayyad, K.B.Irani& Z.Qian. Improved decision trees: a generalized version of ID3. In Proc. Fifth Int. Conf. Machine Learning, pages 100-107, San Mateo, California, 1988
    [45] J.R.Quinlan. Learning efficient classification procedures and their
    
    application to chess end_games. In Michalski et al, editor, Machine Learning: An Artificial Intelligence Approach, Vol.1, pages 463-482. Morgan Kaufrnann, 1993
    [46] J.R.Quinlan, Induction of decision trees. Machine Learning, 1:81-106,1996
    [47] J.R.Quinlan, C4. 5: Programs for Machine Learning. Morgan Kaufrnann, 1993
    [48] W.Y.Loh & N.Vanichsetakul, Tree_structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83:715-728,1988
    [49] M.Manago & Y.Kodratoff, Induction of decision trees from complex structured data. In G.Piatetsky-shapiro and W.J.Frawley, editors, Knowledge Discovery in Databases, pages 289-306. AAAI/MIT Press, 1991
    [50] 刘胜军,分类规则在线分析挖掘的研究 中国科学技术大学2000
    [51] M.Kamber, L.Winstone, W.Gong, S.Cheng & J.Han. Generalization and decision tree induction: Efficient classification in data mining. In Proc. 1997 Int Workshop Research Issues on Data Engineering (RIDE'97) , pages 111-120, Birmingham, England, April 1997
    [52] Brachman R J, Anand T. The Process of Knowledge Discovery in database : A Human-centered Approach. In: Advance In Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996
    [53] J.Han, S.Chee & J.Y.Chiang. Issues for on-line analytical mining of data warehouses. In Proc. Of 1998 SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'98) , Seattle, Washington, June 1998
    [54] 史东辉等,基于规则的分类数据离群挖掘方法研究,计算机研究与发 展,Vol.37,NO.9,1094-1100
    [55] http://research.microsoft.com/-dmax/winmine/tooldoc.htm

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700