一种启发式贝叶斯分类算法及其在铁路货运客户细分中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是从大量数据中发现潜在规律、提取有用知识的方法和技术。近年来,数据挖掘受到了普遍关注,已经成为信息系统和计算机科学领域中的研究热点之一。
     作为数据挖掘中的一种分类算法,贝叶斯网络是用来表示变量间连接概率的图形模型,它提供了一种自然的表示因果信息的方法,是目前不确定知识表示和推理领域中最有效的理论模型之一,在机器学习算法的设计和分析方面扮演着越来越重要的角色。
     本文全面介绍了贝叶斯网络的研究现状,重点分析了贝叶斯分类器的理论基础以及三种经典的贝叶斯分类器:朴素贝叶斯分类器、贝叶斯网络分类器和TAN分类器。在此基础上提出了一种启发式贝叶斯分类算法,该算法结合了K2搜索算法和TAN分类器的优点,并在一定程度上弥补了两者的不足。在TAN分类器构建最大权重跨度树的过程中确定出边的次序,再依据一定的规则为节点排序,最后由K2搜索算法构建贝叶斯网络结构。实验结果表明,该启发式贝叶斯分类算法的网络结构更加合理,分类准确度更高。
     鉴于数据挖掘技术在客户关系管理中日益广泛的应用,本文还提出了一种铁路货运客户细分方案,即利用数据挖掘中的聚类和分类技术对铁路货票库中的海量数据所蕴藏的信息进行挖掘,首先利用聚类方法对货运历史数据进行聚类分析,然后依据聚类结果用贝叶斯分类器对新客户分类。该客户细分方法可以为铁路货运营销部门提供决策依据,从而提高铁路企业的客户关系管理和决策水平。
     另外,本文在深入研究贝叶斯分类算法的基础上,并结合铁路货运客户细分的实际需要,开发了一个贝叶斯分类算法软件平台,可以作为一个通用的数据挖掘平台应用于相关领域。
Data mining is a technology which can discover underlying rules and extract useful knowledge. In recent years, data mining has attracted widely attention and became one of hotspots in the research of information system and computer science.
     As a classification algorithm in data mining, Bayesian network is a graphical model which can express the probabilities between the variables. It is one of the most effective models in the field of uncertain knowledge and is playing an important role in design and analysis aspects of machine learning algorithms.
     The research environment of Bayesian network is roundly introduced in this paper, furthermore, the theoretical foundation of Bayesian classifier and three classical Bayesian classification algorithms are typically analyzed, which are naive Bayes classifier, Bayesian network classifier and TAN classifier. Based on this, a heuristic Bayesian classification algorithm is proposed which combines the merits of K2 algorithm and TAN classifier, and gets rid of their defects. Edge order can be fixed in the procedure of constructing maximum weighted spanning tree in TAN, and then nodes order can be fixed according to certain rules, at last K2 algorithm is used to construct Bayesian network. The experiment result shows that network structure of this algorithm is more reasonable and it has higher classification precision.
     Based on the broadly application of data mining in CRM, a scheme of railway freight customer segmentation is also proposed in this paper, that is, using the clustering and classification of data mining to mine the information hided in the mass data of railway waybill database. First, the historical freight data is analyzed with clustering method, and then the new customer can be classified with Bayesian classifier according to the previous result. This customer segmentation method could support the marketing department's decision-making and improve the CRM level of railway enterprise.
     In addition, based on the in-depth research on Bayesian classification, Bayesian algorithms software is developed in need of railway freight customer segmentation, and as a universal data mining platform, it could be applied in relative fields.
引文
[1]Jiawei Han,Micheline Kamber.数据挖掘:概念与技术[M].北京.机械工业出版社.2001.8-25.
    [2]邵峰晶,于忠清.数据挖掘原理与算法[M].北京.中国水利水电出版社.2003.1-3,68-70,126-128,197-221.
    [3]杨路明,巫宁.客户关系管理理论与实务[M].北京.电子工业出版社.2005.203-220.
    [4]Joan L.Anderson,Laura D.Jolly,Ann E.Fairhurst.Customer relationship management in retailing:A content analysis of retail trade journals[J].Journal of Retailing and Consumer Services.2007,14(6).394-399.
    [5]郭玉华.强化服务意识 创新服务手段实现铁路货运服务的历史新突破[J].铁道货运.2006.20(24).4-6.
    [6]史忠植.知识发现[M].北京.清华大学出版社.2002.169-202.
    [7]王利民.贝叶斯学习理论中若干问题的研究[学位论文].吉林大学.2005.1-3,21-23.
    [8]黄友平.贝叶斯网络研究[学位论文].中国科学院研究生院.2005.1-24.
    [9]Gregory F.Cooper,Edward Herskovits.A Bayesian Method for the Induction of Probabilistic Networks from Data[J].Machine Learning.1992.9.309-347.
    [10]R.R.Bouckaert.A Stratified Simulation Scheme for Inference in Bayesian Belief Networks[J].In University in AI,Proceedings of the Tenth Conference.1994.110-117.
    [11]W.Lam,F.Bacchus.Learning Bayesian Belief Networks:an Approach Based on the MDL Principle[J].Computational Intelligence.1994.10(4).269-293.
    [12]D.M.Chickering,D.Geiger.Learning Bayesian Networks:Search Methods and Experimental Results[J].In Proceeding of Fifth Conference on Artificial Intelligence and Statistics.1995.112-128.
    [13]P.Larranaga.Structure Learning of Bayesian Networks by Genetic Algorithms:a Performance Analysis of Control Parameters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.1996.18(9).912-926.
    [14]P.Larranaga.Search for the Best Ordering in the Structure Learning of Bayesian Networks [J].IEEE Transactions on System,Man and Cybernetics.1996.26(4).487-493.
    [15]Nir Friedman,Dan Geiger,Moises Goldszmidt.Bayesian Network Classifiers[J].Machine Learning.1997.29(2-3).131-163.
    [16]Nir Friedman.The Bayesian Structural EM Algorithm[J].Uncertainty in Artificial Intelligence.1998.129-138.
    [17]J.Cheng,R.Greiner.Learning Bayesian Networks from Data:an Information Theory Based Approach[J].Artificial Intelligence.2002.137(1-2).43-90.
    [18]石洪波,王志海,黄厚宽,励晓健.一种限定性的双层贝叶斯分类模型[J].软件学报.2004.15(2).193-199.
    [19]李旭升,郭耀煌.一种新颖混合贝叶斯分类模型研究[J].计算机科学.2006.33(9).135-139.
    [20]马小宁.基于决策树的轨道不平顺数据分析[学位论文].北京交通大学.2005.10-14.
    [21]朱明.数据挖掘[M].合肥.中国科技大学出版社.2002.83-86.
    [22]梁循.数据挖掘算法与应用[M].北京.北京大学出版社.2006.164-175.
    [23]盛骤,谢式千,潘承毅.概率论与数理统计[M].北京.高等教育出版社.2004.18-26.
    [24]Tom M.Mitchell.机器学习[M].北京.机械工业出版社.2003.112-136.
    [25]Zhang Harry.The optimality of Naive Bayes[J].In Proceedings of the 7th International Florida Artificial Intelligence Research Society Conference.2004.2.562-567.
    [26]Cheng J,Greiner R.Comparing Bayesian Network Classifiers[J].In Proceedings of the 15~(th)Conference on Uncertainty in Artificial Intelligence.San Francisco,Morgan Kaufmann Publishers.1999.101-108.
    [27]Abdelaziz Ouali,Amar Ramdane Cherif,Marie-Odile Krebs.Data Mining Based Bayesian Networks for Best Classification[J].Computational Statistics & Data Analysis.2006.51(2).1278-1292.
    [28]Pang-Ning Tan,Michael Steinbach.数据挖掘导论[M].北京.人民邮电出版社.2006.139-150.
    [29]Lan H.Witten,Eibe Frank.数据挖掘实用机器学习技术[M].第2版.北京.机械工业出版社.2006.56-65,180-188.
    [30]J.Rissanen.Stochastic Complexity in Statistical Inquiry[J].World Scientific.Singapore.2004.
    [31]Rernco R.Bouckaert.Bayesian Network Classifiers in Weka[EB/OL].http://www.cs.waikato.ac.nz/-ml/publications.html.2004-9-1.
    [32]Pearl J.Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference[J].San Matzo,California,Morgan Kaufmann.1988.117-133.
    [33]陆君安.信息论基础[M].第二版.武汉.武汉大学出版社.2006.14-53.
    [34]C.K.Chow,C.N.Liu.Approximating Discrete Probability Distributions with Dependence Trees[J].IEEE Transactions on Information Theory.1968.14(3).462-467.
    [35]石洪波,黄厚宽,景丽萍.改进的TAN构造算法及其文本分类[J].太原师范学院学报(自然科学版).2002.1(1).20-26.
    [36]Sahami,M.Learning Limited Dependence Bayesian Classifiers[J].In Proceedings of the 2~(nd)International Conference on Knowledge Discovery and Data Mining.2000.335-338.
    [37]胡运权.运筹学教程[M].第二版.北京.清华大学出版社.2003.252-258.
    [38]严蔚敏,吴伟民.数据结构(C语言版)[M].北京.清华大学出版社.2005.173-176.
    [39]http://archive.ics.uci.edu/ml/
    [40]Fayyad U M,Irani K B.Multi-interval Discretization of Continuous-valued Attributes for Classification Learning[J].Proceedings of the 13th International Joint Conference on Artificial Intelligence C.San Francisco.Morgan Kaufmann.1993.1022-1027.
    [41]马颖.客户分类管理法[J].山东冶金.2005.27(4).61-62.
    [42]范英,张忠能,凌君逸.聚类方法在通信行业客户细分中的应用[J].计算机工程.2004.30,增刊.440-441,448.
    [43]谢寰红.数据挖掘在证券公司CRM客户细分中的应用[J].计算机工程.2004.30,增刊.553-554,585.
    [44]石英,陈治亚,雷定猷.铁路货运分析系统的设计与应用[J].铁道运输与经济.2007.29(1).41-42.
    [45]时云凤.货运业务[M].北京.中国铁道出版社.2005.1-19,48-49,88-123.
    [46]雷定猷,陈治亚.铁路货票信息的数据挖掘[J].中国铁道科学.2003.24(4).45-48.
    [47]钟雁,郭雨松.数据挖掘技术在铁路货运客户细分中的应用[J].北京交通大学学报.2008.32(3).58-63.
    [48]胡少东.客户细分方法探析[J].工业技术经济.2005.24(7).66-69.
    [49]李春宏.基于数据挖掘方法的中小型企业客户细分的案例研究[J].云南师范大学学报.2007.27(3).15-17.
    [50]任家聪.铁路运输经济[M].北京.中国铁道出版社.1990.209-216.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700