决策树分类算法的研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
分类是数据挖掘领域研究的重要课题。常用的分类模型有决策树、神经网络、遗传算法、粗糙集等。本文主要研究决策树ID3算法及其改进算法。首先阐述了决策树的相关理论,并对几种典型的决策树算法进行了分析比较。然后,针对ID3算法存在的不足,提出了基于属性优先关联度的ID3算法(AID3),实验证明AID3算法加快了决策树的构建速度,同时也克服了ID3算法往往偏向于选择取值较多的属性的缺点,随着数据规模的增大,决策树的分类性能也越来越好。最后,探讨了AID3算法在人力资源管理中的实际应用,结果分析进一步表明AID3是有效的。
Classification is the important topic in the research field of data mining. There are many classification models such as decision tree, neural networks, genetic algorithms, rough sets, and so on. The thesis mainly research ID3 decision tree algorithm and its improved algorithm. First of all, the thesis introduced relative theories of decision tree, and compared several kinds of typical decision tree algorithms. Then a new algorithm based on attribute priority associate ID3(AID3) was proposed with advantages of ID3. The results of experiments proved that AID3 could raise the speed of constructing decision tree, at the same time, and overcame the ID3’s shortcoming which was often partial to select some attributes with more value. Furthermore, the performance of classification of decision tree was also getting better and better with the enlarging of dataset scale. At last, the thesis discussed the application of AID3 algorithm to human resources management, the results had proved that AID3 algorithm was effective.
引文
[1] Hu XiaoHua. Knowledge Discovery in Databases: An Atrribute-Oriented Rough Set Approach.Ph. D dissertation, Dept. of Computer Science,University of Rengina,1995
    [2] Inmon W H, Hackathorn R.Using the Data Warehouse.John Wiley & Sons,1994
    [3] Inmon W H.Building the Data Warehouse. QED Technical Publishing Group
    [4] 徐洁磐.数据仓库与决策支持系统[M].北京:2005,4
    [5] 邵峰晶,于忠清.数据挖掘原理与算法[M].2003,28~126
    [6] R.Groth,侯迪等译.数据挖掘-构筑企业竞争优势[M].西安:西安交通大学出版社,2001,146~186
    [7] 史忠植.知识发现[M].北京:清华大学出版社,2003,5~10
    [8] Quinlan J R. Discovering rules from large collections of examples:A case study. In: Michie D,ed. Expert Systems in the MicroElectronic Age, Edinburgh University Press,1979.
    [9] Porter B W, Baress E R, Holte R. Concept learning and heuristic classification in weak theory domains. Artificial Intellignece,1989,45(1/2):56~58
    [10] http://ardorsoft.bokee.com/2252518.html
    [11] 朱绍文等.决策树采掘技术及发展趋势[J].2002,28(8):77~78
    [12] Quilan J R. Simplifying Decision Trees Internat.Journal of Man-Machine Studies, 1987,27:221~234
    [13] 王峥琦.基于决策树算法的改进与应用:[硕士学位论文].西安:西安科技大学,2005
    [14] 李道国,苗夺谦,俞冰.决策树剪枝算法的研究与改进[J].计算机工程,2005,31(8):19~21
    [15] 张 晓 龙 ,骆 名 剑 .基 于 IF-THEN 规 则 的决 策 树 裁 剪 算 法 [J].计 算 机 应用,2005,25(9):1986~1988
    [16] 屈俊峰,朱莉,胡斌.两种决策树的事前修剪算法[J].计算机应用,2006,26(3):670~672
    [17] 王晓东.算法设计与分析[M].北京:清华大学出版社,2003,5~10
    [18] 龙际珍,任海叶,易华容.一种改进决策树算法的探讨[J].株洲师范高等专科学校学报,2006,11(2):64~66
    [19] J.R.Quinlan. C4.5: Programs for Machine Learning. San Mateo, CA:Morgan Kaufmann,1993
    [20] L.Breiman, L.Friedman and J.H.Olshen et al. Classification and Regression Trees[M]. Belmont,CA:Wadsworth International Group,1984
    [21] M.Mehta, R.Agrawal and J.Rissanen. SLIQ: A fast scalable classifier for data mining. In Proc.of the Fifth Int’l Conference on Extending Database Technology,Avignon, France,1996
    [22]J.Shafer, R.Agrawal and M.Mehta. SPRINT: A scable parallel classifier for data mining. Rearching report,IBM Almaden Research center,San Jose, California,1996
    [23] 胡江洪.基于决策树的分类算法研究:[硕士学位论文].武汉:武汉理工大学,2006
    [24] 胡智喜,唐学忠.基于信息增益法的决策树构造方法[J].计算机与现代化,2006(3):28~30
    [25]Han J,Kamber M.Data Ming:Concepts and Technoques[M].San Francisco:Morgan Kaufmann Publishers,2001
    [26] 韩松来,张辉,周华平.基于关联度函数的决策树分类算法[J].计算机应用,2005,25(11):2655~2657
    [27] 郭玉滨.一种基于离散度的决策树改进算法[J].山东师范大学学报(自然科学版),2006,21(3):129~131
    [28] 刘小虎,李生.决策树的优化算法[J].软件学报,1998,9(10)
    [29] Hong J R.AE1:an extension approximate method for general covering problem. International Journal of Computer and Information Science 1985,14(6):421~437
    [30] Tu Pei-lei,Chung jen-yao.A new decision-tree classification algorithm for machine learning.In proceeding of the 1992 IEEE International Conference on Tools for Artificial Intelligence.Arlington,VA,1992
    [31] 陈宝林.最优化理论与算法[M].北京:清华大学出版社,2001
    [32] RUEY-HSIA LI, INSTABILITY OF DECESION TREE CLASSIFICATION ALGORITHMS, University of Illinois at Urbana-Champaign,2001
    [33]Chris Drummond,Robert C.Holte, Exploiting the Cost(In)sensitivity of Decision Tree Splitting Criteria, School of Information Technology and Engineering, University of Ottawa,2000
    [34]University of California Irvine. UCI KDD Archive[DB/OL]. http://kdd.ics.uci.edu/2005-03-21
    [35] 刘鹏,姚正,尹俊杰.一种有效的 C4.5 改进模型[J].清华大学学报(自然科学版),2006,46(S1):996~1001
    [36] 刘鹏.一种健壮有效的决策树改进模型[J].计算机工程与应用,2005(33):172~175
    [37] 赵曙明.人力资源管理研究[M].北京:中国人民大学出版社,2002,90~100
    [38] 陈小颖.人力资源管理系统中数据挖掘技术的应用:[硕士学位论文].武汉:武汉理工大学,2006
    [39] 沈睿芳,时希杰,吴育华.基于数据仓库的数据预处理过程模型[J].计算机与数字工程,2005,33(9): 73~74
    [40] 刘明吉,王秀峰,黄亚楼.数据挖掘中的数据预处理[J].计算机科学,2000,27(4):54~57
    [41]Feng Zhao,Xiaoou Tang. Preprocessing and post-processing for Skeleton -based fingerprint minutiae extraction[J].Pattern Recognition,2007 (40):1270~1281
    [42] 俞文彬,谢康林,张忠能.基于属性分类的数据挖掘方法[J].小型微型计算机系统,2000,(3):1~7