模糊决策树产生过程中参数的敏感性分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基于ID3算法的决策树归纳学习是归纳学习的一个重要分支,可用于知识的自动获取过程。随着归纳学习研究的深入,具有精确描述特征的示例学习已不能适应一个系统中不精确知识自动获取的要求,研究不确定环境中的示例学习已非常必要,进而产生了传统ID3算法的模糊推广——模糊ID3算法。在模糊决策树的产生过程中,用模糊熵选择的扩展属性不能像经典决策树那样将类清晰的分开,而是属性术语所覆盖的例子之间有一定的重叠,因此树的整个产生过程在给定的显著性水平α的基础上进行,参数α的引入能在一定程度上减少这种重叠,从而减少分类的不确定性,提高模糊决策树的分类结果。而它一般由领域专家根据经验或需要直接给出,这种人为的参与过分依赖于专家知识,从而可能使实际分类结果在规则数、准确率方面达不到最优。
     本文在Visual C++软件开发平台及模糊ID3算法的基础上,从解析的角度出发,通过分析参数α与模糊熵之间的函数关系式,讨论了随着α的增加,模糊熵函数的变化趋势,进一步分析了参数α对模糊决策树的分类结果在训练准确率、测试准确率、规则数等方面所表现出的敏感性,探讨了得到最优参数α的实验方法。实验证明,利用这一方法得到的最优参数α的值,可以使模糊决策树的分类结果达到最好的效果,从而为人们用模糊决策树进行分类时选取参数以获得最优的分类结果,提供了良好的理论依据。
Induction learning of decision tree based on ID3 algorithm is an important branch of inductive learning now, which can be used to automatic acquisition of knowledge. With the deeper research of inductive learning, it can't meet the automatic acquisition of non-crisp knowledge because of its crisp description. It appears to be very important to research inductive learning in uncertainty condition and therefore the fuzzy extension of traditional ID3-fuzzy ID3 is proposed. In building fuzzy decision tree, each expanded attribute can't classify the class label clearly like decision tree, but the cases covered with the attribute-values have some overlap. So the entire process of building trees is based on a significant level a, the import of a can reduce such overlap in some degree, decrease the uncertainty of classification and improve classification result. But the value of a is given directly by domain expert based on experience or requirement, which depend on expert's knowledge excessively, therefore do not gain the best classification result possibly.
    By analyzing expression between a and fuzzy entropy from the view of analytics, this paper analyses the relationship of between a and fuzzy entropy and the changing trend of fuzzy entropy function with the increase of a, then discusses the sensitivity of the parameter a to classification result such as total nodes, rule number, classification accuracy of fuzzy decision tree, proposes an experimental method of obtaining optimal a , It is proved by experiment that the optimal value a obtained by this method can make the classification result of fuzzy decision tree best, and therefore provides the academic evidence of selecting parameter a in order to gain the best classification result.
引文
[1] H.A.Simon. Why should machine learn? In R.S.Michalski, J.G.Carbonell & T.M.Mitchell (Eds.), Machine Learning: an artificial intelligence approach. San Mateo, CA: Morgan Kaufmann (1983).
    [2] J.R.Quinlan. Induction of Decision Trees. Machine Learning, 1986, (1):81-106
    [3] R.S.Michalski, I.Mozetic,et al. Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains. Proceedings of the Fifth National Conference on Artificial Intelligence, Philadelphia, PA: Morgan Kaufmann, 1986:1041-1045
    [4] J.R.Hong. AE1: An Extension Matrix Approximate Method for the General Covering Problem. Int.Journal of Computer and Information Science, 1985,14(6):421-437
    [5] P.Langley. BACON. 1: A General Discovery System. In Proceedings of the 2nd National Conference of Canadian Society for Computational Studies in Intelligence. 1987:173-180
    [6] R.S.Michalski, J.G.Carbonell and T.M.Mitchell(Eds.). Machine Learning: An Artificial Intelligence Approach. Vol. Ⅰ,Tioga, Palo Alto, CA, 1983:98-139
    [7] Cheng, J., Faggad, V.M. Lrani. K. B. and Qian, Z. Improved decision trees: A generalized version of ID3. Proceedings of the Fifth international conference on Machine Learning, San Mateo, CA:Morgan Kanfmann, 1988: pp100-108.
    [8] Utgoff, P. E. ID5: An incremental ID3. Proceedings of the Fifth International Conference on Machine Learning, San Mateo, CA: Morgan Kaufmann, 1988: pp108-120.
    [9] Utgoff, P. E. An improved algorithm for incremental induction of decision trees. Proceedings of the 11th International Conference on Machine Learning, New Brunswick, NJ: Morgan Kaufmann, 1994: pp166-173.
    [10] Buntine W. et al. A further comparison of splitting rules for decision tree induction. Machine Learning. 1992, Vol.82:75-85
    [11] Allan P.W. et al. Bias in information-based measures in decision tree induction.
    
    Machine Learning. 1994, Vol 15:pp321-329
    [12]W.Z.Liu et al. The importance of attribute selection measures in decision tree induction. Machine Learning. 1994, Vol. 15:pp25-41
    [13]Fayyad, U.M. What Should be minimized in a decision tree?. Proceedings of the 8th National Conference on Artificial Intelligence AAAI-90, Cambridge, MA: MIT Press, pp749-754.
    [14]洪家荣等.一种新的决策树归纳的学习算法.计算机学报.1995,Vol.18(6):pp470-474
    [15]Fayyad. U.M. On the handling of Continuous-valued attributes in decision tree generation. Machine Learning. 1992, Vol.8:87-102
    [16]Qing-Ren Wang et al. Analysis and design of a decision tree bused on entropy reduction and its application to large character set recognition. IEEE Transactions on pattern analysis and machine intelligence. 1984, Vol. 6(4): pp406-417
    [17]Tani, T. et al. Fuzzy modeling by ID3 algorithm and its application to prediction of heater outlet temperature. IEEE Transaction Conf. on Fuzzy Sets and Systems. San Diego CA, 1992: pp923-930.
    [18]Breiman L., Friedman J. H., Olshen R.A. and Stone C.J. Classification and Regression Trees. Wadworth International Group, 1984
    [19]Smyth P. and Goodman R.M.. Rule Induction Using Information Theory. Knowledge Discovery in Database, MIT Press, 1990
    [20]Press W.H., Teukolsky S.A. et al. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1988
    [21]Michie D. Personal Models of Rationality. Journal of Statistical Planning and Inference, 1989, (21):89-101
    [22]Shannon C.E. A Mathematical Theory of Communication, Bell. syst. teh. J. 1948, Vol.27: 1-65
    [23]Baim P.W. A Method for Attribute Selection in Inductive Learning Systems. IEEE Trans. On PAMI, 1988, (10):888-896
    [24]J.R.Quinlan(Ed).C4.5:Programs for Machine Learning. Ver.1, Morgan Kauffmann, San Marco, CA, 1993:170-247
    
    
    [25]J.R.Quinlan. Simplifying Decision Trees. International Journal of Man-Machine Studies, 1997,(27):221-234
    [26]J.Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning 3(1989) 319-342
    [27]Niblett T. Constructing Decision Trees in Noisy Domains. Proceedings of the second European Working Session on Learning, Bled, Yugoslavia: Sigma Press, 1987:67-78
    [28]Quinlan J.R. Learning Logical Definitions from Relations. Machine Learning, 1990, (5): 239-266
    [29]Michalski R. S. and Stepp R.E. Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1983, 5(4): 396-410
    [30]Utgoff, P. E. An improved algorithm for incremental induction of decision trees. T.Dietterich, Proceedings of the 11th International Conference on Machine Learning, New Brunswick, NJ: Morgan Kaufmann, 1994: 34-41.
    [31]Clark P. and Niblett N. The CN2 Inductive Algorithm. Machine Learning, 1989, (3): 261-283
    [32]Cestnik B., Kononenko I. and Bratko I.ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users. L. Stepp, Proc. Of the Second European Working Session on Learning, Bled, Yugoslavia: Sigma Press, 1987:31-45
    [33]Quinlan J.R. Decision trees at probabilistic classifier. Proc. 4th International workshop on machine learning. Morgan Kaufmann. Los Altos, CA, 1987: pp31-37.
    [34]Nakakuki Y, et al. Inductive learning in probabilistic domain. Proc. 8th National Conference on Artificial Intelligence, 1990:pp809-814
    [35]Zadel L.A. Fuzzy sets. Information and Control. 1965, Vol.8:pp338-353
    [36]K.J.Cios and L.M.Sztandera. Continuous ID3 algorithm with fuzzy entropy measures. Proc.IEEE Internat. Conf. On Fuzzy Systems(San Diego, CA:8-12 March 1992)469-476
    [37]T.Tani and M.Sakoda. Fuzzy modeling by ID3 algorithm and its application to
    
    prediction of heater outlet temperature. Proce. IEEE Internat. Conf. On Fuzzy Systems(San Diego,CA:8-12 March, 1992)923-930.
    [38]R.weber. Fuzzy-ID3: a class of methods for automatic knowledge acquisition. Proc. 2nd Internat. Conf. On Fuzzy Logic & Neural Networks(lizuka, Japan. 17-22 July 1992)265-268
    [39]M.Umano, H.Okamolo, I.Hatono, H.Tamura, F.Kawachi, S.Umezu and J.Kinoshita "Fuzzy Decision Trees by Fuzzy ID3 Algorithm and Its Application to Diagnosis System", Proceedings of Third IEEE International Conference on Fuzzy Systems, Vol.3, pp.2113-2118 (1994).
    [40]C.Z.Janikow, "Fuzzy Processing in Decision Trees," Proceeding of the International Symposium on Artificial Intelligence, pp 360-370 (1993)
    [41]Y.Yuan and M.J.Shaw, Induction of fuzzy decision trees, Fuzzy Sets Syst., vol.69, no.2, pp.125-139,1995
    [42]SAFAVIANSR, LANDGREBED. A survey of decision tree classifier methodology[J]. IEEE Trans On Systems, Man and Cybernetics, 1991, 21(3):660-674
    [43]张智星、孙春在、水谷英二,神经—模糊和软计算,西安交通大学出版社,2000
    [44]Z.米凯利维茨,演化程序——遗传算法和数据编码的结合,科学出版社2000
    [45]王福明、贺正辉、索瑾,应用数值计算方法,科学出版社,1992
    [46]S. R. Gunn. Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1997. (toolbox available at http://www.isis.ecs.soton.ac.uk/resources/svminfo/download.php)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700