基于混合智能系统的数据挖掘分类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着数据库应用的不断深化,数据库的规模急剧膨胀,人们需要对这些数据进行分析,从中发现有价值的信息。但是数据库管理系统本身却没有提供有效的工具和方法来利用这些数据,因此数据挖掘成为当今研究的热点。特别是其中的分类问题,由于其使用的广泛性,现已获得了越来越多的关注。本文即以混合智能系统为基础对数据挖掘中的分类问题进行系统研究。
     作者综述了国内外数据挖掘分类的研究现状和应用成果,深入分析了分类问题的基本理论,并介绍了经典算法,以及对各种分类方法进行了综合比较。在基本理论分析的基础上,作者构建了以认知心理学、模型集成理论为基础,集粗糙集理论、聚类理论、模糊逻辑理论、遗传算法理论、人工神经网络理论于一体的一个新的混合智能系统R-FC-DENN。它先通过粗糙集将输入数据进行约简,然后用聚类技术将简化后的数据进行聚类,对不同的聚类使用经过遗传算法改进了的神经网络进行训练,接着将这些经过不同神经网络训练的数据用模糊权值组合起来,放入新的用遗传算法改进了的神经网络再进行训练,从而完成分类的整个训练过程。
     最后,作者根据系统的各个子模块分别进行设计、实现,并在Matlab6.5环境下进行系统组装,开发了一个新的混合智能系统R-FC-DENN的工具箱——RFCDENNTool。并用UCI下的实际数据库对本文提出的混合智能系统R-FC-DENN的实用性进行了检验,得到了比较满意的结果。
With the deepening of the application of database, the size of database expands quickly, people need to analyse these data and find the worthy informations. But the database management systems do not provide available tools to analyse and use these data, so data mining appears and becomes the hotspot. Among this Classifacation is payed much attention for is's widely use by people. This thesis researchs the classification based on Hybrid Intelligent System.The author summarizes the researching actuality and the production of application, then analyses the standard theories . Based on the psychology's and aggregative model's theory analysis, the author proposes a new HIS—R-FC-DENN according to Rough Set、 Clustering theory 、 Fuzzy Logic 、 Genetic Algorithm and Artificial Neural Network.First, R-FC-DENN uses the Rough Set to reduced the data. And then it cluster the data used the Clustering theory. After that it uses different and improved ANN to train. Subsequently the data trained is fabricated by fuzzy power. Last the data is trained by another improved ANN and the whole process of training is completed.Last the author devises and achieves the R-FC-DENN according to every functional module. Then the HIS is fabricated under the surroundings of Matlab6.5.Thus the RFCDENNTool is developed. Subsequently the author uses the UCI's databases to prove the utility of the new HIS—R-FC-DENN and gets the satisfied answers.
引文
[1] Kdnuggets News. http://www.kdnuggets.com/. 2004
    [2] M. L. Gargan and B. G. Raggad. Data mining-a powerful information creating tool. OCLC Systems & Services. 1999, 15 (2) : 81-89
    [3] M. Minsky. Logic versus analogical or symbolic versus connectionist or neat versus scruffy. AI Magazine. 1991, 12 (2) : 35-51
    [4] International Joint Conferences on Artificial Intelligence. http://www.ijcai.org/past/default.htm. 2004
    [5] Association for Computing Machinery. http://www.acm.org/event. 2004
    [6] L. Breiman, J. Friendman, R. Olshen, and C. Stone. Classification and regression Trees. Monterey, CA: Wadsworth International Group, 1984
    [7] J. R. Quinlan. Induction of decision trees. Machine Learning, 1986 (1) : 81-106
    [8] J. C. Schlimmer and D. Fisher. A case study of incremental concept induction. In Proc. 5th Natl. Conf. Artificial Intelligence(AAAI'86). pages 496-501. San Mateo: Morgan Kaufmann 1986
    [9] P. E. Utgoff. An incremental ID3. In Proc. fifth Int. conf. Machine Learning, pages 107-120, San Mateo, CA, 1998
    [10] M. Methta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. In Proc. 1996 Int. Conf. Extend Database Technology(EDBT'96), Avignon, France, Mar. 1996
    [11] J. Shafer, R. Agrawal and M. Mthta. SPRINT: A scalable parallel classifier for data mining. In Proc. 1996 Int. Conf. Very large Data Base(VLDB'96), pages 544-555, Bombay, India, Sept. 1996
    [12] J. Gehrke, R. Ramakrishman and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Base(VLDB'98), pages 416-427, New York, Aug. 1998
    [13] J. R. Quinlan. Induction of decision tress. Machine Learning, 1986 (1) : 81-106
    [14] J. R. Quinlan. C4. 5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann, 1993
    [15] P. Clark and T. Niblett. The CN2 Induction Algorithm. Machine Learning, 1998 (3) : 261-283
    [16] J. Hong, I. Mozetic and R. S. Michalski. AQ 15: incremental leaming of attribute-based descriptions from examples, the method and user's guide. In Report ISG 85-5, UIUCDCS-F-86-949, Department of computer Science, University of Illinois at Urbana-champaign, 1986.
    [17] P. Smyth and R. M. Goodman. An information theoretic approach to rule induction. IEEE Trans. Knowledge and Data Engineering, 1992 (4) : 301-316
    [18] J. R. Quinlan. Leaming logic definition from relations. Machine Learning, 1990 (5) : 139-166
    [19] S. M. Weiss and N. Indurkhya. Predictive Data Mining. San Francisco: Morgan Kaufmann, 1998
    [20] M. Kamver, L. Winstone, W. Ggong, S. Cheng and J. Han. Generralization and decision tree induction: Efficient classification in data mining. In Proc. 1997 Int. Workshop Reseach issues on Data Engineering(RIDE'97), pages 111-120, Birming-ham, England, Apr. 1997
    [21] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami. An interval classifier for database mining applications. In Proc. 1992 Int. Conf. Very Large Data Base (VLDB'92), pages 560-573, Vancouver, Canada, Aug. 1992
    [22] G. H. John. Enhancements to the data mining Process. Ph. D. Thesis, Computer Science Dept. , Stanford University, 1997
    [23] D. Heckerman. Bayesian networks for knowledge discovery. In U. M. Fayyad, G. Piatesky-shapiro, P. Smyth and R. Uthurusamy, editors, Advances in Knowledge discovery and data mining, pages 273-305. Cambridge, MA: MIT Press, 1996
    [24] S. Russell and P. Norvig. Artifical Intelligence: A Modem Approach. Englewood cliffs. NJ: Prentice-Hall, 1995
    [25] F. V. Jensen. An Introduction to Bayesian Networks. New York: Springer Verlag, 1996
    [26] G. Cooper and E. Herskoits. A Bayesian method for the induction of probalistic networks from data. Machine Learning, 1992 (9) : 309-347
    [27] W. L. Butine. Operations for learning with graphical models. Journal of Artifical Intelligence Research, 1994 (2) : 159-225
    [28] S. L. Lauritzen. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 1995 (19) 191-201
    [29] W. S. McCulloch and W. H. Pitts. A logical calculus of the ideas immanent in neuron activity. Bulletion Mathematical Biophysics. Vol. 5, pages: 115-133, 1943
    [30] D. O. Hebb. The organization of Behavior. Wiley, New York, 1949
    [31] R. Rosenblatt. The perceptron: a perceiving and recognizing automation. Cornell Aeronautical Laboratory Report, 85-406-1, 1959
    [32] B. Widrow. An adaptive adaline neuron using chemical memistors. Standford Electronics Lavoratroy Technical Report, 1960
    [33] S. Grossberg. Some network that can learn, rember and reproduce any number of complicated spacetime pattern. Journal of Mathematics and Mechanics, 1968 (19) : 53-91
    [34] T. Kohonen. Self-organization and associative memory. Berlin: Springer, 1984
    [35] J. J. Hopfield. Neural networks and physical systems with emergent collective comptitational abilities. In: Proc. Natl. Acad. Sci. USA, 1982 (79) : 2554-2558
    [36] D. E. Rumelhart and J. L. McCelland. Parallel distributed processing. Cambridge, MA: MIT Press. 1986
    [37] S. J. Hanson and D. J. burr. Minkowski back-propagation: Learning in connectionist models with non-euclidean error signals.In Neural Information Processing Systemss, American Institute of Physics, 1988
    [38] S. Fahlman and C. Lebiere. The cascade-correlation learning algorithm. In Technical Report CMU-CS-90-100, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1990
    [39] Y. Le Cun, J. S. Denker and S. A. Solla. optimal brain damage. In D. Touretzky, editor, dvances in Neural Information Processing Systems, 2. San Mateo, CA: Morgan Kaufmann, 1990
    [40] Y. Chauvin and D. Rumelhart. Backpropagation: theory, Architetures, and applications. Hillsdale. NJ: Lawrence Erlvaum Assoc. , 1995
    [41] K. Saito and R. Nakano. Medical diagnostic expert system based on PDP mode. In Proc. IEEE international conf. on Neural Networks, Volume 1, pages 225-262. San Mateo, CA: 1988
    [42] S. I. Gallant. Neural nerwork Learning and Expert Systems. Cambridge, MA: MIT Press, 1993
    [43] S. Avner. Discovery of comprehensible symbolic rules in a neural network. In Intl. Symposiym on Intelligence in Neural and biological Systems, pages 64-67, 1995
    [44] S. Lawrence, C. L Giles and A. C. Tsoi. Symbolic conversion, grammatical inference and rule extraction for foreign exchange rate prediction. In Y. Abu-Mostafa, A. S. Weigend and P. N. Refenes, editors, Neural Networks in the capital Markets. Singapore: World Scientific, 1997
    [45] B. Lent, A. Swami and J. Widom. Clustering association rules. In Proc. 1997 Int. Conf. Data Engineering (ICDE'97), pages 220-231, Birmingham, England, Apr. 1997
    [46] G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc.1999 Int. Conf. Knowledge Discovery and data mining (KDD'99), pages 43-52, San Diego, CA, Aug. 1999
    [47] J. Li, G. Dong and K. Ramamohanrarao. Making use of the most expressive jumping emerging patterns for classification. In Proc. 2000 Pacific-Asia conf. Knowledge discovery and data mining(PAKDD'00), pages: 220-232, Kyto, Japan, Apr. 2000
    [48] X. Li, A. G. Yeh. Multitemporal SAR images for monitoring cultivation systems using case-based reasoning. Remote Sensing of Environment, 2004 (4) : 524-534
    [49] Nunez Hector, Sanchez-Marre Miquel and Cortes Ulises. A comparative study on the use of similarity measures in case-based reasoning to improve the classification of environmental system situations. Environmental Modelling and Software, 2004 (9) : 809-819
    [50] Yang Bo-Suk, Han Tian, Kim Yong-Su. Integration of ART-Kohonen neural network and case-based reasoning for intelligent fault diagnosis. Expert Systems with Applications, 2004 (3) : 387-395
    [51] D. R. Carvalho, A. A. Freitas. A hybrid decision tree/genetic algorithm method for data mining. Information Sciences, 2004 (3) : 13-35
    [52] L. H. Chiang, R. J. Pell. Genetic algorithms combined with discriminant analysis for key variable identification. Journal of Process Control, 2004 (2) : 143-155
    [53] A. A. Wieczorkowska, A. J. Czyzewski. Rough Set Based Automatic Classification of Musical Instrument Sounds. Electronic Notes in Theoretical Computer Science, 2003 (4) : 1-12
    [54] K. Dembczynski, R. Pindur, R. Susmaga. Dominance-based Rough Set Classifier without Induction of Decision Rules. Electronic Notes in Theoretical Computer Science, 2003 (4) : 1-12
    [55] D. W. Kim, K. H. Lee, D. Lee. Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognition Letters, 2004 (11) : 1263-1271
    [56] H. Ishibuchi, T. Yamamoto. Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets and Systems, 2004 (1) : 59-88
    [57] T. S. Lin, W. Y. Loh and Y. S. Shih. A comparison of prediction accuracy, complexity and training time of thirty-three old and new classification algorithms. Machine Learning, 2000 (39)
    [58] 钱学森,于景元,戴汝为.一个科学新领域——开放的复杂巨系统及其方法论.自然杂志,1990,13 (1):3-10
    [59] S. V. Allen. An Aggregate connectionist Approach for Discovery Association Rules. Ph. D. Thesis, Department of Computer Science and Engineering, Wright State University, 2003
    [60] P. Meesad. A Hybrid Intelligent System and its Application to Medical diagnosis. Ph. D. Thesis, Oklaborna State University, 2003
    [61] H. C. Lee. Simnet: A Neural Network architecture fro Pattern Recognition and Data Mining. Ph. D. Thesis, University of Missouri-Rolla, 2004
    [62] 陈京民等.数据仓库与数据挖掘技术.北京:电子工业出版社.2002:259-261
    [63] 朱世武,崔嵬等.数据挖掘运用的理论与技术.统计研究.2003 (8):45-49
    [64] J. W. Han, M. Kamber. 数据挖掘:概念与技术(范明,孟小峰译).北京:机械工业出版社.2001
    [65] 王长琼.基于混合智能的故障诊断与维修决策模型研究与实践.博士论文:武汉交通科技大学机械设计与理论.2003.10
    [66] 赵卫东,陈国华.粗集与神经网络集成技术研究.系统工程与电子技术.2002,24 (10):103-107
    [67] 行小帅,焦李成.数据挖掘的聚类算法.电子与系统学报.2003,8 (1):59-67
    [68] 陈国良,汪镇泉等.遗传算法及其应用.北京:人民邮电出版社.1999,6
    [69] 张乃荛,阎平凡.神经网络与模糊控制.北京:清华大学出版社.1998,10
    [70] 张文修,吴伟志等.粗糙集理论与方法.北京:科学出版社.2003,1
    [71] 罗健旭,张兆宁等.应用基于粗集的模糊神经网络进行软测量建模的研究.化工自动化及仪表 2003,30 (2):14-18
    [72] 刘增良.模糊技术与神经网络技术选编(5).北京:北京航空航天大学出版社.2001,1

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700