基于人才认知的数据挖掘研究

作者：朱红
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 知识发现 ; 数据预处理 ; 聚类分析 ; 回归分析
英文关键词：Data Mining ; Knowledge Discovery in Databases (KDD) ; Pretreament of Data ; Clustering Analysis ; Regression Analysis
学位年度：2002
导师：王清心 ; 胡建华
学科代码：081203
学位授予单位：昆明理工大学
论文提交日期：2002-03-01

摘要

随着计算机技术，特别是数据库技术的发展，在人才市场上积累了大量的人才数据。如何发现隐含在这些数据中的规则和知识，并辅助决策，成了亟待解决的问题。数据挖掘技术的出现和发展为此提供了有力支持。
数据挖掘就是从大量的、不完备的数据中，提取出事先未知的、但具有价值的信息和知识的过程。本文在对数据挖掘技术的理论研究基础上，描述了该技术在人才认知系统中的应用。主要阐明了人才认知系统在数据预处理的前提下，如何运用改进的聚类方法，对人才库进行合理、高效的聚类，然后在其结果簇上进行回归分析，从而得到各类人才能力的评价标准。其中，改进的聚类算法，在聚类的合理性、高效性和精确性等方面都有显著的提高。
With the development of the computer technology, especially the database technology, lots of talents data have been accumulated in the talents markets. How to discover the rules and knowledge hiding in these data so as to provide the assistant decision support has become an urgent problem to be solved. The appearance and development of the Data Mining technology has provided powerful support to this need.
Data Mining is the period of picking up unknown, but valuable information and knowledge from large amount of incomplete data. Based on the theoretical research of Data Mining technology, the application of data mining technology in the talents cognition system was stated in this paper. On the basis of the pretreatment of data, exert the improved method of the clustering analysis to obtain more reasonable and high efficiency result of classification. Then on the basis of the clustering analysis, the initial regression analysis of talents data has been made and qualitative analysis result of the talents abilities has been obtained. The clustering algorithm of improvement has raising of notable at the aspect of reasonable, high efficiency and accurate nature of clustering.

引文

[1] Branchman R, Avand T. The process of Knowiedge Discovery in Databases: A Human Centered Approach in AKDDM, AAAI/MIT, 1996． 37-58．
    [2] Fayyad U M. Data mining and Knowledge Discovery: making sense out of data. Microsoft Research, IEEE Expert, Oct.1996． 20-25．
    [3] Jone G H. Enhancements to the Data mining process, PhD. Dissertation, Stanford University, 1997． Charpter 1-2．
    [4] Fayyad U M, Piatetsky-Shapiro G, Smyth P. From Data mining to Knowledge Discovery : An Overview. In: Fayyad U. M. , G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining (AKDDM), AAAI/MIT Press, 1996．
    [5] Fayyad U M, Piatetsky-Shapiro G, Smyth P. Knowledge Discovery and Data mining : Towards a Unifying Framework. In: Proc. of the 2nd Int Conf. on Knowledge Discovery and Data Mining(KDD-96) , Porland, Oregon, Aug, 1996．
    [6] Reality Check for Data Mining. Evangelos Simoudis, IBM Almadan Research Center. IEEE Epert, Oct, 1996． 26-30．
    [7] Hu Xiaohua. Knoledge Discovery in Database: An Attribute-Oriented Rough Set Approach: [PhD. dissertation] . University of Regina, 1995． 10-11．
    [8] 史忠植，高级人工智能。科学出版社, 1998． 180-190．
    [9] 王克宏，汤志忠，胡蓬，知识工程与知识处理系统。清华大学出版社，1994． 34-50．
    [10] Agrawal R., Imielinski T., Swami A. . Mining Association Rules between Sets of Items in Large Databases. In Proc. ACM SIGMOD, May 1993． 207-216．
    [11] Agrawal R., Srikant R. . Fast Algorithms for Mining Association Rules in Large Databases. In Proc. 20th Int' 1 Conf. Very Large Databases. Sept 1994． 478-499．
    [12] J. S. Park, M. S. Chen, P. S. Yu. An effective hash-based algorithms for discovering association rules. In Proc. 1995 ACM SIGMODE Int. Conf. Management of Data. May 1995． 175-186．
    [13] Brin S., Montwani R.,Ullman J. D., Tsur S. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proc. Of the

    ACM SIGMODE Conference on Management of Data. 1997．
    [14] Srikant R., Agrawal R. . Mining Generalized Association Rules. In Proc. 21th Int' 1 Conf. Very Large Databeses. Sept 1995． 432-444．
    [15] Han J., Fu Y. . Discovery of Multiple-level Association Rules from Large Databases. In Proc. 21th Int'l Conf. Very Large Databases. Sept 1995． 420-431．
    [16] R. Agrawal, R. Srikant Mining Sequential Patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE'95) . Taipei, Taiwan, Mar 1995． 3-14．
    [17] R. Srikant, R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96) . Montreal, Canada, June 1996． 1-12．
    [18] J. Han, J. Pei, B. Mortazavi,-Asl, Q. Chen, U. Dayal, M. C. Hsu. Freespan: Frequent Pattern-projected Sequential Pattern Mining. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining(KDD'00) . Boston, MA, Aug 2000．
    [19] Pawlak Z. Rough Set International Journal of Information and Computer Science, Vol 11,1982． 341-356．
    [20] Pawlak Z. Rough Set-theoretical Aspects of Reasoning about Data. Dordrecht: Kluwer Academic Publishers, 1991．
    [21] Pawlak Z, et al. Rough Sets. Communications of ACM, Vol 11, 1995． 89-95．
    [22] L. Kaufman, P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990．
    [23] S. Guha, R. Rastogi, K. Shim. Cure: An Efficient Clustering Algorithm for Large Databases. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98) . Seattle, WA, June 1998． 73-84．
    [24] G Karypis, E.-H. Han, V. Kumar. Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER,32,1999． 68-75．
    [25] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'96) . Montreal, Canada, June 1996． 103-114．
    [26] M. Ester, H. P. Kriegel, J. Sander, X. Xu. A density-based Algorithm for Discovering Clusters in Large Spatial Databases, In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining(KDD'96) . Portland, OR,

    Aug 1996． 226-231．
    [27] M. Ankerst, M. Breuning, H. P. Kriegel, J. Sander. OPTICS: Ordering Points to Identify the Clustering Structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'99) . Philadelphia, PA, June 1999． 49-60．
    [28] A. Hinneburg, D. A. Keim. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining(KDD'98) . New York, Aug 1998． 58-65．
    [29] W. Wang, J. Yang, R.Muntz. STING: A Statistical Information Grid Approach to Spatial Data Mining. In Proc. 1997 Int. Conf. Very Large Data Bases(VLDB'97) . Athens, Greece, Aug 1997． 186-195．
    [30] G Sheikholeslami, S. Chatterjee, A. Zhang.WaveCluster: A Multiresolution Clustering Approach for Very Large Spatial Databases. In Proc. 1998 Int. Conf. Very Large Data Bases(VLDB'98) . New York, Aug 1998． 428-439．
    [31] R. Agrawal, J. Gehrke, D.Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimentional Data for Data Minging Application. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98) . Seattle, WA, June 1998． 94-105．
    [32] P. Cheeseman, J. Stutz. Bayesian Classification (AutoClass): Thoery and Results. In Advances in Knowledge Discovery and Data Mining. Cambridge, MA: AAAI/MIT Press, 1996． 153-180．
    [33] D. E. Rumelhart, D. Zipser. Feature Discovery by Compititive Learning. In Cognitive Science,Vol 9,1985． 75-112．
    [34] T. Kohonen. Self-organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, Vol 43,1982． 59-69．
    [35] Seber, G A. F. . Linear Regression Analysis. Wiley, New York. 1977．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700