数据资源聚类预处理及其应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
水呵水,到处都是水,船上的甲板却在干涸;
     水呵水,到处都是水,却没有一滴能解我焦渴。
     数据呵数据,到处都是数据,各类用户却在迷茫;
     数据呵数据,到处都是数据,却没有任何提示能帮我决策。
     美国前副总统Al Gore在1998年1月31日所做《数字地球:二十一世纪认识我们的星球》~([Gorel 998])的演讲中指出:一场新的技术革新浪潮正允许我们能够获取、储存、处理并显示有关地球的空前浩瀚的数据以及广泛而又多样的环境和文化数据信息,而充分利用这些浩瀚数据的困难之处在于把这些数据变得有意义——即把原始数据变成可理解的信息。今天,我们经常发现我们拥有很多数据,却不知如何处置。现在,我们贪婪地渴求知识,而大量的资料却闲置一边,无人问津。
     没有物质,就什么都不存在;没有能源,就什么都不会发生:没有信息,就什么都没有意义~([Oet1965])。作为三大资源之一的信息,对于我们的生活越来越具有深远的影响。面对如此丰富、繁杂的数据,如何才能从中提取有价值的信启、和知识,由此诞生了一个新的研究方向:基于数据库的知识发现KDD(Knowledge Discoveryin Database)以及相关的数据挖掘DM(Data Mining)理论和技术。
     数据资源(Data Resource)作为信息领域基本的研究对象,是从资源的角度对数据及其本身所存在的状态给予的重新认识与高度概括。综合利用各类有效的KDD和DM技术来提高数据资源本身的质量、增强数据对象的利用效率成为数据资源有效开发利用的主要研究方向。数据资源的预处理作为KDD和DM过程的重要环节,聚类分析作为KDD和DM领域成熟的技术,这两者相结合的研究具有重要的探讨意义和应用价值。
     本文将聚类分析引入数据资源的预处理,进行了多方面的研究,取得以下主要成果:
     1.借鉴分裂型层次化聚类方式,分别从平面、立面、空间等三个层次综合构建基于层次分析法的数据库聚类预处理DCP-AHP方法,突出运用层次化思维来迭代评估目标,剔除相异度高的数据对象集合,达到聚类清理数据对象集合的目的,减少定性问题定量化后误差的影响。
     2.按照相关性最小原则,提出数据库主成份提取的聚类预处理DCP-PCE方法进行高维数据系统的降维处理,获得数据对象变异最大方向的投影作为特定数据对象集合中的各个主成份,实现分层次的主成份聚类提取;同时DCP-PCE方法也验证了主成份对于原有信息全面覆盖的特性,同步解决了综合变量覆盖和降维问题,降低了数据对象集合的相异度和维度,实现了数据对象集合的聚类归约。
     3.利用数据对象的物理存储属性本身所具有的“0、1”特性,针对同体不同源数据对象SEDS提出同体不同源数据对象聚类数化NC-SEDS算法,将数据资源中所有数据对象都通过数据对象预处理的过程转换成数字状态,然后利用数化后数据对象的数字状态作为聚合归类的依据,在不考虑数据对象其他属性的情况下,提高同体不同源数据对象SEDS的凝聚程度,达到降低比较次数、总体执行时间的目的,实现数据对象的聚类集成。
     4.为了贯彻“复杂问题求解”的思想,提出了基于本体核与直方图的聚类预处理CPOKH方法。在对数据对象进行聚类预处理时,首先得到弱量本体核的客体数据频数,然后根据用户明确的需求信息,获得所有需要的弱量本体核,并将其结合成强量本体核,最后通过“直方图”的构建与分析,明确数据对象的相关类属。
     5.借鉴“能量”与“碰撞”的基本理念,以数据资源预处理得到的数据对象类或簇作为主要研究对象,构建了基于能量的“有效”动态阈值,实现了基于能量碰撞的聚类优化COEH策略;对已经具备聚类初步特征的数据空间进行用户主题需求的能量驱动,把聚类内部的数据对象与孤立点数据对象放在统一的认识平台中加以统筹处理,保证了数据对象的聚类优化。
     同时,作为理论成果的应用研究,本文选择了高校教育评估体系作为应用研究对象,将聚类分析技术引入高校数据资源的预处理环节,给出了应用实例,为有效利用现有数据资源,理性分析高校各方面工作的成效,深入探索学生培养的模式提供了有效的分析方法。
A new wave of technological innovation is allowing us to capture, store, process and display an unprecedented amount of information about our planet and a wide variety of environmental and cultural phenomena. The hard part of taking advantage of this flood of geospatial information will be making sense of it. - turning raw data into understandable information. Today, we often find that we have more information than we know what to do with. Now we have an insatiable hunger for knowledge. Yet a great deal of data remains unused. (The Digital Earth: Understanding Our Planet in the 21st Century[O~re1998], by U. S. Former Vice President A1 Gore, on January 31, 1998.)
     Without materials, nothing exists. Without energy, nothing happens. Without information, nothing makes sense[~et1965]. As one of three resource(materials, energy and information), information brings more and more important influence on our life. For the wide availability of huge amounts of data and imminent need for turning such data into useful information and knowledge, Knowledge Discover in Database(KDD) and Data Mining(DM) have come into being attracted a great deal of attention.
     Being the fundamental object of information field, Data Resource can be the cognition and recapitulation of data and its statement on resource. With the effective utilization of KDD and DM, improving the quality on data resource and strengthening the efficiency on data object has naturally become the main target. Preprocessing of data resource is the necessary stage of KDD and DM, as also clustering analysis is the perfect technique on KDD and DM. Therefore, Research on preprocessing of data resource with clustering analysis has the significance on practice and discussion.
     In the dissertation, some discussion on clustering preprocessing of data resource has carried out and the main research results are as following.
     Firstly, according to the divisive hierarchical clustering, a method of Database Cluster Preprocessing on Analytic Hierarchy Process(DCP-AHP) is constructed. Standing on the plane, section and space, DCP-AHP emphasizes the hierarchy on the target. With the DCP-AHP, the data object sets with the higher dissimilarity can be ignored, clustering cleaning on the data object sets can be achieved, and the error from qualitative analysis to quantitative analysis can be reduced.
     Secondly', according to the lowest relativity of the data object, a method of Database Cluster Preprocessing on Principal Component Extraction(DCP-PCE) is submitted to carry out the clustering'extraction of principal component by hierarchical analysis. The projection on the most differentiation of the data object is defined as principal component, which can be proved to include all the original information of the data object sets. By the DCP-PCE, integrality of information and lower dimension of principal component are solved synchronously, dissimilarity.and dimension of the data object sets are decreased, and clustering reduction of the data object sets are reached.
     Thirdly, making use of the characteristic "0" and "1", which is the physics storage attribute of the data object, an algorithm of Numerical Cluster on Same Entity from Different Sources(NC-SEDS) is put forward to turn all the data object into numerical statement. Not considering other attribute of the data object, the numerical statement will be known as the basis of clustering to improve the clustering state of SEDS. Through the exercise of method, the times of comparison among the data object is played down, the executing time is dropped off and the clustering integration is taken.
     Fourthly, following out the "complicated problem's solution", a method of Cluster Preprocessing on Ontic Kernel and Histogram(CPOKH) is brought forward to cluster preprocessing of the data object. In the method, the Weak Ontic Kernel(WOK) comes from Object Data Time by the user's demands, and will be combined into Strong Ontic Kernel(SOK). Based on the SOK, the histogram will be made up to analyze and detect the clustering on material ascription of the data object.
     Fifthly, refer to the distillation of "energy" and "hit", a strategy of Clustering Optimized by Energy Hit(COEH) is taken to make the valid dynamic threshold among the cluster by energy. With the function of COEH, energy driven about the user's demands will be brought into effect in all data-space, and all the data object, including the outlier, is planed as a whole at the unified cognition platform. Therefore, the clustering optimized of the data object can be ensured on the unification and overall environment.
     Finally, an evaluation system on colleges and universities education is confirmed as the application research in practice. All the work in the dissertation is verified by the real experiments. By leading the clustering analysis into the preprocessing of data resource on the colleges and universities, it is possible to discuss the effect on all fields validly, in particular the gain and loss about the student training.
引文
[ABKS 1999] M. Ankerst, M. M. Breunig, H. -P. Kriegel, J. Sander. OPTICS. Ordering Points to Identify the Clustering Structure[A]. In: A. Delis, C. Faloutsos, S. Ghandeharizadeh, eds. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data[C]. Philadelphia: ACM Press. 1999: 49~60.
    [AGGRI998] Rakesh Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[A]. In: L.M. Haas, A. Tiwary, eds. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press. 1998: 94~105.
    [Ago2000] Lou Agosta. The Essential Guide to Data Warehousing[M]. Prentice-Hall, Inc. 2000.
    [AS1994] Rakesh Agrawal, Ramakrishnan Srikanl. Fast Algorithms for Mining Association Rules in Large Databases[A]. In: J. B. Bocca, M. Jarke, C. Zaniolo, eds. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94)[C]. Santiago: Morgan Kaufmann. 1994: 487~499.
    [Bao2005] Tu H. O. Bao. Knowledge Discovery and Data Mining Techniques and Practice[EB/OL]. hrtp://www.netnam.vn/unescocourse/knowlegde/3-5.htm. 2005.
    [BB1999] Andrea Baraldi, Palma Blonda. A Survey of Fuzzy Clustering Algorithms for Pattern Recognition[J]. IEEE Transactions on Systems, Man and Cybernetics, Part B(Cybernetics), 1999, 29: 786-801.
    [BF1996] Dominique Bicout. Martin Field. Quantum Mechanical Simulation Methods For Studying Biological System[M]. New York: Springer-Verlag. 1996.
    [BL2000] Michael J. A. Berry, Gordon S. Linoff. Mastering Data Mining. The Art and Science of Customer Relationship Management[M]. New York: John Wiley&Sons, Inc.. 2000.
    [BGG+1999] Daniel Boley, Maria Gini, Robert Gross, Eui-Hong Hart, Kyle Hastings, George Karypis, Vipin Kumar. Bamshad Mobasher. Jerome Moore. Partitioning-Based Clustering for Web Document Categorization[J]. Decision Support Systems, 1999, 27(3): 329~341.
    [BGRS1999] Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft. When Is "Nearest Neighbor" Meaningful?[A]. In: C. Beeri, P. Buneman, eds. Proceedings of the 7th International Conference on Database Theory (ICDT'99)[C]. Jerusalem: Springer. 1999: 217~235.
    [Bur1969] Burke P.G.. Theory of Electron Atom Collisions[A]. In: Proceedings of the First International Conference on Atomic Physics[C]. New York: Plenum Press. 1969: 265-278.
    [CJB 1999] Balakrishnan Chandrasekaran. John R. Josephson, V. Richard Benjamins. What Are Ontologies, and Why Do We Need Them[J]. IEEE Intelligent Systems, 1999, 14(1): 20~26.
    [Col1999] Samuel Taylor Coleridge. The Rime of the Ancient Mariner[EB/OL]. http://etext.virginia.edu/stc/Coleridge/poems/Rime_Ancient_Mariner.html. 1999.
    [CS1996] Peter Cheeseman, John Stutz. Bayesian Classification (AutoClass). Theory and Resuhs[A]. In: Usama M. Fayyad, Gregory, Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy, eds. Advances in Knowledge Discover), and Data Mining[C]. Menlo Park, California: AAAIPress. 1996: 153~180.
    [DC1994] M.W. Du, S. C. Chang. An Approach to Designing Very Fast Approximate String Matching Algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 1994, 6(4): 620~633.
    [DLR1977] Arthur Dempster, Nan Laird, Donald Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm[J]. Journal of the Royal Statistical Society, Series B. 1977, 39(1): 1~38.
    [EKS+1998] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu. Incremental Clustering for Mining in a Data Warehousing Environment[A]. In: A. Gupta, O. Shmueli. J. Widom, eds. Proceedings of the 24th International Conference on Very Large Data Bases[C]. New York: Morgan Kaufmann. 1998: 323~333.
    [EKSX1996] Martin Ester, Hans-Peter Kriegel, Jorg. Sander, Xiaowei Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[A]. In: E. Simoudis, J. W. Han, U. M. Fayyad, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C]. Portland: AAAI Press. 1996: 226~231.
    [EV2003] Mohamed G. Elfeky, Vassilios S. Verykios. On Search Enhancement of the Record Linkage Process[A]. In: Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation[C]. Washington, DC. 2003: 31~33.
    [Fas1999] Daniel Fasulo. An Analysis of Recent Work on Clustering Algorithms[EB/OL]. http://www.cs.washington.edu/homes/dfasulo/ciustering.ps. 1999.
    [FPS 1996a] Usama M. Fayyad; Gregory. Piatetsky-Shapiro, Padhraic Smyth. The KDD Process for Extracting Useful Knowledge from Volumes qf Data[J]. Communications of the ACM, 1996.39(11): 27~34
    [FPS1996b] Usama M. Fayyad. Oregory Piatetsky-Shapiro, Padhraic Smyth. From Dam Mining to Knowledge Discovery An Overview[A]. In: Usama M. Fayyad. Oregory Piatetsky-Shapiro, Padhraic Smvth. Ramasamx Uthurusamy, eds. Advances in Knowledge Discovery and Data Mining[C]. Menlo Park. California: AAAI Press. 1996: 1~30.
    [FPS 1996c] Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth. From Data Mining to Knowledge Discovery in Databases[J]. Al Magazine, 1996. 17(3): 37~54.
    [FU1996] Usama M. Fayyad, Ramasamv Uthurusamy. Data Mining and Knowledge Discovery in Databases (Introduction to the 5pecial Section)[J]. Communications of the ACM, 1996.39(11): 24~26.
    [GGRI999] Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. CACTUS - Clustering Categorical Data Using Summaries[A]. In: Proceedings of the 5th International Conference on Knowledge Discover), and Data Mining[C]. San Diego: ACM Press. 1999: 73~83.
    [GKR1998] David Gibson, Jon Kleinberg, Prabhakar Raghavan. Clustering Categorical Data. An Approach Based on Dynamical Systems[A]. In: A. Gupta, O. Shmueli, J. Widom, eds. Proceedings of the 24th International Conference on Very Large Data Bases[C]. NewYork: Morgan Kaufmann. 1998: 311~322.
    [Gore1998] Al Gore. The Digital Earth." Understanding Our Planet in the 21st Century[EB/OL]. http://www.digitalearth.gov/VP 19980131.html. 1998.01.31.
    [Gra2005] Jim Gray. What Next? A Dozen Information-Technology Research Goals[EB/OL]. http://research.microsoft.com/~gray/papers/MS_TR_99_50_Turin gTalk.pdf. 2005.
    [GRS1998a] S. Guha, R. Rastogi, K. Shim. CURE. an Efficient Clustering Algorithm for Large Databases[A]. In: L. M. Haas, A. Tiwary, eds. Proceedings of the ]998 ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press. 1998: 73~84.
    [GRS1998b] S. Guha, R. Rastogi, K. Shim. CURE. an Efficient Clustering Algorithm for Large Databases[J]. Information System Journal, 1998, 26(1): 35~58.
    [GRS1999] S. Guha, R. Rastogi, K. Shim. ROCK. a Robust Clustering Algorithm for Categorical Attributes[A]. In: Proceedings of the 15th International Conference on DataEngineering[C]. Sydney: IEEE Computer Society Press. 1999: 512~521.
    [HJ1985] Roger A. Horn, Charles R. Johnson. Matrix Analysis[M]. Cambridge University, Press. 1985.
    [HK1998] Alexander Hinneburg, Daniel A. Keim. An Efficient Approach to Clustering in Large Multimedia Databases with Noise[A]. In: R. Agrawal, P. E. Stolorz, G. Piatetsky-Shapiro, eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining[C]. New York: AAAI Press. 1998: 58~65.
    [HK1999] Alexander Hinneburg, Daniel A. Keim. Optimal Grid-Clustering" Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering[A]. In: M. P. Atkinson, M. E. Orlowska. P. Valduriez, et al., eds. Proceedings of the 25th International Conference on Very Large Data Bases[C]. Edinburgh: Morgan Kaufmann. 1999: 506~517.
    [HK2000] Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques[M]. Morgan Kaufmann Publishers. 2000.
    [HK2003] Khaled M. Hammouda. Mohamed 5. Kamel. Incremental Document Clustering Using Cluster Similariry Histograms[A]. In: Proc. of IEEE/WIC International Conference on Web Intelligence(WI2003)[C]. Halifax: IEEE Computer Society; 2003: 597~601.
    [HKKMI998] Eui-Hong Hart, George Karypis, Vipin Kumar, Bamshad Mobasher. Hypergraph Based Clustering in High-Dimensional Data Sets. A Summay of Results[J]. IEEE Bulletin of the Technical Committee on Data Engineering, 1998.21 (1): 15~22.
    [HMS200I] David Hand, Heikki Mannila, Padhraic Smyth. Principies of Data Mining[M]. MIT Press. 2001.
    [Hot1933] Harold Hotelling. Analysis of a Complex of Statistical Variables into Principal Components[J]. Journal of Educational Psychology, 1933, 24:417~441 &498~520.
    [HS1995] Mauricio A. Hernandez, Salvatore J. Stolfo. The Merge/Purge Problem for Large Databases[A]. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD-95)[C]. San Jose, CA. 1995: 127~138.
    [HS1998] Mauricio A. Hernandez, Salvatore J. Stolfo. Real-World Data is Dir.ty. Data Cleansing and The Merge/Purge Problem[J]. Data Mining and Knowledge Discovery, 1998, 2(1): 9~37.
    [HTF2001] Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning. Data Mining, Inference, and Prediction[M]. Springer. 2001.
    [Hua1998] Zhexue Huang. Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283~304.
    [Inm1993] William H. lnmon. Building the Data Warehouse[M]. New York. John Wiley&Sons, Inc.. 1993.
    [Inm2005] William H. Inmon. What is A Data Warehouse[EB/OL]. http://inmoncif.com/library/whiteprs/earlywp/ttdw.pdf. 2005.
    [JHW+2005] Shardrom Johnson, Daniel Hsu, Gengfeng Wu, Shenjie Jin, Wu Zhang. Clustering Approach on Core-based and Energy-based Vibrating[A]. In: Wu Zhang, Zhangxin Chert, Roland Glowinski, Weiqin Tong, eds. Current Trends in High Performance Computing and Its Applications[C]. Proceedings of the International Conference on High Performance Computing and Applications. Springer. 2005: 325~331. (ISI number: ISIP000231829400039)
    [Jol1986] Ian Timothy Jolliffe. Principal Component Analysis[M]. New York: Springer-Verlag. 1986.
    [Karl947] Kari Karhunen. Uber lineare Methoden in der Wahrscheilichkeitsrechnung[D]. Helsingin Yliopisto, Finland, 1947. (in Finnish)
    [KH 1999] Daniel A. Keim. Alexander Hinneburg. Clustering Techniques for Large Data Sets-
    From The Past To The Future[A]. In: Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining[M]. ACM Press. 1999: 141~181.
    [KHK1999] George Karypis, E. H. Hart, V. Kumar. CHAMELEON: a Hierarchical Clustering Algorithm Using Dynamic Modeling[J]. IEEE Computer. 1999.32(81): 68~75.
    [KLB2002] Abraham Kandel, Mark Last. Horst Bunke. Data Mining and Computational intelligence[M]. Physica-Verlag. 2002.
    [KR1990] L. Kaufman. P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis[M]. New York: John Wiley&Sons, Inc.. 1990.
    [Lau1995] Steffen L. Lauritzen. The EM Algorithm for Graphical Association Models with Missing Data[J]. Computational Statistics and Data Analysis. 1995, 19:191~201.
    [Lin1990] T. Lindeberg. Scale-Space for Discrete Signals[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1990. 12(3): 234~254.
    [Loè2000] Michel Loève. ProbabilityTheory[M]. New York: Springer-Verlag. 2000.
    [LT1998] Jiaxue Liu, Youde Tao. The Discussion of Analytic Hierarchy Process for Quantitative Type Multiple Attribute Decision Systems Analysis[J]. Journal of Systems Science and Systems Engineering, 1998, 7(2): 129~133.
    [Luan2002] Jing Luan. Data Mining as Driven by Knowledge Management in Higher Education Persistence Clustering and Prediction[EB/OL]. http://www. cabrillo.edu/services/pro/oir_reports/UCSFpaper.pdf. 2002.
    [LZX2000] Yee Leung, Jiang-She Zhang, Zong-Ben Xu. Clustering by Scale-Space Filtering[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(12): 1396~1410.
    [Mac1967] J.B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations[A]. In: L. Le Cam, J. Neyman, eds. Proceedings of the 5 Berkeley Symposium on Mathematical Statistics and Probability[C]. Berkeley: University of California Press. 1967, 1: 281~297.
    [Ma12002] Efrem G. Mallach. Decision Support And Data Warehouse Systems[M]. McGraw-Hill Higher Education. 2002.
    [Mey2000] Carl D. Meyer. Matrix Analysis & Applied Linear Algebra[M]. Society for Industrial and Applied Mathematics. 2000.
    [Mi11956] George A. Miller. The Magical Number Seven. Plus or Minus Two. Some Limits on Our Capacity for Processing Information[J]. The Psychological Review, 1956, 63: 81~97.
    [Mon2003] Alvaro E. Monge. An Adaptive and Efficient Algorithm for Detecting Approximately Duplicate Database Records[EB/OL]. http://citeseer.ist.psu.edu/cache/papers/cs/23472/http:zSzzSzwww.cecs.csulb.eduzSz~mongezSzPaperszSzis-special-issues.paf/monge00adaptive.pdf, 2003-03.
    [NFW2005] Eric Ka Ka Ng, Ada Wai-chee Fu, Raymond Chi-Wing Wong. Projective Clustering by Histograms[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(3): 369~383.
    [NH1994] R.T. Ng, J. Hart. Efficient and Effective Clustering Methods for Spatial Data Mining[A]. In: J. B. Bocca, M. Jarke, C. Zaniolo, eds. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94)[C]. Santiago: Morgan Kaufmann. 1994: 144~155.
    [NKAJ1959] H.B. Newcombe. J. M. Kennedy, S. J. Axford. A. P. James. Automatic Linkage of Vital Records[J]. Science, 1959, 130: 954~959.
    [Oet1965] Anthony G. Oettinger. An Essay in Information Retrieval or The Birth of A Myth[J]. Information and Control, 1965, 8(1): 64~79.
    [Oja1983] Erkki Oja. Subspace Methods of Pattern Recognition[M]. England: Research Studies Press. 1983.
    [PB1999] Asunción Gómez Pèrez. V. Richard Benjamins. Overview of Knowledge Sharing and Reuse Components : Ontologies and Problem-solving Methods[A]. In: Proc. of the IJCAI-99 workshop on Ontologies and Problem-Solving Methods(KRR5)[C]. Stockholm: Morgan-Kaufmann, 1999: 1~15.
    [PC1998] Matthew Partridge, Rafaet Caivo. Fast Dimensionality Reduction and Simple PCA[J]. Intelligent Data Analysis, 1998, 2(1-4): 203~214.
    [Pea1901] Karl Pearson. On Lines and Planes of Closest Fit to Systems of Points in Space[J]. Philosophical Magazine. 1901, 2: 559~572.
    [QZ2002] QIAN Wei-ning, ZHOU Ao-ying. Analyzing Popular Clustering Algorithms from Different Viewpoints[J]. Journal of Software, 2002,13(8): 1382~1394.
    [RD2000] Erhard Rahm, Hong Hai Do. Data Cleaning. Problems and Current Approaches[J]. IEEE Data Engineering Bulletin, 2000, 23(4): 3~13.
    [Saa1980] Thomas L. Saaty. The Analytic Hierarchy Process[M]. New York: McGraw Hill. 1980.
    [Saa2004] Thomas L. Saaty. Decision Making - The Analytic Hierarchy and Network Processes (AHP/ANP)[J]. Journal of Systems Science and Systems Engineering, 2004, 13(1): 1~35.
    [SBSG2006] Petr Sereda, Anna Vilanova Bartrolí, lwo W. O. Serlie, Frans A. Gerritsen. Visualization of Boundaries in Volumetric Data Sets Using LH Histograms[J]. IEEE Transactions on Visualization and Computer Graphics, 2006, 12(2): 208~218.
    [SC2004] John Shawe-Taylor, Nello Cristianini Kernel Methods for Pattern Analysis[M]. Cambridge: Cambridge UniversityPress. 2004.
    [SCZ1998] G. Sheikholeslami, S. Chatterjee, A. Zhang. WaveCluster. a Multi-resolution Clustering Approach for Very Large Spatial Databases[A]. In: A. Gupta, O. Shmueti. J. Widom, eds. Proceedings of the 24th International Conference on Very Large DataBases[C]. NewYork: Morgan Kaufmann. 1998: 428~438.
    [SEKX1998] Jorg. Sander, Martin Ester. Hans-Peter Kriegel, Xiaowei Xu. Density-Based Clustering in Spatial Databases. The Algorithm GDBSCAN and Its Applications[J]. Data Mining and Knowledge Discovery, 1998, 2(2): 169~194.
    [SM2003] Thomas L. Saaty, Ozdemir M.. Why the Magic Number Seven Plus or Minus Two[J]. Mathematical and Computer Modelling, 2003, 38: 233~244.
    [ST1999] William Swartout, Austin Tare. Ontoiogies[J]. IEEE Intelligent Systems, 1999, 14(1): 18~19.
    [TB1998] Luis Talavera, Javier Bejar. EFficient Construction of Comprehensible Hierarchical Clustering[A]. In: J. M. Zytkow, M. Quafalou, eds. Principles of Data Mining and Knowledge Discovery[C]. Proceedings of the 2nd European Symposium. PKDD'98. Nantes: Springer-Verlag. 1998: 93~101.
    [Usc1998] Mike Uschold Knowledge Level Modelling : Concepts and Terminology[J]. Knowledge EngineeringReview, 1998. 13(1): 5~29.
    [WYM1997] W. Wang, J. Yang. R. Muntz. STING: a Statistical Information Grid Approach to Spatial Data Mining[A]. In: M. Jarke. M. J. Carey. K. R. Dirtrich, et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases[C]. Athens: Morgan Kaufmann. 1997: 186~195.
    [XEKS1998] Xiaowei Xu. Martin Ester, Hans-Peter Kriegel. Jorg. Sander. A Distribution-Based Clustering A lgorithm for Mining in Largespatial Dalabases[A], In: Proceedings of the 14th International Coference on Data Engineering[C]. Orlando: IEEE Computer Society Press. 1998: 324~331.
    [ZJC+2005] Zhengde Zhao. Shardrom Johnson. Xiaobo Chert, Hongcan Ren, Daniel Hsu. Research on a Visual Collaborative Design System: A High Performance Solution Based on Web Service[A]. In: Wu Zhang, Zhangxin Chert. Roland Glowinski, Weiqin Tong, eds. Current Trends in High Performance Computing and Its Applications[C]. Proceedings of the International Conference on High Performance Computing and Applications. Springer. 2005:611-615. (ISI number: ISIP000231829400086)
    [ZRL1996] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: an Efficient Data Clustering Method for Very Large Databases[A]. ln: H. V. Jagadish, I. S. Mumick, eds. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data[C]. Quebec: ACM Press. 1996: 103~114.
    [陈安2006] 陈安,陈宁,周龙骧.数据挖掘技术及应用[M].北京:科学出版社,2006.03.
    [陈莉2005] 陈莉,焦李成.基于自适应聚类的数据预处理算法I[J].计算机应用与软件,2005,22(3):28~29,47.
    [陈舜麟2005] 陈舜麟.计算材料科学[M].北京:化学工业出版社,2005.07.
    [陈伟志2003] 陈伟志,魏振军,王春迎.多元统计分析在数据挖掘中的作用[J].信息工程大学学报,2003.4(4):22~25.
    [陈文伟2004] 陈文伟,黄金才.数据仓库与数据挖掘[M].北京:人民邮电出版社,2004.01.
    [董诚2006] 董诚,黄鼎成.科常数发据资源的管理[J].中国基础科学,2006,(2):20~24.
    [董红斌1999] 董红斌,梁意文,郭学理.张健.校园网络数据资源分层访问策略[J].武汉大学学报(自然科学版),1999.45(1):61~64.
    [段仁军2002] 段仁军.张伟.我国普通高等教育发展水平的全局主成分分析[J].数理统计与管理,2002.21(6):27~31.
    [冯德益1996] 冯德益,小山顺二.数理地震学进展[M].北京:地震出版社,1996.06.
    [傅荣林2001] 傅荣林.主成分综合评价模型的探讨[J].系统工程理论与实践,2001,21(11):68-74.
    [淦文燕2004] 淦文燕,李德毅.基于核密度估计的层次聚类算法[J].系统仿真学报,2004,16(2):302~305,309.
    [高策理2004] 高策理,蔡斌.使用主成分分析进行综合排名时出现高相关指标的研究[J].数学的实践与认识.2004.34(12):21~24.
    [高述珉2003] 高述珉,沈岩.刘惠琴.校企合用培养应届推免工程硕士生的实践与思考[J].清华大学教育研究,2003.24(2):101~103.
    [辜寄蓉2003] 辜寄蓉.基于元数据的综合数据管理与信息共享[D].成都理工大学博士学位论文.2003.09.
    [郭志懋2002] 郭志懋,周傲英.数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076~2082.
    [何海芸2005a] 何海芸,袁春风.基于Ontology的领域知识构建技术综述[J].计算机应用研究,2005,22(3):14~18.
    [何海芸2005b] 何海芸,包云岗,袁春风.领域概念语义关系类型的半自动提取技术[J].计算机工程.2005,31(18):68~70,118.
    [胡侃1998] 胡侃,夏绍纬.基于大型数据仓库的数据挖掘[J].软件学报.1998,9(1): 53~61.
    [胡琦2004] 胡琦.中值滤波方法在数据挖掘数据预处理中的应用[J].武汉冶金管理干部学院学报,2004,14(3):69~71.
    [胡小勇2003] 胡小勇,祝智庭.教育信息资源的本地化研究[J].中国远程教育,2003,(5):1~5.
    [华东2005] 华东师范大学法政学院课题组.完善高校免试直升研究生制度的调查研究[J].思想理论教育,2005,(7-8):96~101,31.
    [黄德峰2002] 黄德峰,王正平.对进一步做好高校推荐优秀应届本科毕业生免试攻读硕士学位工作的思考[J].学位与研究生教育,2002,(4):36~39.
    [黄鼎成2003] 黄鼎成.科学数据共享的理论基础与共享机制[J].中国基础科学,2003,(2):22~27.
    [黄发良2004] 黄发良,万东升.分类挖掘及其在学生保持工程中的应用[J].现代教育技术,2004,14(6):72~75.
    [黄建军1999] 黄建军,甘仞初.管理信息系统数据资源配置的研究与应用[J].北京理工大学学报.1999,19(4):521~524.
    [黄万华2004] 黄万华.陆声链,林士敏.孤立点挖掘在新务管理中的应用研究[J].广西科学院学报,2004,20(3):155~158,162.
    [菅志刚2004] 菅志刚,金旭.数据挖掘中数据预处理的研究与实现[J].计算机应用研究,2004,21(7):117~118,157.
    [金沈杰2004] 金沈杰,吴绍春,吴耿锋,严胜祥.基于预聚类技术的并行序贯模式挖掘算法[J].计算机工程与科学,2004.26(10):66~68,90.
    [喀兴林2000] 喀兴林.量子力学与原子世男[M].太原:山西科学技术出版社,2000.02.
    [李德毅1994] 李德毅.发现状态空间理论[J].小型微型计算机系统,1994,15(11):1~6.
    [李强2004] 李强,夏骄雄,焦政.An Extensible and Type Safe Implementation of Abstract Factory,Pattern[A].《第十届联合国际计算机会议文集》[C].北京:世界图书出版公司,2004,10:459~464.
    [李强2005] 李强,夏骄雄,焦政.基于J2EE的数据控掘算法组件库设计[J].计算机工程与设计,2005,26(11):3091~3093.
    [李庆忠2006] 李庆忠,王栋.关于语义网格环境中异构数据资源整合的研究[J].南京大学学报(自然科学),2006,42(2):141~147.
    [李雄飞2003] 李雄飞.李军.数据挖掘与知识发现[M].北京:高等教育出版社.2003 11.
    [李言荣2001] 李言荣.恽正中.材料物理学概论[M].北京:清华大学出版社.2001.05.
    [李玉珍2005] 李玉珍,王宜怀.主成分分析及算法[J].苏州大学学报(自然科学版),2005,21(1):32~36.
    [林阳2002] 林阳.数据挖掘在教育信息化中的潜在价值[J].现代教育技术.2002.(1):65~67.
    [刘宏2003] 刘宏.通过标记样本与未标记样本学习文本分类规则[D].上海交通大学博士学位论文.2003.08.
    [刘来福1999] 刘来福.曾文艺.问题解决的数学模型方法[M].北京:北京师范大学出版社.1999.08.
    [刘莉2003] 刘莉,徐玉生,马志新.数据挖掘中数据预处理技术综述[J].甘肃科学学报,2003,15(1):117~119.
    [刘宁2005] 刘宁,时金芝.生态信息科学与数据资源管理[J].现代情报,2005,(3):166~167,170.
    [刘贤龙1998a] 刘贤龙,胡国亮.综合评价结果的合理性研究[J].统计研究,1998,(1):38~40.
    [刘贤龙1998b] 刘贤龙.“多元分析”教学中培养学生应用能力的探讨[J].数学教育学报,1998,7(3):88~90.
    [刘贤龙1998c] 刘贤龙.我国普通高等教育发展水平的统计分析[J].数理统计与管理,1998,17(5):1~4.
    [刘星晔2005] 刘星晔,阳生权,李润求.加强高校管理信息系统的数据共享与利用[J].教育信息化,2005,(6):43~44.
    [刘越江2003] 刘越江,黄今慧.数据挖掘中的数据预处理技术[J].科技情报开发与经济,2003,13(5):170~171.
    [楼伟进2006] 楼伟进,孔繁胜,曹永生.数据库中的知识发现综述[DB/OL].http://icgr.caas.net.cn/training/forum/kj.htm.2006.
    [路甬祥2000] 路甬祥.合作开发“数字地球”兴享全球数据资源[J].地球信息科学,2000,(1):6~7.
    [罗长勋1986] 罗长勋.量于场论引论[M].陕西:陕西师范大学出版社,1986.07.
    [罗明高1998] 罗明高.定量储层地质学[M].北京:地质出版社,1998.12.
    [罗雨滋2005] 罗雨滋,付兴宏.数据挖掘在教育信息化中的应用[J].固原师专学报(自然科学).2005,26(6):54~57.
    [彭木根2002] 彭木根.数据仓库技术与实现[M].北京:电子工业出版社,2002.06.
    [钱伟长1996] 钱伟长.谈当前学生工作的原则,要求和方向问题[J].中国高等教育,1996,(5):8~9.
    [曲春锦2005] 曲春锦.改进的关联规则挖掘算法及其在教育信息挖掘中的应用[J].交通与计算机,2005,23(4):68~71.
    [阮秋琦2001] 阮秋琦.数字图像处理学[M].北京:电子工业出版社,2001.01.
    [萨师煊1991] 萨师煊,王珊.数据库系统概论[M].北京:高等教育出版社,1991.04.
    [盛子宁2003] 盛子宁.多指标评估体系的主成分分析及应用实例[J].上海海运学院学报,2003,24(3):251~253.
    [施伯乐1999] 施伯乐,丁宝康.周傲英,田增平.数据库系统教程[M].北京:高等教育出版社,1999.12.
    [施佳2007] 施佳,夏骄雄.张武.基于策略模式的特征选择算法工具库FSLS的设计[J].计算机工程与应用,2007.43(1):181~184,197.
    [孙九林2003] 孙九林.科学数据资源与共享[J].中国基础科学,2003,(1):30~33.
    [孙霞2004] 孙霞,郑庆华.教育资源元数据语义扩展查找方法的研究[J].计算机研究与发展.2004.41(12):2170~2174.
    [唐敖庆1980] 唐敖庆.江元生.鄢国森.戴树珊.分子轨道图形理论[M].北京:科学出版社,1980.06.
    [陶兰2003] 陶兰,王保迎.吕建军.数据挖掘技术在高等学校决策支持中的应用[J].中国农业大学学报.2003.8(2):39~41.
    [王红卫2002] 王红卫.建模与仿真[M].北京:科学出版社,2002 03.
    [王洪伟2004] 王洪伟,吴家春,蒋馥,基于粗糙集与主成分分析的属性约简的启发式算法研究[J].管理工程学报,2004,18(3):87~90.
    [王晓云2004] 王晓云,刘鲁,数据仓库系统的数据预处理问题研究与应用[J].北京航空航天大学学报(社会科学版),2004,17(2):45~50.
    [王羽2005] 王羽.基于层次结构的一种数据预处理设计[J].湖南经济管理干部学院学 报,2005,16(1):108~109.
    [魏萍萍2003] 魏萍萍.王翠茹,王保义,张振兴.数据挖掘技术及其在高校教学系统中的应用[J].计算机工程,2003.29(11):87~89.
    [魏涛2005] 魏涛.改理的ID3算法及其在教育信息挖掘中的应用[J].上海海事大学学报,2005,26(3):82~84.
    [吴鹤龄2000] 吴鹤龄,崔林.ACM图灵奖(1966-1999):计算机发展史的缩影[M].北京:高等教育出版社.2000.08.
    [吴胜利1998] 吴胜利.估算查询结果大小的直方图方法之研究[J].软件学报,1998,9(4):285~289.
    [武成岗2001] 武成岗,焦文品,田启家,史忠植.基于本体论和多主体的信息检索服务器[J].计算机研究与发展,2001.38(6):641~647.
    [夏骄雄1998] 夏骄雄,高珏.唐毅.需求驱动及其发展[J].计算机工程,2003,24(5):49~51,62.(EI Compendex Code:AN98074299153)
    [夏骄雄2000a] 夏骄雄.陆菊康.施振夏.Intranet与管理信息系统[J].计算机工程与应用,2000,36(3):142~144,147.
    [夏骄雄2000b] 夏骄雄,高珏,陆菊康,施振夏.数据库存取技术在IBMIS中的应用[J].计算机工程.2000.26(7):26~28,76.(EI Compendex Code:AN00115399565)
    [夏骄雄2000c] 夏骄雄.计算机辅助实验考核系统[J].计算机应用与软件,2000,17(10):57~64.
    [夏骄雄2001] 夏骄雄,陆菊康.吴耿锋.基于Intranet的管理信息系统[J].小型微型计算机系统,2001,22(4):497~500.(EI Inspec Code:AN6914067)
    [夏骄雄2002] 夏骄雄,陆菊康,施振夏.CAT、CA1、CSET在实验考核中的应用[J].计算机应用与软件,2002,19(1):29~34.
    [夏骄雄2003a] 夏骄雄,滕建勇,竺剑,卫静芬,顾晔.经静.高校共青团工作信息化建设及其途径的研究[A].《汇聚服务年——上海青年工作课题调研集》[M].上海:百家出版社,2003,01:228~233.
    [夏骄雄2003b] 夏骄雄,徐俊.吴耿锋.Extranet与管理信息系统[J].计算机工程.2003.29(4):13~15.
    [夏骄雄2003c] 夏骄雄.徐俊,吴耿锋.CSET理念的应用研究[J].计算机应用与软件,2003.20(5):66~68
    [夏骄雄2003d] 夏骄雄,金沈杰,吴耿锋.基于煤体导航的方法及相关算法研究[J].计算机工程与科学,2003,25(6):64~67.
    [夏骄雄2006a] 夏骄雄.徐俊,吴耿锋.“数据库主成份提取”方法及其应用[J].计算机工程与应用,2006,42(20):134~137,202.
    [夏骄雄2006b] 夏骄雄,徐俊,吴耿锋.基于“震动方法”的类删减策略[J].小型微型计算机系统.2006,27(9):1632~1636.(El Inspec Code:AN9167559)
    [夏骄雄2006c] 夏骄雄.徐俊.高校共青团工作信息化建设研究[J].中国教育导刊,2006,(11):85~86.
    [夏骄雄2007] 夏骄雄,徐俊,吴耿锋.数据清理中同体不同源数据的数化算法研究[J].计算机工程,2007,33(1):71~73.
    [谢昌浩2004] 谢昌浩.对高校学生评价指标体系主成分分析[J].云南财贸学院学报.2004,20(4):113~117.
    [徐恒钧2001] 徐恒钧.材料科学基础[M].北京:北京工业大学出版社.2001.10.
    [徐坚成2002] 徐坚成.我国各地区高等教育发展背景与现状的评价[J].中国高等教育评 估,2000.(2):9~12.
    [徐俊2006] 徐俊,夏骄雄,施佳.基于“核”和“能”的震动聚类方法研究[J].计算机应用研究,2006.23(增):865~866.
    [薛毅2001] 薛毅.最优化原理与方法[M].北京:北京工业大学出版社.2001.02.
    [杨景春2001] 杨景春,李有利.地貌学原理[M].北京:北京大学出版社,2001.08.
    [殷鹏程1986] 殷鹏程.量子场论纲要[M].上海:上海科学技术出版社,1986.09.
    [尹鸿钧1999] 尹鸿钧.量子力学[M].合肥:中国科学技术大学出版社,1999.10.
    [张东生2001] 张东生.用关联规则方法指导高校人才培育[J].河南大学学报(教育科学版),2001,17(2):20~22.
    [张建林2006] 张建林,夏婷婷.应届本科学攻读硕士研究生的机会成本分析[J].武汉科技学院学报,2006,19(1):83~86.
    [张翎2001] 张翎.主成分分析法在高校学生质量评价中的应用[J].云南民族学院学报(自然科学版),2001,10(1):283~286.
    [张蓉2004] 张蓉,申德荣,于戈.Ontology在异构数据库集成中的应用[J].计算机工程,2004,30(24):29~31,133.
    [张哲华1997] 张哲华,刘莲君.量子力学与原子物理学[M].武汉:武汉大学出版社,1997.09.
    [赵安郎1992] 赵安郎.孙子兵法百战韬略[M].南京:东南大学出版社.1992.08.
    [赵广社2003] 赵广社,张希仁.数据挖掘中的统计方法概述[J].计算机测量与控制,2003,11(12):914~917.
    [赵广社2004] 赵广社,张希仁.基于主成分分析的支持向量机分类方法研究[J].计算机工程与应用,2004,40(3):37~38,144.
    [赵仲牧2000] 赵仲牧.物理场论对哲学思考的提示[J].思想战线(云南大学人文社会科学学报),2000,26(5):1~6,
    [郑乐民2000] 郑乐民.原子物理[M].北京:北京大学出版社,2000.09.
    [郑明2001] 郑明,沈怡,黄治斌.对使用C.R.衡量互反矩阵一致性的统计解释及探讨[J].系统工程理论方法应用,2001,10(1):75~78.
    [周济2006] 周济.教学评估是提高教育质量的关键举措——在普通高等学校本科教学评估工作经验交流暨评估专家组组长工作研讨会上的讲话[J].中国大学教学,2006,(5):4~8.
    [周世勋1979] 周世勋.量子力学教程[M].北京:高等教育出版社.1979.02.
    [朱洪元1960] 朱洪元.量子场论[M].北京:科学出版社,1960.09.
    [朱雪龙2001] 朱雪龙.应用信息论基础[M].北京:清华大学出版社.2001.03.
    [朱正和1997] 朱正和.俞华根.分子结构与分子势能函数[M].北京:科学出版社,1997.06.
    [邹国兴1980] 邹国兴.量子场论导论[M].北京:科学出版社.1980.02.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700