基于约束概念格的离群数据挖掘方法及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
概念格是数据分析和知识提取的一种有效形式化工具,具有精确性和完备性等特点。约束概念格是利用用户对数据集的兴趣、了解、认识等作背景知识,指导概念格的构造,从而使概念格的结构更具有针对性和实用性。本文针对约束概念格的代数系统、基于约束概念格的离群数据挖掘进行了研究。主要研究工作如下:
     第一、约束概念格的代数系统。利用约束概念格节点之间的上、下确界运算,构造出了约束概念格的代数系统,并给出其代数性质,证明了约束概念格知识表示的完备性,从而为基于约束概念格的数据挖掘与知识发现奠定了理论基础。
     第二、提出了基于约束概念格的离群数据挖掘算法。首先,将约束概念格中每个概念节点的内涵缩减看作子空间,并计算其稀疏度系数,若某个K维内涵缩减的稀疏度系数小于稀疏度系数阈值,则考察其所有K-1维真子集,判断由这些真子集构成的子空间是否稠密;其次,根据稀疏度系数和稠密度系数,判断概念节点的外延所包含的对象是否为离群数据;最后,采用天体光谱数据作为形式背景,实验结果表明,该算法挖掘低维子空间中的偏离数据是准确的、完备的和有效的。
     第三、在上述研究的基础上,以VC++ 6.0和Oracle 9i为开发工具,设计并实现了天体光谱数据离群挖掘原型系统,并对软件模块功能、体系结构及关键技术进行了详细描述。运行结果表明,该系统是可行的、有价值的,从而为实现天体光谱数据离群数据挖掘提供了一种新途径。
Concept lattice, which has accurate and complete characteristics, is an effective tool for data analysis and knowledge discovery. In order to improve the utility and pertinence to concept lattice construction, taking customer’s interest and understanding about data set as back grand knowledge, guiding the process of constructing concept lattice, a new concept lattice– constrained concept lattice is presented. This paper research on the algebra system of constrained concept lattice and outliers mining based on constrained concept lattice. The main research work can be summarized as follows:
     First, the algebra system of constrained concept lattice is constructed. According to the operation of supremum and infimum among constrained concept lattice nodes, the algebra system of constrained concept lattice is constructed and its algebra property and the complement of knowledge are proved. Establishing the theory base for outlier mining based on constrained concept lattice.
     Second, the outliers mining algorithm based on constrained concept lattice is proposed. Firstly, the constrained intent reduction of constrained concept lattice nodes is regarded as subspace, and sparsity coefficient is computed for every constrained intent reduction of the nodes, If there is a k dimensional constrained intent reduction that its sparsity coefficient is less than the sparsity coefficient threshold value which user set beforehand, then enumerate the k-1dimensional subset of the constrained intent reduction and judge whether it is dense subspace. Secondly, judging whether the object contained in the extent of constrained concept lattice are outliers, according to sparsity coefficient and dense coefficient. Finally the experiment results prove the efficient and validity of outlier mining based on concept lattice algorithm CLOM by taking the star spectra from the LAMOST project as the formal context.
     Third, on the basis of above, by using VC++ 6.0and Oracle 9i as development tools, the outliers mining system for star spectra data are designed and realized, and its function modules, software architecture and key technologies are elaborated. In the end, the running results show that it is feasible and valuable for outlier mining for star spectra data.
引文
[1] 邵峰晶,于忠清. 数据挖掘原理与算法. 北京:中国水利水电出版社,2003
    [2] 王永庆. 人工智能原理与方法. 西安:西安交通大学出版社,1998
    [3] J. Han, M. Kambr. Data Mining concepts and Techniques. Morgan Kaufmann Publishers,2000
    [4] Agrawal R,Imielinski T,Swami A. Mining association rules between sets of items in large databases[C]. In Proc.1993 ACM_Sigmod Int.conf.Management of Data,May 1993,Washington,D.C, PP:207-216
    [5] Quinlan J R. Induction of decisoion tree[J]. Machine Learning,1986(1):81-106
    [6] Kaufman L,Rousseeuw P. Finding Groups in Data: An Introduction to Cluster Analysis [M]. New York:John Wiley and Sons,1990.
    [7] Knorr E,Ng R. Algorithms formining distance-based outliers in large datasets (C). In: Proc. of the 24th VLDB Conference,1998,PP:392-403
    [8] Han J,Pei J,Yin Y. Mining Frequent Patterns without Candidate Generation: A Frequent-pattern Tree Approach [J]. Data Mining and Knowledge Discovery,2004,8(1):53-87.
    [9] Barnett V,Lewis T. Outliers in statistical data〔M〕. New York: John Wiley &Sons,1994
    [10] Arning A,Agrawal R,Raghavan P. A linear method for deviation in large database (C). In : Proc. of Int. Conf. Data Mining and Knowledge Discovery ,1996,PP:164-169
    [11] DUBES RC,JAIN AK. Algorithms for Clustering Data [M]. Prentice Hall,1988
    [12] Wille R. Restructuring lattice theory: an approach based on hierarchies of concepts,In:Rival I ed. Ordered sets,1982,PP:415-470
    [13] 胡可云,陆玉昌,石纯一.概念格及其应用扩展.清华大学学报(自然科学版). 2000, 40(9):77-81
    [14] Ho T B. An approach to concept format ion based on formal concept analysis [J]. IE ICETrans. Information and Systems,1995,E782D (5) : 553-559
    [15] Nourine L, Raynaud O. A fast algorithm for building lattices. Work shop on Computational Graph Theory and Combinatory [C]. Victoria,Canada,May,1999.
    [16] Godin R,Missaoue R,Alaui H. Increamental concept formation algorithms based on Galois(concept) lattice. Computational intelligence,1995,11(2):246-267
    [17] 李云,刘宗田. 概念格的分布处理研究. 小型微型计算机系统, 2005,26(3):448-451
    [18] 李云,刘宗田. 多概念格的横向合并算法. 电子学报,2004,11:1849-1854
    [19] 王志海,胡可云,胡学钢等. 概念格上规则提取的一般算法与渐进式算法.计算机学报, 1999,22(1):66-71
    [20] 胡学钢. 扩展概念格的规则提取[C]. 国际(亚太)微机应用学术会议论文集, 2000,PP:204-206
    [21] 王志海,胡可云,胡学钢. 概念格上提取规则的一般和渐进式算法.计算机学报,1999,22(1):66-70
    [22] 王德兴,胡学钢,王浩. 基于量化概念格的关联规则挖掘.合肥工业大学学报(自然科学版) ,2002,25(5):678-682
    [23] 简宋全,胡学钢,蒋美华. 扩展概念格的渐进式构造.计算机工程与应用,2001,36(15):132-134
    [24] 谢志鹏,刘宗田. 基于概念格的关联规则发现.小型微型计算机系统,2000,21(10): 1028-1031
    [25] 谢志鹏,刘宗田. 概念格与关联规则发现. 计算机研究与发展,2000,12(10) : 1415-1391
    [26] 赵弈,施鹏飞,熊范纶. 概念格递增修正关联规则挖掘算法. 上海交通大学学报,2000,34 (5): 684-687
    [27] 胡可云,陆玉昌,石纯一. 基于概念格的分类和关联规则的集成挖掘方法. 软件学报,2000,11(11):1479-1484
    [28] Anamika Gupta,Naveen Kumar,Vasudha Bhatnagar. Incremental Classification Rules Based on Association Rules Using Formal Concept Analysis. LNCS,vol.3587 (MLDM 2005),2005,PP:11-20
    [29] 王浩,胡学钢,赵文兵. 基于量化相对约简格的分类规则发现. 复旦大学学报,2004, 43(5): 761-765
    [30] 胡学钢,简宋全. 扩展概念格的渐进式求解[C]. 国际(亚太)微机应用学术会议论文集,2000,PP:212-214
    [31] 陈世权,程里春. 模糊概念格. 模糊系统与数学,2002,16(4):12-19
    [32] 张继福,张素兰等. 加权概念格及其渐进式构造. 模式识别与人工智能,2005,18(2):171-176
    [33] 张素兰,张继福等. 加权概念格的渐进式构造及其关联规则提取. 计算机工程与应用, 2005,34(7):173-177
    [34] 张继福,张素兰,胡立华. 约束概念格及其构造方法. 智能系统学报,2006,2(1):31-38
    [35] Carpineto C,Romano G. Galois: an order-theoretic approach to conceptual clustering. U tgoff P ed. Proceedings of ICML-93 [C]. Amherst: Elsevier,1993,PP:33-40
    [36] Ho T B. Incremental conceptual clustering in the framework of Galois lattice. In Lu H, Motoda H, Liu H, eds. KDD: Techniques and Applications[C]. World Scientific, 1997:49-64
    [37] 左孝凌. 离散数学[M]. 上海科学技术文献出版社, 1981
    [38] C C.Agarwal,P S.Yu. An effective and efficient algorithm for high-dimensional outlier detection[J]. The International Journal on Very Large Data Bases,2004, 14 ( 2):211–221
    [39] 谢志鹏. 基于概念格模型的知识发现研究[D]. 合肥工业大学博士论文,2001.
    [40] Hu Keyun,Lu Yuchang,Shi Chunyi. Incremental discovering association rules: a concept lattice approach. In: Proceedings of the PAKDD’99,Beijing,1999,PP:109-113
    [41] Anamika Gupta,Naveen Kumar,Vasudha Bhatnagar. Incremental Classification Rules Based on Association Rules Using Formal Concept Analysis. LNCS 3587/2005 (MLDM 2005),PP:11-20
    [42] 覃冬梅. 天体光谱信号的自动识别方法研究[D]. 中国科学院自动化研究所博士论文, 2003.6

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700