空间同位模式挖掘算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

空间同位模式挖掘算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：杨柽
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：空间数据挖掘 ; 空间同位模式 ; 空间事务化 ; 极大团
英文关键词：Spatial Data Mining ; Co-location Pattern ; Spatial Transactionalization ; Maximal Clique
学位年度：2010
导师：李勃
学科代码：081202
学位授予单位：昆明理工大学
论文提交日期：2010-05-24

摘要

由于计算机技术、网络技术、空间数据采集技术的迅速发展,使得复杂多变的空间数据日益膨胀,远远超出了人的解译能力。人们迫切需要从空间数据库中发现知识,跳出这种数据丰富而知识贫乏的困境。一个崭新的研究领域——空间数据挖掘正是产生于这种需求背景之下。空间数据挖掘是指从空间数据库中抽取隐含的知识、空间关系或非显式地存储在空间数据库中的其它模式的过程。它汇集了来自数据挖掘、机器学习、模式识别、空间数据库、统计学、人工智能、地理信息系统、遥感以及决策支持系统等多门学科的成果,是多个学科和多种技术交叉综合的新领域。
     论文首先对传统数据挖掘与空间数据挖掘的研究和应用现状以及发展趋势进行了比较和分析。认为空间数据挖掘作为数据挖掘的一个新的研究和应用领域,有较好的应用前景,但目前的开发和利用程度较低；然后通过对传统数据挖掘与空间数据挖掘基本理论和主要方法的探讨,阐明二者关系：由于空间数据的特殊性,传统的数据挖掘技术不能从空间数据库中有效发现知识。只有通过研究新的理论、技术和方法,或将空间数据进行某种形式的转化、满足原有方法的运用条件,才能最终从空间数据库中挖掘出有用的知识。在此基础上,本文对基于同位模式的空间数据挖掘算法进行了分类研究、分析、比较和改进。
     同位模式挖掘是空间数据挖掘中一个新的研究热点。可以把它看作是传统关联规则在空间数据集上应用时的一种特殊形式。空间同位模式挖掘用于发现空间中频繁出现的事件或特征子集。本文将目前提出的同位模式挖掘算法分为两类：一是采用新度量、新方法进行同位模式挖掘的算法,以“类Apriori"算法为代表；二是采用极大团对空间数据进行事务化,再结合传统频繁项集挖掘方法进行同位模式挖掘的算法。在对第二类方法进行研究时,本文发现了一种空间事务化算法GridClique的不足之处,并尝试运用CoreClique算法对其进行改进。通过在合成数据集上的实验验证了改进的有效性。在前面研究的基础上,本文尝试提出一种新的极大团空间事务化算法——MCEBOSN (Maximal Clique Enumeration Based on Star Neighborhood).这种方法将极大团的应用引入到正空间同位模式挖掘下的空间事务化操作上。通过对算法的完整性、有效性、时间复杂度的分析及在合成数据集上的实验,验证MCEBOSN算法可将极大团事务化操作有效应用于正空间关系同位模式挖掘上。最后,.总结了全文的主要研究内容和创新之处,并对未来的研究工作进行了展望。
The rapid development of computer science, network technology and spatial data collection technique cause the explosion of spatial data which goes beyond people's understanding ability. We are eager to find knowledge from the spatial database and escape from this dilemma so called "rich data and poor knowledge". A brand new study field——spatial data mining comes up due to this request. Spatial data mining refers to the process to extract the implicit knowledge, spatial relationships, or other patterns not explicitly stored in spatial databases. It is a multi-disciplinary and multi-technology comprehensive new cross area which brings together subjects like data mining, machine learning, pattern recognition, spatial database, statistics, artificial intelligence, GIS, remote sensing and decision support systems etc..
     This thesis first compares and analyzes the study and application situation and development trend of traditional data mining and spatial data mining. As a new study and application field, spatial data minig has brilliant prospect, but its level of development and useage are still not high. Then, this thesis tries to explain the relationship between traditional data mining and spatial data mining by studying their basic principles and methods and draws the conclusion that due to the specialty of spatial data, using the traditional mining techniques directly on spatial data cannot effectively find knowledge. There are two ways to solve this problem:studying new theory, technique and method or tranzactionize the spatial data to make the present methods still work. Taking this as the basic principle, this thesis classifies, analyzes and compares the present spatial co-location data mining algorithms and tries to improve one of them.
     Co-location data mining is a new important study of spatial data mining. It can be seen as a kind of transformation of traditional association rule mining when applied on spatial dataset. This thesis classifies co-location mining algorithms presented into two:1. Using new interesting measures and new methods to discover co-location patterns; 2. Using maximal cliques to transactionalize the space and combine the traditional frequent item set mining methods to mine co-locations. When studying the algorithm named GlidClique, this thesis finds out its flaws and tries to improve it by using a new algorithm CoreClique. Experiments on synthetic data show the efficiency of this improvement. Based on all the analysis and study, this thesis proposes a new algorithm MCEBOSN (Maximal Clique Enumeration Based on Star Neighborhood) to introduce the maximal clique transactionalization to the co-location mining algorithm which finds out the positive co-location patterns. The analysis of completeness, correctness and time complexity and experimental evaluations on synthetic data show the efficiency of MCEBOSN algorithm. At the end of this thesis the conclusion and the future work directions are given.

引文

[1]Lu W, Han J. W, et al. Discovery of general knowledge in large spatial databases[C]. In: Proc Far East Workshop on Geographic Information Systems. Singapore,1993: 275-289.
    [2]Li D R,Cheng T. KDG-Knowledge Discovery from GIS[C]. Proceedings of the Canadian Conference on GIS, Ottawa,1994.
    [3]李德仁,王树良,李德毅.空间数据挖掘理论与应用[M].北京：科学出版社,2006.
    [4]TOBLER W. A computer movie simulating urban growth in the detroit region [J]. Economic Geography,1970,46 (2):234-240.
    [5]Shekhar S, Chawla S. Spatial Databases:A Tour[M]. Beijing:China Machine Press, 2004 (in Chinese). ([美] Shashi Shekhar, Sanjay Chawla.谢昆青,马修军,杨冬青等译.空间数据库[M].北京:机械工业出版社,2004)
    [6]Zhang X, Mamoulis N, Cheung D, et al. Fast mining of spatial collocations[C]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2004:3842393.
    [7]Shekhar S, Huang Y. Discovering spatial co-location patterns:a summary of results [C]. In Proc. of Int'l Symposium on Spatio and Temporal Database(SSTD),2001: 236-256
    [8]Yoo J, Shekhar S. A partial join approach for mining co-location patterns[C]. In Proc. of ACM Int'l Symposium on Advances in Geographic Information Systems(ACM-GIS),2004:241-249
    [9]YOO J, Shekhar S, Celik M. A joinless approach for mining spatial co-location patterns[J]. Knowledge and Data Engineering, IEEE Transactions on Volume 18, Issue 10,2006:1323-1337.
    [10]Fayyad U M, Piatetsky-Shapiro G, Smyth P, et al. Advances in knowledge discovery and data mining[M]. AAAI/MIT Press,1996.
    [11]Han J, Kamber M. Data Mining:concepts and techniques, Second Edition[M].Beijing: China Machine Press,2008(in Chinese).([加]Jiawei Han, Micheline Kamber范明,孟小峰等译.数据挖掘概念与技术(原书第二版)[M].北京：机械工业出版社,2008)
    [12]Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C]. SIGMOD,1993:207-216.
    [13]Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]. SIGMOD,2000:1-12.
    [14]Olson D, Shi Y. Introduction on business data mining[M]. Beijing:China Machine Press,2007.8(in Chinese). (David Olson,石勇.吕巍等译.商业数据挖掘导论[M].北京：机械工业出版社,2007.8)
    [15]Wang L, Bao Y, Lu J, et al. A new join-less approach for co-location pattern mining[C]. Proceedings-2008 IEEE 8th International Conference on Computer and Information Technology, CIT 2008, Piscataway, NJ 08855-1331, United States:Institute of Electrical and Electronics Engineers Computer Society,2008:197-202.
    [16]Wan Y, Zhou J, Bian F. CODEM:A novel spatial co-location and de-location patterns mining algorithm[C]. Proceedings-5th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008, Piscataway, NJ 08855-1331, United States: Institute of Electrical and Electronics Engineers Computer Society,2008:576-580.
    [17]Huang Y, Zhang P, Zhang C. On the relationships between clustering and spatial co-location pattern mining[J]. International Journal on Artificial Intelligence Tools, 2008,17(1):55-70.
    [18]Deeb FK, Niepel L. A methodology for discovering spatial co-location patterns[C]. AICCSA 08-6th IEEE/ACS International Conference on Computer Systems and Applications, Piscataway, NJ 08855-1331, United States:Institute of Electrical and Electronics Engineers Computer Society,2008:134-141.
    [19]Celik M, Kang JM, Shekhar S. Zonal co-location pattern discovery with dynamic parameters[C]. Proceedings-IEEE International Conference on Data Mining, ICDM, Piscataway, NJ 08855-1331, United States:Institute of Electrical and Electronics Engineers Inc.,2007:433-438.
    [20]Huang Y, Pei J, Xiong H. Mining co-location patterns with rare events from spatial data sets[J]. GeoInformatica,2006,10 (3):239-260.
    [21]William H. Inmon. Building the data warehouse.fourth edition[M]. Beijing:China Machine Press,2007.1 (in Chinese). (William H. Inmon.王志海等译.数据仓库(原书第四版)[M].北京：机械工业出版社,2007)
    [22]陆亿红,王子仁,黄燕.适合稀少空间特征的同位模式挖掘算法[J].浙江工业大学学报,2007,8.
    [23]Huang Y, Xiong H, Shekhar S, et al. Mining confident co-location rules without a support threshold[C]. Proc 2003 ACM Symposium on Applied Computing. New York: ACM Press,2003:4972501.
    [24]Huang Y, Xiong H, Shekhar S, et al. Mining confident co-location rules without a support threshold[C]. Proceedings of the ACM Symposium on Applied Computing: Association for Computing Machinery,2003:497-501.
    [25]胡彩平,秦小麟.一种新的正负空间同位规则挖掘算法[J].小型微型计算机系统,2008,1.
    [26]王占全,王申康,华成.空间分类数据同位规则挖掘算法[J].计算机辅助设计与图形学学报,2005,10.
    [27]Koperski K, Han J. Discovery of spatial association rules in geographic information databases[C]. In proc. Fourth International Symposium on Large Spatial Data bases, Maine,1995:47-66
    [28]Shekhar S, Huang Y, Xiong H. Performance evaluation of co-location miner[C]. the IEEE International Conference on Data Mining (ICDM'01),2001.
    [29]曾玲,熊才权,胡恬.关联规则在空间数据挖掘中的研究[J].计算机与数字工程,2005(06).
    [30]倪凯,祝晓东,张超.基于关联规则的空间数据知识发现及实现[J].计算机应用与软件,2005(12).
    [31]蔡伟杰,张晓辉,朱建秋等.关联规则挖掘综述[J].计算机工程,2001(06)
    [32]佟强,周园春,阎保平.关联规则挖掘算法[J].微电子学与计算机,2005(06).
    [33]郭伟,叶德谦.改进的基于FP-tree的频繁项集挖掘算法[J].计算机工程与应用,2007(19).
    [34]颜跃进,李舟军,陈火旺.频繁项集挖掘算法[J].计算机科学,2004(03).
    [35]宋余庆,朱玉全,孙志挥等.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003(09).
    [36]Verhein F, Al-naymat G Fast mining of complex spatial co-location patterns using GLIMIT[C]. Proceedings-IEEE International Conference on Data Mining, ICDM, Piscataway, NJ 08855-1331, United States:Institute of Electrical and Electronics Engineers Inc.,2007:679-684.
    [37]Al-naymat G Enumeration of maximal clique for mining spatial co-location patterns[C]. AICCSA 08-6th IEEE/ACS International Conference on Computer Systems and Applications, Piscataway, NJ 08855-1331, United States:Institute of Electrical and Electronics Engineers Computer Society,2008:126-133.
    [38]Arunasalam B, Chawla S, Sun P. Striking two birds with one stone:Simultaneous mining of positive and negative spatial patterns[C]. In Proceedings of the Fifth SIAM International Conference on Data Mining,2005:173-182.
    [39]Verhein F, Chawla S. Geometrically inspired itemset mining[C]. In ICDM. IEEE Computer Society,2006:655-666
    [40]Agrawaland R, Srikant R. Fast algorithms for mining association rules[J]. VLDB,1994: 487-499.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700