基于时间序列相似性匹配算法的地震预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
我国是地震多发国家。地震活动频度高、强度大、分布范围广、震源浅,地震灾害十分严重。由于引发地震的因素很多且各种因素之间具有极不确定的非线性关系。本文通过时间震级序列数据挖掘方法对地震预报展开了一系列的研究。其研究的目的是根据地震数据的特点,把经典的时间序列数据挖掘方法和高性能的计算机技术相结合,研究适合于地震预报的数据挖掘算法,找到地震数据背后的规律,发现潜在,有价值的地震预报知识。
     地震相关地区相似度匹配研究工作包括以下几个部分:
     1.对地震数据进行预处理,地震目录数据由发震时间,震中位置,震级等信息组成的数据,若直接进行管理规则挖掘,挖掘出来的结果是一些点与点之间的关系,本文对地震数据进行去噪、分块、画圆、聚类等离散化和主类分析,将地震转化为我们需要的地震数据格式。
     2.根据地震领域相关知识,定义了时间震级序列相似度,提出一种基于地震相似度的序列相似性匹配算法。该算法引入时间、震级二维阈值匹配思想,能快速地进行序列的相似性匹配,从而在地震序列中发现地震相关地区。
     3.约束规则的序贯模式度量模型的建立相似度匹配关联算法,其中相似度匹配定义可分为两个部分:粗粒度相似匹配,即在地震源目录中找出地震条数差值在一定的阈值margin下的地震区域;细粒度相似匹配,在粗相似的基础上,把时间、震级、发震地点等信息转化为二维阈值支持数的地震序列,对需查询的地震序列与地震数据仓库中的地震序列记录进行比较,找出具有较高相似度的地震序列。
     4.实现了基于机群系统的地震预报并行数据挖掘平台。在该平台中对海量数据进行预处理筛选的基础上再进行时间相似性匹配,增加了横向和纵向,多地区和多时间段的匹配:以及不同时间差,阈值的匹配,并通过大量实验对该模型进行反复验证,对我国地震频繁地区近几十年的地震历史数据进行了匹配实验分析,取得了可信度较高的实验结果,验证了所给序列相似性匹配控制策略的有效性、实用性以及算法的优越性。
China is earthquake-prone countries. High-frequency seismic activity, strength, wide distribution, light source, a very serious earthquake. To the country and the people brought huge losses. Because many factors that caused the earthquake and a variety of factors have highly uncertain nonlinear relationship. Using data mining techniques can be more systematic, in-depth, comprehensive, detailed research on earthquake prediction analysis play a role in promoting. This paper focuses on earthquake prediction in time series data mining algorithms magnitude theories, methods and practical applications. In this paper, the magnitude of time series data mining started a series of earthquake prediction research. The purpose of the study is based on the characteristics of seismic data, the classic time-series data mining methods and high-performance computer technology combined with studies for earthquake prediction in data mining algorithms to find the law behind the seismic data, identify potential and valuable knowledge of earthquake prediction.
     Similarity matching of earthquake-related areas of work include the following parts:
     1. Preprocessing of seismic data, seismic catalog data from the earthquake time, epicenter location, magnitude, composition data and other information, if the management of direct rule mining, excavation The result is that some of the relationship between points, this denoising of seismic data, block, round, discrete cluster, etc. and the main class, the seismic into the seismic data format we need.
     2. According to the seismic area of knowledge, time and magnitude of the definition of sequence similarity, is proposed based on seismic sequence similarity similarity matching algorithm. The algorithm introduces time, magnitude two-dimensional threshold matching, can quickly match the sequence similarity to the earthquake in the earthquake sequence found in relevant areas.
     3. Constraint Rule metric model of sequential pattern matching similarity association algorithm, which matches the definition of similarity can be divided into two parts:coarse-grained similar to the match, that the earthquake source directory to find the difference in the number of seismic section a certain threshold margin of seismic area, simply, in a period of time, an earthquake occurred in an area project, another region of the tens of thousands of items of Article earthquake, then the two regions have the possibility of similar the minimum of; fine-grained similarity matching, on the basis of similarity in the rough, the time, magnitude, earthquake location and other information into two-dimensional threshold to support the number of earthquake sequences, need to check on the seismic sequence and seismic data warehouse The earthquake sequence records were compared to find sequences with high similarity of the earthquake. When a higher degree of similarity, the two areas is bound to reflect the occurrence of earthquakes have certain rules on the relationship.
     4. Realized the cluster system based on parallel data mining platform for earthquake prediction. In the platform of the massive data preprocessing filter time based on similarity matching further increased the horizontal and vertical, multi-regional and multi-time matching; and different time difference, the match threshold, and through a large number of experiments repeated validation of the model, the earthquake in China's earthquake-prone areas in recent decades the history matching experimental data analysis, made more credible experimental results verify the sequence similarity to match the effectiveness of control strategies, practical and algorithm.
引文
[1]严蔚敏、吴伟民.数据结构.清华大学出版社.北京.1997.
    [2]张保健.时间序列数据挖掘.西北工业大学博士论文[D],2003.
    [3]Agrawal R,Mamnila H, Srikant R et al Fast Discovery of Association Rules, in Fayyad U, Piatetsky-Shapiro, Smyth, Uthurusamy. P eds Advances in Knowledge Discovery and Data Mining, MIT Press,1996.307-328.
    [4]Jiawei Han, and Micheline Kamber. Data Mining:Concepts and Techniques. Academic Press.2000.
    [5]Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In Proc.5th Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, Mar.1996, pp.3-17.
    [6]Agrawal R, Psaila G, Wimmers E, etc.Querying shapes of histories. Proc.of Twenty-first International Conference On Very Large Database(VLDB 95),Zurich,Switerland.Morgan
    [7]Kaufmann Publishers, Inc. San Francisco,USA.1995.502—514.
    [8]吴绍春,吴耿锋,王炜,蔚赵春.寻找地震相关地区的时间序列相似性匹配算法[J].软件学报,2006.
    [9]刘王芬,廖清波.利用地震活动的相关性进行地震预报的初步探讨.<<地震预报方法实用化研究文集,地震学专辑>>.学术出版社,1989.
    [10]敖雪明,王桂岭,黄克强等.相关地震预报方法的研究.<<地震预报方法实用化研究文集,地震学专辑>>.学术出版社,1989.
    [11]陆远忠,陈章立,王碧泉等.地震预报的地震学方法[J].地震出版社.北京:1985.
    [12]张保健,何华灿.时态数据挖掘研究进展[J].计算机科学,2002,29(2).
    [13]Fayyad U, Piatetsy-Shapiro, Smyth, Uthurusammy. Advances in Knowledge Discovery and Data Mining[M].MITPress,1996.
    [14]欧阳为民、蔡庆生.发现序贯模式的增量式更新技术.小型微型计算机系统.1998年11月,第19卷(第11期):12-17.
    [15]欧阳为民、蔡庆生.在大型数据库中多层序贯模式的发现.计算机研究与发展.
    [16]欧阳为民、蔡庆生.发现广义序贯模式的增量式更新技术.软件学报.1998年.
    [17]杨学兵、陆勤、蔡庆生.一种高效的挖掘序贯模式的算法.小型微型计算机系统.2001年2月,第22卷(第2期):201-203.
    [18]丁瑞等.基于MapInfo MapX环境信息系统的开发研究[J].浙江化工,2007.
    [19]杜伟.地图控件MapInfo MapX研究[J].电脑知识与技术,2008.
    [20]李德仁,王树良,李德毅,等.论空间数据挖掘和知识发现的理论与方法[J].武汉大学学报(信息科学版),2002,27(3):221-233.
    [21]Jia-wei Han, Micheline Kamber.Data Mining Concepts and Techniques [M]. Academic Press,2000.
    [22]吕寻才,冯志勇.数据挖掘在地震预报中的应用[D].天津大学,2006.
    [23]Chen M, Han J, Yu P S. Data Mining:An overview from database PersPectieve[J].IEEE Tralls. Knowledge and Data Engineeirng,1996,8:833-866.
    [24]王炜,赵利飞等.数据挖掘及其在地震预报中的应用前景[J].地震学报,2005.
    [25]裴韬.中国及邻区大型地震数据库时空特征分析及其方法研究[D].博士后出站报告.北京:中国科学院,2000.
    [26]秦承志.面向点数据的地学数据可视化分析[D],博士论文.北京:中国科学院,2004.
    [27]李德仁,王树良,李德毅等.论空间数据挖掘和知识发现的理论与方法[J].武汉大学学报(信息科学版),2002,27(3):221-233.
    [28]时振梁,汪良谋,傅征祥等.中国大陆中长期强震危险性预测方法研究[M].北京:海洋出版社,1997.
    [29]蔡强.中国及邻近地区集成地震目录数据库及其初步应用研究[D].北京:中国科学院,2002.
    [30]中国科普博览地球故事http://www.kepu.net.cn/gh/earch/quake,2008.
    [31]吴开统,焦远碧,吕培荃,王志东.地震序列概论[M].北京:北京大学出版社,1990.
    [32]王炜,吴耿锋,黄冰树等.模糊联想记忆神经网络模型在地震预报中的应[J].地震学报,1997,19(3):254-260.
    [33]袁方,王煌王,丽娟等译.实用数据挖掘[M].北京:电子工业出版社.
    [34]蔚赵春,吴绍春.时间序列挖掘算法及其在地震预报中的应用研究[D].上海大学,2005.
    [35]Agrawal R, Psaila G,wimmwers E, etc.Querying shapes of histories,Proc of Twenty-first International Confernce on Very Lagre Database, Zuiich, Switerland Morgan Kaufmann Publishers, Inc San Francisco,USA.1995:502-514.
    [36]Das Gunopulos D, Mannila H.Finding similar time series. Proc of the fisrt European Sympossium on Principle of Data Mining ang Knowledge Discovery[M].vol 1263 of LANI springer 1997:88-100.
    [37]郑魁香.2001年台湾地区地震趋势分析.2001年台湾地区地震趋势分析论坛论文集.台湾台北,2001:41-52.
    [38]张剑平.MapInfo地理信息系统与应用[M].北京:科学出版社,1999.
    [39]Kang-tsung Chang.陈建飞等译.地理信息系统导论[M].北京:科学出版社,2003.
    [40]王家耀等.地理信息系统的现状及关键技术[M].北京:军事测绘出版,1997.
    [41]蔡钟亮,印杰等.数字地图生产设计与研究[D].武汉:武汉测绘科技大学,2003.
    [42]杨春辉,罗建军.用MapInfo与VC开发数字地图[J].计算机应用与软件,2005.
    [43]许亮,刘涛.VC中MapInfo地理信息系统的二次开发[J].现代电子技术,2004.
    [44]陈建春,Visual C++开发GIS系统一开发实例剖析.北京:电子工业出版社,2000:2-7.
    [45]张凡,吕汉兴.使用MapX组件实现地理图形与数据库的结合[J].计算机应用研究,2000.
    [46]徐晓刚,高兆法,王秀娟.Visual C++ 6.0入门与提高[M].北京:清华大学出版社,1999.
    [47]Fayyad U, Piatetsky-Shapiro, Smyth, Uthurusamy. Advances in Knowledge Discovery and Data Mining, MIT Press,1996.
    [48]Agrawal R,Mamnila H, Srikant R et al Fast Discovery of Association Rules, in Fayyad U, Piatetsky-Shapiro, Smyth, Uthurusamy. P eds Advances in Knowledge Discovery and Data Mining, MIT Press,1996.307-328.
    [49]Jiawei Han, and Micheline Kamber. Data Mining:Concepts and Techniques. Academic Press.2000.
    [50]欧阳为民、蔡庆生.在大型数据库中多层序贯模式的发现.计算机研究与发展.
    [51]欧阳为民、蔡庆生.发现广义序贯模式的增量式更新技术.软件学报.1998年.
    [52]杨学兵、陆勤、蔡庆生.一种高效的挖掘序贯模式的算法.小型微型计算机系统.2001年2月,第22卷(第2期):201-203.
    [53]Srikant R, Agrawal R. Mining sequential patterns:Generalizations and performance improvements. In Proc.5th Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, Mar.1996, pp.3-17.
    [54]Zaki M J. Efficient enumeration of frequent sequences. In Proc.7th Int. Conf. Information and Knowledge Management (CIKM'98), Washington D.C., Nov. 1998, pp.68-75.
    [55]Han J, Pei J. Mortazavi-Asl B, Chen Q, Dayal U, Hsu M C. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc.2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), Boston, MA, Aug.2000, pp.355-359.
    [56]Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, pp.215-224.
    [57]J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "Mining Sequential Patterns by Pattern-Growth:The PrefixSpan Approach", IEEE Transactions on Knowledge and Data Engineering,16(10), 2004.
    [58]Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proc.2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), Dallas, TX, May 2000, pp.1-12.
    [59]Masseglia F, Cathala F, Poncelet P. The psp approach for mining sequential patterns. In Proc.1998 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'98), Nantes, France, Sept.1998, pp.176-184.
    [60]J. Han, W. Gong, and Y. Yin. Mining segment-wise periodic patterns in time-related databases. In Proc.1998 Int'l Conf. on Knowledge Discovery and Data Mining (KDD'98), New York City, NY, August 1998.
    [61]HAN, J., DONG, G. and YIN, Y.Efficient Mining of Partial Periodic Patterns in Time Series Database. Proc. Fifteenth International Conference on Data Engineering, Sydney, Australia,106-115, IEEE Computer Society.
    [62]陆远忠、陈章立、王碧泉等.地震预报的地震学方法.地震出版社.北京:1985.
    [63]吴开统等.地震序列概论.北京大学出版社.北京.1990.
    [64]Pei J, Han J, Wang W. Constraint-based sequential pattern mining in large databases. In Proc.2002 Int. Conf. Information and Knowledge Management (CIKM'02), McLean, VA, Nov.2002, pp.18-25.
    [65]Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U. Multi-dimensional sequential pattern mining. In Proc.2001 Int. Conf. Information and Knowledge Management (CIKM'01), Atlanta, GA, Nov.2001, pp.81-88.
    [66]Minos N. Garofalakis and Rajeev Rastogi and Kyuseok Shim. SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. The VLDB Journal, pages 223—234,1999.
    [67]韩志军、王桂兰、周成虎等.地震序列研究现状与研究方向探讨.地球物理学进展,2003年3月,第18卷,第1期,074—078.
    [68]陆鑫达等译.并行程序设计.机械工业出版社.北京.2001.
    [69]都志辉.高性能计算并行编程技术MPI并行程序设计.清华大学出版社.北京2001.
    [70]Takahiko Shintani and Masaru Kitsuregawa. Mining Algorithms for Sequential Patterns in Parallel:Hash Based Approach. Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 283—294,1998.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700