统计学习理论及其在地学中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文的目的是将统计学习理论的思想方法,引入到地学信息处理的非线性方法研究中。研究统计学习理论,支持向量机方法的数学模型、算法及其程序实现,为实现地学信息的非线性处理提供技术支持。由于地学数据具有多尺度、多时段、多精度、多比例尺和多解性等特点,这就造成了观测数据与研究对象的本质之间的对应关系具有非线性特征,因此,地学信息处理需要非线性方法,支持向量机可以将非线性空间的问题转换到线性空间中解决,所以适合进行地学信息的非线性处理。
     文中首先论述了支持向量机的基础理论--统计学习理论,然后论述了利用结构风险最小化原理代替经验风险最小化准则的理论,解决在有限样本下利用渐进理论估计期望风险的缺欠问题,支持向量机是结构风险最小化思想的具体实现。最后本文利用最小二乘法支持向量机,针对地学中的若干具体问题进行了仿真试验,并将实验结果与多元统计方法、bp神经网络方法比较分析。
The SLT is one kind of small sample statistical theory which is proposed by Vapnik and tother people.It emphatically studies the statistical rule and the study method in the small sample situation.The SLT has built and established a better theory frame for the machine learning question, also developed one kind of new general study algorithm to support SVM, it is also the better solution to resolve small-sample learning question.At present, SLT and SVM have become new research hot topic in the international machine learning field. Vanpik and hisAT&TBell laboratory research team, proposed one kind of new classified technology-SVM which has extremely potential in 1963. SVM is one kind pattern recognition method based on the statistical learning theory, mainly applies to the pattern recognition field. In 90's SLT’s realization andfor the comparatively emerging machine learning method research like the nerve network encounters some important difficulties, for instance , how determine the network architecture、overlearning、Insufficient learning and the Partial Minimum point question and so on, causes the SVM rapid development and the consummation. The SVM displays many unique superiority in solution small sample question、Non-linearity question and High dimension pattern recognition question. SVM henceforth the rapid development, now already in many domains(biological information study, text and handwriting recognition and soon) has all obtained the successful application The nuclear function which is the most gratifying in SVM .Lower dimension space vector collection usually with difficulty to divide, the solution method is maping them to the high dimension space. But this brings the difficulty which is the computation complex increasing,but the nuclear function just right has solved this problem ingeniously. In other words, so long as selecting the suitable nuclear function, we may obtain the high dimension space classified function. In the SVM theory.the different nuclear function will cause the different SVM algorithm.
     In the 60 to 70's 20th century,Studing in the geological research gradually introduces the mathematical method and technology. According to the type of geological application, these statistical models includes: moving average single/ multiple -element regression forcast,gray system forcast and so on;classifing and pattern recognition; distance Cluster Analysis; bayesian classifier; maximum likelihood classification; correlation analysis ;Factor analysis (principal components analysis) and so on; Optimization; Appraisal and plan; Linear programming; fuzzy comprehensive evaluation and analytic hierarchy process and so on. Of course,these mathematical methods has the positive impetus function to transformed the geological study from the description science to the quantificational science.However,these methods has exposed many malpractices when processing non-linear problem. In fact, occupies the dominant position in the geological research is the high dimension non-linear complex question. 90's intermediate stages, the people apply the nerve network model in the geological analysis. At present,more than 10 years’research, the artificial nerve network performance already the enormous enhancement, also was extremely widespread in the geologcial application domain, nearly has covered all domains. Its function like non-linear pattern recognition, classification,forecast, optimization, control and so on is widely applied.
     Certainly, the method in the information processing must determined by the variable and the data nature. When variable is shortage, and the relations between phenomenonand and the essential quite are explicit, we may use the logical inference directly to draw the conclusion; If the research phenomenon can occur under some kind of probability condition, then may use the probability statistical analysis method to study. Certainly, the method in the information processing must determined according to the variable and the data nature. When variable is shortage, and the relations between phenomenonand and the essential quite are explicit, we may use the logical inference directly to draw the conclusion; If the research phenomenon can occur under some kind of probability condition, then may use the probability statistical analysis method to study.In variable sufficient situation, relations between the phenomenon and the essential quite are complex, generally drawing the conclusion directly depending on the logical inference is difficult.by now, if the variable only contains the quantificational variable, may apply multiple- element statistical analysis; When includes the qualitative variable, may use quantification theory.
     Because the Earth's origin、the evolution and the developing process are not the repeatability,and the Historical reason of the humanity science and technology development, geological data has many characteristics for instance: the multi- criteria、the multi- time intervals、the multi- precisions、the multi- scales and multi-results, this has created the corresponding relations between observation data and the research object essence with the non-linear characteristic,therefore,geological data and information processing needs the non-linear method. When the limited number of samples is difficult to obtain ideal results. Actually we have the training and practice samples are limited, and it is theoretically very mature approach In practical application's performance is unsatisfactory. For example, BP algorithm, the optimized process falls into the minimum question; the overlearning question.
     But to one’s excited ,for SVM’s some merits, can satisfy the geological work’s need, therefore this article introduces SVM in the geological research.And we preliminary use Least squares method SVM in some gold ore simulation experiment to classify the chemical Exceptionally,and the accurate rate is 86%.
     At present, the SVM research had the partial achievements, but mostly only limits to the simulation experimental stage, by far the application research is insufficient, this article aims at appling SVM to the geological research,and enrich the quantitative analysis theory, and proposes the idea and the method when solve the complex non-linear geological problem.
引文
[1](美)瓦普尼克著,张学工译.统计学习理论的本质.清华大学出版社,2000 年
    [2]王世称,成秋明,范继璋著.金矿资源综合信息评价方法,吉林科学技术出版社,1990年 9 月
    [3]杨毅恒,范继璋,夏立显等.多维地学数据处理技术与方法.科学出版社,2002 年
    [4]王世称,杨毅恒,李景朝,夏立显.综合信息矿产资源预测中的定性数据分析方法.吉林大学出版社 1999 年
    [5]焦李成.神经网络系统理论.西安电子科技大学出版,1995 年
    [6]蒋宗礼.人工神经网络导论.高等教育出版社, 2001 年
    [7]闻新,周露,李翔,张宝伟编著.MATLAB 神经网络仿真与应用.科学出版社,2003 年
    [8]张治国.人工神经网络及其在地学中的应用研究.吉林大学博士论文 2006 年
    [9]张学工.关于统计学习理论与支持向量机.自动化学报 ,2000 年 1 月 26(1):32-42.
    [10]高隽,人工神经网络原理及仿真实例.机械工业出版社,2003 年
    [11]茆诗松,王静龙,濮晓龙编著.高等数理统计.高等教育出版,1998 年
    [12]王朝勇.基于 LS - SVM 的个人信用评估.吉林工程技术师范学院学报 (自然科学版 ),2005 年 12 月,21(12):5-8
    [13]李双成,郑度.人工神经网络模型在地学研究中的应用进展.地球科学进展,2003 年 2月,18(1)68-76
    [14] 邓乃扬,田英杰.数据挖掘中的新方法—支持向量机.科学出版社,2004
    [15] 武安绪,李平安,鲁亚军,穆会泳,苏小非,刘学谦等.基于支持向量机的多维地震时间序列建模.东北地震研究,2006 年 12 月,22(4):30-34
    [16] 张凯,李阳,姚军,王子胜.应用支持向量机方法预测砾石充填防砂井产能.石油天然气学报(江汉石油学院学报), 2006 年 12 月.28( 6):120-124
    [17]王莉,林锦国 .支持向量机的发展与应用.石油化工自动化,2006 (3):34-38
    [18] 周伟达,张莉,焦李成.支撑矢量机推广能力分析.电子学报,2001,29(5):590-594
    [19]赖永标.支持向量机在地下工程中的应用研究.山东科技大学,硕士论文,2004 年
    [20]张国云.支持向量机算法及其应用研究.湖南大学,博士论文,2006 年
    [21] 许 建 华 , 张 学 工 , 李 衍 达 . 支 持 向 量 机 的 新 发 展 . 控 制 与 决 策 ,2004 年 5月,19(5):481-493
    [22]阎威武,邵惠鹤.支持向量机和最小二乘支持向量机的比较及应用研究.控制与决策,2003 年 5 月,18(3)
    [23] 李 焕 荣 , 林 健 . 基 于 一 类 分 类 方 法 的 多 类 分 类 研 究 . 数 学 的 实 践 与 认识,2007,37(4):12-20
    [24]苟博,黄贤武.支持向量机多类分类方法.数据采集与处理, 2006 年,21(3):334-339
    [25] 赵洪波,冯夏庭,尹顺德.基于支持向量机的岩体工程分级.岩土力,2002 年 12月,23(6):698-710
    [26]赵洪波.岩爆分类的支持向量机.岩土力学,2005 年 4 月,26(4):642-644
    [27]刘斌,苏宏业,褚健.一种基于最小二乘支持向量机的预测控制算法.控制与决策 2004年 12 月,19(12):1400-1402
    [28]Brierley S D,Chiasson J N,Lee E B,et al.Onstability independent of delay for linearsystems[J].IEEE Transon Automatic Control,1982,27(2):252-254.
    [29]Mahmoud M S,Al-Muthairi N F.Design of robustcontroller for time-delay systems[J].IEEE Trans on Automatic Control,1994,39(8):995-999.
    [30]Cao Y Y,Frank P M.Analysis and synthesis of nonlinear time-delay systems via fuzzy controlapproach[J].IEEE Trans on Fuzzy Systems,2000,8(2):200-211.
    [31]Burges C J C. A tutorial on support vector machines for pattern recognition[J].Data Mining and K nowledge Discovery,1998,2(2):1-43.
    [32]SmolaJ,Scho。Lkopf B. A tutorial on support vectorregression[R].London:University of London,1998.
    [33]陈增照,杨 扬,何秀玲,喻 莹,董才林.基于核聚类的 SVM 多类分类方法.计算机应用,2007 年 1 月,27(1):47-49
    [34]徐振东.人工神经网络的数学模型建立及成矿预测 BP 网络实现.吉林大学硕士论文2004 年 3 月
    [35]朴寿成,刘树田.吉林小石人金矿地球化学异常特征及成矿预测.地质与勘探,2003 年3 月,39(2):26-29
    [36] 黄 勇, 郑 春 颖 , 宋 忠 虎 . 多类 支 持 向 量 机 算 法 综 述 . 计 算 技 术 与 自 动 化2005,24(4):61-63
    [37] 艾 娜,吴作伟,任江华.支持向量机与人工神经网络.山东理工大学学报(自然科学版), 2005 年 9 月,19(5):46-49
    [38]李 卓 ,刘 斌 ,刘铁男 ,朱秀华 ,魏 坤.支持向量机及其在油田生产中的应用.大庆石油学院学报,2005 年 6 月,29(3):76-82
    [39]尤启东,陈月明.基于统计学习理论的高含盐油藏储层渗透率变化预测.油气地质与采收率.2006 年 3 月,13(2):74-77
    [40]朱华平,张德全.区域化探异常的地球化学勘查评价方法技术进展综述.地质与勘探2003 年 5 月 39(3):34-38
    [41] 王朝勇.信用卡管理分析系统的设计与实现.吉林大学硕士论文,2004 年
    [42] Vapnik V N.Estimation of Dependencies Based on Empirical Data. Berlin: Springer – Verlag , 1982
    [43] Vapnik V N.The Nature of Statistical Learning Theory, NY:Springer-Verlag,1995
    [44]Cherkassky V,Mulier F . Learning from Data:Concepts,Theory and Methods. NY:John Viley & Sons,1997
    [45]Vapnik V, Levin E,Le CunY. Measuring the VC-dimension of a learning machine.Neural Computation,1994,6:851-876.
    [46]Burges CJC. A tutorial on support vector machines for pattern recognition. Data Mining and K knowledge Disco-very,19982(2)
    [47]Burges C J C. A tutorial on support vector machines for pattern recogn it ion. Data Mining and Knowledge Disco-very,19982(2)
    [48] CortesC,Vapnik V.Support-vector net works.Machine Learning,1995,20:273-297
    [49]王世称 杨毅恒 严光生 李景朝。全国超大型、大型金矿定量预测方法研究。 地质论评,2000 年 10 月,第46卷(增刊):17-24
    [50]王全明,方一平.地理信息系统中矿产资源评价模型的建立.地质论评, 2000 年 10月,46 卷(增刊):55-58
    [51] 周东岱,叶水盛.基于人工神经网络的航磁信息关联方法的研究.长春科技大学学报,2001 年 4 月,31(2):189-192
    [52] 梁济宇,范继璋.综合信息矿产资源评价数据库构建.吉林地质,2004 年 12月,23(4):132-136
    [53] 程 勖,杨毅恒,陈薇伶.自组织特征映射网络的分析与应用.长春师范学院学报(自然科学版),2005 年 10 月 24(4):55-58
    [54] 左治兴,孙学森,吴超,段丹青 .矿产资源评价评述与展望.矿业工程,第 3 卷 第 6 期 2005 年 12 月,3(6):19-20
    [55] 左治兴,孙学森.矿产资源评价理论与方法评述.中国矿业,2005年10月14(10)30-33
    [56] 董耀松,杨言辰,刘光胜,夏立显.神经网络和特征分析在矿产普查与勘探中的互补性探讨.矿产与地质,20006 年 2 月,20(113):1-6
    [57] 廖桂香,许亚明,付宝霞.青海省金矿综合信息预测.吉林大学学报(地球科学版), 2006 年 1 月,36(1):44-48
    [58] 刘洪,王莉,方浩,夏立显.多层前馈模糊神经网络在儿童综合素质测评中的应用 数学的实践与认识.2006 年 6 月,36(6):170-175
    [59] 张春明,孙豁然,李元辉,柳小波,郭竺翠.设计模式在矿产资源评价专家系统中的应用.金属矿山, 2006 年,362:1-12
    [60] 董耀松,范继璋.矿产勘查新理论与新方法综述.吉林地质,2005年3月,24(1):39-44

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700