时序数据挖掘在经济领域中的应用研究

英文题名：The Application Research of Time Sequence Data Mining in Financial Domain
作者：周强
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：KDD ; 金融交易 ; 时序序列 ; 聚类 ; 小样本
英文关键词：KDD ; financial transaction ; time sequence data ; cluster ; small sample data
学位年度：2005
导师：欧阳一鸣
学科代码：081202
学位授予单位：合肥工业大学
论文提交日期：2005-05-01

摘要

数据库中知识发现(Knowledge Discovery in Database,KDD)结合了数据库技术、人工智能技术以及其它专业领域知识,成为近年来计算机学科研究的热点。目前,KDD研究涵盖了多个方面,在时间序列规则、关联规则、分类规则、聚类规则的发现工作中,取得了较好的效果;在诸如:联机处理分析(OLAP)、数据仓库(DW)等领域的实践工作中,KDD同样得到了广泛应用;此外,随着网络技术的飞速发展,基于WEB的KDD研究也越来越为人们重视。
     本文主要研究内容是针对时间序列数据进行分析挖掘,获得内在规律性,并将其作用于金融时序交易应用之中。
     在金融领域中,存在着大量数据,由于数据量过于庞大,传统处理方法难于发现其中蕴含的知识,迫切需要新知识、新技术来解决这个问题。KDD技术在金融领域应用,主要集中在客户关系分析与管理方面,对交易数据的挖掘还不多见。而实际工作要求有一种工具可以对交易数据进行分析,发现其内在规律性,从而对交易的性质和发展趋势作出判断。
     为此,本文研究了KDD在金融时序数据挖掘中的应用,探索一个合适的模式挖掘方法,设计一个包含挖掘交易模式、分析交易性质并可以预测交易发展趋势的试验系统,以期对KDD在金融领域中应用起到一定的推动作用。
     本文主要研究成果如下:
     首先,提出一个应用系统框架,该框架结合金融领域知识,完成数据预处理、模式生成与评估,以及数据分析预测等功能。使得人们对时序交易数据的内在规律和特点有更深刻的认识。
     其次,针对金融领域中交易数据的时间相关性,结合统计学理论,对时序数据挖掘中的C—均值算法作出了一定改进,使之可以自动地完成模式发现工作。
     再次,在时序数据预测时,结合模式匹配,得出小样本空间,以基于小样本的回归预测方法代替数据曲线趋势判断方法,以期得到更佳的预测结果。
The Knowledge Discovery in Database combined with database technique, the artificial intelligence and other professional domain knowledges, is becoming a hotspot in computer science in recent years. Currently, the KDD research has related to several aspects such as: time sequence rules, association rules, classification rules and clusters rules. All aspects are obtained good results. In practical works, such as online processing analysis, the data warehouse, KDD got the extensive application. With the rapid development of network technique, people attach more and more important to the research of KDD in Web.This disquisition focus on the research of KDD in time sequence data, acquires the inside regulations and use for financial domain.There are a great deal of data in financial domain. The data quantity is so huge that we can not find knowledges by tradition methods. It's need new knowledge and technology to resolve this problem. In financial, KDD is mainly used to analysis the custom relationship management. There hasn't many KDD method to be used in transaction data. In the actual work, it request a kind of tool to analysis transaction data, discover it's inside regulations, thus to judge these data's quality and tendency.This disquisition aims at the KDD application in financial sequent data, discover a fit pattern, and design a testing system to predict the data tendence. In expectation, it can rise a certain impetus in financial realm.The followings are results of our research:First, we proposed an applying system frame that can preprocess data, find and evaluate patterns, analysis and estimate data with financial domain knowledges. It helps people to make deeper understanding for the inside regulations and characteristics in data.Second, aiming at the financial time sequence data, we proposed a method to improve C-mean algorithm. So, it can find patterns automatically.Third, in the phase of forecasting, according with the pattern matching, we can get small sample fields. Instead of judgement which based on curve trend, we can get better estimate result by use the regression function in that small sample field.

引文

[1] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, 2000.
    [2] 史忠植著,《知识发现》,清华大学出版社,2002年出版
    [3] 朱明编著,《数据挖掘》,中国科学技术大学出版社,2002年5月第一版
    [4] 胡学钢,《从数据库中提取知识的模型研究》,合肥工业大学博士学位论文,合肥,2000,5月
    [5] 张焱、欧阳一鸣、王浩、汪曦东,数据挖掘在金融领域中的应用研究,计算机工程与应用,2004,10
    [6] 周强、欧阳一鸣、胡学钢、王浩,数据挖掘中应用偏最小二乘法发现异常值,微电子学与计算机,2005,01
    [7] 张恒喜、郭基联,《小样本多元数据分析方法和应用》,西北工业大学出版社,2002
    [8] 何晓群、刘文卿,《应用回归分析》,中国人民大学出版社,2001
    [9] 郑斌祥、杜秀华、席裕庚,一种时序数据的离群数据挖掘新算法,控制与决策,2002年5月,第17卷第3期:P324—327
    [10] 李爱国、覃征、贺升平,时间序列数据的相似模式抽取,西安交通大学学报,2002年12月,第36卷第12期:P1275—1278
    [11] 覃征、李爱国,时间序列数据的稳健最优分割方法,西安交通大学学报,2003年4月,第34卷第4期:P338—341
    [12] 郭斯羽、吴铁军,一种挖掘相似子趋势的可变递增步长算法,浙江大学学报(工学版),2002年7月,第36卷第4期:P421—424
    [13] 肖晶、黄国兴、赵若韵、黄豫蕾,时间序列的快速相似性搜索改进算法,计算机科学,2003年,第30卷第9期:P97—99
    [14] 虞建飞、张恒喜、赵罡,时间序列泛概念树的模糊生成方法,计算机工程与应用,2003年3月:P193—195
    [15] 蒋良孝、蔡之华,时序数据库中的数据挖掘研究,微机发展,2003年5月,第13卷第5期:P90—92
    [16] 蔡智、岳丽华、王熙法,时序模式发现算法研究,计算机研究与发展,2000年9月,第37卷第9期:P1107—1113
    [17] 黄河、黄柯、杭小树、熊范纶,时间序列中快速模式发现算法的研究,计算机工程与应用,2003年第21期;P192—194
    [18] 郭躬德、王晖、David Bell,时间序列数据分析与预处理,2003年12月,第24卷第12期:P2228—2232
    [19] Edwin Knorr, Roymond Ng. Algorithms for mining dsitances-based outerliers in large database[A]. Proc ofVLDB Conf[C], New York, 1998: P392—403
    [20] P. W. Huang, P. L. Lin, H. Y. Lin. Optimising storage utilization in R-tree dynamic index structures for spatial databases(J). The Journal of System and Software, 2001, Vol55: P291—299
    [22] Guo G. Data reduction and classification based on spatial partitioning[j]. Mini-Micro Systems. 2002, 23: P456—459
    [23] Hellstrom T. Data snooping in the stock marked[j]. Theory of Stochastic Processess 1999, 21: P33—50
    [24] Refei D. On similarity-based queries for time series data[C]. In: Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia, 1999
    [25] 汪荣鑫,《数理统计》,西安交通大学出版社,1986
    [26] 黄梯云、李一军等,《管理信息系统(修订版)》,高等教育出版社,2000年7月,第二版
    [27] 王惠文、刘强、屠永平,偏最小二乘回归内涵分析方法研究,北京航空航天大学学报,2000,26卷4期
    [28] 邓念武、徐晖,单因变量的偏最小二乘回归模型和应用,武汉大学学报(工学版),2001,34卷2期
    [29] 刘则毅等,《科学计算技术与Matlab》,科学出版社,2001
    [30] Stanley B.Lippman,Josee Lajoie著,潘爱民、张丽译,《C++Primer(第三版)》,中国电力出版社,2002年5月,第一版
    [31] Stephen R.Schach著,袁兆山等译,《软件工程——Java语言实现》,机械工业出版,1999年9月第一版
    [32] Olivia Parr Rud著,朱扬勇、左子叶、张忠平等译,《数据挖掘实践》,机械工业出版社,2003年9月,第一版
    [33] 庄楚强、吴亚森编,《应用数理统计基础》第二版,华南理工大学出版社。
    [34] 徐绪松,吴健谋,胡则成,《金融数据分析智能信息处理技术》,科学进步与对策,2000年第17卷第6期
    [35] 俞文彬,谢康林,张忠能,《基于属性分类的数据挖掘方法》,小型微型计算机系统,2000年3月,第21卷第3期,P305—308
    [36] 刘红岩,陈剑,陈国青,《数据挖掘中的数据分类算法综述》,清华大学学报,2002年第42卷第6期,P727—730
    [37] 行小帅,焦李成,《数据挖掘的聚类方法》,电路与系统学报,2003年2月,第8卷第1期,P59—67
    [38] 杨辉,《数据挖掘及其在商业银行中的应用》,中国金融电脑,1998年第11期
    [39] 王大玲,于戈,鲍玉斌,王国仁,《一种面向数据挖掘预处理过程的领域知识的分类及表示》,小型微型计算机系统,2003年5月,第24卷第5期,P863—868
    [40] 肖智,李勇,李昌隆,《一种基于相关分析的数据预处理技术》,重庆大学学报,2002年6月,第25卷第6期,P132—134

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700