基于关联规则的股票时间序列趋势预测研究

英文题名：A Study on Prediction of Stock Time Series Trend Based on Association Rules
作者：闭英权
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：关联规则 ; 时间序列 ; 成交量 ; 双事务
英文关键词：Association Rule ; Time Series ; trading volume ; Dual transaction
学位年度：2008
导师：秦亮曦
学科代码：081202
学位授予单位：广西大学
论文提交日期：2008-06-01
答辩委员会主席：罗海鹏

摘要

中国股市从2006年1月份的1100多点升至2007年10月份的6000多点,升幅巨大,在世界也是罕见的。虽然蕴涵着巨大的投资机会,但却也使部分投资者亏损累累。随着经济的发展,股票市场越来越受到人们的重视,其在经济体系中也发挥着越来越重要的作用。而股市的健康发展和繁荣也成为管理者和投资者关心和研究的重点。股票投资的收益与风险往往是成正比的,即投资收益越高,可能冒的风险越大。因此,股市预测方法的研究具有极其重要的应用价值和理论意义。传统的技术分析和基本面分析也各有各自优缺点,而我国的股市正日益成熟和规范,投资者在进行投资决策时也愈加趋于理性化。本文试图应用数据挖掘的办法来在股票分析中,帮助投资者获取更多的股市关联信息以加强对某些个股的分析和判断。因为目前世界上尚无很好的预测股市方法,目前可以运用许多统计分析方法来发现一些隐藏在股票信息中的规律,本文的工作也就立足在关联原则基础上进行分析,以帮助投资者对股票进行预测。
     在整个数据挖掘的研究中,算法的研究占有特别重要的地位。数据挖掘面对的是大量数据集,算法的效率起到决定性的作用,因此,研究和改进现有的算法,有着十分重要的意义。鉴于此,本文对关联规则挖掘算法进行了研究。
     首先对股票的基础知识作了简单地概括,对数据挖掘作了一般性介绍,包括数据挖掘的概念、模式、挖掘的主要问题、数据挖掘系统的分类以及应用和发展趋势。然后,对数据挖掘中重要的关联规则挖掘算法做了深入的研究,分析了关联规则中经典的Apriori算法、AprioriTid算法和Apriori算法的在股票数据的改进算法,总结了算法中存在的问题;接着,详细介绍了本文内容的重点之一,基于成交量和二维时间模式下的双事务股票时间序列关联的研究的一种OptimizedApriori算法的改进算法。为了更好地挖掘股市信息,就必须结合股市的特点,特别是股票自身的运作规律,股票的走势包含了数以万计人的思维和智慧,必须通过详细和耐心的观察才能学之一二。经过长期学习、跟踪股市及模拟演练,挖掘出在股市中存在的这样一些带有时间约束的规律——在某个时间段w(如一天)内,如果股票A的收盘价上涨超过2%,成交量大于vol_min(某个设定的阈值),那么间隔DAY个时间段(如两天)后的那个时间段(即第三天)内股票B和股票C会以80%的可能性也上涨(或下跌)。
     最后在Microsoft Visual C++6.0环境下完成了对股票数据的处理、算法的改进及挖掘工作。实验验证了改进的OptimizedApriori算法的效率在一定程度上优于Apriori算法;同时挖掘出了大量关联规则,其中一些颇具指导意义。
The Stock index of china has been grow up from about 1100 to 6000 since From January 2006 to June 2007 . This situation is rare in the world. Although it contain huge change of rise from bottom, but it also make most of the investor to the great bad. With the economic growth and the conversion of people's investmentconsciousness, the stock has become a more and more important part of economy. The investment in stock has become one of focuses of public topic. How to keep the development and boom of stock market is becoming the emphasis of concern and research of manager and investor. The proceeds of stock investment always equal to the risk. That means the good proceeds is based on the poor risk of failure. Therefore the study of stock prediction method has great application value and theoretical significance. Traditional technical analysis and fundamental analysis also have their respective advantages and disadvantages .our stock market is becoming more mature day by day. This paper try to find a way to apply data mining technology into stock analysis to help investor get more information of the stock and enhance the analysis and judgement for some share, because there has no trusty way to predict stock market. Nowadays, we can use lots of statistical method of analysis to discover some concealed rules in stock information, thereby help investors to analyze and forecast the stock.
     Exploration of algorithms plays an important role in all Data Mining research. Data Mining faces large database. The efficiency of algorithms is the most important, so it is very significant to research and improve the existing algorithms. Based on above, this thesis mainly studies the algorithms of association rule mining.
     Firstly, it generally introduces Data Mining and the basis of the stock of knowledge, including the concepts and the patterns, main mining problems, system classifications, and the application and development trend. Secondly, this thesis researches the Association Rule Algorithm totally, which is important in Data Mining. It analyses the classical algorithms that are Apriori,AprioriTid and the improved algorithms of Apriori of the stock data. It summarizes existing problems in these algorithms. Then this thesis presents an improved OptimizedApriori algorithm - Dual transaction analysis in Stock Time Series Association Rules Based on trading volume and Two-dimensional time mode, which is one of the key contents. In order to discovery the stock market information well, we must combine stock market characteristic, especially operational rules of stock itself. The movement of stock includes thinking and wisdom of tens of thousands of people. We want to study it only through detailed and patient observation. By a long time studying and tracking stock market and simulated operation to look for some rulus of time constraint as follows: if the closing price of stock A is going up to 2% and its trading volume is greater than vol_min(a preset Threshold) in a time——segment W (suck as one day),then those of stock B and C will also rise (or descent) in 80% probability in the time——segment(that is the third day) just after INT_DAY time——segments(such as two days).
     Finaly the disposal of stock data, the improvement of algorithm and mining were completed under VC++6.0 platform. The experiments show that the eficiency of the improved OptimizedApriori algorithm was superior to OptimizedApriori algorithm to a certain extent. And a lot of association rules were extracted, some of them have fine instructional significance.

引文

[1]、史忠植著知识发现,清华大学出版社,2002.
    [2]、Juan P.Caraca-Valente,Ignacio Lopez-Chavarrias.Discovering Similar Patterns in Time Series.KDD2000.200
    [3]、Martin T.Hagan(美)等著,戴葵等译:《神经网络设计》,机械工业出版社,2000
    [4]、A.N.Refenes.Stock Performance Modeling Using Neural Networks:A Comparative Study with Regression Models.Neural Networks.1997,7(2)。
    [5]、郑中发.基于连续双正交样条小波的股价指数奇异点检测.科技情报开发与经济》2006年16卷20期
    [6]、徐洁磐著数据仓库与决策支持系统科学出版社 2005
    [7]、赵艳厂.数据挖掘中聚类算法研究与仿真[博士论文].北京:北京邮电大学,2003.
    [8]、普运伟,金炜东,朱明,胡来招.核模糊C均值算法的聚类有效性研究[J].计算机科学,2007,(02).
    [9]、Bradley,P.S.,Fayyad,U.and Reina,C.,Scaling clustering algorithms to large databases[A].In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining.1998:9-15.
    [10]、张保稳.时间序列数据挖掘研究[博士论文].西安:西北工业大学,2002.
    [11]、Lu H,Han J,and Feng L.Stock movement and n-dimensional inter-transaction association rules.In Proc.of the SICMOD Workshop on Research.Issues on Data Mining and Knowledge Discovery,1998
    [12]、Lu H,Feng L,Han 1.Beyond Intra-Transaction Association Analysis:Mining Multi-Dimensional Inter-Transaction association rules.ACM Transactions on Information Systems,2000,18(4):423-454.
    [13]、R.Agrawal,T.Imielinski,and A.Swami,Mining.association rules between sets of items in large databases,In:Proc.SIGMOD' 93,Washington,D.C:ACM Press,1993
    [14]、M.Homa and A.Swami,Set-oriented mining of association rules,In:Research Report RJ 9567,California:BM Almaden Research Center,1993
    [15]、R.Agrawal and R.Srikant,Fast Algorithms for mining association rules,In:VLDB'94,San Francisco:Morgan Kaufmann Publishers,1994
    [16]、Rakesh Sgawal and Ramakrishnan Srikant,Fast Algorithms for mining Association Rules in Large Databases,Proceedings of the Twentieth International Conference on Very Large Databases,Santiago,Chile,1994:487-499
    [17]、R.Agarwal,C.Aggarwal and V V V Prasad,A tree projection algorithm for generation of frequent itemsets,hi:J.Parallel and Distributed Computing,Seatle:ACM Press,2000
    [18]、J.Han,J.Pei,and Y.Yin,Mining frequent patterns without candidate generation,In:SIGMOD' 00,Dallas:ACM Press,2000
    [19]、杨健兵.数据挖掘中关联规则的改进算法及其实现[A],微计算机信息,2006,7-3:195- 197.
    [20]、Lu H,Han J,and Feng L.Stock movement and n-dimensional inter-transaction association rules.In Proc.of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,1998
    [21]、Lu H,Feng L,Han J.Beyond Intra-Transaction Association Analysis:Mining Multi-Dimensional Inter-Transaction association rules.ACM Transactions on Information Systems,2000,18(4):423-454.
    [22]、董泽坤,李辉,史忠植.多元时间序列中跨事务关联规则分析的高效处理算法[J].计算机科学,2004,31(3):108-111.
    [23]、秦亮曦,刘新峰,史忠植.基于片段模式的多时间序列关联分析[J],计算机科学,2006,33(1):232-235.
    [24]、许兆新,郝燕玲,约束在序列模式挖掘中的应用研究,计算机工程与应用,2004年第5期
    [25]、王猛,王景光,李延霞,相关性分析在关联规则挖掘中的应用,统计与信息论坛,Vol.19,No.1,2004
    [26]、朱红蕾李明,关联规则挖掘的维护算法研究,微机发展,Vol.14,No.2,2004
    [27]、李清峰,杨路明,张晓峰,关联规则中最大频繁项目集的研究,计算机应用研究,2005年,第1期
    [28]、R.Agrawal and R.Srikant.Fast algorithms for mining association rules.In Proces of the 20" Conference on Very Large Data Bases,1994
    [29]、Jiawei Han,Micheline Kamber,数据挖掘概念与技术,第一版,2001年8月
    [30]、秦吉胜,宋瀚涛.关联规则挖掘AprioriHybird算法的研究和改进.计算机工程,17:7-9,2004
    [31]、陈文庆,许棠.关联规则挖掘Apriori算法的改进与实现.微机发展,8(15):155-157,2005
    [32]、Rakesh Arawal,Ramakrishnan Srikant.Fast Algorithm for Mining Association Rules.Proceedings of 20~(th)Int.Conf.Very Large Data Bases(VLDB),Jorge B.Bocca,matthias Jarke and Carlo Zaniolo,eds.Morgan Kaufmann Press,1999.487-499
    [33]、M.J.Zaki,S.Parthasarathy and W.Li.,A localized algorithm for parallel association mining,9th Annual ACM Symposium on Parallel Algorithms andArchitectures,Newport,Rhode Island,June 28-29,1997.
    [34]、H.Toivonen,Sampling Large Databases for Association Rules,Proceedings of the 22nd International Conference On Very Large Databases(TLDB 96),Bombay,India,1996,Morgan Kaufmann Publisher,134-145,1996.
    [35]、S.Brin,R.Motwani,J.D.Ullman and S.Tsur,Dynamic Itemset counting and implication rules for market basket data,Proceedings of the 1997 ACM-SIGMOD International Conference On Management of Data(SIGMOD'97),Tucson,Arizona,1997,ACM Press Publisher,255-264,1997.
    [36]、马盈仓.挖掘关联规则中Apriori算法的改进.计算机工程与软件,11(21):82-84,2004
    [37]、郑朝霞,刘廷建.关联规则在股票分析中的应用.成都大学学报(自然科学版),Vol,21,No.4 2002.
    [38]、孙成钢股市预测之实战--十年二十倍.中国科学技术出版社 2003.8
    [39]、李雨青立体操盘绝技海南出版社 2007.10
    [40]、童明余,董景荣.沪深股市股票价格与交易量关系的实证研究.重庆师范大学学报(哲学社会科学版),2005年第4期77-81
    [41]、Sehwert G William.Stock volatility and the crash of' 8rEc].The Review of Financial Studies,1990,3(1):77-102.
    [42]、Karpoff Jonathan M.The relation between price changes and trading volume:A survey Journal of Financial and Quantitative Analysis,1987t 22:109-126.
    [43]、Gailaat A Ronatdt Peter E Rossit George Tanchen.Stock prices and volume[C].The Review of Financial Studies,1992.(2):199-242.
    [44]、Chordia Tarun,Bhaskaran Swaminatban.Trading volume and crossautocorrelations in stock market[J].Working Paper,Yanderbilt University.
    [45]、吴冲锋,吴文锋.基于成交量的股价序列分析。系统工程理论方法应用。Vol.10 No.12001第一期 123-125
    [46]、余国合我国股市量价规律实证分析。统计与与信息论坛Vol.18 No.16 2003-11 64-66
    [47]、刘永利,李双成,杨桂华中国股票市场成交量与价格波动关系。2007-3第28卷第2期65-71
    [48]、张智翔,陈静.彼得林奇的股票投资艺术中国时代经济出版社2005-10-8
    [49]、周佛郎零风险博奕理论丛书中国财政经济出版社 2006-6-10

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700