基于时间序列的关联规则数据挖掘在证券中的应用

英文题名：The Applications in Stock Market of the Association Rules Data Mining Based on Time Series
作者：叶翔
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 关联规则 ; 时间序列 ; 股票 ; 比特串
英文关键词：data mining ; association rules ; time series ; stock ; bit strings
学位年度：2012
导师：程从从
学科代码：081203
学位授予单位：南昌大学
论文提交日期：2012-06-05
答辩委员会主席：储珺

摘要

股票的趋势研究一直是股民关心的问题,研究的方法有很多。本文使用了数据挖掘的一个重要分支,关联规则方法来挖掘股票间的联动关系,统计3只股票带时序上涨的情况在过去的某个时间段中出现的次数。如果出现次数多,也就是支持度和置信度都较大,那么当“股票A第Ta天上涨且股票B第Tb天也上涨”的情况出现时,可以考虑在Tc天买入第三只股票。这里的Ta,Tb和Tc可取任意值。得出的规则可以用来辅助股票投资。
     本文的主要研究内容包括以下几个方面：
     (1)对国内关于股票方面的关联规则挖掘的相关文献进行分析和总结,对股票挖掘过程进行了深入探讨,对挖掘过程中的数据预处理、算法、关联规则兴趣度这三个方面已有的一些改进方法进行了概括和评价。
     (2)针对人们希望看到的“股票A在Ta当天上涨且股票B在Tb当天上涨,则股票C在Tc当天上涨,支持度是X%,置信度是Y%”这样的规则(Ta     算法一是基于时间窗口的关联规则挖掘算法,在股票时间序列上定义一个时间窗口,在时间窗口内循环查找2项集和3项集,通过时间窗口的移动寻找全部的2项集和3项集；
     算法二是bit-search算法,引入了比特串的概念,用比特串来表示连续的时间序列上股票的上涨信息。而比特串便于移位操作和逻辑运算,可较大的简化股票间的运算；也减少了需要的内存空间。根据结果的串的支持度计数是否满足最小支持度计数阈值,就能得到所需要的频繁2项集和3项集。
     (3)介绍了股票关联规则挖掘流程的设计以及对股票数据的预处理过程,进而将bit-search算法运用到股票数据挖掘中,并生成股票间联动规则。
The researches of stock market behavior have always been issues of common concern to investors, and the research methods can be varied. This paper mainly used the association rules, an important branch of data mining, as a tool to mining the linkage between stocks, to count the occurrences of rising interval of three stocks statistically in the past within a certain time.
     If the occurrence rate was high, that is to say, the approval rate and confidence level was high, then when the situation as "A stock raises price on the date of Ta, B stock raises price on the date of Tb" occur, the investors could consider to buy a third stock on the date of Tc. The rule got by this way can be used to assist stock investment.
     The main content of this paper include the following aspects:
     Analyzing and summarizing the domestic literature which is about mining association rules on stock. The author conduct in-depth exploration into stock mining process, moreover the author conclude and evaluate some of the existing improved methods for data preprocessing, algorithm, association rules interestingness within the mining process.
     The investors would like to know the rules as "A stock raises price on the date of Ta, B stock raises price on the date of Tb. and from this we can get the information that C stock would raise price on the date of Tc.(Ta     Algorithm one is the algorithm of mining association rules based on time window. That is, applying a time window on the stock time series, and loop searching the two sets and three sets in time window, by moving the time window, find all of the two sets and three sets.
     Algorithm two is bit-search algorithm, introducing the concept of bit string to present the rising stock information within a continuous time series. As the bit strings is easy to carry on the shift operations and logic operations, it can greatly simplify the calculation of stock information and reduce the memory space needed. According to the result, whether the approval rate counts of strings meet the condition of minimum approval rate counts threshold or not, we can get the frequent two sets and three sets that we needed.
     (3) The design of stock association rule to mining model design and stock data preprocessing are introduced, and then the author puts bit-search algorithm into the application of stock data mining and conclude interaction rules between the stocks.

引文

[1]王玉梅.关联规则算法在股票分析预测中的应用研究[D].华北电力大学(保定)：华北电力大学,2008.
    [2]柴明亮,宋苏.关联规则在股票分析中的应用[J].计算机应用,2005,25(4).
    [3]龚惠群,黄超,彭江平.具有双时间维约束的股票序列模式挖掘[J].计算机工程,2003,29(20).
    [4]闭英权,秦亮曦.基于两种方式的股票时间序列关联的研究[J].微计算机信息,2008,24(36).
    [5]柴明亮.关联规则在时间序列数据挖掘中的应用[D].北京工业大学,2006.
    [6]吴小珍.关联规则挖掘在股票预测中的应用研究[D].南昌大学.2008.
    [7]董泽坤,李辉,史忠植.多元时间序列中跨事务关联规则分析的高效处理算法[J].计算机科学,2004,31(3).
    [8]汪廷华.程从从.一种元规则指导的股票联动关联规则挖掘算法[J].计算机工程,2006,32(5).
    [9]董伟娟.股票价格波动关联规则的实证研究[D].中央财经大学,2010.
    [10]徐晓晓,黄林鹏,顾锡康.关联规则挖掘在证券业中的应用[J].计算机工程,2004.30(z1).
    [11]张巍.基于时间参数的股票连动分析[D].华南理工大学,2005.
    [12]闭英权.基于关联规则的股票时间序列趋势预测研究[D].广西大学.2008.
    [13]李文文.数据挖掘技术在证券行业中的应用研究[D].大连交通大学.2009.
    [14]徐海鹏.基于关联规则的股票预测方法研究[j].计算机与数字工程.2010,38(3).
    [15]吴佳英,李平,郑金华,李少年.基于兴趣度的时态关联规则挖掘算法[J].计算机工程与应用.2006.42(36).
    [16]Jiawei Han, Micheline Kamber.数据挖掘：概念与技术[M].机械工业出版社,2001.
    [17]Saravanan, M.S. A simple process model generation using a new association rule mining algorithm and clustering approach[C]. International Conference on Advanced Computing, 265-269.2011.
    [18]Narmadha, D.; NaveenSundar, G.; Geetha, S. A novel approach to prune mined association rules in large databases[C]. International Conference on Electronics Computer Technology,409-413.vol.5.2011
    [19]Ramasubbareddy, B. Mining positive and negative association rules[C]. International Conference on Computer Science and Education,1403-1406,2010
    [20]Becerra, D. An association rule based approach for biological sequence feature classification [C]. IEEE Congress on Evolutionary Computation,3111-3118,2009.
    [21]Namik, A.F.; Othman, Z.A. Reducing network intrusion detection association rules using Chi-Squared pruning technique[C]. International Conference on Data Mining and Optimization,122-127,2011
    [22]Chrysos, G.; Dagritzikos, P.; Papaefstathiou, I. Novel and Highly Efficient Reconfigurable Implementation of Data Mining Classification Tree[C]. International Conference on Field Programmable Logic and Applications,411-416,2011
    [23]Shahrokhi, N.; Dehzad, R.; Sahami, S. Targeting customers with data mining techniques: Classification[C]. International Conference on User Science and Engineering,212-215, 2011
    [24]Kotecha, R.; Ukani, V.; Garg, S. An empirical analysis of multiclass classification techniques in data mining[C]. Nirma University International Conference on Engineering,1-5,2011
    [25]Odukoya, O.H.; Aderounmu, G.A.; Adagunodo, E.R. An Improved Data Clustering Algorithm for Mining Web Documents[C]. International Conference on Computational Intelligence and Software Engineering,1-8,2010
    [26]Shalini, D.V.S.; Shashi, M.; Sowjanya, A.M. Mining frequent patterns of stock data using hybrid clustering[C]. Annual IEEE India Conference,1-4,2011
    [27]Yao-Tang Yu; Chien-Chang Hsu. A structured ontology construction by using data clustering and pattern tree mining[C]. International Conference on Machine Learning and Cybernetics,45-50,vol.1,2011
    [28]Nagwani, N.K.; Bhansali, A. A Data Mining Model to Predict Software Bug Complexity Using Bug Estimation and Clustering[C]. International Conference on Recent Trends in Information, Telecommunication and Computing,13-17,2010
    [29]Handra, S.I.; Ciocarlie, H. Anomaly detection in data mining. Hybrid approach between filtering-and-refinement and DBSCAN[C].6th IEEE International Symposium on Applied Computational Intelligence and Informatics,75-83,2011
    [30]Rezgui, J.; Cherkaoui, S. Detecting faulty and malicious vehicles using rule-based communications data mining[C]. IEEE 36th Conference on Local Computer Networks.827-834,2011
    [31]Xinguang, Tian; Miyi, Duan; Chunlai, Sun. Detecting network intrusions by data mining and variable-length sequence pattern matching[J]. Journal of Systems Engineering and Electronics,405-411,vol.20,2009
    [32]Ming Xue; Changjun Zhu. Applied Research on Data Mining Algorithm in Network Intrusion Detection[C]. International Joint Conference on tificial Intelligence,275-277, 2009
    [33]林殉,李志蜀,周勇.时间序列序列模式的相似性研究[J].计算机科学.2011,28(9)：245-247
    [34]王晓锋.时间序列数据挖掘在医疗领域的应用[J].软件导刊.2011,10(5)：]23—124.
    [35]陈晓云,吴本昌,韩海涛.基于多维时间序列挖掘的降雨天气模型研究[J].计算机工程与设计.2010,31(4)：898—902.
    [36]Jiawei Han, Micheline Kamber.数据挖掘：概念与技术.北京：机械工业出版社,2001
    [37]Agrawal R, Imielinski T, Swami A. Mining Association Rules Between Sets of Items in Large Databases[C] Proceedings of the 1993 ACM SIGMOD Conference.Washington D.C:[s.n.],1993.207-216.
    [38]Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules[C]. Proceedings of the 20th Very Large Data Bases(VLDB'94)Conference.Santiago, Chile:[s.n.],1994:478-499.
    [39]Agrawal R,Srikant R.Fast Algorithm for Mining Association Rules in Large Databases[R].San Jose,CA:IBM Almaden Research Center,1994.
    [40]Park J S,Chen M S,Yu P S.An effective hash based algorithm for mining association rules[C] In:Proc.1995 ACM SIGMOD.San Jose,CA:[s.n.],1995:175-186.
    [41]Han J, Pei J. Mining Frequent Patterns Without Candidate Generation[C]. Proceeding of the 2000 ACM-SIGMOD International Conference on Management of Data(SIGMOD'2000) Dallas TX 2000,1-12.
    [42]Toivonen H.Sampling Large Databases for Association Rules[C]. Proceeding of 1996 International Conference on Very Large Databases(VLDB'96).Bumbay,India:Morgan Kaufmann,1996:134-145
    [43]Savasere A,Omieeinski E.Navathe S.An efficient algorithm for mining association rules in large databases[C]. Proceedings of the 21st International Conference on Very Large Databases.Zurich.Switzerland:[s.n.],1995:432-443.
    [44]Cheung D W,Ng V.Fu W C.Effieient mining of association rules in distributed databases [J].IEEE Transactions on Knowledge and Data Engineering,1996,8(1):910-921.
    [45]于宁锴.货币供应量、股票成交量与股市价格：基于中国2004-2010年月度数据的实证研究[J].人文杂志,2011.5：74—77.
    [46]赵慧娟,张玉倩,马斌.中国股市量价关系的实证分析[J].中国经贸.2010.6：86—87.
    [47]佟孟华,吴成明.基于CARR模型的上海股市量价关系实证研究[J].辽宁工程技术大学学报(社会科学版).2009,11(1)：16—19.
    [48]郑志姣.深市股价变动与成交量关系的实证研究[J].现代商业,2009,(3)：26—27.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700