具有时间约束的股票序列模型及采掘算法研究

英文题名：The Model of Stock Sequence with Time Constraint and the Research of Mining Algorithm
作者：龚惠群
论文级别：硕士
学科专业名称：管理科学与工程
中文关键词：数据挖掘 ; 股票 ; 序列规则 ; 时间约束 ; 采掘算法
英文关键词：Data mining ; Stock ; Sequence rule ; Time constraint ; Mining algorithm
学位年度：2003
导师：彭江平
学科代码：1201
学位授予单位：湖南大学
论文提交日期：2003-10-08

摘要

随着市场经济的发展，我国的股市正日益成熟和规范，投资者在进行投资决策时也愈加趋于理性化。目前可以运用许多统计分析方法来发现一些隐藏在股票信息中的规律，以帮助投资者对股票进行分析和预测。
     然而，常用的这些统计分析方法无法发现出在股市中存在的这样一些带有时间约束的规律——在某个时间段W(如一天)内，如果股票A的收盘价上涨超过5％，那么间隔INT个时间段(如两天)后的那个时间段(即第三天)内股票B和股票C会以80％的可能性也上涨(或下跌)。因此，本文采用一种目前正在发展的新技术——数据挖掘技术来发现股市中存在的这类复杂的序列规则。这类具有时间段W和时间间隔INT两维约束的序列规则的挖掘无疑对于指导投资决策具有重要的意义。
     本文主要有三个创新点。其中第一个创新点是在本文中建立了两个具有时间约束的股票序列模式挖掘模型：带有确定的时间段W约束的一维模型和带有确定的时间段W及时间间隔INT约束的二维模型。第二个创新点则是通过对关联规则的Apriori算法和FP_Growth算法进行扩展来实现一维股票序列规则的采掘。至于第三个创新之处就是通过设计一个全新的算法来实现二维股票序列规则的挖掘。在本文的最后一章通过一个实证研究对本文所提算法的可行性进行了验证。
     本文一共分为四个部分：第一部分介绍了传统的股票分析方法及数据挖掘技术的基本概念；第二部分则建立了两个具有时间约束的股票序列模式挖掘模型；第三部分就对具有时间约束条件的股票序列规则采掘的一维和二维算法进行了实现，并且扩展讨论了在分布式环境下进行这类序列规则的挖掘所需注意的几个问题；最后一部分则进行了一个实证研究来对本文所提出算法的正确性进行验证。
With the developing of market-directed economy, our stock market is becoming more mature and standardize day by day, and the investor's decision is more rational. Nowadays, we can use lots of statistical method of analysis to discover some concealed rules in stock information, thereby help investors to analyze and forecast the stock.
    However, these common statistical method of analysis can't be used to find out the rule with time constraint in stock market as follows, if the closing price of stock A is going up to 5% in a time-segment W (suck as one day), then those of stock B and C will also rise (or descent) in 80% probability in the time- segment (that is the third day) just after INT time-segments (such as two days). Therefore, in this paper a new developing technique-data mining (DM) is adopted to look for these compound sequence rules in stock market. No doubt, the mining of the sequence rule with two dimensions-time constraints has very important meaning in guiding investment decision.
    In this paper, there are three innovations. The first innovation is that we construct two stock sequence rule models with time constraint: the stock' sequence rule model of one dimension with certain time-segment (represented by W) constraint and the stock' sequence rule model of two dimensions with W and time-interval (represented by INT) constraints. And the second innovation is that we bring about the mining of the stock' sequence rule of one dimension through extending the association rule algorithm - Apriori algorithm and FP_Growth algorithm. So as to the third innovation is designing a new algorithm to mine the stock' sequence rule of two dimensions. And we also validate the feasibility of the algorithms given by the paper through a positive research in the last chapter of the paper.
    This paper includes four parts. In the first part we introduce some basic concepts of the technology of data mining and the traditional analytic methods of stock. In the second part we establish two models for mining the stock' sequential scheme with time constraint. Then we accomplish the mining algorithm of stock' sequence rule of one dimension and two dimensions with time constraint, and we also extendly discuss the problems that should be paid attention to in order to achieve the sequence rule in a distributed system in the third part. In the last part, we make a positive research to verify the correctness of the algorithms given by the paper.

引文

[1] 贝政新，陈瑛．证券投资通论．上海：复旦大学出版社，1998
    [2] 安妮．股票投资技术分析方法与应用．深圳：海天出版社，1995
    [3] 甘仞初．动态数据的统计分析．北京：北京理工大学出版社，1991
    [4] 黄百清．从概率统计看中国股市波动规律．泉州师范学院(自然科学)，2001，19(6)：10～12
    [5] 闫冀楠，张维．关于上海股市收益分布的实证研究．系统工程，1998，16(1)：21～25
    [6] 周利，白思俊．统计方法在股市投资风险分析中的应用．河南大学学报(自然科学版)，2000，30(2)：53～55
    [7] 冯予，陈萍．非线性时间序列分析在股市行情预测中的应用．南京理工大学学报，1998，22(1)：82～85
    [8] 王培勋．非线性回归的弹性分析在股票投资与行情预测中的应用．系统工程理论与实践，2000，8：100～104
    [9] 李民，邹捷中，李俊平等．用ARMA模型预测深沪股市．长沙铁道学院学报，2000，18(1)：78～84
    [10] 温素彬．我国股市波动的ARCH模型分析．淮海工学院学报，2002，11(2)：64～67
    [11] 岳朝龙，王琳．股票价格的灰色——马尔柯夫预测．系统工程，1999，17(6)：54～59
    [12] 李从铁，姜铁军，单秀珍等．统计学在证券期货市场中的应用(Ⅲ)．数理统计与管理，2000，19(3)：56～60
    [13] 李学伟，关忠良．经济数据分析预测学．北京：中国铁道出版社，1998
    [14] 张莞庭，方开泰．多元统计分析引论．北京：科学出版社，1997
    [15] Fayyad. From Data Mining to Knowledge Discovery: An Overview, Advances in knowledge Discovery and Data Mining[J]. In: Advances in Knowledge Discovery and Data Mining. Calif: AAAI/MIT Press, 1996
    [16] Han J, Kambr M. Data Mining: Concepts and Techniques. Beijing: Academic Press, 2000
    [17] 黄超．具有多维限定性约束条件的交易规则模型及采掘算法研究：[硕士学位论文]．湖南大学：会计学院，2002


    [18] 吉根林，孙志挥．一种基于可信度最优的数量关联规则挖掘算法．东南大学学报(自然科学版)，2001，31(2)：31～34
    [19] Bayardo Jr R J, Agrawal R. Mining the most interesting rules. In: Proc. of KDD-99. San Diego: AAAI/MIT Press, 1999
    [20] Silberschatz. What Makes Patterns Interesting in Knowledge Discovery Systems[J].In:IEEE transactions on Knowledge and Engineering. Calif:IEEE Computer Society Press, 1996
    [21] 中国人民大学统计学系数据挖掘中心．统计学与数据挖掘．统计与信息论坛，2002，17(51)：4～9
    [22] 行智国．统计学与数据挖掘的比较分析．统计教育，2002，6：6～8
    [23] Keogh E. Fast Similarity Search in the Presence of Longitudinal Scaling in Time Series Databases. In: Proceedings of the 9th International Conference on Tools with Artificial Intelligence. Calif:IEEE Press, 1997
    [24] 龚惠群，黄超．商业销售分析中具有时间约束的交易规则模型及实现．计算机应用与软件，2003，20：22～24
    [25] 胡侃．基于大型数据仓库的数据采掘：研究综述．软件学报，1998，9：5～8
    [26] 梁意文，曹霞．关联规则的启发式发现方法．计算机工程与应用，2000，12：105～108
    [27] 左万利，刘居红．任意多表间关联规则的并行挖掘．吉林大学自然科学学报，1999，4：15～18
    [28] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In:Proc. SIGMOD' 93. Washington, D.C.: ACM Press, 1993
    [29] M. Homa and A. Swami. Set-oriented mining of association rules. In: Research Report RJ 9567. California: BM Almaden Research Center, 1993
    [30] R.Agrawal and R. Srikant. Fast algorithms for mining association rules. In: VLDB'94. San Francisco: Morgan Kaufmann Publishers, 1994
    [31] R. Agarwal, C. Aggarwal, and V.V.V. Prasad. A tree projection algorithm for generation of frequent itemsets. In: J. Parallel and Distributed Computing. Seattle:ACM Press, 2000


    [32] J. Han, J. Pei, and Y. Yin. Minging frequent patterns without candidate generation. In: SIGMOD' OO. Dallas: ACM Press, 2000
    [33] R. Agrawal and R. Srikant. Mining sequential patterns. In: ICDE' 95. Taipei: IEEE Computer Society Press, 1995
    [34] J. Han, J. Pei. FreeSpan: Frequent pattern-projected sequential pattern mining. In: KDD' OO. Boston: AAAI/MIT Press, 2000
    [35] R. Srikant, R. Agrawal. Minging quantitative rules in large ratiosnal table. In: Proc. of the ACM SIGMOD Conference on Management of Data. New York: ACM Press, 1996
    [36] H. Lu, J. Han, and L. Feng. Stock movement and n-dimensional inter-transaction association rules. In: Proc. SIGMOD' 98 (12) Seattle: ACM Press, 1998
    [37] M. Garofalakis,R. Rastoki. Spirit: sequential pattern mining with regular expression constraints. In: VLDB' 99. Edinburgh: Morgan Kaufmann Publishers, 1999
    [38] Y. Chen, G. Dong, J. Han. Multi-dimensional regression analysis of time-series data steams. In: VLDB' 2002. Hong Kong: Morgan Kaufmann Publishers, 2002
    [39] 龚惠群，黄超，彭江平．具有双时间维约束的股票序列模式挖掘算法研究．拟发于计算机工程，2003，12
    [40] Cheung D W, Han J, Ng Vet al.Maintenance of discovered association rules in large databases: An incremental updating technique. In: Proceeding of 1996 International Conference on Data Engineering. New Orleans: IEEE Computer Society Press, 1996
    [41] 陆丽娜，陈亚平．挖掘关联规则的优化处理．计算机工程与应用，2000，8：108～112
    [42] 于晨捷，袁晓法，马涛．数据挖掘中趋势模型的建立与分析．计算机工程与应用，2002，8：198～200
    [43] H.Mannila, H. Toivonen, and A. I. Verkamo. Discovering Frequent Episodes in Sequences. In: Proc. 1st Int' l Conf. Knowledge Discovery Databases and Data Mining. Menlo Park: AAAI Press, 1995
    [44] 李晋晋，徐洁磐，陈栋．面向属性的数据库知识发现[A]．见：第14届全

    国数据库学术会议论文集[C]．成都：四川联合大学出版社，1997
    [45] R. Agrawal and R. Srikant. Mining Sequential Patterns. In: Proc. 11th Int'1 Conf. Data Eng. Los Alamitos: IEEE CS Press, 1995
    [46] 陆建江．数据库中布尔型及广义模糊型加权关联规则的挖掘．系统工程理论与实践，2000(2)：28～32
    [47] 方依兰，黄智兴．股票信息的数据挖掘．西南师范大学学报(自然科学版)，2000，25(2)：138～142
    [48] 吴煲宁，汪晓刚．一种基于模糊集的时间序列挖掘算法的设计与实现．计算机工程与应用，2002(20)：196～198
    [49] 陆丽娜，陈亚萍．挖掘关联规则中Aptiori算法的研究．小型微型计算机系统，2000，21(9)：940～943
    [50] 周斌，吴泉源．序列模式挖掘的一种渐进算法．计算机学报，1999，22(8)：882～887
    [51] H. Toivonen. Sampling Large Databases for Association Rules. In: Proc. 22nd Int' 1 Conf. Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1996
    [52] J.S. Park, M. S. Chen, and S. Y. Philip. An Effective Hash-Based Algorithm for Mining Association Rules. In: Proc. ACM SIGMOD Int' 1 Conf. Management of Data. New York: ACM Press, 1995
    [53] 王新亮，黄仁．一种关联规则挖掘的新算法——逆向分解算法．计算机应用，2002，22(10)：15～17
    [54] S. Brim Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proc. ACM SIGMOD Int' 1 Conf. Management of Data. New York: ACM Press, 1997
    [55] 刘君强，孙晓莹．直接挖掘跨层关联规则的新方法．计算机工程与应用，2002(20)：50～51
    [56] 李雄飞，董立岩．基于相联规则的数据挖掘理论．吉林工业大学自然科学学报，2002，30(2)：43～46
    [57] 叶中行，顾立庭．混合认知系统及其在股市分析上的应用．上海交通大学学报，1995，29(2)：92～99
    [58] 肖利，张宜红．挖掘序列模式的模型研究．计算机科学，1998，(专刊)：135～136


    [59] 陆丽娜，伍卫国，刘隆国．分布式操作系统．北京：电子工业出版社，1999
    [60] 史忠植．知识发现．北京：清华大学出版社，2002
    [61] 丁祥武．序列模式的一种模型及其挖掘．中南民族学院学报(自然科学版)，1999，18(2)：44～48

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700