基于混合遗传算法的股票价格时间序列分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据预测在金融投资领域占有重要地位,而股票价格由于具有变化幅度大,变化因素多,变化不稳定等特性,因而成为金融数据中最复杂最难预测的数据类型之一。正因为这些因素的原因,其神秘性也引起了广大经济学家的兴趣,很多经济学家一直致力于研究股票市场价格的变化,希望能从中找出一些规律,避免诸如大的股市波动,从而保持经济繁荣稳定。
     时间序列分析是数据挖掘和应用统计学中的一类典型问题,而基因表达式程序设计是一种新的自适应演化算法,该算法已经应用到许多领域中,并且取得了很好的效果。但是由于其具有“容易早熟收敛,陷入局部最优解”的缺陷,因此引入模拟退火的思想将二者融合,设计GEPSAT-STOCK算法用于股票指数建立时间序列模型,并且针对股票数据的特点,选择适应股票规律的GEPSAT-STOCK模型,包括GSAT-G编码模型,适应度值函数选择以及GSAT-CT求解适应度值算法等。
     研究了三个新的算子---保存算子、替换算子,自适应算子,用VC的多线程技术做成可视化界面动态显示其结果,使用000002(万科A)号股票(2007年第一个交易日至2008年5月12日)每天的股票收盘价格作为实验数据,与传统GEP算法得到的数据结果进行对比分析,以此来分析GEPSAT-STOCK算法在该问题应用上的优劣。结果显示,利用GEPSAT-STOCK算法进行预测取得了较好的结果,其预测精度较高,以4d作为嵌入维进行预测时的平均相对误差在1.4%左右。
     然而虽然股票的预测平均误差很小,但是要想精确预测股票价格还是不太可能,因为影响其变化的因素很多,但并不意味着股票时间序列的预测就变得没有任何意义。虽然股票数据的结果不能精确预测,但是可以估计出大致范围,从而为预测股票的走势提供了有利条件。而预测出的股票的走势可以为股票的交易提供一种可靠的理论保证,这样就可以降低金融风险,减少股票交易中资金的损失。通过实验验证,算法在以4d为嵌入维预测时,通过不断优化可以使升降判断的准确率达到85%以上。
Forecast data occupies an important position in the field of financial investment. And one of the most complex financial data is the stock price data which is characterized with wide range of change, huge number of change factors and instability of change and so on. Because of these characteristics, it has attracted the interest of many economists, who have been dedicated to the study of price changes in stock market with an aim to find some disciplines, avoiding large fluctuations of the stock market, and thereby maintaining economic prosperity and stability.
     The analysis time series is the typical of Data Mining and applied statistics. And GEP is a new adaptive evolutionary algorithm, a method which has been applied to many areas, with very good results achieved. However, because of its defects, namely, easy premature convergence and a local optimal solution, simulation annealing is introduced to make the two integrated with each other. GEPSAT-STOCK algorithm is used to establish model of stock index time series. It's necessary to choose suitable GEPSAT-STOCK model, including GSAT-G coding model, fitness function and GSAT-CT solution according to the characteristics of stock data.
     In the course of the study, three new operators are designed-the preservation operator, replaced operator, and the adaptive operator. What’s more, the multi-thread technology of VC is used to provide visual interface systems dynamically demonstrating its results. The closing price of the 000002 (Wanke A) of stock from the first trading day in 2007 to May 12th in 2008 is chosen as the experimental data, which is used to compare the results from traditional method GEP's in order to analyze GEPSAT-STOCK algorithm in the application on the merits of the issue. The result shows that the use of the model established by GEPSAT-STOCK algorithm to forecast achieved better results with higher precision forecast. And the average forecast error is in around 1.4% when 4d is used as embedded dimension to forecast.
     Although the average error of forecast is very small, it is impossible to forecast the stock data accurately since it is under the influence of many factors. But it does not mean that the forecast of the stock time series is insignificant. Though the results of stock data cannot be accurately forecast, it’s possible to estimate the general scope of the stock data so as to provide favorable conditions for predicting the trend of stocks. Therefore, it is likely to provide a reliable theoretical guarantee for stock transaction, and this can reduce the financial risks as well as the losses. By experimental verification, the program using 4d as the embedded dimension for forecasting could result in a 85% degree of accuracy in the judging through continuous optimization.
引文
[1]潘正君,康立山,陈毓屏,演化计算,清华大学出版社.1998
    [2] C.Cortes,K.Fisher,D.Rogers et al.A Language for Extracting Signatures from Data Streams.In:Proc.of the 6th ACM SIGKDD Int.Conf.on Knowledge Discovery and Data Mining.2000:9~17
    [3] D.E.Goldberg,Genetic Algorithms in Search,Optimization,and Machine Learning.Reading,MA:Addison-Wesley,1989
    [4] T.Mitchell:Machine Learning.New York:McGraw-Hill Press,1997
    [5] Witten,I.H.,Frank,E.,Data Mining-Practical Machine Learning Tools and Techniques with Java Implementation.Morgan Kaufmann(2000)
    [6] R.Agrawal,T.Imielinski,and A.Swami.Mining association rules between sets of items in large databases.In Proc.of the ACM SIGMOD Conference on Management of Data,Washington D.C.,USA,May 1993:207-216
    [7] R.Agrawal and R.Srikant.Fast algorithms for mining association rules.In Proc.of the 20th Conference on Very Large Data Bases,Santiago,Chile,September 1994:478-499
    [8] R.J.Miller and Y.Yang.Association rules over interval data.In Proc.of the ACM SIGMOD Conference on Management of Data,Tucson Arizona,USA,May 1997:452-461
    [9] Ferreira,C.Discovery of the Boolean Fuctions to the Best Density-Classification Rules Using Gene Expression Programming.Proceedings of the 4th European Conference on Genetic Programming , Lecture Notes in Computer Science,2002,Vol.2278:51-60
    [10] Chi Zhou,Weimin Xiao,Thomas M.Tirpak,et al.Evolving Accurate and Compact Classification Rules With Gene Expression Programming.IEEETransactions on Evolutionary Computation,Vol.7,NO.6,December 2003:519-531
    [11] Jiawei Han,Micheline Kamber,Data Mining concepts and techniques,China machine press.2001
    [12] J.-S.Park,M.-S Chen,and P.S.Yu.An effective hash based algorithm for mining association rules.In Proc.of the ACM SIGMOD Conference on Management of Data,San Jose,CA,May 1995:175-186
    [13] Lu H,Han J,Feng L.Stock Movement and N-Dimensional Inter-Transaction Association Rules,Proc.of 1998 SIGMOD Workshop on Research Issue Data Mining and Knowledge Discovery(DMKD'98).Seattle,Washington,1998,12:1-12:7
    [14]李元诚,方廷健.基于粗糙集理论的支撑向量机预测方法研究.数据采集与处理,2003,18(2):199-203
    [15] Vapnik V N.The nature of statistical learning theory .New York:Springer,1995:12-38
    [16] J.R.Koza,Genetic Programming.Cambridge,MA:MIT Press.1992
    [17] Ferreira C.2002.Genetic representation and genetic neutrality in gene expression programming.Advances in Complex System,2002,5(4):389-408
    [18] Ferreira C.Function finding and the creation of numerical constants in gene expression programming.the 7th Online World Conference on Soft Computing in Industrial Applications.England:September 23-October 4,2002
    [19] Kirkpatrick S,Gelatt C D,Vecchi M P.Optimization by simulated annealing.Science,1983,220:671~680
    [20]王雪梅,王义和.模拟退火与遗传算法的结合.计算机学报,1997,20(4):381-384
    [21]王凌,郑大钟.领域搜索算法的统一结构和混合优化策略.清华大学学报(自然科学版),2000,40(9):125-128
    [22]蒋思伟,蔡之华,曾丹等.基于模拟退火的并行基因表达式编程算法研究.电子学报,2005,33(11):2017-2021
    [23] Tarek M Nabhan,Albert Y Zom aya.A parallel simulated annealing algorithm with low communication overhead.IEEE Transactions on Parallel and Distributed Systems,1995,6(12):1226-1233
    [24] L.Feng,H.Lu,and J.Han.Beyond intra-transaction association analysis:Minning multi-dimensional inter-transaction association rules.Submitted for publication,February 1998
    [25] M.Kamber,J.Han,and J.Y.Chiang.Metarule-guided mining of multi-dimensional association rules using data cubes.In Proc.of the International Conference on Knowledge Discovery and Data Mining,California,USA,August 1997:207-210
    [26] Ankenbrandt,C.A.An extension to the theory of convergence and a proof the time complexity of genetic algorithms San Francisco:Morgan Kaufm ann Publishers,Inc,1994:53~58
    [27] Zuo Jie,Tang Changjie ,Li Chuan,et al.Time series prediction based on gene expression programming.WAIM04,LNCS(Lecture Notes in Computer Science),Springer Verlag Berling Heidellberg,2004,3129:55~64
    [28] Nguyen Xuan Hoai,Y.Shan,and R.I.McKay,Is Ambiguity is Useful or Problematic for Genetic Programming? A Case Study,in Proceedings of 4th Asia-Pacific Conference on Evolutionary Computation and Simulated Learning(SEAL02),IEEE Press,2002:449-453
    [29] J,J.Freeman,Linear A.Representation for GP using Context Free Grammars,in Proceedings of Genetic Programming 1998,the Third Annual Conference on Genetic Programming,Morgan Kaufmann,1998:72-77
    [30] Candida Ferreira,李曲(译).一种新的适应算法:基因表达式规划,讲稿
    [31] Ferrieira C.Gene expression programming in Probleam Solving.Invited tutorial of the 6th Online World Conference on Soft Computing in Industrial Applications,2001,September:10-24
    [32] Ferrieira,C.Mutation,Transposition,and Recombination:An Analysis of the Evolutionary Dynamic . 4th Internation Workshop on Frontiers in Evolutionary Algorithms,2002:614-617
    [33]吴浩扬,常炳国,朱长纯等.基于模拟退火机制的多种群并行遗传算法.软件学报,2000,11(3):416~420
    [34]郭茂祖,姜俊峰,李静梅.模拟退火算法中冷却调度选取方法的研究.计算机工程,2000.9,26(9):63-66
    [35] Fogel,D.B.An introduction to simulated evolutionary optimization.IEEE Transactions on Neural NetWork,1994,5(1):3~14
    [36] K.A.De Jong,W.M.Spears,and Gordon,D.F.:Using genetic algorithms for concept learning.Machine Learning,1993,13:161-188
    [37] Ferrieira.C.Gene expression programming:A new adaptive algorithm for solving problems.Complex Systems,2001,13(2):87-129
    [38] Li Kangshun,Li Yuanxiang,Mo Haifang,et al.A new algorithm of evolving artificial neural networks via gene expression programming.Journal of the Korea Society for Industrial and Applied Mathematics,2005,9(2):83~90
    [39] Zuo Jie,Tang Changjie and Zhang Tianqing."Mining Predicate Association Rule by Gene Expression Programming " , WAIM02 ( International Conference for Web Information Age 2002).LNCS(Lecture Notes In Computer science),2002,Vol.2419:92-103
    [40] Duan Lei,Tang ChangJie,Zuo Jie,et al.An Anti-Noise Method for Function Mining Based GEP.Journal of Computer Research and Development,2004,41(10):1684~1689
    [41] H.S.Lopes,W.R Weinert,2004.EGIPSYS:An Enhanced Gene Expression Programming Approach for Symbolic Regression Problems.International Journal of Applied Mathmatics and Computer Science,14(3):375-384
    [42] Zhuli Xie,Xin Li,Barbara Di Eugenio,Weimin Xiao,Thomas M.Tirpak,and Peter C.Nelson,Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization.In Proceedings of the 20th International Conference on Computational Linguistics,Geneva,Switzerland,August 23-27,2004
    [43] Radcliffe,N.J.,Surry,P.D.Fundamental limitations on search algorithms:evolutionary computing in perspective In:Van Leeuwen,J.,ed Computer Science Today:Recent Trends and Developments Berlin:Springer-Verlag,1995.274~291
    [44]廖勇,唐常杰,元昌安等.基于基因表达式编程的股票指数时间序列分析.四川大学学报(自然科学版),2005,42(5):931-936
    [45] H.Yorick and W.-H.Steeb,2002.Gene Expression Programming and One-dimensional chaotic maps.International Journal of Modern Physics C,13(1):25-30
    [46] H.S.Lopes,W.R Weinert,.A Gene Expression Programming System for Time Series Modeling.In:Proceedings of XXV lberian Latin American Congress on Computational Methods in Engineering(CILAMCE),10-12/november,2004
    [47]陈宇,唐常杰,钟义啸等.一个基于基因表达式编程的时间序列预测新方法.四川大学学报,2005:128-141
    [48]黄晓东,唐常杰,李智等.基于基因表达式编程挖掘函数关系.软件学报,2004,15:97-106
    [49] Asoh,H.,Muhlenbein,H.On the mean convergence time of evolutionaryalgorithms without selection and mutation.In:Davidor,Y.,Schwefel,H.,Manner,R.,eds Parallel Problem Solving from Nature,PPSN III Berlin:Springer-Verlag,1994:88~97
    [50] Jiang Dazhi,Wu Zhijian,Kang Lishan,et al.New method jsed in Gene Expression Programming:GRCM.Journal of System Simulation,2006,18(6):1446-168
    [51] Schradoph N.N,Belew R K.Dynamic parameter'Encoding for genetic algrothms.Machine Learning,1992,9(1):9-21
    [52] Lester Ingber,Bruce Rosen.Genetic algorithms and very fast simulated annealing:A comparison.Mathematical Computer Modeling,1992,16(11):87-100
    [53] Gunter Deck,Tobias Scheuer.Threshold accepting:A general purpose optimization algorithm appearing superior to simulated annealing.Journal of Computation Physics,1990,90(1):161-175
    [54]戴晓军,李敏强,寇纪淞.遗传算法的性能分析研究.软件学报,2001,12(5):742-750

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700