Exploiting Financial News and Social Media Opinions for Stock Market Analysis using MCMC Bayesian Inference
详细信息    查看全文
  • 作者:Manolis Maragoudakis ; Dimitrios Serpanos
  • 关键词:Stock return forecasting ; Data mining ; Hierarchical Bayesian methods ; Trading strategies
  • 刊名:Computational Economics
  • 出版年:2016
  • 出版时间:April 2016
  • 年:2016
  • 卷:47
  • 期:4
  • 页码:589-622
  • 全文大小:3,537 KB
  • 参考文献:Atsalakis, G., & Valavanis, K. (2009). Surveying stock market forecasting techniques - Part II: Soft computing methods. Expert Systems with Applications, 36, 5932–5941.CrossRef
    Bebarta, D. K., Biswal, B., & Dash, P. K. (2012). Comparative study of stock market forecasting using different functional link artificial neural networks. International Journal of Data Analysis Techniques and Strategies, 4(4), 398–427.CrossRef
    Bettman, J. L., Sault, S. J., & Schultz, E. L. (2009). Fundamental and technical analysis: Substitutes or complements. Accounting and Finance: ACCOUNT FINANC, 49(1), 21–36.CrossRef
    Bi, J., Bennett, K., Embrechts, M., Breneman, C., & Song, M. (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Resources, 3(Mar), 1229–1243.
    Bilson, C. M., Brailsford, T. J., & Hooper, V. J. (2001). Selecting macroeconomic variables as explanatory factors of emerging stock market returns. Pacific-Basin Finance Journal, 9(4), 401–426.CrossRef
    Bollen, J., Mao, H., & Pepe, A. (2010). Determining the public mood state by analysis of microblogging posts. In Proceedings of the Alife XII Conference, Odense, Denmark. MIT Press.
    Brank, J., Grobelnik, M., Milic-Frayling, N., & Mladenic, D. (2002). Feature selection using support vector machines. Proceedings of the 3rd international conference on data mining methods and databases for engineering, finance, and other fields. September 2002, Bologna, Italy.
    Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRef
    Chan, Y., & John-Wei, K. C. (1996). Political risk and stock price volatility: The case of Hong-Kong. Pacific-Basin Finance Journal, 4(2–3), 259–275.CrossRef
    Chandra, D. K., Ravi, V., & Ravisankar, P. (2010). Support vector machine and wavelet neural network hybrid: Application to bankruptcy prediction in banks. International Journal of Data Mining, Modelling and Management, 2(1), 1–21.CrossRef
    Chen, N. (1991). Financial investment opportunities and the macroeconomy. The Journal of Finance, 46(2), 529–554.CrossRef
    Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Proceedings of 5th conference on artificial intelligence and statistics (pp. 112–128). Fort Lauderdale, FL.
    Cho, V. (1999). Knowledge discovery from distributed and textual data. Hong Kong: Dissertation Hong Kong University of Science and Technology.CrossRef
    Chung, F., Fu, T. Luk, R. & Ng, V. (2002). Evolutionary time series segmentation for stock data mining, In Proceedings of IEEE international conference on data mining, pp. 83–91. Larnaca.
    Clark, T. E., & McCracken, M. W. (2013). Testing for unconditional predictive ability. In G. Elliott & A. Timmermann (Eds.), Handbook-of-economic-forecasting (Vol. 2). North-Holland: Elsevier.
    Falinouss, P. (2007). Stock trend prediction using news articles: A text mining approach, Master’s Thesis, Lulea University of Technology.
    Fama, Eugene. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25, 383–417.CrossRef
    Fellbaum, Christiane (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
    Fong, K., Holden, C., & Trzcinka, C. (2011). What are the best liquidity proxies for global research?. Available at SSRN: http://​ssrn.​com/​abstract=​1558447
    Friedman, N., & Koller, D. (2003). Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50, 95–126.CrossRef
    Fung, G.P.C., Yu, J.X., & Lam, W. (2003). Stock prediction: Integrating text mining approach using real-time news. In Proceedings IEEE international conference on computational intelligence for financial engineering (pp. 395–402). Hong Kong.
    Heckerman, D. (1999). A tutorial on learning with Bayesian networks. In M. Jordan (Ed.), Learning in graphical models. Cambridge: MIT Press.
    Huang, W., Nakamori, Y., & Wang, S.-Y. (2005). Forecasting stock market movement direction with support vector machine. Computer and Operations Research, 32, 2513–2522.CrossRef
    Jayech, S., & Zina, N. B. (2012). Measuring Financial contagion in the stock markets using a copula approach. International Journal of Data Analysis Techniques and Strategies, 4(2), 154–180.CrossRef
    Klibanoff, P., Laymont, O., & Wizman, T. A. (1998). Investor reaction to Salient news in closed-end country funds. Journal of Finance, 53(2), 673–699.CrossRef
    Kumar, D. A., & Ravi, V. (2008). Predicting credit card customer churn in banks using data mining. International Journal of Data Analysis Techniques and Strategies, 1(1), 4–28.CrossRef
    Liu, J. S. (2001). Monte Carlo strategies in scientific computing. Heidelberg: Springer.
    Liu, Y., Huang, X., An, A., & Yu, X. (2007). ARSA: a sentiment-aware model for predicting sales performance using blogs. New York, NY: ACM.CrossRef
    Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. The Review of Financial Studies, 1(1), 41–66.CrossRef
    Lunn, A., Thomas, G., Best, H., & Spiegelhalter, D. (2000). WinBUGS - A Bayesian modeling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.CrossRef
    Lyras, D., Sgarbas, K., & Fakotakis, D. (2007). Using the Levenshtein edit distance for automatic lemmatization: A case study for modern greek and english, 19th IEEE international conference on tools with artificial intelligence (Vol. 2, pp. 428–435). Patras.
    Mitchell, M. L., & Mulherin, J. H. (2002). The impact of public information on the stock market. Journal of Finance, 49(3), 923–950.CrossRef
    Mittermayer, M.A. (2004). Forecasting intraday stock price trends with text mining techniques. In Proceedings of the 37th annual Hawaii international conference on system sciences (HICS). IEEE Computer Society (vol. 3(3), 30064.2.) Washington, DC.
    Ng, A., & Fu, A.W. (2003). Mining frequent episodes for relating financial events and stock trends. In Proceedings of the 7th Pacific-Asia conference on advances in knowledge discovery and data mining, lectures notes in computer science (vol. 2637, pp. 27–39). Seoul.
    Nummelin, E. (2004). General irreducible Markov chains and non-negative operators. Cambridge: Cambridge University Press. 1984.
    Nunez-Letamendia, L., Pacheco, J., & Casado, S. (2011). Applying genetic algorithms to wall street. International Journal of Data Mining, Modelling and Management, 3(4), 319–340.CrossRef
    Oyatoye, E. O., & Arilesere, W. O. (2012). A non-linear programming model for insurance company investment portfolio management in nigeria. International Journal of Data Analysis Techniques and Strategies, 4(1), 83–100.CrossRef
    Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks for plausible inference. San Mateo: Morgan Kaufmann.
    Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge: Cambridge University Press.
    Preis, T., Moat, H.S. & Stanley, H.E. (2013). Quantifying trading behavior in financial markets using google trends, Scientific Reports, 3.
    Ram, R., Chetty, M. (2008). Constraint minimization for efficient modeling of gene regulatory network. In: M. Chetty, A. Ngom, S. Ahmad (Eds.) PRIB 2008. LNCS (LNBI) (vol. 5265, pp. 201–213) Heidelberg:Springer.
    Sehgal, V., & Song, C. (2007). SOPS: Stock prediction using web sentiment. Proceedings of the 7th IEEE international conference on data mining workshops. Los Alamitos, CA.
    Shumaker, R.P., & Chen, H. (2006). Textual analysis of stock market prediction using financial news articles, On the 12th American conference on information systems (AMCIS).
    Technical-analysis. The trader’ s glossary of technical terms and topics. (2005). http://​www.​traders.​com
    Thomas, J.D., & Sycara, K. (2000). Integrating genetic algorithms and text learning for financial prediction. In: Proceedings GECCO-2000 workshop on data mining with evolutionary algorithms (pp. 72–75). Las Vegas.
    Vasu, M., & Ravi, V. (2011). A hybrid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance. International Journal of Data Mining, Modelling and Management, 3(1), 75–105.CrossRef
    West, K. D. (2006). Forecast evaluation. In G. Elliott, C. W. J. Granger, & A. Timmermann (Eds.), Handbook of-economic-forecasting (Vol. 1). North-Holland: Elsevier.
    Wuthrich, B., Cho, V., Leung, S., Peramunetilleke, D., & Sankaran, K. (1998). Daily prediction of major stock indices from textual WWW data. In J. Zhang, W. Lam (Eds.) Proceedings 4th ACM SIGKDD international conference on knowledge discovery and data mining (pp 364–368). New York.
    Xidonas, P., Ergazakis, E., Ergazakis, K., Metaxiotis, K., & Psarras, J. (2009). Evaluating corporate performance within the frame of the expert systems technology. International Journal of Data Mining, Modelling and Management, 1(3), 261–290.CrossRef
    Yao, J., Tan, C. L., & Poh, H. (1999). Neural networks for technical analysis: A study on KLCI. International Journal of Theoretical and Applied Finance, 2(2), 221–241.CrossRef
  • 作者单位:Manolis Maragoudakis (1)
    Dimitrios Serpanos (2)

    1. Department of Information and Communication Systems Engineering, University of the Aegean, Samos, Greece
    2. Qatar Computing Research Institute (QCRI), Doha, Qatar
  • 刊物类别:Business and Economics
  • 刊物主题:Economics
    Economic Theory
  • 出版者:Springer Netherlands
  • ISSN:1572-9974
文摘
Stock market analysis by using Information and Communication Technology methods is a dynamic and volatile domain. Over the past years, there has been an increasing focus on the development of modeling tools, especially when the expected outcomes appear to yield significant profits to the investors’ portfolios. In alignment with modern globalized economy, the available resources are becoming gradually more plentiful, thus difficult to be analyzed by standard statistical tools. Thus far, there have been a number of research papers that emphasize solely in past data from stock bond prices and other technical indicators. Nevertheless, throughout recent studies, prediction is also based on textual information, based on the logical assumption that the course of a stock price can also be affected by news articles and perhaps by public opinions, as posted on various Web 2.0 platforms. Despite the recent advances in Natural Language Processing and Data Mining, when data tend to grow both in number of records and attributes, numerous mining algorithms face significant difficulties, resulting in poor forecast ability. The aim of this study is to propose a potential answer to the problem, by considering a Markov Chain Monte Carlo Bayesian Inference approach, which estimates conditional probability distributions in structures obtained from a Tree-Augmented Naïve Bayes algorithm. The novelty of this study is based on the fact that technical analysis contains the event and not the cause of the change, while textual data may interpret that cause. The paper takes into account a large number of technical indices, accompanied with features that are extracted by a text mining methodology, from financial news articles and opinions posted in different social media platforms. Previous research has demonstrated that due to the high-dimensionality and sparseness of such data, the majority of widespread Data Mining algorithms suffer from either convergence or accuracy problems. Results acquired from the experimental phase, including a virtual trading experiment, are promising. Certainly, as it is tedious for a human investor to read all daily news concerning a company and other financial information, a prediction system that could analyze such textual resources and find relations with price movement at future time frames is valuable.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700