Exploiting Financial News and Social Media Opinions for Stock Market Analysis using MCMC Bayesian Inference
详细信息   
摘要
Stock market analysis by using Information and Communication Technology methods is a dynamic and volatile domain. Over the past years, there has been an increasing focus on the development of modeling tools, especially when the expected outcomes appear to yield significant profits to the investors’ portfolios. In alignment with modern globalized economy, the available resources are becoming gradually more plentiful, thus difficult to be analyzed by standard statistical tools. Thus far, there have been a number of research papers that emphasize solely in past data from stock bond prices and other technical indicators. Nevertheless, throughout recent studies, prediction is also based on textual information, based on the logical assumption that the course of a stock price can also be affected by news articles and perhaps by public opinions, as posted on various Web 2.0 platforms. Despite the recent advances in Natural Language Processing and Data Mining, when data tend to grow both in number of records and attributes, numerous mining algorithms face significant difficulties, resulting in poor forecast ability. The aim of this study is to propose a potential answer to the problem, by considering a Markov Chain Monte Carlo Bayesian Inference approach, which estimates conditional probability distributions in structures obtained from a Tree-Augmented Naïve Bayes algorithm. The novelty of this study is based on the fact that technical analysis contains the event and not the cause of the change, while textual data may interpret that cause. The paper takes into account a large number of technical indices, accompanied with features that are extracted by a text mining methodology, from financial news articles and opinions posted in different social media platforms. Previous research has demonstrated that due to the high-dimensionality and sparseness of such data, the majority of widespread Data Mining algorithms suffer from either convergence or accuracy problems. Results acquired from the experimental phase, including a virtual trading experiment, are promising. Certainly, as it is tedious for a human investor to read all daily news concerning a company and other financial information, a prediction system that could analyze such textual resources and find relations with price movement at future time frames is valuable.