文摘
Investors are often said to be driven by emotions, and studies in sentiment analysis claim that there is a causal relationship between negative affect in text and prices in financial markets. The text collections used in these studies tend to be of varying sizes and sources, with little justification of their design criteria. This is a classic data engineering problem, which requires specification of the data sources and design of the data repositories and retrieval facilities. In this paper, we explore the statistical properties of negative affect expressed in various textual corpora, differing in specification, size and provenance. The question we ask is whether there are any stylized facts of negative affect that are universal across all texts. We observed two main findings: (1) The frequency distribution of negative terms is generally stable across different corpus sizes and (2) The frequency of negative terms accounts for a relatively small proportion of the total terms in the corpus.