If all you have is a hammer… natural language processing in financial markets

Posted by Joachim Gassen - Nov 25, 2018

The amount of corporate information is increasing exponentially and most of it is non-numerical data, such as texts, images and video. Regulatory innovations in the area of financial and non-financial reporting require corporations to provide rich information not only on their financial activities but also on their corporate governance, as well as their environmental and their social activities. Information provided by financial analysts, the media but also by users of social networks add to the mix.

How does this new informational landscape shape corporate transparency? Are financial markets still capable to siphon through all this data, pricing firms correctly? How do institutional investors deal with these questions? Is regulatory intervention needed? These where the motivating questions for the Center for Financial Reporting and Auditing at ESMT Berlin to organize a one-day workshop on natural language processing in financial markets.

Building on two key notes, a panel discussion and three research paper presentations the short answer to the above questions is “We don’t know yet”. While Steven Young from Lancaster University clearly communicated the fascinating potential of exploring textual data via statistical methods such as machine learning, he also stressed that these methods of automated data processing complement but will never fully replace careful reading of reports by humans. Also, he indicated that the literature on this topic is still somewhat in its infancy. While methodological researchers develop more and more advanced tools, they tend to ignore the conceptual and institutional detail of the analyzed texts. The applied researchers, however, often stick to simple “bag of words” approaches and thus do not use the richness of the data to its full extent.

The presentation by Nicolas Pröllochs from University of Oxford, having a methodological focus, was a perfect example for this observation. Nicholas introduced a refined training method for sentiment analysis at the sentence level, a task that is almost impossible with traditional approaches. He presented a use case based on ad hoc filings by German firms and the discussion of his findings indicated that including more information about the nature of these regulatory filings would have most likely improved the predictive ability of the model.

The presentations by Peter Wysocki (Boston University) and Beatriz García Osma (Universidad Carlos III de Madrid) stressed the importance of balancing information from textual analysis with the remaining information environment. Peter talked about the link between text and numbers. As financial reporting texts are anchored on numbers it is surprising that prior literature has rarely studied this link. The fine balance of numbers and narratives in financial texts seems to be very important when assessing content, sentiment and market effects of the texts. Beatriz presented evidence high-lighting that conventional U.S. based natural language processing studies tend to ignore a sizable fraction of the market as they require linking accounting and stock market data with textual disclosures. This sample selection process is systematically affecting prior findings of the literature. Beatriz and the audience of the workshop were questioning what characterizes these firms that are largely overlooked by prior literature.

Given that the academic speakers clearly identified the potential of natural language processing but also significant shortcomings of the current academic literature in this field, the workshop audience was interested to hear how the investment community assesses the usefulness of natural language processing and related research for investment-related decisions. Ryan LaFond (Deputy CIO, Algert Global LLC) was well positioned to discuss these issues as he has been working in academia for several years before successfully transitioning to the investment industry. Ryan indicated that the investment community uses natural language processing for generating trading signals but that their methods tend to differ from the approaches that are being used in the academic literature. He emphasized that the key success factors are data quality and the analysts detailed understanding of it. Related to that, he criticized that a substantial body of the existing capital market’s empirical literature would not replicate out of sample and would thus be of little use for practical investment purposes. The challenge from an investment viewpoint would be to identify high quality information signals that reliably add to the quantitative information already available. In a slightly tongue-in-cheek note he concluded that “the only players that currently make reliable money out of NLP are the data vendors”.

In the concluding panel discussion, the presenters and the audience briefly touched on practical issues of regulation and on how to deal with these methodological innovations from a corporate perspective. It was generally agreed that the key regulatory role is to establish open and low-friction access to financial and non-financial disclosures. From a corporate and consulting perspective, participants where questioning where a strategic response from corporations and information intermediaries would lead us. If these start to optimize their disclosures and reports with machine learning algorithms in mind, it remains to be seen how this interacts with algorithmic outcomes and, ultimately, with the attractiveness of corporate disclosures for the human reader.

BTW: The CFRA at ESMT is regularly hosting accounting related events that are gernerally open to the public. If you are interested in receeiving invitations, please send an email to: cfra@esmt.org.