In this article, we’re going to explore Natural Language Processing / NLP applications in finance.
Just a quick recap though, recall that we said that NLP is just a set of techniques which help us gain insight from text data.
NLP Applications in Finance
NLP applications in finance are quite wide in their scope. However, we can broadly categorise them into three different types, including:
- Compliance, and
- Quantitative Analysis
Let’s now explore NLP Applications in Finance within each of these three categories.
NLP Applications in Context
Applications in Context are quite extensive in and of themselves. We will discuss 4 key areas, but remember that there’s a whole other world of applications out there.
Exploring what text “moves the market”
Applications of NLP in Context include, for instance, exploring what type of text moves the financial market.
For example, what sort of text causes the prices of stocks to increase, or decrease?
Back in 2004, Antweiler and Frank explored whether or not the messages on internet investment forums actually matter. You’ve probably come across these forums on “the internet”.
If you haven’t come across them personally, you might well have heard of these forums where you usually have a lot of people participating; many claiming to be “gurus” or the sort of “Gods of finance” if you like.
The ones who claim to know how to predict stocks for instance. Or predict the movement of the market day to day.
Antweiler and Frank (2004) looked to study whether those forum messages actually mattered. So when people say XYZ stock is going to increase tomorrow; and if you’ve got enough people saying that, does that actually cause the price of XYZ Plc to increase?
What they found, arguably unsurprisingly is that the messages on the forums don’t quite matter! They don’t have any impact as far as the returns of securities go.
This obviously negates the case with the GME stock for instance. However, the GME short squeeze is the exception rather than the rule.
What is interesting, however, is that they found some evidence to show that the internet investment forums do have some sort of an effect on volatility. That is, on the risk of securities. They can in fact, increase the volatility of securities.
Other studies have looked at the impact of news on securities. This ranges from exploring good news versus bad news, or macro economic news versus firm-level microeconomic news, for example.
But what they ultimately try to do is evaluate what sort of text has an impact on either securities individually, collectively, or at a macro level.
This Article features concepts that are covered extensively in our course on Investment Analysis with Natural Language Processing (NLP).
If you’re interested in learning how to leverage the power of text data for investment analysis while working with real world data, you should definitely check out the course.
Extracting themes / topics from large samples
The other area of Context includes extracting themes or topics from news articles.
In a relatively recent study, Bybee et al. (2019) took about 800,000 articles from the Wall Street Journal and applied Topic Modelling (or unsupervised machine learning algorithms) to try and deduce themes or topics from within those news articles.
As a result of that , they went from 800,000 newspaper articles – where it’s difficult to figure out what’s going on – to about 180 unique themes or topics.
And thanks to that, we can kind of get a feel for what the news articles are talking about or discussing every single month.
So in this case, it’s really about using to bring some sort of method into the madness; or order into the chaos.
Classifying firms into relevant industries dynamically
The other application of Context includes classifying firms into relevant industries, dynamically.
Traditionally, firms are usually classified into one industry, and they tend to remain in that industry indefinitely.
And this approach has existed for about a century, if not longer.
The problem of course, is that the pace of change and the pace of development is significantly faster now than it was say, a hundred years ago.
And we’ve moved away from the times where firms operated in just one single industry. Today, firms tend operate in multiple industries.
The case for multiple industries
Think about a firm like Coca-Cola and Pepsi. You would think that they operate in the beverages industry. But actually, they oftentimes also have food products like crisps (or other unhealthy snacks).
And so they’re not just in the drinks or beverages industry; they’re also in the food industry.
Now, of course, you could argue that they are just in the food and beverage industry. They’re not in some sort of automobile industry; and you’d be right!
But if you now think about a firm like say, Apple. Or Google (now Alphabet) for instance. Then of course they operate in the tech industry, but they also operate in hardware.
So they create phones, MacBooks, and notebooks, and laptops; Apple stores and Google play stores… we could go on. But the point is that it’s not one single industry.
And arguably more importantly, industries can change over time.
And so what Hoberg and Phillips (2010, 2016) did was to look at the business descriptions of firms and the use that text to try and classify firms into industries more dynamically.
The idea is that firms are talking about their industries in their business descriptions or annual reports, which are available every year.
And if you were to look at the text inside those annual reports or business descriptions, over a period of time, then you can get a time varying industry classification.
Of course this is only really possible because we’re now able to analyse these vast volumes of text data programmatically.
Measuring technological innovation
Thanks advancements in NLP, it’s not limited to just classification of industries. We can also look at measuring technological innovation.
We can see for instance, whether or not firms patents are value relevant. For instance, do they actually increase the value of the firm?
And if we’re trying to measure technological innovation, we don’t have to just restrict ourselves to looking at the amount firms are spending on R&D. Because that has quite a few accounting issues, depending on which Accounting Standard the firm’s using.
We can dig deeper and get a much richer measure of technological innovation be leveraging the power of text data. See Kelly et al., 2016; Balsmeier et al., 2018; and Chen et al., 2019 for examples).
Hopefully that gives you some sort of inclination for NLP applications in finance, for context.
Do bear in mind that we’re just about scratching the surface here. And there are of course, several other NLP Applications in Finance for Context.
But let’s move on for now though, and look at the applications for compliance.
NLP Applications in Compliance
Applications in Compliance, at least today, appear to be more prevalent in the ‘real world’ or in the practitioner world. This is particularly prevalent within large firms.
There’s not a lot of academic research that’s gone into this, relative to the amount that’s gone into say, Context.
Of course there’s always research in compliance as well. But the ones we found interesting particularly are those involving the detection and prevention of insider trading, as well as fraud detection / reducing fraud and internal threats. And meeting regulatory requirements.
In fact, this last part (meeting regulatory requirements) is also looking at applying Natural Language Generation, and not just Natural Language Processing.
Natural Language Generation (as the name sort of suggests) involves generating texts programmatically. In other words, it involves using artificial intelligence to create text automatically..
You might well have heard of instances where the artificial intelligence is creating news articles or writing out stories automatically for instance.
And so we’re probably not too far away from artificial intelligence or AI writing out annual reports automatically. Or even other regulatory text documents for example.
But for the most part, at least today, the applications for Compliance are focused on detecting and preventing insider trading, or breaches of compliance, including fraud and internal threats.
We’d encourage you to take a look at this “reg tech” firm Behavox, which is doing some pretty interesting things by applying NLP in Compliance for a variety of / firms for example.
For now though, let’s explore applications in quantitative analysis.
Want to go further?
Get the Investment Analysis with NLP Study Pack (for FREE!).
NLP Applications in Quantitative Analysis
Similar to applications in Context, there’s a wealth of research exploring applications in quantitative analysis. We explore and survey 3 core areas.
Hedging / reducing risk
Applications here have included identifying and then hedging or reducing risks, including economic shocks, uncertainty, and climate change. See for example, Birz et al. (2011); Alexopouls et al. (2015); Manela and Moreira (2017); Caporle et al. (2018); and Engle et al. (2020).
While all of these studies are looking at different types of risks and reducing these different types of risks, they’re all united in their approach of working with text data.
And in using some state-of-the-art NLP techniques to identify and quantify these risks.
For the most part, when people are looking to quantify these macro level risks using text data, they tend to work with news articles as the source of the text data.
And of course that’s intuitive.
Because news is likely your best source of what’s going on in the world. It’s not the only source by any means; but it’s certainly one of the more rich and diverse sources of texts / that are out there.
Apart from looking at news articles and estimating macro level risks, people have also looked at measuring things like the readability and indeed usefulness of disclosure.
Quantifying readability and usefulness of disclosure
Firms all around the world spend so much time, effort and money, and indeed other resources to create these reports. Be they annual reports, or quarterly reports, or corporate social responsibility (CSR) reports.
There’s so much information that firms publish.
In a related article, we saw that the average annual report, at least in 2013 at about 42,000 words. And that number is not getting any smaller.
But is it actually useful?
It is interesting to at least explore whether all of these large volumes of text are actually meaningful.
And it turns out, actually, they are!
Because what the research has found is that firms that disclose more – so firms that provide more information or are more transparent with their information, or the volume of information that they disclose – those firms tend to perform better.
Investors naturally tend to like firms that disclose more, or are more transparent.
It turns out that those firms also tend to be more profitable compared to firms that disclose less.
And so this is just another example of how, thanks to NLP, we’re able to firstly, measure something like readability. And then link it to something like profitability to analyse whether firms that disclose more, or have more readable financial statements, are more profitable vis-a-vis those firms that disclose less (or have less readable financial statements or annual reports).
But it doesn’t just stop at that. Because we can actually create trading strategies using sentiment analysis for instance.
Creating trading strategies
The research has found that sentiment can matter. Things like the tone in which firms communicate in their annual reports, as well as in their conference / earnings calls. So whether they’re communicating more positively or more negatively, or with greater uncertainty, etc.
The idea with sentiment analysis based trading strategies is firstly, to measure the sentiment and then use that measure of sentiment to create trading strategies.
Of course, as with any trading strategy, it’s crucial to statistically test and validate the core investment thesis. Failure to do this can of course be catastrophic. And that’s one of the main reasons we dive deep into precisely how you can create, test, and validate your investment thesis / idea. Especially while working with unstructured data / text data, in this course.
But as far as this article goes, hopefully you’ve had a fairly good insight into the NLP applications in Finance in Context, Compliance, and Quantitative Analysis.
Importantly, NLP Applications in Finance are still relatively new. Furthermore, we’re really only just getting started.
We’re therefore particularly pleased that you’re here; joining us on this journey of working with text data in finance. And using NLP in finance, because it is in fact a fascinating new world, or a new realm within finance.
It allows us to explore some fascinating questions, including for instance, whether “positive” firms outperform “negative” firms. Or whether firms with “less uncertainty” are less risky compared to firms with “more uncertainty”.
2 Parts to NLP Questions in Finance
There’s always two parts to these questions.
There’s the implicit or subtle part, which is, well, firstly, how do you define a “positive firm” or a “negative firm”, or a “less uncertain” or “more uncertain” firm?
And then there’s the empirical finance question or the investment analysis question, which is, do these firms outperform the other ones?
So do positive firms perform negative ones? Do firms with less uncertainty outperform those with more uncertainty? Do firms’ Corporate Social Responsibility (CSR) policies matter?
Can we create some sort of a trading strategy based on firms’ CSR policies? Or even more broadly, do narcissistic CEOs outperform their modest counterparts?
Now, we’re not going to be able to answer all of these questions here, or even in several articles for that matter.
Because the questions are quite diverse and extensive; and they can require working with completely different sets of data.
But as diverse as all of these questions are, the methodologies used in exploring and answering these questions rigorously are in fact quite similar and fairly standardised.
In other words, the overarching approach and the principles are fairly consistent, even though the questions being posed are significantly different.
All right. Hopefully that’s inspired you to keep an open mind and look out for more NLP Applications in Finance.
Do you want to build a rigorous investment analysis system that leverages the power of text data with Python?
Alexopoulos, M., Cohen, J., 2015. The power of print: Uncertainty shocks, markets, and the economy. International Review of Economics & Finance40, 8–28. https://doi.org/10.1016/j.iref.2015.02.002
Antweiler, W., Frank, M.Z., 2004. Is all that talk just noise? The information content of internet stock message boards. The Journal of finance 59, 1259–1294.
Balsmeier, B., Assaf, M., Chesebro, T., Fierro, G., Johnson, K., Johnson, S., Li, G.-C., Lück, S., O’Reagan, D., Yeh, B., others, 2018. Machine learning and natural language processing on the patent corpus: Data, tools, and new measures. Journal of Economics & Management Strategy 27, 535–553.
Birz, G., Lott Jr, J.R., 2011. The effect of macroeconomic news on stock returns: New evidence from newspaper coverage. Journal of Banking & Finance 35, 2791–2800.
Boudoukh, J., Feldman, R., Kogan, S., Richardson, M., 2013. Which News Moves Stock Prices? A Textual Analysis 46.
Bybee, L., Kelly, B., Manela, A., Xiu, D., 2019. The Structure of Economic News 53.
Caporale, G.M., Spagnolo, F., Spagnolo, N., 2018. Macro news and bond yield spreads in the euro area. The European Journal of Finance 24, 114–134.
Chen, M.A., Wu, Q., Yang, B., 2019. How valuable is FinTech innovation? The Review of Financial Studies 32, 2062–2106.
Davis, A.K., Tama-Sweet, I., 2012. Managers’ use of language across alternative disclosure outlets: earnings press releases versus MD&A. Contemporary Accounting Research 29, 804–837.
Demers, E., Vega, C., 2011. Linguistic tone in earnings announcements: News or noise. FRB International Finance Discussion Paper 951.
Doran, J.S., Peterson, D.R., Price, S.M., 2012. Earnings conference call content and stock price: the case of REITs. The Journal of Real Estate Finance and Economics 45, 402–434.
Durnev, A., Mangen, C., 2011. The real effects of disclosure tone: Evidence from restatements. Available at SSRN 1650003.
Engelberg, J., 2008. Costly information processing: Evidence from earnings announcements, in: AFA 2009 San Francisco Meetings Paper.
Engelberg, J.E., Reed, A.V., Ringgenberg, M.C., 2012. How are shorts informed?: Short sellers, news, and information processing. Journal of Financial Economics 105, 260–278.
Engle, R.F., Giglio, S., Kelly, B., Lee, H., Stroebel, J., 2020. Hedging climate change news. The Review of Financial Studies 33, 1184–1216.
Feldman, R., Govindaraj, S., Livnat, J., Segal, B., 2010. Management’s tone change, post earnings announcement drift and accruals. Review of Accounting Studies 15, 915–953. https://doi.org/10.1007/s11142-009-9111-x
Feldman, R., Govindaraj, S., Livnat, J., Segal, B., 2008. The incremental information content of tone change in management discussion and analysis.
Ferguson, N.J., Philip, D., Lam, H., Guo, J.M., 2015. Media content and stock returns: The predictive power of press. Multinational Finance Journal 19, 1–31.
Hoberg, G., Phillips, G., 2016. Text-based network industries and endogenous product differentiation. Journal of Political Economy 124, 1423–1465.
Hoberg, G., Phillips, G., 2010. Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis. The Review of Financial Studies. 23, 3773–3811. https://doi.org/10.1093/rfs/hhq053
Kelly, B., Papanikolaou, D., Seru, A., Taddy, M., 2018. Measuring technological innovation over the long run. National Bureau of Economic Research.
Li, F., 2006. Do stock market investors understand the risk sentiment of corporate annual reports? Available at SSRN 898181.
Loughran, T., McDonald, B., 2014. Measuring readability in financial disclosures. The Journal of Finance 69, 1643–1671.
Manela, A., Moreira, A., 2017. News implied volatility and disaster concerns. Journal of Financial Economics 123, 137–162.
Sinha, N.R., 2016. Underreaction to news in the US stock market. Quarterly Journal of Finance 6, 1650005.
Tetlock, P.C., 2007. Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance 62, 1139–1168.
Tetlock, P.C., Saar-Tsechansky, M., Macskassy, S., 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance 63, 1437–1467.
Twedt, B., Rees, L., 2012. Reading between the lines: An empirical examination of qualitative attributes of financial analysts’ reports. Journal of Accounting and Public Policy 31, 1–21.