In this article, we’re going to get an introductory overview into Natural Language Processing or NLP for Finance.
So let’s get into it.
What is Natural Language Processing (NLP)?
Firstly, what is NLP?
Well, ultimately it’s just a set of techniques which help us gain insights from text data.
Or for that matter, any other type of language, data; for instance, voice.
Ultimately the idea is to use these set of techniques to try and gain insights – preferably – or to try and gain value, from language data.
Or indeed, from in general.
And for the most part in Finance, at least today, when we think about language data, we typically work with text data.
But it wasn’t always like this in finance.
NLP for Finance – A Brief History
Historically, academics and practitioners in finance have largely relied on numerical data for investment analysis.
And this ranges from something as simple as ratios to more advanced portfolio optimisation techniques.
But the idea is, regardless of which aspect of finance you look at, be it investment analysis, be it financial modelling or financial analysis, or capital budgeting…
Regardless of which concepts or areas you look at… for the most part, people have worked with structured numerical data.
This Article features a concept that is covered extensively in our course on Investment Analysis with Natural Language Processing (NLP).
If you’re interested in leveraging the power of text data for investment analysis, you should definitely check out the course.
Text Data in Finance
Now this wasn’t because we didn’t have a lot of text data / in finance far from it.
In fact, finance has so much text data, that few fields can actually compete with that sort of volume.
Predominantly relying on numerical data instead of text data was largely because analysing these large volumes of text data was extremely time consuming and cumbersome.
Large sizes of unstructured content
To give you just a minuscule idea of the sheer scale of text data that’s available in finance…
Back in 2015, the Wall Street Journal reported that the average annual report or 10-K had about 42,000 words.
And this was in 2013.
That was up from roughly 30,000 words in 2000.
To put this in perspective, the Sarbanes Oxley Act of 2002, which was this really massive piece of legislation that came about as a result of scandals like Enron and WorldCom and all the other corporate scandals during the.com era.,,
Well, that massive piece of legislation had approximately 32,000 words!
Annual reports today, which is something that firms have to publish every single year, at least back in 2013, they had about 42,000 words on average.
And the size is not really getting particularly smaller today.
Importantly, of course, if you’re thinking 42,000 words is not all that much; this is just an average.
So you’ll find plenty of annual reports that have hundreds of thousands of words.
And of course you will find some annual reports that have tens of thousands of words.
But the point is that this is for a single annual report.
And firms listed on the / stock need to publish these annual reports every single year!
So just take a single firm and, say you’re looking at 10 years worth of data. And the average number of words is 42,000.
Well, you have 420,000 words to analyse now.
So good luck if you’re doing that manually!
I wouldn’t be keen and quite frankly, very few people working.
And this is why until fairly recently, these really massive volumes of text data in finance, which have potentially so much value in them, were just left untouched.
Of course, the size isn’t the only factor that meant people weren’t analysing these reports.
For instance, the CFO of GE, Jeffrey Bornstein was taken aback by the sheer size of their own annual report!
Their annual report was about 110,000 words long. And he himself suggested that not a single retail investor on earth could get through it, let alone understand it.
And in terms of this latter part year… this “understanding these annual reports”; that’s ultimately because annual reports tend to have a lot of technical jargon that not a lot of people actually understand.
And this is not limited to just retail investors.
Although mutual fund managers and hedge fund managers and pension fund managers may not openly admit it…
Not all of them necessarily understand what all these annual reports are on about.
Because sometimes they just have terms that one might not have come across.
Want to go further?
Get the Investment Analysis with NLP Study Pack (for FREE!).
Why use NLP for Finance?
The point is, academics and practitioners didn’t really work with text data in finance, despite there being so much text data, partly because of course of the technical jargon involved, but largely because of the sheer size of the alternative data.
Which meant of course, analysing all of this text data manually was simply not feasible.
Fortunately, though, thanks to major advancements in NLP technology, particularly thanks to computational linguistics, it’s now significantly easier to analyse insanely large volumes of text data. The so-called “Big Data”.
But it’s not just about more than just analysing this text data. It’s ultimately about gaining actionable insights or value from that text data.
Applications of NLP for Finance
And if we think about the applications of NLP for Finance… they’re fairly extensive.
They’re certainly increasing.
And I think, with time, they’re only going to get bigger and better.
Specifically though, while the applications of NLP for Finance are fairly wide in their scope, we think we can broadly categorise them into three different types.
Applications in Context
The first of which is Context
This is about using NLP techniques to try and gain context from text data in finance.
For example, it’s a case of using Topic Modelling algorithms to try and establish the context of news articles or firm announcements, business descriptions, annual reports, and a whole host of other “Big Data” or “Big Text Data” in Finance.
It’s a case of using these machine learning / algorithms in unsupervised settings to try and establish the themes or topics that are being discussed or talked about in these various different kinds of text data.
So that’s context.
Applications in Compliance
Then there’s , which focuses on things like detecting insider trading or detecting and preventing fraud within the / . in particular
And it’s doing so using unique sets of data; for instance, emails or indeed chat transcripts inside firms.
Generally speaking, in will require internal instead of external ones like transcripts, for example.
Applications in Quantitative Analysis
And lastly, there’s the case of in Quantitative Analysis.
For instance, one majoy involves creating trading strategies, using ““.
This involves firstly estimating the sentiment that firms may display, using like annual reports, transcripts, posts, etc.
And then using that sentiment to create trading strategies.
Your biggest takeaway from this article should be that Natural Language Processing (NLP) allows us to really leverage the power of text data and work on interesting problems in Finance.
Do you want to build a rigorous investment analysis system that leverages the power of text data with Python?