In this article, we’re going to get an overview of sentiment analysis in finance.
Just a quick recap though, recall that we said (NLP) is essentially just a set of techniques, which help us gain insights from text (or indeed any other language) data.
What is Sentiment Analysis?
Sentiment analysis, at least in finance, essentially involves quantifying and exploiting sentiment – or emotions – for some sort of objective.
The objective could be for investment purposes. Or for understanding firm performance, including profitability or cash flow management, the likelihood of fraud, etc.
Ultimately though, it’s about quantifying sentiment, and exploiting it by linking it to firms.
So it’s a case of firstly estimating the level of sentiment that a firm may have. And then linking that sentiment to other attributes or characteristics of firms. After that, we can explore whether there’s some sort of relationship with sentiment and other firm characteristics.
This holds regardless of which you end up using.
For example, one could link the “positivity” of firms’ annual reports to firms’ profitability. This would allow us to explore whether for instance, positivity has some relationship with profitability.
What is sentiment in the context of sentiment analysis?
Now, if we think about what we mean by sentiment itself, it can include things like:
- positive sentiment,
- negative sentiment
It could also include things “uncertainty”, “narcissism”, “anxiety”, or “panic”, to name a few.
Essentially, it includes all of the things that you might think about when you think of the word sentiment.
It’s just that, rather than thinking about it in terms of sentiment of humans, we’re thinking about it in terms of sentiment of firms.
For example, sentiment can include the level of positivity of firms, or the level of negative tone of firms.
It could also be about the amount of uncertainty that’s being portrayed by firms. Or whether a firm’s CEO is narcissistic or humble. Or whether a firm is anxious about its future prospects, or even more broadly, whether an economy is currently facing some sort of panic.
It’s very much about the sentiment or emotion that you know and are familiar with as a human.
It’s just that we’re applying it either to the firm level context, or to an aggregate macro level context.
And rather than “feeling” that / , it’s a case of assigning a to a firm and using that in the .
This Article features concepts that are covered extensively in our course on Investment Analysis with Natural Language Processing (NLP).
If you’re interested in learning how to apply sentiment analysis for investment while working with real world data, you should definitely check out the course.
Applying Sentiment Analysis
In terms of how we actually go about using sentiment once we’ve estimated it, it’s a case of first identifying whether or not the overall sentiment matters.
If a specific type of sentiment does in fact matter, then we can use it to create some sort of trading strategy, for example.
But importantly, when we’re trying to identify whether sentiment matters, we’re not talking about whether we think sentiment matters or whether you think sentiment matters.
It’s not about personal opinions or subjective thought, or subjective debate.
It’s about gaining by letting the data determine whether or not sentiment matters.
Data driven validation
In other words, it’s about using statistics to identify whether sentiment actually matters.
And then, if it does matter (statistically), we can then use that sentiment estimate to create a trading strategy.
For instance, if the returns of “more positive firms” are greater than those of “less positive firms”…
Then we can invest in more in firms with stronger firms and short, or sell, the shares of firms with weaker , for example.
Strictly, if the returns of more positive firms are statistically greater than the returns of less positive firms, then we could create a trading strategy that goes long in “more positive firms” and shorts “less positive firms”.
And again, the distinction between, or the classification of “more ” and “less ” is ultimately a function of the assigned for each firm.
The same would naturally hold for something like “more ” and “less ” classification, for example.
Similarly, if you find for instance, that firms with narcissistic CEOs do better than firms with humble CEOs, then you can invest in firms that are led by narcissistic CEOs and short firms that are led by humble CEOs.
Or indeed, vice-versa, depending on what the data shows. So if, for instance, you find that humble CEOs on average tend to outperform narcissistic CEOs, then you could invest in firms that are led by humble CEOs and short firms that are led by narcissistic CEOs.
It’s not just about measuring sentiment
At this stage, don’t worry about how you actually go about measuring sentiment.
It’s just important that you understand that it’s not just about measuring sentiment.
It’s also about validating whether or not the sentiment actually matters.
And only then going into things like whether we can exploit it by creating a trading strategy.
Crucially then, the idea is to start with some sort of notion, or a premise, or an idea, or more formally, what’s called a testable hypothesis.
And then test or validate that hypothesis, to see whether or not it holds.
This approach applies to any scientific analysis, not just sentiment analysis. The best thing you can do is to start with a testable hypothesis and then test and validate whether or not that hypothesis holds.
In other words, let the hypothesis drive your sentiment analysis process.
The Fervent 5 Step Sentiment Analysis System
We think this is so important, that we actually created what we call a 5 Step Sentiment Analysis System to guide you along the way.
But we do want to be clear. Although we’ve created this 5 Step System, it’s important to know that the real world isn’t necessarily always so systematic and organised.
Things can in fact get messy.
But if we were to think about some sort of systematic approach to sentiment analysis, here’s what a 5 Step might look like.
Want to go beyond Sentiment Analysis?
Get the Investment Analysis with NLP Study Pack (for FREE!).
Step 1: Start with a Testable Hypothesis
You want to start with perhaps the most important part – create a testable hypothesis.
If you’re not familiar with what a “testable hypothesis” is, in a nutshell, you can just think of it as a formalised version of an idea or notion that you’re looking to statistically test.
It’s just a way of formalising your beliefs or what you think might be true. And expressing it in a way that you can test empirically.
Step 2: Extract Relevant Data
Once you have a testable hypothesis, you can then think about extracting relevant data.
And in the context of sentiment analysis, because we tend to work with text data for the most part, the relevant data is some sort of Corpus.
Just in case you’re not familiar with what a “Corpus” is; a Corpus is essentially the entire sample of text data that you’re going to be working with.
Hypothesis driven choice
And of course, to determine whether the data is relevant, you really want to go back to Step 1, and let the hypothesis drive that choice.
So for example, if your hypothesis is whether more positive firms outperform less positive firms, then the data that’s most relevant is some sort of firm level data.
It might be annual reports for instance. Or it might be interview transcripts of the management or the CEO. It might even be a tweet on Twitter, or other social media posts and updates by firms.
But the bottom line is that it’s some sort of firm level data. It wouldn’t make sense to work with macro aggregate level data if you’re trying to gain an into whether the positivity of firms matter.
If on the other hand, your hypothesis is something about whether or not the economy is in a state of panic, then it’s unlikely that the most relevant data is firm level data.
Because we’re now thinking about things in the aggregate terms.
And so a large sample of news articles would probably be a better and more relevant data source of insight vis-a-vis firm level annual reports, for instance.
You might also argue that general / generic posts may also make a fairly good data source.
For example, you could use some sort of / to gain a proxy for the matcro economic effect via the posts.
Thus again, the key takeaway is that you want to let the hypothesis drive the choice of data that you work with.
Step 3: Clean Your Data
Now, once you have your relevant data, it’s a case of cleaning that text data.
And the importance of this particular step really cannot be emphasised enough.
There’s a term called “GIGO“, which is quite important in computer science / data science. And that stands for Garbage In Garbage Out.
So GIGO or Garbage In Garbage Out is essentially saying, if your data is garbage – if your data is rubbish; if the data’s not clean, or the data is not usable; or the data is fundamentally flawed – then it doesn’t matter how good your model is.
And it doesn’t matter how sophisticated your sentiment model is.
The results from your text analysis will almost certainly be garbage.
In other words, if the data is garbage, then the results are going to be garbage as well.
If the input is garbage, then the output is garbage. Garbage In Garbage Out. GIGO.
And so again, we really can’t stress the importance of cleaning the text data for any sort of .
As a result of cleaning the text data, the additional incremental benefit is that you really get to know and fully understand the data that you’re working with.
This in turn will allow you to conduct better, and richer text analytics. And gain much deeper insights from your data.
That’s something which is imperative to conducting any sort of half decent analysis.
Okay. Now you’ve created your testable hypothesis. You’ve let the hypothesis drive the choice of data that you work with. And then you’ve obsessively cleaned the data.
The next thing you can do is perhaps the most fun part. And in the context of sentiment analysis, it’s actually estimating the sentiment.
Step 4: Estimate Sentiment
It’s at this stage where you can actually quantify things like the positivity of firms, or negativity, or uncertainty, or indeed any other type of sentiment or emotion.
How do you actually estimate sentiment? Answering that is well worth an entire article.
Or in fact, several videos as part of a robust course.
We have both for you though.
This article discusses sentiment analysis in finance and talks about the two approaches to estimating sentiment including:
- machine learning technique, and
- sentiment lexicon / dictionary based approach
And our course on Investment Analysis with Natural Language Processing (NLP) shows you how to estimate sentiment from scratch. And a whole lot more, too.
Now, once you’ve got this estimate of sentiment for a firm, or a set of firms, or indeed the aggregate economy, you finally test and validate the original hypothesis.
Step 5: Test & Validate the Hypothesis
Now that you’ve got a measure of sentiment, you can empirically or statistically test and validate whether or not that measure of sentiment matters.
And once you’ve validated the sentiment measure, then – and only then – should you proceed to creating a trading or investment strategy.
If any part of this article is not quite clear, especially the part about the five step process to conducting sentiment analysis, then please do read it again.
For now, though, just a quick summary.
We learned that sentiment analysis, at least in finance, essentially involves quantifying and then exploiting sentiment for some sort of investment objective.
The fundamental idea of sentiment analysis is to start with a testable hypothesis. A hypothesis on whether or not some type of sentiment matters. Then statistically test and validate that core hypothesis, before moving on to exploit it in a trading or investment strategy.
Lastly, of course, we talked about the 5 Step Sentiment Analysis Process.
It’s a systematic approach, or the ideal approach, you could use to conduct sentiment analysis in a rigorous and robust manner.
Importantly though, do remember that the real world isn’t quite so systematic or organized.
The real world is in fact chaotic. And that’s the beauty of working in the real world. That’s the beauty of working with real world data.
It’s chaotic, it’s messy, it’s exciting.
There’s a lot of uncertainty; there’s a lot of unknowns.
And you really do want to embrace and enjoy the chaos. But at the same time, you want to have some sort of order.
Thus, although conducting real-world analysis is quite messy, it’s important to have some sort of idea as to where exactly you are in the overall sentiment analysis process. Because otherwise, it’s akin to just running around like headless chickens!
Do you want to build a rigorous investment analysis system that leverages the power of sentiment analysis?