Sentiment analysis has become a very
popular big data application in the recent
years.
Consider for example, the hundreds of
millions of tweets everyday.
As a result, organizations can listen to
the voice of their consumers like never
before.
Manufacturers of consumer goods from
electronics to food products are able to
measure the popularity of their brands
versus the those of their competitors by
simply counting the number of positive
versus the number of negative comments
that they find on forums like Twitter,
Facebook, emails, wherever.
They even convert the voice that they
listen to in a call center to text and
then measure the number of positive
sentiment versus the negative one.
Well, most often the sentiment is negative
because people don't talk too many
positive things about brands, but when
they do, that means the brand is probably
very positive.
So you can build that bias in, but this
has become an extremely popular
application.
In the big data world and let's see how it
works using Beijian machine learning.
Think about a few comments that all of you
might have been posting on the forum.
While I haven't used real comments.
I made this up, obviously.
There aren't that many comments on the
forum; I wish there were.
Be that as it may, think about comments
like, like this, it's a positive comment.
There may be lots of these.
But there may be some which are very
negative.
And so on and so forth.
Suppose we were able to manually label
comments as being positive or negative.
In this obviously we use human intuition,
and not some automated technique.
This is called the training phase.
After that, could we figure out using
Naive Bayes.
Whether a new comment is positive or
negative.
Let's see how to do that.
First, we need to compute the a priori
probability, that is the over all chance
that a comment is positive which is simply
the number of positive comments divided by
the total number of comments which is
8000, in this case 6000 of them being
positive.
And similarly, the probability of a
comment being negative a prioiri, which is
the number of negative comments divided by
the total.
And then needing to compute the
likelihoods.
For example, the probability that the word
like occurs within the positive comments.
So of the 6,000 positive comments, like
occurs in only 2,000 of them.
So, we get this likelihood.
Similarly the probability that enjoy
occurs amongst the positive comments can
be computed similarly.
Now, I notice that there are no comments
which are positive, but include the word,
hate.
We can't put zero for this because that
would make the formula go completely out
of whack.
So we replace those by a low number, like
one.
This is called smoothing, in the Naive
Bayes classifier.
And this is important because we have to
include all likelihood probabilities.
Now we do the same thing for the negative
comments.
The probability that hate occurs amongst
negative comments is 800 out of the 2000
negative comments because this comment
contains hate.
The probability that war occurs is one,
like occurs again by smoothing is 1/2000
and so on.
Not also that the probability that enjoy
which might be thought of as a, as a
positive comment.
Amongst the negative comments is not that
small.
I will come to this phenomena later.
The reason is that the enjoy is occurring
along with a negative term, similarly with
LUT.
For the moment, we need to factor in all
the likelihood probabilities simply by
looking at the words and their
occurrences.
And we'll worry about things like, not
next week, or rather the week afterwards.
We're only going to consider the words
marked bold.
And for these words the likelihood
probabilities look like this.
So we can compute the probability that
like occurs amongst all the positive
comments easy and so on and simply the
same thing is done for the negative
comments.
You can work this out for yourself.
It's a good exercise to do.
Now faced with a new tweet, say, "I really
liked this simple course a lot; something
we haven't seen before." We can compute
the likelihood ratio, which is simply the
probability of like occurring, because
like happens to occur in this tweet, given
that is positive and all.
Everything in the enumerator is given it
is positive.
And for hate, which does not occur we
compute the probability that hate does not
occur, given that the sentiment is
positive by taking one minus the
likelihood ratio.
Clearly the hate can be there or not
there.
So if it's not there, the probably of not
hate given positive is one minus the
probably of hate given positive.
We include every possible word amongst the
bold words that we have considered and
even those which don't occur in this
tweet, we include their probabilities by
taking one minus.
Lastly, we multiply it by the A prior
probability.
Similarly for the denominator.
And we get a likelihood ratio of .026 over
a very small number .00005, which is very
much larger than one.
So, the system can easily label this tweet
as being positive without ever having seen
it before.
This is an example of a machine having
learned to identify which tweets are
positive and which are negative, based on
historical data using the naive Bayes
classifier.