Sentiment analysis has become a very popular big data application in the recent years. Consider for example, the hundreds of millions of tweets everyday. As a result, organizations can listen to the voice of their consumers like never before. Manufacturers of consumer goods from electronics to food products are able to measure the popularity of their brands versus the those of their competitors by simply counting the number of positive versus the number of negative comments that they find on forums like Twitter, Facebook, emails, wherever. They even convert the voice that they listen to in a call center to text and then measure the number of positive sentiment versus the negative one. Well, most often the sentiment is negative because people don't talk too many positive things about brands, but when they do, that means the brand is probably very positive. So you can build that bias in, but this has become an extremely popular application. In the big data world and let's see how it works using Beijian machine learning. Think about a few comments that all of you might have been posting on the forum. While I haven't used real comments. I made this up, obviously. There aren't that many comments on the forum; I wish there were. Be that as it may, think about comments like, like this, it's a positive comment. There may be lots of these. But there may be some which are very negative. And so on and so forth. Suppose we were able to manually label comments as being positive or negative. In this obviously we use human intuition, and not some automated technique. This is called the training phase. After that, could we figure out using Naive Bayes. Whether a new comment is positive or negative. Let's see how to do that. First, we need to compute the a priori probability, that is the over all chance that a comment is positive which is simply the number of positive comments divided by the total number of comments which is 8000, in this case 6000 of them being positive. And similarly, the probability of a comment being negative a prioiri, which is the number of negative comments divided by the total. And then needing to compute the likelihoods. For example, the probability that the word like occurs within the positive comments. So of the 6,000 positive comments, like occurs in only 2,000 of them. So, we get this likelihood. Similarly the probability that enjoy occurs amongst the positive comments can be computed similarly. Now, I notice that there are no comments which are positive, but include the word, hate. We can't put zero for this because that would make the formula go completely out of whack. So we replace those by a low number, like one. This is called smoothing, in the Naive Bayes classifier. And this is important because we have to include all likelihood probabilities. Now we do the same thing for the negative comments. The probability that hate occurs amongst negative comments is 800 out of the 2000 negative comments because this comment contains hate. The probability that war occurs is one, like occurs again by smoothing is 1/2000 and so on. Not also that the probability that enjoy which might be thought of as a, as a positive comment. Amongst the negative comments is not that small. I will come to this phenomena later. The reason is that the enjoy is occurring along with a negative term, similarly with LUT. For the moment, we need to factor in all the likelihood probabilities simply by looking at the words and their occurrences. And we'll worry about things like, not next week, or rather the week afterwards. We're only going to consider the words marked bold. And for these words the likelihood probabilities look like this. So we can compute the probability that like occurs amongst all the positive comments easy and so on and simply the same thing is done for the negative comments. You can work this out for yourself. It's a good exercise to do. Now faced with a new tweet, say, "I really liked this simple course a lot; something we haven't seen before." We can compute the likelihood ratio, which is simply the probability of like occurring, because like happens to occur in this tweet, given that is positive and all. Everything in the enumerator is given it is positive. And for hate, which does not occur we compute the probability that hate does not occur, given that the sentiment is positive by taking one minus the likelihood ratio. Clearly the hate can be there or not there. So if it's not there, the probably of not hate given positive is one minus the probably of hate given positive. We include every possible word amongst the bold words that we have considered and even those which don't occur in this tweet, we include their probabilities by taking one minus. Lastly, we multiply it by the A prior probability. Similarly for the denominator. And we get a likelihood ratio of .026 over a very small number .00005, which is very much larger than one. So, the system can easily label this tweet as being positive without ever having seen it before. This is an example of a machine having learned to identify which tweets are positive and which are negative, based on historical data using the naive Bayes classifier.