1 00:00:00,000 --> 00:00:05,701 So in order to compute these conditional probabilities when there are a large 2 00:00:05,701 --> 00:00:12,565 number of possible words or features, we need to do a little more work. 3 00:00:12,565 --> 00:00:18,648 Let's look at the simple case of just one word, say red. 4 00:00:18,648 --> 00:00:26,403 There are two possibilities. The red could be present or absent. 5 00:00:26,403 --> 00:00:37,024 So, for our cases red is present. Out of these I have a Bi and the rest 6 00:00:37,024 --> 00:00:42,004 don't. The total number of cases which are Y 7 00:00:42,004 --> 00:00:49,027 situations are obviously K as before. So lets see what the simple condition 8 00:00:49,027 --> 00:00:55,508 probabilities are in this case. The probability of a Bi given that there 9 00:00:55,508 --> 00:01:03,071 is an R is just I over R. On the other hand, the probability of an R 10 00:01:03,071 --> 00:01:08,007 over all is just R over the total number N. 11 00:01:10,054 --> 00:01:17,012 And the probability that both R occurs and there is a Bi is I over N. 12 00:01:17,012 --> 00:01:24,045 Now this is the joint probability, so you'll have to divide by the total number 13 00:01:24,045 --> 00:01:30,020 of instances N rather than any of the conditional ones K or R. 14 00:01:30,020 --> 00:01:34,037 Baye's rule is actually just simple arithmetic. 15 00:01:35,023 --> 00:01:42,073 In particular if you write I over N as I over R times R over N, lets see what you 16 00:01:42,073 --> 00:01:46,098 get. Well we already know what I over R is its 17 00:01:46,098 --> 00:01:51,051 just the conditional probability of a Bi given R. 18 00:01:51,098 --> 00:01:59,014 And R over N is the probability of R itself, so simple arithmetic tells us that 19 00:01:59,014 --> 00:02:06,068 the joint probability of R and B is the product of the conditional probability and 20 00:02:06,068 --> 00:02:11,090 the a prior I probability. This is just Bayes rule. 21 00:02:12,028 --> 00:02:18,058 The probability of B and R is the conditional probability of B given R times 22 00:02:18,058 --> 00:02:24,096 P of R which is also the same thing as the probability of R given B times the 23 00:02:24,096 --> 00:02:30,006 probability of B. We can see that by just rewriting I over 24 00:02:30,006 --> 00:02:34,048 N, not by introducing an R, but by introducing a K. 25 00:02:34,048 --> 00:02:41,070 In which case you'll get I over K, which is just the probability of R out of all 26 00:02:41,070 --> 00:02:47,572 those that are B and the probability of B which is just K over N, right? 27 00:02:47,572 --> 00:02:54,976 Bayes' Rule which some of you may or may not remember turns out to be just simple 28 00:02:54,976 --> 00:02:58,872 arithmetic. As we shall soon see, Bayes' Rule is 29 00:02:58,872 --> 00:03:05,396 critical to machine learning because it allows us to compute any of those many, 30 00:03:05,396 --> 00:03:11,906 many joint probabilities even if there is no data for a particular combination. 31 00:03:11,906 --> 00:03:17,765 Before we can do machine learning using Bayesian techniques we need one more 32 00:03:17,765 --> 00:03:27,005 important concept that is independence. Think about two words like red and cheap. 33 00:03:27,005 --> 00:03:36,062 As before, we have R equal to yes for R queries, cheap equal to yes for C cases 34 00:03:36,062 --> 00:03:46,039 and in I cases both key words are present. The probability of an R occurring is 35 00:03:46,039 --> 00:03:51,039 clearly R over N. And the probability of cheap occurring is 36 00:03:51,039 --> 00:03:55,054 C over N, as before. And similarly, the conditional 37 00:03:55,054 --> 00:04:00,046 probabilities of an R given that C occurs can be computed. 38 00:04:00,046 --> 00:04:09,049 Independence says that the probability of R does not depend on whether or not the 39 00:04:09,049 --> 00:04:14,023 word cheap is already present in the query. 40 00:04:14,023 --> 00:04:23,060 In other words the probability of R should be the same as the probability of R given 41 00:04:23,060 --> 00:04:27,090 C. Similarly the probability that cheap 42 00:04:27,090 --> 00:04:33,086 occurs should not depend on whether or not red occurs. 43 00:04:34,098 --> 00:04:39,027 In such situations these two features are independent. 44 00:04:39,027 --> 00:04:43,072 Of course, that might not necessarily always be the case. 45 00:04:43,072 --> 00:04:50,017 For example, somebody searching for big data might actually search for map reduce 46 00:04:50,017 --> 00:04:55,018 at the same time rather than something else, like red or flower.