1 00:00:00,000 --> 00:00:09,190 Let's take a look at how we might view one particular situation of wetness of grass 2 00:00:09,190 --> 00:00:14,880 being caused by rain in the language of probability. 3 00:00:15,340 --> 00:00:21,714 As before, when we studied base rule, we break it up into cases. 4 00:00:21,714 --> 00:00:29,631 So, for our situations we have rain occuring out of a total of n observations 5 00:00:29,631 --> 00:00:34,714 or n days. Out of those in n cases we have the grass 6 00:00:34,714 --> 00:00:41,903 being wet and in a certain situation we have both rain occurring, as well as the 7 00:00:41,903 --> 00:00:48,014 grass being wet for i cases. So we have data which is simply w being 8 00:00:48,014 --> 00:00:55,833 yes or no, rain being yes or no, and lots, and lots of data and we can simply compute 9 00:00:55,833 --> 00:01:03,099 the probabilities that rain and wetness, either occur or don't occur, in the 10 00:01:03,099 --> 00:01:07,922 following way. So, the probability that it rains and the 11 00:01:07,922 --> 00:01:13,709 grass is wet, is i over n. The probability that it doesn't rain but 12 00:01:13,709 --> 00:01:17,480 the grass is still wet is m minus i over n. 13 00:01:18,380 --> 00:01:26,244 That it rains, and the grass is not wet is k minus i over n. And the probability that 14 00:01:26,244 --> 00:01:33,080 it neither rains nor the grass is wet is n minus m minus k plus i over n. 15 00:01:33,080 --> 00:01:39,858 Notice that we had to add in i, because these m cases included i, so did these k 16 00:01:39,858 --> 00:01:46,380 cases, so we had these overlap cases coming twice, so we had to add them back 17 00:01:46,380 --> 00:01:51,786 in since they were subtracted twice when we subtracted m and k. 18 00:01:51,786 --> 00:01:58,780 Well, a table like this is called a probability table or also a potential. 19 00:01:59,060 --> 00:02:03,827 Now, Suppose one wanted to find out the 20 00:02:03,827 --> 00:02:11,110 probability that it rained or didn't rain. Well, one could simply do that from the 21 00:02:11,110 --> 00:02:17,929 data directly by simply adding up all the cases where it rained and dibiding by the 22 00:02:17,929 --> 00:02:21,907 total. Similarly adding up all the cases where it 23 00:02:21,907 --> 00:02:28,727 didn't rain, dividing by the total, or one could work with the smaller table where we 24 00:02:28,727 --> 00:02:35,222 had already done some of the calculations beforehand, but forget about w, that is 25 00:02:35,222 --> 00:02:39,606 sum it out. In the sense that you sum all that goes up 26 00:02:39,606 --> 00:02:47,294 and wherever we have the same values of r, we simply add up those rows regardless of 27 00:02:47,294 --> 00:02:52,832 what the value of w is. So, the first row and third row get added 28 00:02:52,832 --> 00:02:59,494 together. The second row and the last row get added together. Resulting in the 29 00:02:59,494 --> 00:03:03,951 following, Which is clearly the sum of the first and 30 00:03:03,951 --> 00:03:10,877 the third and the second and the fourth, which is also called the marginalization 31 00:03:10,877 --> 00:03:15,751 of the w column. In other words, we have gotten rid of the 32 00:03:15,751 --> 00:03:21,480 w variable by summing it out wherever the values of R are the same. 33 00:03:21,480 --> 00:03:27,513 You can easily verify that by marginalizing out w or summing it out, one 34 00:03:27,513 --> 00:03:33,296 does indeed get k over n, which is the probability of rain being true. 35 00:03:33,296 --> 00:03:38,660 And n minus k over n, which is the probability of rain being not true. 36 00:03:39,340 --> 00:03:47,469 Now notice that in the language of SEQUEL, marginalization is equivalent to an 37 00:03:47,469 --> 00:03:54,948 aggregation on the column P in, For example in SEQUEL, one would write it 38 00:03:54,948 --> 00:04:03,620 as, if you think about this as a table, select R and the sum of P from this table 39 00:04:04,080 --> 00:04:09,264 group by R. That means wherever there is an equal R, 40 00:04:09,264 --> 00:04:15,159 we keep a separate row in the result, but add them all up, 41 00:04:15,159 --> 00:04:21,360 Only keep distinct rows where there are distinct values of R. 42 00:04:22,200 --> 00:04:28,458 In the language of relational algebra, this is written as follows. 43 00:04:28,458 --> 00:04:33,080 We have an, aggregation operation, which is a sum, 44 00:04:33,440 --> 00:04:43,952 Grouping by R on the table R,W.. It's quite interesting that one can 45 00:04:43,952 --> 00:04:49,994 perform this operation on the probability table using SEQUEL nd we shall exploit 46 00:04:49,994 --> 00:04:54,341 this fact as we go along. So please take a careful look and 47 00:04:54,341 --> 00:04:58,393 understand this. Now, let's see how we might write Bayes 48 00:04:58,393 --> 00:05:02,520 rule in terms of these potentials or probability tables. 49 00:05:02,520 --> 00:05:10,315 Remember that Baye's rule stated that the joint probability that R and W together is 50 00:05:10,315 --> 00:05:17,943 the conditional probability of R given W multiplied by the probability of W itself, 51 00:05:17,943 --> 00:05:22,171 Which was, in the simplest case, of yes and yes. 52 00:05:22,171 --> 00:05:29,064 We will simply write it i equal to n, rather i by m, as i over m times m by n. 53 00:05:29,064 --> 00:05:34,303 Simple arithmetic. Similarly for every other row, we would 54 00:05:34,303 --> 00:05:41,667 simply rewrite the denominator of m by the appropriate value so that we get the 55 00:05:41,667 --> 00:05:49,904 conditional probability over here, and the probability of W falls out on in this 56 00:05:49,904 --> 00:05:56,000 term. Verify this and remind yourself of Baye's 57 00:05:56,000 --> 00:06:04,193 rule from the leasing vector. Now, let's notice how this multiplication 58 00:06:04,193 --> 00:06:10,741 actually takes place. Consider these as three tables. 59 00:06:10,741 --> 00:06:16,502 T0, T1, and T2. T1 has R and W, but T2 has only W. 60 00:06:16,502 --> 00:06:26,520 Now, let's see how we might multiply these two probability tables in another way. 61 00:06:26,840 --> 00:06:33,941 The probability that R given W multiplied by the probability of W is some way of 62 00:06:33,941 --> 00:06:39,902 combining these two tables. That's some way just happens to be the 63 00:06:39,902 --> 00:06:44,900 join of these tables in SEQUEL on the common attribute W. 64 00:06:45,620 --> 00:06:49,658 Suppose we were to perform this following sequel. 65 00:06:49,658 --> 00:06:54,787 Select R, Sum of the product, p1 and p2, from these 66 00:06:54,787 --> 00:07:03,468 two tables respectively where W1 equals W2 and group by R. 67 00:07:03,468 --> 00:07:10,761 So essentially all we're saying is we're going to multiply i over m by m over n, 68 00:07:10,761 --> 00:07:15,411 Similarly for the other case where R equals to yes. 69 00:07:15,411 --> 00:07:22,646 We also multiply k minus i by n minus m by the corresponding value in this table, 70 00:07:22,646 --> 00:07:29,375 where W has the same value n, and then add this term and this term up, and you can 71 00:07:29,375 --> 00:07:36,021 easily verify that will give me k over n, which is the probability of rain being 72 00:07:36,021 --> 00:07:39,677 yes. Similarly, for the case of rain being no, 73 00:07:39,677 --> 00:07:46,323 one can work it out that the results of the sequel will get the correct value n 74 00:07:46,323 --> 00:07:50,974 minus k over n. To conclude, we can multiply probability 75 00:07:50,974 --> 00:07:58,040 tables or rather potentials as they are often called in the probability literature 76 00:07:58,460 --> 00:08:05,740 using SEQUEL. Now, let's turn to another important 77 00:08:06,140 --> 00:08:13,408 element which is evidence. Suppose we have a joint probability 78 00:08:13,408 --> 00:08:20,318 distribution, Like probability of rain and wetness, and 79 00:08:20,318 --> 00:08:28,594 we find that the grass is actually wet. In other words, we have observed some 80 00:08:28,594 --> 00:08:33,084 evidence and that evidence tells us the grass is wet. 81 00:08:33,084 --> 00:08:38,473 So, W equal to y. We need to restrict this table to only 82 00:08:38,473 --> 00:08:44,608 those entries where W equal to yes. So, essentially we're saying, let's drop 83 00:08:44,608 --> 00:08:50,412 all the entries where W equal to no and restrict it to W equal to yes. 84 00:08:50,412 --> 00:08:56,381 And that's this restriction operator, which is called the application of 85 00:08:56,381 --> 00:09:01,909 evidence to this potential. If we once again expand this restrictor 86 00:09:01,909 --> 00:09:07,514 table using Baye's rule, which is just a subset of what we had earlier, 87 00:09:07,514 --> 00:09:14,338 We get the probability of R given W equal to yes which is just a restriction of the 88 00:09:14,338 --> 00:09:20,837 overall probability that, of R given W times the probability that W equal to yes. 89 00:09:20,837 --> 00:09:24,980 We don't have to worry about W equal to no anymore. 90 00:09:26,760 --> 00:09:34,147 In other words, we say that probability of R, the W restricted is the probability of 91 00:09:34,147 --> 00:09:41,067 R given W equal to yes times the probability of the evidence, which is the 92 00:09:41,067 --> 00:09:50,033 probability that W equal to yes. Now, coming back to SEQUEL, it turns out 93 00:09:50,033 --> 00:09:58,702 that applying evidence is the same thing as using the select operator on the table 94 00:09:58,702 --> 00:10:06,383 R,W, which is just this table. So what we did, is multiply the R,W by the 95 00:10:06,383 --> 00:10:11,894 restriction operator, which is the same as doing a select. 96 00:10:11,894 --> 00:10:17,792 In other words, select R,W,P from this table where W equal to y. 97 00:10:17,792 --> 00:10:25,007 That's very obvious. The a posteriori probability of R given 98 00:10:25,007 --> 00:10:34,301 evidence, which was just this, Is now merely the restriction of the joint 99 00:10:34,301 --> 00:10:42,127 probability to the case W equal to yes divided by the probability of the evidence 100 00:10:42,127 --> 00:10:46,880 itself. Notice that if we did that division, we 101 00:10:46,880 --> 00:10:53,408 get exactly this, this table, which is exactly what we wanted which is the eight 102 00:10:53,408 --> 00:11:00,377 plus here, the probability of R given W equal to yes, which is different from the 103 00:11:00,377 --> 00:11:06,111 probability of R in general. We'll work out an example in a short 104 00:11:06,111 --> 00:11:13,788 while. But, for the moment just notice that the actions of multiplying 105 00:11:13,788 --> 00:11:19,259 probability tables is just taking a join in SEQUEL. 106 00:11:19,259 --> 00:11:26,195 The act of taking evidence, that is observing, one of the variables or more, 107 00:11:26,195 --> 00:11:34,443 one or more of the variables is merely issuing an appropriate select statement in 108 00:11:34,443 --> 00:11:39,148 SEQUEL. So, let's now take a look at what all this 109 00:11:39,148 --> 00:11:46,326 means for the case of classification or abductive reasoning is, which is what we 110 00:11:46,326 --> 00:11:52,787 were interested in the first place. Which will bring us right back to the 111 00:11:52,787 --> 00:11:59,786 language of classifiers and the naive Bayes classifier, but this time, in terms 112 00:11:59,786 --> 00:12:05,080 of probability tables rather than individual probabilities.