Let's return to our Naive Bayes Classifier now, with rain as the class either yes or no, and grass being wet as the only feature that we're going to measure. But suppose we have the conditional probability that grass is wet, given that there is rain. So obviously, if there is rain, the chance the grass is wet is very high. Otherwise, it's still there but not too much. And if there's no rain, there is a very low chance that the grass is wet. And of course, if there's no rain, the chance that the grass is not wet is very high. Notice again that this is a conditional probability table. So, the values for a particular value of rain being yes, these have to add up to one. The prior or the a priori probability that it rains regardless of whether the grass is wet or not is a twenty percent of the time it rains in this locality and 80% it doesn't rain. Now, if we just have this one feature, our Bayes rule gives us the fact that the joint probability of R and W can be factored in two ways as the probability of R given W times the probability of W, Or this conditional probability times the prior probability of R. Now, suppose we are given some evidence that we actually observe that the grass is wet, That is W equal to yes. Now, we can condition this joint probability by restricting it to the case where W is yes. So, we get probability of R given W equal to yes times the probability that W equal to yes. And this side gets probability of W equal to yes given R which is a likelihood times the prior which doesn't have anything to do with W so we don't have to condition it. This element probability that of R given W equal to yes, can now be written as the right hand side here divided by the probability of the evidence. The inverse of probability W equal to yes, we'll just write it as sigma. It's, it's simply the probability of the evidence and we'll use the sigma wherever we need to refer to one over the probability of the evidence that we are observing. Pardon the use of sigma for this purpose here, Only are we use sigma as a selection operator but now we will use sigma only as the inverse of the evidence probability. Study this carefully so that what we want is the a posterior probability of R given the evidence, which is simply proportional to the likelihoods multiplied by the prior. In sequel, we can write this as select sum of P times P from these two tables where W equal to yes, and R equal to R because we have to make sure that we're joining these tables on the common attribute R and finally grouping by R. This is what we've stated before, how to multiple two probability tables. So, the result is what we get by restricting our case, cases to the, the rows where W is yes, Multiplying the P values and adding them up. But it doesn't add up since there are only two values, Each of them having two rows having distinct values of R. So we simply multiply 0.9 by 0.2 to get the first row R equal to yes, And 0.2 by 0.8 to get the second row. This is the product of these two potentials or probability tables. Now, we need to normalize so that the total probability that, of R is one. So, the sum has to be one. So essentially, we were, we're taking the probability of R being yes as 0.18 divided by the sum of these two values which we get as 0.18 divided by the sum, and you get 53% of 0.53. This is the chance that, or our belief that it's raining once we see the grass as being wet. The reason it's so small, one might expect it's a little bit higher, Is that there are cases where the grass can be wet without there being any rain. In fact, there are cases of twenty percent where the grass is wet when there's no rain. In addition, it hardly ever rains. So, combining these two things together essentially gives us only a 53% chance of saying that it's actually raining if the grass is wet. Now, let's see what happens when we have more than one feature as we normally do in a Baysian classifier. We're going to have another feature called thunder, which will say whether or not we are hearing thunder. And, let's for the moment, assume that we don't really know whether we heard thunder at all. But the probability of hearing thunder, given, that it's raining, is 0.8, not hearing it is 0.2 and so on. So we have a conditional table even for this variable thunder, but in this case let us assume that we haven't actually observed thunder. We could be asking our neighbor over the phone whether the grass is wet, and then trying to conclude whether or not it's raining there. But, we didn't ask our neighbor whether they're hearing thunder. So now, the probability of RT given W equal to yes, because that's all we know, is by the same equation that we had earlier, probability of W equal to yes given R, probability of T given R times the probability of R. This is our Baysian formula. Again, we have the probability of the evidence over here, but this time we don't have anything that says T equal to yes or no. So, we need to sum out T in this product if we want to understand or, or know only the probability of rain. The way we sum that out is that we, again, do sequel. So, we select R and the sum of the products of all three of these columns from these three tables where W is yes, and all the common variables, which are only R over here. The only common variable between these tables is R, And then you group by R. This effectively sums out T because we're only selecting R and summing up the values for different values of T. Now you can verify that you could do this by first joining T1 and T3, that is this table and this table that was the PW given R and P of R, just like we did earlier, and then joining the result of that with the T given R table. So, we're just going to take the result we had earlier and join it with this new table. But, notice that we get the same result because when we multiplied this element by 0.8, for R equal to yes, and again by 0.2 for R equal to no. And then, sum them up, we'll get the same result because 0.8 plus 0.2 is one. Similarly, for 0.1 and 0.9, so it doesn't change anything. This is to be expected since there was no new evidence as compared to earlier. Just by including something new in our diagram, we couldn't expect to change our belief in R. Another important point is that if you remember in one of the homeworks, we asked whether ignoring some of the features changes our belief. Well, it does, change our belief as compared to if we had actually observed the features. But it's in some sense, equally correct, because suppose we didn't observe the feature. So this is actually the same as a Bayesian classifier with only two features, but only partially being observed, that only one of them is being observed. So you could have millions of features and only observe one or two, and you'll still get the same result just by putting them in your classifier. The summing out process make sure that this process always works and you don't get any wrong results. Go back to that example where we asked whether partial evidence changes things and, in fact, it doesn't. It is simply as if the feature didn't exist. Of course, if we are observing the feature, it's better to include it. But, by ignoring it, we're not saying that we had a wrong result, it's just that we didn't have that feature to measure, that's all. Now, let's see what happens if we actually do have evidence about T equal to yes or no and that suppose we have T equal to yes. So now, we're looking for the probability of rain given W and T are both yes. And now, in this case, we restrict our multiplication by the PT given R table to the case where T equal to yes only. In this case, again, we use the same sequel, only we have another select statement which restricts us to use only those rows of this table. Again, joining with the prior join that we had of T1 and T3. But with the restriction that T = to yes, we'll now get a different result because now we'll multiply this by 0.8 and the second row by 0.1, and these rows don't count, so we get different results. And normalizing gives us the probability of rain equal to yes, given the evidence is now 90%. In a sense, our belief has undergrown a revision from the earlier value of 53% to 90%.. So, new evidence has changed our belief. In classical logic, once you asserted that say, rain occurs or doesn't occur, you can't change our belief. But, in probabilistic reasoning, belief can be revised. It's very important.