Let's take a look at how we might view one particular situation of wetness of grass being caused by rain in the language of probability. As before, when we studied base rule, we break it up into cases. So, for our situations we have rain occuring out of a total of n observations or n days. Out of those in n cases we have the grass being wet and in a certain situation we have both rain occurring, as well as the grass being wet for i cases. So we have data which is simply w being yes or no, rain being yes or no, and lots, and lots of data and we can simply compute the probabilities that rain and wetness, either occur or don't occur, in the following way. So, the probability that it rains and the grass is wet, is i over n. The probability that it doesn't rain but the grass is still wet is m minus i over n. That it rains, and the grass is not wet is k minus i over n. And the probability that it neither rains nor the grass is wet is n minus m minus k plus i over n. Notice that we had to add in i, because these m cases included i, so did these k cases, so we had these overlap cases coming twice, so we had to add them back in since they were subtracted twice when we subtracted m and k. Well, a table like this is called a probability table or also a potential. Now, Suppose one wanted to find out the probability that it rained or didn't rain. Well, one could simply do that from the data directly by simply adding up all the cases where it rained and dibiding by the total. Similarly adding up all the cases where it didn't rain, dividing by the total, or one could work with the smaller table where we had already done some of the calculations beforehand, but forget about w, that is sum it out. In the sense that you sum all that goes up and wherever we have the same values of r, we simply add up those rows regardless of what the value of w is. So, the first row and third row get added together. The second row and the last row get added together. Resulting in the following, Which is clearly the sum of the first and the third and the second and the fourth, which is also called the marginalization of the w column. In other words, we have gotten rid of the w variable by summing it out wherever the values of R are the same. You can easily verify that by marginalizing out w or summing it out, one does indeed get k over n, which is the probability of rain being true. And n minus k over n, which is the probability of rain being not true. Now notice that in the language of SEQUEL, marginalization is equivalent to an aggregation on the column P in, For example in SEQUEL, one would write it as, if you think about this as a table, select R and the sum of P from this table group by R. That means wherever there is an equal R, we keep a separate row in the result, but add them all up, Only keep distinct rows where there are distinct values of R. In the language of relational algebra, this is written as follows. We have an, aggregation operation, which is a sum, Grouping by R on the table R,W.. It's quite interesting that one can perform this operation on the probability table using SEQUEL nd we shall exploit this fact as we go along. So please take a careful look and understand this. Now, let's see how we might write Bayes rule in terms of these potentials or probability tables. Remember that Baye's rule stated that the joint probability that R and W together is the conditional probability of R given W multiplied by the probability of W itself, Which was, in the simplest case, of yes and yes. We will simply write it i equal to n, rather i by m, as i over m times m by n. Simple arithmetic. Similarly for every other row, we would simply rewrite the denominator of m by the appropriate value so that we get the conditional probability over here, and the probability of W falls out on in this term. Verify this and remind yourself of Baye's rule from the leasing vector. Now, let's notice how this multiplication actually takes place. Consider these as three tables. T0, T1, and T2. T1 has R and W, but T2 has only W. Now, let's see how we might multiply these two probability tables in another way. The probability that R given W multiplied by the probability of W is some way of combining these two tables. That's some way just happens to be the join of these tables in SEQUEL on the common attribute W. Suppose we were to perform this following sequel. Select R, Sum of the product, p1 and p2, from these two tables respectively where W1 equals W2 and group by R. So essentially all we're saying is we're going to multiply i over m by m over n, Similarly for the other case where R equals to yes. We also multiply k minus i by n minus m by the corresponding value in this table, where W has the same value n, and then add this term and this term up, and you can easily verify that will give me k over n, which is the probability of rain being yes. Similarly, for the case of rain being no, one can work it out that the results of the sequel will get the correct value n minus k over n. To conclude, we can multiply probability tables or rather potentials as they are often called in the probability literature using SEQUEL. Now, let's turn to another important element which is evidence. Suppose we have a joint probability distribution, Like probability of rain and wetness, and we find that the grass is actually wet. In other words, we have observed some evidence and that evidence tells us the grass is wet. So, W equal to y. We need to restrict this table to only those entries where W equal to yes. So, essentially we're saying, let's drop all the entries where W equal to no and restrict it to W equal to yes. And that's this restriction operator, which is called the application of evidence to this potential. If we once again expand this restrictor table using Baye's rule, which is just a subset of what we had earlier, We get the probability of R given W equal to yes which is just a restriction of the overall probability that, of R given W times the probability that W equal to yes. We don't have to worry about W equal to no anymore. In other words, we say that probability of R, the W restricted is the probability of R given W equal to yes times the probability of the evidence, which is the probability that W equal to yes. Now, coming back to SEQUEL, it turns out that applying evidence is the same thing as using the select operator on the table R,W, which is just this table. So what we did, is multiply the R,W by the restriction operator, which is the same as doing a select. In other words, select R,W,P from this table where W equal to y. That's very obvious. The a posteriori probability of R given evidence, which was just this, Is now merely the restriction of the joint probability to the case W equal to yes divided by the probability of the evidence itself. Notice that if we did that division, we get exactly this, this table, which is exactly what we wanted which is the eight plus here, the probability of R given W equal to yes, which is different from the probability of R in general. We'll work out an example in a short while. But, for the moment just notice that the actions of multiplying probability tables is just taking a join in SEQUEL. The act of taking evidence, that is observing, one of the variables or more, one or more of the variables is merely issuing an appropriate select statement in SEQUEL. So, let's now take a look at what all this means for the case of classification or abductive reasoning is, which is what we were interested in the first place. Which will bring us right back to the language of classifiers and the naive Bayes classifier, but this time, in terms of probability tables rather than individual probabilities.