Let's take a look at how we might view one
particular situation of wetness of grass
being caused by rain in the language of
probability.
As before, when we studied base rule, we
break it up into cases.
So, for our situations we have rain
occuring out of a total of n observations
or n days.
Out of those in n cases we have the grass
being wet and in a certain situation we
have both rain occurring, as well as the
grass being wet for i cases.
So we have data which is simply w being
yes or no, rain being yes or no, and lots,
and lots of data and we can simply compute
the probabilities that rain and wetness,
either occur or don't occur, in the
following way.
So, the probability that it rains and the
grass is wet, is i over n.
The probability that it doesn't rain but
the grass is still wet is m minus i over
n.
That it rains, and the grass is not wet is
k minus i over n. And the probability that
it neither rains nor the grass is wet is n
minus m minus k plus i over n.
Notice that we had to add in i, because
these m cases included i, so did these k
cases, so we had these overlap cases
coming twice, so we had to add them back
in since they were subtracted twice when
we subtracted m and k.
Well, a table like this is called a
probability table or also a potential.
Now,
Suppose one wanted to find out the
probability that it rained or didn't rain.
Well, one could simply do that from the
data directly by simply adding up all the
cases where it rained and dibiding by the
total.
Similarly adding up all the cases where it
didn't rain, dividing by the total, or one
could work with the smaller table where we
had already done some of the calculations
beforehand, but forget about w, that is
sum it out.
In the sense that you sum all that goes up
and wherever we have the same values of r,
we simply add up those rows regardless of
what the value of w is.
So, the first row and third row get added
together. The second row and the last row
get added together. Resulting in the
following,
Which is clearly the sum of the first and
the third and the second and the fourth,
which is also called the marginalization
of the w column.
In other words, we have gotten rid of the
w variable by summing it out wherever the
values of R are the same.
You can easily verify that by
marginalizing out w or summing it out, one
does indeed get k over n, which is the
probability of rain being true.
And n minus k over n, which is the
probability of rain being not true.
Now notice that in the language of SEQUEL,
marginalization is equivalent to an
aggregation on the column P in,
For example in SEQUEL, one would write it
as, if you think about this as a table,
select R and the sum of P from this table
group by R.
That means wherever there is an equal R,
we keep a separate row in the result, but
add them all up,
Only keep distinct rows where there are
distinct values of R.
In the language of relational algebra,
this is written as follows.
We have an, aggregation operation, which
is a sum,
Grouping by R on the table R,W..
It's quite interesting that one can
perform this operation on the probability
table using SEQUEL nd we shall exploit
this fact as we go along.
So please take a careful look and
understand this.
Now, let's see how we might write Bayes
rule in terms of these potentials or
probability tables.
Remember that Baye's rule stated that the
joint probability that R and W together is
the conditional probability of R given W
multiplied by the probability of W itself,
Which was, in the simplest case, of yes
and yes.
We will simply write it i equal to n,
rather i by m, as i over m times m by n.
Simple arithmetic.
Similarly for every other row, we would
simply rewrite the denominator of m by the
appropriate value so that we get the
conditional probability over here, and the
probability of W falls out on in this
term.
Verify this and remind yourself of Baye's
rule from the leasing vector.
Now, let's notice how this multiplication
actually takes place.
Consider these as three tables.
T0, T1, and T2.
T1 has R and W, but T2 has only W.
Now, let's see how we might multiply these
two probability tables in another way.
The probability that R given W multiplied
by the probability of W is some way of
combining these two tables.
That's some way just happens to be the
join of these tables in SEQUEL on the
common attribute W.
Suppose we were to perform this following
sequel.
Select R,
Sum of the product, p1 and p2, from these
two tables respectively where W1 equals W2
and group by R.
So essentially all we're saying is we're
going to multiply i over m by m over n,
Similarly for the other case where R
equals to yes.
We also multiply k minus i by n minus m by
the corresponding value in this table,
where W has the same value n, and then add
this term and this term up, and you can
easily verify that will give me k over n,
which is the probability of rain being
yes.
Similarly, for the case of rain being no,
one can work it out that the results of
the sequel will get the correct value n
minus k over n.
To conclude, we can multiply probability
tables or rather potentials as they are
often called in the probability literature
using SEQUEL.
Now, let's turn to another important
element which is evidence.
Suppose we have a joint probability
distribution,
Like probability of rain and wetness, and
we find that the grass is actually wet.
In other words, we have observed some
evidence and that evidence tells us the
grass is wet.
So, W equal to y.
We need to restrict this table to only
those entries where W equal to yes.
So, essentially we're saying, let's drop
all the entries where W equal to no and
restrict it to W equal to yes.
And that's this restriction operator,
which is called the application of
evidence to this potential.
If we once again expand this restrictor
table using Baye's rule, which is just a
subset of what we had earlier,
We get the probability of R given W equal
to yes which is just a restriction of the
overall probability that, of R given W
times the probability that W equal to yes.
We don't have to worry about W equal to no
anymore.
In other words, we say that probability of
R, the W restricted is the probability of
R given W equal to yes times the
probability of the evidence, which is the
probability that W equal to yes.
Now, coming back to SEQUEL, it turns out
that applying evidence is the same thing
as using the select operator on the table
R,W, which is just this table.
So what we did, is multiply the R,W by the
restriction operator, which is the same as
doing a select.
In other words, select R,W,P from this
table where W equal to y.
That's very obvious.
The a posteriori probability of R given
evidence, which was just this,
Is now merely the restriction of the joint
probability to the case W equal to yes
divided by the probability of the evidence
itself.
Notice that if we did that division, we
get exactly this, this table, which is
exactly what we wanted which is the eight
plus here, the probability of R given W
equal to yes, which is different from the
probability of R in general.
We'll work out an example in a short
while. But, for the moment just notice
that the actions of multiplying
probability tables is just taking a join
in SEQUEL.
The act of taking evidence, that is
observing, one of the variables or more,
one or more of the variables is merely
issuing an appropriate select statement in
SEQUEL.
So, let's now take a look at what all this
means for the case of classification or
abductive reasoning is, which is what we
were interested in the first place.
Which will bring us right back to the
language of classifiers and the naive
Bayes classifier, but this time, in terms
of probability tables rather than
individual probabilities.