Let's return to our Naive Bayes Classifier
now, with rain as the class either yes or
no, and grass being wet as the only
feature that we're going to measure.
But suppose we have the conditional
probability that grass is wet, given that
there is rain.
So obviously, if there is rain, the chance
the grass is wet is very high.
Otherwise, it's still there but not too
much.
And if there's no rain, there is a very
low chance that the grass is wet.
And of course, if there's no rain, the
chance that the grass is not wet is very
high.
Notice again that this is a conditional
probability table.
So, the values for a particular value of
rain being yes, these have to add up to
one.
The prior or the a priori probability that
it rains regardless of whether the grass
is wet or not is a twenty percent of the
time it rains in this locality and 80% it
doesn't rain.
Now, if we just have this one feature, our
Bayes rule gives us the fact that the
joint probability of R and W can be
factored in two ways as the probability of
R given W times the probability of W,
Or this conditional probability times the
prior probability of R.
Now, suppose we are given some evidence
that we actually observe that the grass is
wet,
That is W equal to yes.
Now, we can condition this joint
probability by restricting it to the case
where W is yes.
So, we get probability of R given W equal
to yes times the probability that W equal
to yes.
And this side gets probability of W equal
to yes given R which is a likelihood times
the prior which doesn't have anything to
do with W so we don't have to condition
it.
This element probability that of R given W
equal to yes, can now be written as the
right hand side here divided by the
probability of the evidence.
The inverse of probability W equal to yes,
we'll just write it as sigma.
It's, it's simply the probability of the
evidence and we'll use the sigma wherever
we need to refer to one over the
probability of the evidence that we are
observing.
Pardon the use of sigma for this purpose
here,
Only are we use sigma as a selection
operator but now we will use sigma only as
the inverse of the evidence probability.
Study this carefully so that what we want
is the a posterior probability of R given
the evidence, which is simply proportional
to the likelihoods multiplied by the
prior.
In sequel, we can write this as select sum
of P times P from these two tables where W
equal to yes, and R equal to R because we
have to make sure that we're joining these
tables on the common attribute R and
finally grouping by R.
This is what we've stated before, how to
multiple two probability tables.
So, the result is what we get by
restricting our case, cases to the, the
rows where W is yes,
Multiplying the P values and adding them
up.
But it doesn't add up since there are only
two values,
Each of them having two rows having
distinct values of R.
So we simply multiply 0.9 by 0.2 to get
the first row R equal to yes,
And 0.2 by 0.8 to get the second row.
This is the product of these two
potentials or probability tables.
Now, we need to normalize so that the
total probability that, of R is one.
So, the sum has to be one.
So essentially, we were, we're taking the
probability of R being yes as 0.18 divided
by the sum of these two values which we
get as 0.18 divided by the sum, and you
get 53% of 0.53. This is the chance that,
or our belief that it's raining once we
see the grass as being wet.
The reason it's so small, one might expect
it's a little bit higher,
Is that there are cases where the grass
can be wet without there being any rain.
In fact, there are cases of twenty percent
where the grass is wet when there's no
rain.
In addition, it hardly ever rains.
So, combining these two things together
essentially gives us only a 53% chance of
saying that it's actually raining if the
grass is wet.
Now, let's see what happens when we have
more than one feature as we normally do in
a Baysian classifier.
We're going to have another feature called
thunder, which will say whether or not we
are hearing thunder.
And, let's for the moment, assume that we
don't really know whether we heard thunder
at all.
But the probability of hearing thunder,
given, that it's raining, is 0.8, not
hearing it is 0.2 and so on.
So we have a conditional table even for
this variable thunder, but in this case
let us assume that we haven't actually
observed thunder.
We could be asking our neighbor over the
phone whether the grass is wet, and then
trying to conclude whether or not it's
raining there. But, we didn't ask our
neighbor whether they're hearing thunder.
So now, the probability of RT given W
equal to yes, because that's all we know,
is by the same equation that we had
earlier, probability of W equal to yes
given R, probability of T given R times
the probability of R. This is our Baysian
formula.
Again, we have the probability of the
evidence over here, but this time we don't
have anything that says T equal to yes or
no.
So, we need to sum out T in this product
if we want to understand or, or know only
the probability of rain.
The way we sum that out is that we, again,
do sequel.
So, we select R and the sum of the
products of all three of these columns
from these three tables where W is yes,
and all the common variables, which are
only R over here.
The only common variable between these
tables is R,
And then you group by R.
This effectively sums out T because we're
only selecting R and summing up the values
for different values of T.
Now you can verify that you could do this
by first joining T1 and T3, that is this
table and this table that was the PW given
R and P of R, just like we did earlier,
and then joining the result of that with
the T given R table.
So, we're just going to take the result we
had earlier and join it with this new
table.
But, notice that we get the same result
because when we multiplied this element by
0.8, for R equal to yes, and again by 0.2
for R equal to no. And then, sum them up,
we'll get the same result because 0.8 plus
0.2 is one.
Similarly, for 0.1 and 0.9, so it doesn't
change anything.
This is to be expected since there was no
new evidence as compared to earlier.
Just by including something new in our
diagram, we couldn't expect to change our
belief in R.
Another important point is that if you
remember in one of the homeworks, we asked
whether ignoring some of the features
changes our belief.
Well, it does, change our belief as
compared to if we had actually observed
the features.
But it's in some sense, equally correct,
because suppose we didn't observe the
feature.
So this is actually the same as a Bayesian
classifier with only two features, but
only partially being observed, that only
one of them is being observed.
So you could have millions of features and
only observe one or two, and you'll still
get the same result just by putting them
in your classifier.
The summing out process make sure that
this process always works and you don't
get any wrong results.
Go back to that example where we asked
whether partial evidence changes things
and, in fact, it doesn't. It is simply as
if the feature didn't exist.
Of course, if we are observing the
feature, it's better to include it.
But, by ignoring it, we're not saying that
we had a wrong result, it's just that we
didn't have that feature to measure,
that's all.
Now, let's see what happens if we actually
do have evidence about T equal to yes or
no and that suppose we have T equal to
yes. So now, we're looking for the
probability of rain given W and T are both
yes. And now, in this case, we restrict
our multiplication by the PT given R table
to the case where T equal to yes only.
In this case, again, we use the same
sequel, only we have another select
statement which restricts us to use only
those rows of this table.
Again, joining with the prior join that we
had of T1 and T3.
But with the restriction that T = to yes,
we'll now get a different result because
now we'll multiply this by 0.8 and the
second row by 0.1, and these rows don't
count, so we get different results.
And normalizing gives us the probability
of rain equal to yes, given the evidence
is now 90%.
In a sense, our belief has undergrown a
revision from the earlier value of 53% to
90%..
So, new evidence has changed our belief.
In classical logic, once you asserted that
say, rain occurs or doesn't occur, you
can't change our belief.
But, in probabilistic reasoning, belief
can be revised.
It's very important.