1
00:00:00,000 --> 00:00:09,190
Let's take a look at how we might view one
particular situation of wetness of grass

2
00:00:09,190 --> 00:00:14,880
being caused by rain in the language of
probability.

3
00:00:15,340 --> 00:00:21,714
As before, when we studied base rule, we
break it up into cases.

4
00:00:21,714 --> 00:00:29,631
So, for our situations we have rain
occuring out of a total of n observations

5
00:00:29,631 --> 00:00:34,714
or n days.
Out of those in n cases we have the grass

6
00:00:34,714 --> 00:00:41,903
being wet and in a certain situation we
have both rain occurring, as well as the

7
00:00:41,903 --> 00:00:48,014
grass being wet for i cases.
So we have data which is simply w being

8
00:00:48,014 --> 00:00:55,833
yes or no, rain being yes or no, and lots,
and lots of data and we can simply compute

9
00:00:55,833 --> 00:01:03,099
the probabilities that rain and wetness,
either occur or don't occur, in the

10
00:01:03,099 --> 00:01:07,922
following way.
So, the probability that it rains and the

11
00:01:07,922 --> 00:01:13,709
grass is wet, is i over n.
The probability that it doesn't rain but

12
00:01:13,709 --> 00:01:17,480
the grass is still wet is m minus i over
n.

13
00:01:18,380 --> 00:01:26,244
That it rains, and the grass is not wet is
k minus i over n. And the probability that

14
00:01:26,244 --> 00:01:33,080
it neither rains nor the grass is wet is n
minus m minus k plus i over n.

15
00:01:33,080 --> 00:01:39,858
Notice that we had to add in i, because
these m cases included i, so did these k

16
00:01:39,858 --> 00:01:46,380
cases, so we had these overlap cases
coming twice, so we had to add them back

17
00:01:46,380 --> 00:01:51,786
in since they were subtracted twice when
we subtracted m and k.

18
00:01:51,786 --> 00:01:58,780
Well, a table like this is called a
probability table or also a potential.

19
00:01:59,060 --> 00:02:03,827
Now,
Suppose one wanted to find out the

20
00:02:03,827 --> 00:02:11,110
probability that it rained or didn't rain.
Well, one could simply do that from the

21
00:02:11,110 --> 00:02:17,929
data directly by simply adding up all the
cases where it rained and dibiding by the

22
00:02:17,929 --> 00:02:21,907
total.
Similarly adding up all the cases where it

23
00:02:21,907 --> 00:02:28,727
didn't rain, dividing by the total, or one
could work with the smaller table where we

24
00:02:28,727 --> 00:02:35,222
had already done some of the calculations
beforehand, but forget about w, that is

25
00:02:35,222 --> 00:02:39,606
sum it out.
In the sense that you sum all that goes up

26
00:02:39,606 --> 00:02:47,294
and wherever we have the same values of r,
we simply add up those rows regardless of

27
00:02:47,294 --> 00:02:52,832
what the value of w is.
So, the first row and third row get added

28
00:02:52,832 --> 00:02:59,494
together. The second row and the last row
get added together. Resulting in the

29
00:02:59,494 --> 00:03:03,951
following,
Which is clearly the sum of the first and

30
00:03:03,951 --> 00:03:10,877
the third and the second and the fourth,
which is also called the marginalization

31
00:03:10,877 --> 00:03:15,751
of the w column.
In other words, we have gotten rid of the

32
00:03:15,751 --> 00:03:21,480
w variable by summing it out wherever the
values of R are the same.

33
00:03:21,480 --> 00:03:27,513
You can easily verify that by
marginalizing out w or summing it out, one

34
00:03:27,513 --> 00:03:33,296
does indeed get k over n, which is the
probability of rain being true.

35
00:03:33,296 --> 00:03:38,660
And n minus k over n, which is the
probability of rain being not true.

36
00:03:39,340 --> 00:03:47,469
Now notice that in the language of SEQUEL,
marginalization is equivalent to an

37
00:03:47,469 --> 00:03:54,948
aggregation on the column P in,
For example in SEQUEL, one would write it

38
00:03:54,948 --> 00:04:03,620
as, if you think about this as a table,
select R and the sum of P from this table

39
00:04:04,080 --> 00:04:09,264
group by R.
That means wherever there is an equal R,

40
00:04:09,264 --> 00:04:15,159
we keep a separate row in the result, but
add them all up,

41
00:04:15,159 --> 00:04:21,360
Only keep distinct rows where there are
distinct values of R.

42
00:04:22,200 --> 00:04:28,458
In the language of relational algebra,
this is written as follows.

43
00:04:28,458 --> 00:04:33,080
We have an, aggregation operation, which
is a sum,

44
00:04:33,440 --> 00:04:43,952
Grouping by R on the table R,W..
It's quite interesting that one can

45
00:04:43,952 --> 00:04:49,994
perform this operation on the probability
table using SEQUEL nd we shall exploit

46
00:04:49,994 --> 00:04:54,341
this fact as we go along.
So please take a careful look and

47
00:04:54,341 --> 00:04:58,393
understand this.
Now, let's see how we might write Bayes

48
00:04:58,393 --> 00:05:02,520
rule in terms of these potentials or
probability tables.

49
00:05:02,520 --> 00:05:10,315
Remember that Baye's rule stated that the
joint probability that R and W together is

50
00:05:10,315 --> 00:05:17,943
the conditional probability of R given W
multiplied by the probability of W itself,

51
00:05:17,943 --> 00:05:22,171
Which was, in the simplest case, of yes
and yes.

52
00:05:22,171 --> 00:05:29,064
We will simply write it i equal to n,
rather i by m, as i over m times m by n.

53
00:05:29,064 --> 00:05:34,303
Simple arithmetic.
Similarly for every other row, we would

54
00:05:34,303 --> 00:05:41,667
simply rewrite the denominator of m by the
appropriate value so that we get the

55
00:05:41,667 --> 00:05:49,904
conditional probability over here, and the
probability of W falls out on in this

56
00:05:49,904 --> 00:05:56,000
term.
Verify this and remind yourself of Baye's

57
00:05:56,000 --> 00:06:04,193
rule from the leasing vector.
Now, let's notice how this multiplication

58
00:06:04,193 --> 00:06:10,741
actually takes place.
Consider these as three tables.

59
00:06:10,741 --> 00:06:16,502
T0, T1, and T2.
T1 has R and W, but T2 has only W.

60
00:06:16,502 --> 00:06:26,520
Now, let's see how we might multiply these
two probability tables in another way.

61
00:06:26,840 --> 00:06:33,941
The probability that R given W multiplied
by the probability of W is some way of

62
00:06:33,941 --> 00:06:39,902
combining these two tables.
That's some way just happens to be the

63
00:06:39,902 --> 00:06:44,900
join of these tables in SEQUEL on the
common attribute W.

64
00:06:45,620 --> 00:06:49,658
Suppose we were to perform this following
sequel.

65
00:06:49,658 --> 00:06:54,787
Select R,
Sum of the product, p1 and p2, from these

66
00:06:54,787 --> 00:07:03,468
two tables respectively where W1 equals W2
and group by R.

67
00:07:03,468 --> 00:07:10,761
So essentially all we're saying is we're
going to multiply i over m by m over n,

68
00:07:10,761 --> 00:07:15,411
Similarly for the other case where R
equals to yes.

69
00:07:15,411 --> 00:07:22,646
We also multiply k minus i by n minus m by
the corresponding value in this table,

70
00:07:22,646 --> 00:07:29,375
where W has the same value n, and then add
this term and this term up, and you can

71
00:07:29,375 --> 00:07:36,021
easily verify that will give me k over n,
which is the probability of rain being

72
00:07:36,021 --> 00:07:39,677
yes.
Similarly, for the case of rain being no,

73
00:07:39,677 --> 00:07:46,323
one can work it out that the results of
the sequel will get the correct value n

74
00:07:46,323 --> 00:07:50,974
minus k over n.
To conclude, we can multiply probability

75
00:07:50,974 --> 00:07:58,040
tables or rather potentials as they are
often called in the probability literature

76
00:07:58,460 --> 00:08:05,740
using SEQUEL.
Now, let's turn to another important

77
00:08:06,140 --> 00:08:13,408
element which is evidence.
Suppose we have a joint probability

78
00:08:13,408 --> 00:08:20,318
distribution,
Like probability of rain and wetness, and

79
00:08:20,318 --> 00:08:28,594
we find that the grass is actually wet.
In other words, we have observed some

80
00:08:28,594 --> 00:08:33,084
evidence and that evidence tells us the
grass is wet.

81
00:08:33,084 --> 00:08:38,473
So, W equal to y.
We need to restrict this table to only

82
00:08:38,473 --> 00:08:44,608
those entries where W equal to yes.
So, essentially we're saying, let's drop

83
00:08:44,608 --> 00:08:50,412
all the entries where W equal to no and
restrict it to W equal to yes.

84
00:08:50,412 --> 00:08:56,381
And that's this restriction operator,
which is called the application of

85
00:08:56,381 --> 00:09:01,909
evidence to this potential.
If we once again expand this restrictor

86
00:09:01,909 --> 00:09:07,514
table using Baye's rule, which is just a
subset of what we had earlier,

87
00:09:07,514 --> 00:09:14,338
We get the probability of R given W equal
to yes which is just a restriction of the

88
00:09:14,338 --> 00:09:20,837
overall probability that, of R given W
times the probability that W equal to yes.

89
00:09:20,837 --> 00:09:24,980
We don't have to worry about W equal to no
anymore.

90
00:09:26,760 --> 00:09:34,147
In other words, we say that probability of
R, the W restricted is the probability of

91
00:09:34,147 --> 00:09:41,067
R given W equal to yes times the
probability of the evidence, which is the

92
00:09:41,067 --> 00:09:50,033
probability that W equal to yes.
Now, coming back to SEQUEL, it turns out

93
00:09:50,033 --> 00:09:58,702
that applying evidence is the same thing
as using the select operator on the table

94
00:09:58,702 --> 00:10:06,383
R,W, which is just this table.
So what we did, is multiply the R,W by the

95
00:10:06,383 --> 00:10:11,894
restriction operator, which is the same as
doing a select.

96
00:10:11,894 --> 00:10:17,792
In other words, select R,W,P from this
table where W equal to y.

97
00:10:17,792 --> 00:10:25,007
That's very obvious.
The a posteriori probability of R given

98
00:10:25,007 --> 00:10:34,301
evidence, which was just this,
Is now merely the restriction of the joint

99
00:10:34,301 --> 00:10:42,127
probability to the case W equal to yes
divided by the probability of the evidence

100
00:10:42,127 --> 00:10:46,880
itself.
Notice that if we did that division, we

101
00:10:46,880 --> 00:10:53,408
get exactly this, this table, which is
exactly what we wanted which is the eight

102
00:10:53,408 --> 00:11:00,377
plus here, the probability of R given W
equal to yes, which is different from the

103
00:11:00,377 --> 00:11:06,111
probability of R in general.
We'll work out an example in a short

104
00:11:06,111 --> 00:11:13,788
while. But, for the moment just notice
that the actions of multiplying

105
00:11:13,788 --> 00:11:19,259
probability tables is just taking a join
in SEQUEL.

106
00:11:19,259 --> 00:11:26,195
The act of taking evidence, that is
observing, one of the variables or more,

107
00:11:26,195 --> 00:11:34,443
one or more of the variables is merely
issuing an appropriate select statement in

108
00:11:34,443 --> 00:11:39,148
SEQUEL.
So, let's now take a look at what all this

109
00:11:39,148 --> 00:11:46,326
means for the case of classification or
abductive reasoning is, which is what we

110
00:11:46,326 --> 00:11:52,787
were interested in the first place.
Which will bring us right back to the

111
00:11:52,787 --> 00:11:59,786
language of classifiers and the naive
Bayes classifier, but this time, in terms

112
00:11:59,786 --> 00:12:05,080
of probability tables rather than
individual probabilities.