Coins and dice provide a nice simple 
model of how to calculate probabilities, 
but everyday life is a lot more 
complicated and it's not taking up with 
gambling. 
At least I hope your life is not taking 
up with gambling. 
So in order to make probabilities more 
applicable to everyday life, we need to 
look at slightly more complicated 
methods. 
Now, because these methods are more 
complicated, this lecture is going to be 
an honors lecture. 
It's optional. 
It will not be on the quiz, so don't get 
worried about that. 
But it is still useful, and it's 
fascinating, 
and it'll help you avoid some mistakes 
that people make and that create a lot of 
problems. 
So I hope you will stick with it and 
listen to this lecture. 
And there will be exercises to help you 
figure out whether you understand the 
material or not. 
But don't get too worried, because it's 
not going to be on the quiz. 
The real problem that we'll be facing in 
this lecture, is the problem of tests. 
We use tests all the time, we use tests 
to figure out. 
Whether you have a certain medical 
condition we use test to predict the 
weather or to predict people's future 
behavior. 
We have certain indicators that how 
they're going to act. 
Either commit a crime or not commit a 
crime, but also whether they're going to 
pass, do well in school, or fail. 
we always use these tests when we don't 
know for certain, but we want some kind 
of evidence or some kind of indicator. 
The problem is, none of these tests are 
perfect. 
They always contain errors of various 
sorts, and what we're going to have to do 
is to see how to take those errors of 
different sorts and build them together 
into a method, and then a formula for 
calculating how reliable the method is 
for detecting the thing that we want to 
detect. 
This problem is a lot like a problem we 
faced earlier when we were talking about 
applying generalizations to particular 
cases, because here we're going to be 
applying probabilities to particular 
cases. 
So it'll seem familiar to you in certain 
parts but you'll see that this case is a 
little trickier. 
The best examples occur in medicine. 
So just imagine that you go to your 
doctor for a regular checkup. 
You don't have any special symptoms, but 
he decides to do a few screening tests. 
And unfortunately, and very worryingly, 
it turns out that you test positive on 
one test for a particular form of cancer, 
a certain kind of medical condition. 
Well, what that means is that you might 
have cancer. 
Might? 
Great. 
You want to know whether you do have 
cancer, but of course, finding out for 
sure whether or not you have cancer is 
going to take further tests, and those 
tests might be expensive. They might be 
dangerous, they're going to be invasive 
in various ways. 
So you really want to know what's the 
probability, given that you tested 
positive on this one test, that you 
really have cancer. 
Now clearly, that probability is going to 
depend on in a number of facts about this 
type of cancer, about the type of test, 
and so on. 
And I am not a doctor, I am not giving 
you medical advice. 
If you test positive on a test, go talk 
to your doctor, don't trust me because I 
am just making up numbers here. 
But let's do make a few number's and 
figure out what the likelihood is of 
having cancer, given that you tested 
positive. 
So, let's imagine that the base rate of 
this particular type of cancer in the 
population is 3%.. 
That is three out of 1,000 or 003.. 
And to say that's the base rate or it's 
sometimes called the prevalence of the 
condition in the population, that's 
simply to say that out of 1,000 people 
chosen randomly in the population, you'd 
have about three that have this 
condition. 
It's just a percentage of the general 
population. 
So, that's the condition, what about the 
test? 
Well the first thing we want to know is 
the sensitivity of the test. 
The sensitivity of the test, we're going 
to assume is 99,. and what that means is 
that out of 100 people, who have this 
condition, 
99 of them will test positive. 
So, this test is pretty good at figuring 
out, from among the people who have the 
condition, 
which ones do. 
99 of those 100 people who have the 
condition will test positive. 
The other feature is specificity. 
And what that means is the percentage of 
the people who don't have the condition 
who will test negative. 
The point here is that you're not 
going to get a positive result for people 
who don't have the condition. 
Right? 
Because you want it to be specific to 
this particular condition, and not get a 
bunch of positives for people who have 
other types of conditions, or no medical 
condition at all. 
So the specificity, we're going to 
assume, in this particular case we're 
talking about, is also 99 percent. 
Now, what we want to know is the 
probability, that you have the cancer, 
the condition, 
given that you tested positive on the 
test. But notice that the sensitivity 
tells you the probability that you will 
test positive, given that you have the 
condition. 
We want to know the opposite of that, the 
probability that you have the condition, 
given that you tested positive. 
And that's what we have to do a little 
calculation to figure out. 
But before we do that calculation, I want 
you to think about these figures that 
I've given you, the prevalence in the 
population, the sensitivity of the test, 
the specificity of the test, and just 
make a guess. 
Just start out by writing down on a piece 
of paper what you think the probability 
is that you would have the cancer, given 
that you tested positive on the test. 
Take a minute and think about it, and 
write it down. 
But we don't want to just guess about 
medical conditions, about probabilities 
that really matter as much as this one 
do. 
Instead, we want to calculate what the 
probability really is. So let's go 
through it carefully, and show you how to 
use what I'll call the Box Method, in 
order to calculate the real likelihood 
that you have the condition, given that 
you got a positive test result. 
What we need to do is to divide the 
population into four different groups. 
The group that has the condition and 
tested positive, 
the group that has the condition and 
tested negative, the group that doesn't 
have the condition and tested positive 
and the group that doesn't have the 
condition and tested negative. And this 
chart will show you a nice simple way of 
organizing all of that information. 
Because this row, the top row, 
tells you all the people who tested 
positive. 
The bottom row tells you the people who 
tested negative. 
Then, the left column gives you the 
people who do have the medical condition, 
in this case some kind of cancer, 
and the right column tells you the people 
who do not have that condition. 
Now, what we need to do is to start 
filling it out with numbers. 
Now, the first thing we need to specify 
is the population. 
In this case, we want to start with a big 
enough population that we're not going to 
have a lot of fractions in the other 
boxes. 
So let's just imagine that the population 
is 100,000. 
Make it one million or ten million. 
It doesn't matter, because we're going to 
be interested in the ratios of the 
different groups. 
We can use that 100,000 to fill out the 
other boxes if we know the prevalence or 
the base rate. 
Because the base rate tells you what 
percentage of that 100,000 actually do 
have the condition and don't have the 
condition. 
We imagined, remember we're just making 
up numbers here, but we imagined that the 
prevalence of this condition is 3%. and 
that means out of 100,000 people, there 
will be 300 who do have the medical 
condition. 
Well if there are 300 who have it and 
there are 100,000 total, we can figure 
out how many don't have the medical 
condition by just subtracting, 
which means, 99,700 do not have the 
medical condition. 
Okay? 
Now, we've divided the population into 
our two columns. 
The ones that do and the ones that don't 
have the medical condition. 
The next step is to figure out how many 
are going to test positive. 
And how many are going to test negative 
out of each of these groups. 
For that, we first need the sensitivity. 
The sensitivity tells us the percentage 
of the cases that have the condition who 
will test positive. 
So the people who have the condition are 
the 300. 
The ones who test positive are going to 
go up in this area. 
And we know from the sensitivity being 
0.99 or 99% that the number in that area 
should be 99% of 300, or 297. 
And of course, if that's the number that 
tests positive then the remainder are 
going to test negative, and that means 
that we'll have three, 
which shouldn't surprise you because if 
99% of the cases that have it test 
positive, then 1% will test negative and 
1% of 300 is 3. 
Good. 
So we got the first column done. 
Now, the next question is going to be the 
specificity. 
We can use the specificity to figure out 
what goes in that next column. 
If the specificity is 99, 
and we know that 99,700 people do not 
have the condition out of our sample of 
100,000. 
Well, that means that 99% of 99,700 are 
going to test negative. 
Becuase the specificities, the percentage 
of cases without the condition that test 
negative. 
And that means that we'll have 98,703 
among people who do not have the 
condition who test negative. 
How many are you going to test positive? 
The rest of them. 
So 99,700 minus 98,703 is going to be 
997. 
And of course that shouldn't be 
surprising again, because 1% of 99,700 is 
997. 
We only got two boxes left to fill out. 
How do you fill out those? 
Well this box in the upper right is the 
total number of people in this population 
of 100,000 who test positive. 
And so we can get that by adding the ones 
that do have the condition and test 
positive and the ones that don't have the 
condition and test positive. Just add 
them together, 
and you get 1294. 
And you do the same on the next row 
because that blank is the area that has 
all the people who test negative, 
and three people who have the condition 
test negative. 
98,703 people who do not have the 
condition test negative. 
So the total is going to be 98,706. 
And we can check to make sure that we got 
it right by just adding them together, 
1,294 plus 98,706 is equal 100,000. 
[SOUND] We got it right, okay. 
So, now we divided the population into 
those people who have the condition, 
those people who don't have the condition 
and we know how many of each of those 
groups test positive and how many of each 
of those groups test negative. 
The real question is what's the 
probability that I have cancer or the 
medical condition given that I tested 
positive. 
How do we figure that out? 
Well, 
the total number of positive tests was 
1294, 
and the people who tested positive who 
really had the condition was 297, so it 
looks like the probability of actually 
having the condition given that you 
tested positive is 297 out of 1294 or 
23.. 
That's 23%, less that one out of four. 
Is that what you guessed? 
Most people, including most doctors, when 
they hear that the test is 99% sensitive 
and 99% specific will guess a lot higher 
than one and four. 
Oh my gosh, I'm a doctor and I never 
would have thought that. 
Now, don't worry, she's not a physician, 
she's a meta-physician. 
[INAUDIBLE] But in this case, the 
probability really is just one in four 
that you have that medical condition. 
Now, how did that happen? 
The reason was that the prevalence or the 
base rate was so low, that even a small 
rate of false positives, given the 
massive numbers of people who don't have 
the condition, will mean that there are 
more false positives, three times as many 
as there are true positives and that's 
why the probability is just one in four 
actually a little less than one in four 
that you have the medical condition even 
when you tested positive. 
I want to add a quick caveat here in 
order to avoid misinterpretation. 
Because the point here is that if you 
have a screening test for a condition 
with a very low base rate or prevalence. 
And you don't have any symptoms that put 
you in a special category then you need 
to get another test before you jump to 
any conclusions about having the medical 
condition. 
Because, if you had that other test, the 
fact that you test positive at first test 
puts you in smaller class with a much 
higher base rank for prevalence and now 
the probability is going to go up. 
Most doctors know that. 
And that's why after the first test they 
don't jump to conclusions and they order 
another test. 
But many patients don't realise that, 
and they get extremely worried after a 
single test, even when they don't have 
any symptoms. 
So that's the mistake that we're trying 
to avoid here. 
Now that's surprising, 
but it actually applies to many different 
areas of life. 
It applies for example to medical tests 
with all kinds of other diseases, not 
just cancer, or colon cancer, 
but pretty much every disease where the 
prevalence is extremely low. 
It applies also to drug tests. 
If somebody gets a positive drug test, 
does that mean that they really were 
using drugs? 
Well, if it's a population where the base 
rate or prevalence of drug use is quite 
low, then it might not. 
Of course, if you assume that the 
prevalence or base rate is quite high, 
then you're going to believe that drug 
test, but you need to know the facts 
about what the prevalence, or base rate 
really is in order to calculate 
accurately the probability that this 
person really was using drugs. 
Same applies to evidence in legal trials. 
Take eye witnesses for example. 
It's very tricky. 
Someone's trying to use their eyes as a 
test for what they see. 
They might identify a friend. 
Or they might just say that car that did 
the hit and run accident was a Porsche. 
Well, how good are they at identifying 
Porsches? 
If they get it right most of the time, 
but not always, 
and sometimes they don't get it right 
when it is a Porsche, then we've got the 
sensitivity and specificity of what they 
identify, 
and we can use that to calculate how 
likely it is that their evidence in the 
trial really is reliable or not. 
Another example is the prediction of 
future behavior. 
We might have some kind of marker. 
That a certain group of people with that 
marker have a certain likelihood of 
committing crimes, but if crimes are very 
rare in that community and every other, 
then a test which has a pretty good 
sensitivity and specificity still might 
not be good enough when we're talking 
about something like crime that's 
actually very rare, and has a very low 
prevalence, or base rate in most 
communities. 
And the same applies to failing out of 
school. 
Our SAT scores or GRE scores are going to 
be good predictors of, of who's going to 
fail out of school. 
Well, if very few people fail out of 
school so that the prevalence of base 
rate is very low, then even if they're 
pretty sensitive and specific, they might 
not be good predictors. 
So this same type of problem arises in a 
lot of different areas, and I'm not 
going to go through more examples right 
now, but we'll have plenty of examples in 
these exercises at the end of this 
chapter. 
I want to end, though, by saying a few 
things that are a bit more technical 
about this method. 
First, there's a lot of terminology to 
learn, because when you read about using 
this method in other areas for other 
types of topics, then you'll run into 
these terms, and it's a good idea to know 
them. 
So first. 
The cases where the person does have the 
condition and also tests positive are 
called hits or true positives, different 
people use different terms. 
The cases where. 
The person tests positive, but they don't 
have the condition, are called false 
positives or false alarms. 
The cases where a person really does have 
the condition but tests negative are 
called misses or false negatives. 
And the cases where the person does not 
have the condition and the test comes out 
negative are called true negatives. 
because they're negative and it's true 
that they don't have the condition. 
If we put together the false negatives 
and the true negatives. 
We get the total set of negatives. 
And if we put together the true positives 
and the false positives, we get the total 
set of positives. 
And of course we have the general 
population. 
And within that population a percentage 
that have the condition and a percentage 
that don't have the condition. 
Now, what's the base rate? 
The base rate, in this population, is 
simply the set that had the condition / 
the total population. 
Which is box seven / box nine. 
If we use e for the evidence, and h for 
the hypothesis being true. 
That the condition really does exists. 
Then that's the probability of h. 
And the sensitivity is going to be. 
The total number of true positives 
divided by the total number of people 
with the condition. 
Because it's the percentage of people who 
have the condition and test positive. 
Okay, so that's the probably of e given 
h, and it's box one divided by box seven. 
The specificity in contrast is the ratio 
of it being a true negative to the total 
number of people who do not have the 
condition. 
That is, the probability of Not E, 
that is, not having the evidence of a 
positive test result, given not H, 
given that you're in this second column, 
where the hypothesis is false, because 
you don't have the condition. 
So that's box five divided by box eight. 
That's the specificity. 
So we can define all of these in terms of 
each other. 
The hits divided by the total with that 
condition is going to be the sensitivity, 
and you can use this terminology to guide 
your way through this box. 
And the big question is, again, going to 
be, what's the solution? 
What's the probability of the hypothesis 
having the condition, 
given the evidence, that is a positive 
test result. 
That's going to be box one divided by box 
three, 
and as we saw in the case that we just 
went through, that gives you the 
probability of having the medical 
condition, or colon cancer, given a 
positive test result. 
That's called the posterior probability 
or, in symbols, the probability of the 
hypothesis, given the evidence. 
So I hope this terminology helps you 
understand some of the discussions of 
this I fyou go on and read about it. 
This procedure that we've been discussing 
is actually just an application of a 
famous theorem called Bayes' Theorem, 
after Thomas Bayes, an eighteenth century 
English clergymen, who was also a 
mathematician, and proved this extremely 
theorem in probability theory. 
Now, some of you out there will use the 
boxes and it'll make sense to you, but 
some Corsairians, I assume, are 
mathematicians and they want to see the 
mathematics behind it. 
So now, I want to show you how the 
derived base theorem from the rules of 
probability that we learned in earlier 
lectures. 
So for all you Math nerds out there, here 
it goes. 
You start with rule 2G, 
apply it to the probability that the 
evidence and the hypothesis are both 
true, 
and by the rule, that probability is 
equal to the probability of the evidence 
times the probability of the hypothesis, 
given the evidence. 
You have to have that conditional 
probability, because they're not 
independent. 
Then you simply divide both sides of that 
by the probability of the evidence, a 
little simple algebra. 
And you end up with the probability of 
the hypothesis, given the evidence, is 
equal to the probability of the evidence 
and the hypothesis divided by the 
probability of the evidence. 
Now we can do a little trick, this was 
ingenious. 
Substitute for E something that's 
logically equivalent to E, namely the 
evidence in the hypothesis or the 
evidence in not the hypothesis. 
Now if you think about it, you'll see 
that those are equivalent, because either 
the hypothesis has to be true or not the 
hypothesis is true. 
One or the other has to be true, 
and that means that the evidence and the 
hypothesis or the evidence and not the 
hypothesis is going to be equivalent to 
E. 
So this is equivalent to this, and 
because they are equivalent, we can 
substitute them within the formula for 
probability without affecting the truth 
values. 
So we just substitute this formula in 
here for the E up there. 
And we end up with the probability of the 
hypothesis, given the evidence, is equal 
to the probability of the evidence and 
the hypothesis divided by the probability 
of the evidence and the hypothesis, or 
the evidence and not the hypothesis. 
Now that's not supposed to make much 
sense, but it helps with the derivation. 
The next step is to apply rule three, 
because we have a disjunction. 
And notice that disjuncts are mutually 
exclusive. 
It cannot be true both that the evidence 
and the hypothesis is true, 
and also, that the evidence and not the 
hypothesis is true, 
because it can't be both h and not h. 
So, we can apply the simple version of 
rule three, 
and that means that the probability of E 
and H, or E and Not H, is equal to the 
probability of E and H plus the 
probability of E and Not H. 
We're just applying that rule three for 
dysjunction that we learned a few 
lectures ago. 
Now we apply rule 2G again, because we 
have the probability of a conjunction up 
in the top. 
And since these are not independent of 
each other, we hope not, if it's a 
hypothesis of the evidence for it, then 
we have to use the conditional 
probability. 
And using rule 2G, we find that the 
probability of the hypothesis, given the 
evidence, is equal to the probability of 
the hypothesis times the probability of 
the evidence given the hypothesis divided 
by the probability of the hypothesis 
times the probability of the evidence, 
given the hypothesis, plus the 
probability of the hypothesis being 
false. 
That is, the probability of not h times 
the probability of the evidence given, 
not H, or the hypothesis being false. 
And that's a mouthful, 
and its a long formula. 
But, that's the mathematical formula that 
Bayes proved in the 18th century, and 
that provides the mathematical bases for 
that whole system of boxes that we talked 
about before. 
But if you don't like the mathematical 
proof, if that's too confusing for you, 
then use the boxes. 
And if you don't like the boxes, use the 
mathematical proof. 
They're both going to work. 
Just pick the one that works for you. 
In fact, you don't have to pick either of 
them, because remember, this is an honors 
lecture. It's optional, and it won't be 
on the quiz. 
But if you do want to try this method and 
make sure you understand it, we'll have a 
bunch of exercises for you, where you can 
test your skills.