In this video, we're going to, look at a
proof that the perceptron learning
procedure will eventually get the weights
into the cone of feasible solutions.
I don't want you to get the wrong idea
about the course from this video.
In general, it's gonna be about
engineering, not about proofs of things.
There'll be very few proofs in the course.
But we get to understand quite a lot more
about perceptrons when we try and prove
that they will eventually get the right
answer.
So we going to use our geometric
understanding of what's happening in
weight space as subset from learns, to get
a proof that the perceptron will
eventually find a weight vector if it gets
the right answer for all of the training
cases, if any such vector exists.
And our proof is gonna assume that there
is a vector that gets the right answer for
all training cases.
We'll call that a feasible vector.
An example of a feasible vector is shown
by the green dot in the diagram.
So we start with the wait vector that's
getting some of the training cases wrong,
and in the diagram we've shown a training
case that is getting wrong.
And what we want to show.
This is the idea for the proof.
Is that, every time he gets a training
case wrong, it will update the current
weight vector.
In a way that makes it closer to every
feasible weight factor.
So we can represent the squared distance
of the current weight factor from a
feasible weight factor, as the sum of a
squared distance along the line of the
input factor that defines the training
case, and another squared difference
orthogonal to that line.
The orthogonal squared distance won't
change, and the squared distance along the
line of the input factor will get smaller.
So our hopeful claim is that, every time
the perceptor makes a mistake.
Our current weight factor is going to get
closer to all feasible weight factors.
Now this is almost right, but there's an
unfortunate problem.
If you look at the feasible weight vector
in gold, it's just on the right side of
the plane that defines one of the training
cases.
And the current weight factor is just on
the wrong side, and the input vector is
quite big.
So when we add the input factor to the
count weight factor, we actually get
further away from that gold feasible
weight factor.
So our hopeful claim doesn't work, but we
can fix it up so that it does.
So what we're gonna do is we're gonna
define a generously feasible weight
factor.
That's a weight factor that not only gets
every training case right, but it gets it
right by at least a certain margin.
Where the margin is as big as the input
factor for that training case.
So we take the cone of feasible solutions,
and inside that we have another cone of
generously feasible solutions.
Which get everything right by at least the
size of the input vector.
And now, our proof will work.
Now we can make the claim, that every time
the perceptron makes a mistake.
The squared distance to all of the
generously feasible weight vectors would
be decreased by at least the squared
length of the input vector, which is the
update we make.
So given that, we can get an informal
sketch of a proof of convergence.
I'm not gonna try and make this formal.
I'm more interested in the engineering
than the mathematics.
If your mathematician i'm sure you can
make it formal yourself.
So, every time the preceptor makes a
mistake, the current weight vector moves
and it decreases its squared distance from
every feas, generously feasible weight
vector, by at least the squared length of
the current input vector.
And so the squared distance to all the
generously feasible weight vectors
decreases by at least that squared length
and assuming that none of the input
vectors are infinitesimally small.
That means that after a finite number of
mistakes the weight vector must lie in the
feasible region if this region exists.
Notice it doesn't have to lie in the
generously feasible region, but it has to
get into the feasible region to make, to
stop it making mistakes.
And that's it.
That's our informal sketch of a proof that
the perceptron convergence procedure
works.
But notice, it all depends on the
assumption that there is a generously
feasible weight vector.
And if there is no such vector, the whole
proof falls apart.