In this figure, we're going to get a
geometrical understanding of what happens
when a perceptron learns.
To do this, we have to think in terms of a
weight space.
It's a high dimensional space in which
each point corresponds to a particular
setting for all the weights.
In this phase, we can represent the
training cases as planes and learning
consists of trying to get the weight
vector on the right side of all the
training planes.
For non-mathematicians, this may be
tougher than previous material.
You may have to spend quite a long time
studying the next two parts.
In particular, if you're not used to
thinking about hyperplanes and high
dimensional spaces, you're going to have
to learn that.
To deal with hyperplanes in a
14-dimensional space, for example, what
you do is you visualize a 3-dimensional
space and you say, fourteen to yourself
very loudly.
Everybody does it.
But remember, that when you go from
13-dimensional space to a 14-dimensional
space, your creating as much extra
complexity as when you go from a 2D space
to a 3D space.
14-dimensional space is very big and very
complicated.
So, we are going to start off by thinking
about weight space.
This is the space that has one dimension
for each weight in the perceptron.
A point in the space represents a
particular setting of all the weights.
Assuming we've eliminated the threshold,
we can represent every training case as a
hyperplane through the origin in weight
space.
So, points in the space correspond to
weight vectors and training cases
correspond to planes.
And, for a particular training case, the
weights must lie on one side of that
hyperplane, in order to get the answer
correct for that training case.
So, let's look at a picture of it so we
can understand what's going on.
Here's a picture of white space.
The training case, we're going to think of
one training case for now, it defines a
plane, which in this 2D picture is just
the black line.
The plane goes through the origin and it's
perpendicular to the input vector for that
training case, which here is shown as a
blue vector.
We're going to consider a training case in
which the correct answer is one.
And for that kind of training case, the
weight vector needs to be on the correct
side of the hyperplane in order to get the
answer right.
It needs to be on the same side of the
hyperplane as the direction in which the
training vector points.
For any weight vector like the green one,
that's on that side of the hyperplane, the
angle with the input vector will be less
than 90 degrees.
So, the scaler product of the input vector
with a weight vector will be positive.
And since we already got rid of the
threshold, that means the perceptron will
give an output of what?
It'll say yes, and so we'll get the right
answer.
Conversely, if we have a weight vector
like the red one, that's on the wrong side
of the plane, the angle with the input
vector will be more than 90 degrees, so
the scalar product of the weight vector
and the input vector will be negative, and
we'll get a scalar product that is less
than zero so the perceptron will say, no
or zero, and in this case, we'll get the
wrong answer.
So, to summarize, on one side of the
plane, all the weight vectors will get the
right answer.
And on the other side of the plane, all
the possible weight vectors will get the
wrong answer.
Now, let's look at a different training
case, in which the correct answers are
zero.
So here, we have the weight space again.
We've chosen a different input vector, of
this input factor, the right answer is
zero.
So again, the input case corresponds to a
plane shown by the black line.
And in this case, any weight vectors will
make an angle of less than 90 degrees with
the input factor, will give us a positive
scalar product, [unknown] perceptron to
say yes or one, and it will get the answer
wrong conversely.
And the input vector on the other side of
the plain, will have an angle of greater
than 90 degrees.
And they will correctly give the answer of
zero.
So, as before, the plane goes through the
origin, it's perpendicular to the input
vector, and on one side of the plane, all
the weight vectors are bad, and on the
other side, they're all good.
Now, let's put those two training cases
together in one picture weight space.
Our picture of weight space is getting a
little bit crowded.
I've moved the input vector over so we
don't have all the vectors in quite the
same place.
And now, you can see that there's a code
of possible weight vectors.
And any weight vectors inside that cone,
will get the right answer for both
training cases.
Of course, there doesn't have to be any
cone like that.
It could be there are no weight vectors
that get the right answers for all of the
training cases.
But if there are any, they'll lie in a
cone.
So, what the learning algorithm needs to
do is consider the training cases one at a
time and move the weight vector around in
such a way that it eventually lies in this
cone.
One thing to notice is that if you get a
good weight factor, that is something that
works for all the training cases, it'll
lie on the cone.
Ad if you had another one, it'll lie on
the cone.
And so, if you take the average of those
two weight vectors, that will also lie on
the cone.
That means the problem is convex.
The average of two solutions is itself a
solution.
And in general in machine learning if you
can get a convex learning problem, that
makes life easy.