In this and the next
video I want to work
through a detailed example, showing
how a neural network can compute
a complex nonlinear function of
the input and hopefully, this will
give you a good sense of why
Neural Networks can be used
to learn complex, nonlinear hypotheses.
Consider the following problem where
we have input features x1
and x2 that are binary
values, so either zero or one.
So x1 and x2 can each
take on only one of two possible values.
In this example, I've drawn
only two positive examples and
two negative examples, but you
can think of this as a
simplified version of a
more complex learning problem where
we may have a bunch
of positive examples in the upper
right and the lower left and
a bunch of negative examples to notify
the circles, and what
we'd like to do is learn a nonlinear, you know,
decision boundary that we
need to separate the positive and the negative examples.
So how can a neural
network do this and rather than
use an example on the
right. I'm going to use this, maybe easier
to examine example on the left.
Concretely, what this is
is really computing the target
label y equals x1 XOR x2.
Or this is actually the
x1 XNOR x2 function
where XNOR is the alternative
notation for "not x1 or x2".
So x1, XOR or
x2 - that's true only
if exactly one of
x1 or x2 is equal to 1.
It turns out that the specific
example I'm going to use works out
a little bit better if we
use the XNOR example, instead.
These two are the same, of course.
It means not x1 XOR
x2, and so we're going
to have positive examples
if either both are true or
both are false and we'll
have that's y equals 1, y equals 1 and
we're going to have y equals 0 if
only one of them is
true and we want
to figure out if we can
get a neural network to fit to this sort of training set.
In order to build up
to a network that fits the
XNOR example, we're going
to start to a slightly simpler one
and show a network that fits the AND function.
Concretely, lets say we
have inputs x1 and
x2 that are again binary. So, it's either zero or one.
And let's say our target
labels y are you know, is
equal to x1 and x2.
This is a logical AND.
So can we get a
one unit network to compute
this logical AND function?
In order to do so, I'm
going to actually draw in
the bias unit as well, the plus one unit.
Now, let me just assign some
values to the weights or
the parameters of this network.
I am going to write down the parameters on this diagram.
Write minus 30 here
plus 20 and plus
20 and what this means
is that I'm assigning a value
of minus thirty to the
value associated with x0.
This is plus 1 going
to this unit and a
parameter value of plus 20
that multiplies in x1 in
a value of plus 20 for
the parameter that multiplies into x2.
So, concretely, this is saying
that my hypotheses h of
x is equal to
g of  -30 + 20x1 + 20x2.
So sometimes it's just
convenient to draw these
weights and draw these parameters
up here, you know, in the diagram of the neural network.
And of course this minus 30
this is actually theta 1
of 1,0.
This is theta 1
of 1,1 and that's theta
1 of 1,2
but it's just easier think about
it as, you know, associating these
parameters with the edges of the network.
Let's look at what this little single neuron network will compute.
Just to remind you, the sigmoid
activation function g of z looks like this.
It starts from 0, rises
smoothly, crosses 0.5, and
then it asymptotes at one.
And to give you some landmarks,
if the horizontal axis value
z is equal to 4.6, then
the sigmoid function is equal to 0.99.
This is very close
to 1 and kind of symmetrically
if it is negative 4.6, then
the sigmoid function there is
equal to 0.01 which is very close to 0.
Let's look at the four possible input
values for x1 and x2
and look at whether the hypothesis will
open in that case.
If x1 and x2 are both
equal to 0 - if
you look at this, if
x1 and x2 are both equal
to 0 then the hypotheses of point g of -30.
So, it's like very
far to the left of this diagram.
This will be very close to 0.
If x1 equals 0 and
x2 equals 1 then this
formula here evaluates to
g, thus the sigmoid function applied
to -10 and again,
that's, you know, to the far left
of this plot and so,
that's again very close to 0.
This is also g of -10.
That is if x1
is equal to 1 and
x(2)0, this is -30 plus 20, which is -10.
And finally if
x1 equals 1, x2 equals
1, then you have g of
-30 +20 +20,
so that's g of
+10, which is
therefore very close to 1.
And if you look
in this column, this is
exactly the logical "and" function.
So, this is computing h of
x is, you know,
approximately x1 and x2.
In other words, it outputs
1 if and only
if x1 and x2 are
both equal to 1.
So by writing out our little
truth table like this,
we manage to figure out what's
the logical function that our
neural network computes.
This network shown here computes
the OR function just to
show you how I worked that out.
If you are to write out
the hypotheses you find that
it's computing g of
-10 +20 x1
+20 x2. And so
if you fill in these values you
find that's g of
-10 which is approximately 0,
g of 10 which is
approximately 1, and so on.
These are approximately 1, and approximately
1, and these numbers is
essentially the logical OR
function.
 So, hopefully
with this, you now understand how
single neurons in a
neural network can be used
to compute logical functions like AND and OR and so on.
In the next video, we'll continue
building on these examples and work through a more complex example.
We'll get to show you how
a neural network, now with
multiple layers of units can
be used to compute more complex
functions like the XOR function or the XNOR function.