In this video, I'd like you
to work in through our example
to show how a neural
network can compute complex nonlinear hypotheses.
In the last video, we saw
how a neural network can
be used to compute the functions
x1 and x2 and the
function x1 or x2 when
x1 and x2 are binary.
That is, when they take on values of 0, 1.
We can also have a
network to compute negation, that
is to compute the function  "not x1".
Let me just write
down the ways associated with this network.
We have only one input feature, x1
in this case and the
bias unit plus 1 and
if I associate this with
the weights +10 and
-20 then my hypotheses is computing this.
H of x equals sigmoid of
10 minus 20 times x1
so when x1 is
equal to 0, my
hypothesis will be computing
g of 10 minus
20 times 0 which is just 10.
And so that's approximately
1, and when is x is
equal to 1 this will
be g of  -10, which
is therefore approximately equal to 0.
And if you look at what
these values are, that's essentially
the "not x1" function.
So to include negations, the
general idea is to put
a large negative weight in front
of the variable you want to negate.
So if it's -20, multiplied by
x1 and, you know, that's the general
idea of how you end
up negating x1.
And so, in an example that
I hope you will figure out
yourself, if you want
to compute a function like this:
"not x1 and not x2"
you know, while part of that would
probably be putting large negative
weights in front of x1
and x2, but it should
be feasible to get a
neural network with just
one output unit to compute this as well.
All right?
So, this large fill
function "not x1 and not
x2" is going to
be equal to 1
if, and only if, x1
equals x2 equals zero, right?
So this is a logical function, this
is not x1, that means x1 must be zero and not x2.
That means x2 must be equal to zero as well.
So this logical function is
equal to 1 if, and only
if, both x1 and x2 are equal to zero.
And hopefully, you should be
able to figure out how to make a
small neural network to compute
this logical function as well.
Now, taking the three pieces
that we have put together, the
network for computing x1 and
x2 and the network for
computing not x1 and
not x2 and one last
network for computing x1 or
x2, we should be
able to put these three pieces together
to compute this x1, XNOR
x2 function.
And just to remind you, if
this was x1, x2, this
function that we want to
compute would have negative examples
here and here and we'd
have positive examples there and there.
And so clearly this, you know, we'll
need a nonlinear decision boundary
in order to separate the positive and negative examples.
Let's draw the network.
I'm going to take my input
plus 1, x1, x2, and create
my first hidden unit here.
I'm going to call this a(2)1
because that's my first hidden unit.
And I'm going to copy
the weights over from the Red
Network, x1 and x2 networks.
So now minus 30, 20, 20.
Next, let me create
a second hidden unit, which
I'm going to call a(2)2 that
is the second hidden unit of layer two.
And I'm going to copy over the
Cyan Network in the
middle, so I'm going
to have the weights 10, minus 20,
minus 20.
And so, let's pull some of the truth table values.
For the Red Network, we know
that was computing the x1 and x2.
And so this is
going to be approximately 0, 0,
0, 1, depending on the values of x1 and x2.
And for a (2)2, that's the Cyan Network.
Well that we know the function not x1
and not x2 then outputs 1,
0, 0, 0 for the
4 values of x1 and x2.
Finally, I'm going to
create my output note, my
output unit that is a(3)1.
This is one more output h
of x and I'm
going to copy over the OR
Network for that and I'm going to
need a plus one bias unit here.
So, draw that in and I'm
going to copy over the weights from the Green Networks.
So, it's minus 10, 20, 20
and we know earlier that this computes the OR function.
So, let's go on the truth table entries.
For the first entry is 0
or 1, which is gonna be
1 then next 0 or
0, which is 0,
0, or 0, which is 0,
1 or 0 and that all
is to 1 and thus, h
of x is equal to 1
when either both x1 and
x2 are 0 or when x1 and
x2 are both 1. And
concretely, h of x
outputs 1 exactly at these
two locations and it outputs
0 otherwise and thus,
with this neural network, which
has an input layer, one
hidden layer and one output
layer, we end up
with a nonlinear decision boundary that
computes this XNOR function.
And the more general intuition is
that in the input
layer, we just had our raw
inputs then we had
a hidden layer, which computed some
slightly more complex functions of
the inputs that is shown
here, these are slightly more
complex functions, and then by
adding yet another layer, we end
up with an even more complex nonlinear function.
And this is the sort of
intuition about why Neural
Networks can compute pretty complicated
functions that when you
have multiple layers, you have, you know,
relatively simple function of
the inputs, and the second layer,
but the third layer can build
on that to compute even more
complex functions and then
the layer after that can compute even more complex functions.
To wrap up this video, I
want to show you a fun example of
an application of a neural
network that captures this intuition
of the deeper layers computing more complex features.
I want to show you
a video that I got from
a good friend of mine, Yon Khun.
Yon is a professor at
New York University, at NYU,
and he was one of
the early pioneers of neural
network research and sort
of a legend in the field
now and his ideas are
used in all sorts of products
and applications throughout the world now.
So, I want to show you
a video from some of his
early work in which he
was using a neural network
to recognize handwriting - to do handwritten digit recognition.
You might remember early in this
class, at the start of this
class, I said that one of
early successes of neural networks
was trying to use it
to read zip codes, to help
us, you know, send mail along. So, to read postal codes.
So, this is one of the attempts.
So, this is one of the
algorithms used to try to address that problem.
In the video I'll show you
this area here is the
input area that shows a
handwritten character shown to the
network. This column here
shows a visualization of
the features computed by so that the
first hidden layer of the
network and so the first
hidden layer, you know, this visualization shows
different features, different edges and lines and so on detected.
This is a visualization of the next hidden layer.
It's kind of harder to see
how to understand deeper hidden
layers and that's the visualization
of what the next hidden layer is computing.
You probably have a hard time
seeing what's going on, you know,
much beyond the first hidden layer,
but then finally, all of
these learned features get
fed to the output layer
and shown over here is
the final answers, the final
predictive value for what
handwritten digit the neural network things that is being shown.
So, let's take a look at the video.
So, I hope
you enjoyed the video and that
this hopefully gave you some
intuition about the sorts
of pretty complicated functions neural
networks can learn in which
it takes this input this image,
just takes this input the
raw pixels and the first
end of the layer computes some set
of features, the next end
of the layer computes even more complex
features and even more complex features
and these features can then be
used by essentially the final
layer of logistic regression classifiers
to make accurate predictions about what
are the numbers that the network sees.