In this video, I want
to start telling you about how
we represent Neural Networks,
in other words how we represent
our hypotheses or how
we represent our model when using your Neural Networks.
Neural Networks were developed at
simulating neurons or networks of neurons in the brain.
So, to explain the hypotheses
representation. Let's start by
looking at what a single
neuron in the brain looks like.
Your brain and mine is jam-packed
full of neurons like these
and neurons are cells in
the brain and the two
things to draw attention to are
that first that the
neuron has a cell body
like so and moreover, the
neuron has a number of
input wires and these are
called the dendrites who think of
them as input wires and
these receive inputs from other
locations and the neuron
also has an output wire called the axon.
And this output wire
is what it uses to send
signal to other neurons
or to send messages to other neurons.
So, at a simplistic level, what
a neuron is is a computational
unit that gets a number
of inputs through its input wires, does some computation.
and then it sends outputs, via its
axon to other nodes
or other neurons in the brain.
Here's an illustration of a group of neurons.
The way that neurons communicate
with each other is with little pulses of electricities.
They're also called spikes, but they're just means of little pulse of electricity.
So, here's one neuron and what
it does is if it
wants to send a message,
what it does is it sends
the little pulse of electricity via its
axon to some difference
neuron and here this axon.
There is this open wire.
Connects to the input wire or
connects to the dendrite of this
second neuron over here, which
then accepts this incoming message
does some computation and may
in turn decide to send
out its O messages on its
axon to other neurons.
And this is the process by
which all human thought
happens as these neurons doing
computations and passing messages
to other neurons as a
result of what other inputs they've got.
And by the way, this is how
our senses and our muscles work as well.
If you want to move one
of your muscles, the way that
works is that a neuron may
send these pulses of electricities
to your muscle and that causes
your muscles to contract and your
eyes - if some
sensor like your eye
wants to send a message to
your brain, what it does
is it sends its pulses of
electricity to a neuron in your brain like so.
In a neural network, or
rather in an artificial neural
network that we implement in
a computer, we're going to
use a very simple model
of what a neuron does.
We're going to model a neuron as just a logistic unit.
So, when I draw a yellow
circle like that, you should hink of
that as playing a
role analogous to maybe the
body of a neuron, and
we then feed the neuron a
few inputs via its dendrites or
its input wires and the neuron does some computation
and output some value on
this output wire or in
a biological neuron that sorts
the axon and whenever I
draw a diagram like this, what
this means is that this represents
a computations of, you know, h of x equals 1
over 1 + e to
the negative theta transpose x where, as
usual, x and theta
are our parameter vectors like so.
So, this is a very simple maybe
a vastly over simplified model of
the computation that the neuron
does where it gets the
number of inputs, x1, x2,
x3 and it outputs some value computed like so.
When I draw a neural network,
usually I draw only the
input nose x1, x2, x3,
sometimes when it's useful to do so.
I draw an extra node for x zero.
This x zero node is
sometimes called the bias unit
or the bias neuron but because
x0 is already equal to 1.
Sometimes, I draw with it, sometimes
I won't just depending on whether
there's more the rotationally convenient for that example.
Finally, one last
bit of terminology when we
talk about neural networks, sometimes
we'll say that this
is a neuron - an
artificial neuron  with a sigmoid or a logistic
activation function.
So this activation function in the neuronetro
terminology, this is just
another term for that
function for that non-linearity g
of z, equals 1
over 1 plus e to the negative z.
And whereas so far
I've been calling theta the parameters
of the model are mostly continued
to use that terminology to conjugate
to the parameters, but the neural networks.
In the neural networks literature and
sometimes you might hear people
talk about weights of a
model and weights just means
exactly the same thing as
parameters of the model.
But almost to use the terminology
parameters in these videos,
but sometimes you may hear others use the weights terminology.
So, this little
diagram represents a single neuron.
What a neural network is
Is just a proof of
these different neurons strung together.
Concretely, here we have
input units x1, x2, and x3
and once again,
sometimes can draw this
extra node x0 or sometimes
not. So, I just draw that in here.
And here we have
three neurons, which I
have written, you know, a(2)1, a(2)2 and
a(2)3 around top bottles indices
later and once again,
we can if we want
adding this a0 and
add an extra bias unit there.
It always outputs the value of 1.
Then finally we have this
third node at the final
layer, and it's this
third node that opens the value
that the hypotheses h of x computes.
To introduce a bit
more terminology in a neural
network, the first layer, this
is also called the input
layer because this is where
we input our features, x1 x2 x3.
The final layer is
also called the output layer
because that layer has
the neuron - this one over
here - that outputs the
final value computed by a
hypotheses and then layer
two in between, this is called the hidden layer.
The term hidden layer isn't a
great terminology, but the
intuition is that, you know, in
supervised learning where you
get to see the inputs, and you get to see the correct outputs.
Whereas the hidden layer are values you
don't get to observe in the training set.
If it's not x and it's not y and so we call those hidden.
and later on we'll see neural
networks with more than
one hidden layer, but in
this example we have one
input layer, layer 1; one hidden
layer, layer 2; and one output layer, layer 3.
But basically anything that isn't
an input layer and isn't a
output layer is called a hidden layer.
So, I
want to be really clear
about what this neural network is doing.
Let's step through the computational
steps that are embodied
by this, represented by this diagram.
To explain the specific computations
represented by a neural network,
here's a little bit more notation.
I'm going to use a superscript
j subscript i to denote
the activation of neuron i
or of unit i in layer
j.  So concretely, this
a superscript 2 subscript 1
does the activation of the
first unit in layer 2,
in our hidden layer.
And by activation, I just mean,
you know, the value that is computed
by and that is output by a specific.
In addition, our neural network
is parametrized by these matrices,
theta superscript j where
our theta j is going to
be a matrix of waves
controlling the function mapping from
one layer, maybe the first
layer to the second layer or from the second layer to the third layer.
So, here are the computations that are represented by this diagram.
This first hidden unit here,
has its value computed as
follows: is a a(2)1 is
equal to the sigmoid
function, or the sigmoid activation function
also called the logistic activation function,
applied to this sort
of linear combination of its inputs.
And then this second hidden
unit has this activation
value computed as sigmoid of this.
And similarly, for this
third hidden unit, it's computed by that formula.
So here, we have three
input units and the three hidden units.
And so the dimension
of theta one which the
matrix of parameters governing our
mapping from all three input
units, about three hidden units
theta 1 is going to
be a 3,
theta 1 is going to
be a 3 by 4 dimensional
matrix and more generally,
if a network has Sj
units and their j
and Sj + 1 units
in their j + 1 then
the matrix theta j which
governs the function mapping from
layer j to layer j +
1 that we'll have to mention
Sj + 1 by Sj + 1.
Just be clear about this notation, right?
This is S subscript j
+ 1 and that's S
subscript j, and then
this whole thing, plus 1.
Of this whole thing, that's j + 1, okay?
So that's S subscript j plus
1 plus, by So,
that's S subscript j plus
1 by Sj
+ 1 where as plus 1 is not part of the subscript.
So, we talked about what
the three hidden units do to compute their values.
Finally, this last, the spinal in optimal
layer, we have one more
units which computes h of
x and that's equal, can
also be written as a(3)1
and that's equal to this.
And you notice that I've
written this with a superscript
2 here because theta superscript
2 is the matrix of parameters,
or the matrix of weights that
controls the function that maps
from the hidden units, that
is the layer 2 units,
to the 1 layer 3
unit that is the output
unit.
To summarize, what we've done
is shown how a picture
like this over here defines
an artificial neural network which defines
a function h that maps
your x's input values to hopefully
to some space and provisions y?
And these hypotheses after parametrized
by parameters that I
am denoting with a capital
theta so that as
we be vary theta we get different hypotheses.
So we get different functions mapping
say from x to y.  
So
this gives us a mathematical
definition of how to
represent the hypotheses in the neural network.
In the next few videos, what
I'd like to do is give
you more intuition about what
these hypotheses representations do, as
well as go through a
few examples and talk about how to compute them efficiently.