In this video, I want to start telling you about how we represent Neural Networks, in other words how we represent our hypotheses or how we represent our model when using your Neural Networks. Neural Networks were developed at simulating neurons or networks of neurons in the brain. So, to explain the hypotheses representation. Let's start by looking at what a single neuron in the brain looks like. Your brain and mine is jam-packed full of neurons like these and neurons are cells in the brain and the two things to draw attention to are that first that the neuron has a cell body like so and moreover, the neuron has a number of input wires and these are called the dendrites who think of them as input wires and these receive inputs from other locations and the neuron also has an output wire called the axon. And this output wire is what it uses to send signal to other neurons or to send messages to other neurons. So, at a simplistic level, what a neuron is is a computational unit that gets a number of inputs through its input wires, does some computation. and then it sends outputs, via its axon to other nodes or other neurons in the brain. Here's an illustration of a group of neurons. The way that neurons communicate with each other is with little pulses of electricities. They're also called spikes, but they're just means of little pulse of electricity. So, here's one neuron and what it does is if it wants to send a message, what it does is it sends the little pulse of electricity via its axon to some difference neuron and here this axon. There is this open wire. Connects to the input wire or connects to the dendrite of this second neuron over here, which then accepts this incoming message does some computation and may in turn decide to send out its O messages on its axon to other neurons. And this is the process by which all human thought happens as these neurons doing computations and passing messages to other neurons as a result of what other inputs they've got. And by the way, this is how our senses and our muscles work as well. If you want to move one of your muscles, the way that works is that a neuron may send these pulses of electricities to your muscle and that causes your muscles to contract and your eyes - if some sensor like your eye wants to send a message to your brain, what it does is it sends its pulses of electricity to a neuron in your brain like so. In a neural network, or rather in an artificial neural network that we implement in a computer, we're going to use a very simple model of what a neuron does. We're going to model a neuron as just a logistic unit. So, when I draw a yellow circle like that, you should hink of that as playing a role analogous to maybe the body of a neuron, and we then feed the neuron a few inputs via its dendrites or its input wires and the neuron does some computation and output some value on this output wire or in a biological neuron that sorts the axon and whenever I draw a diagram like this, what this means is that this represents a computations of, you know, h of x equals 1 over 1 + e to the negative theta transpose x where, as usual, x and theta are our parameter vectors like so. So, this is a very simple maybe a vastly over simplified model of the computation that the neuron does where it gets the number of inputs, x1, x2, x3 and it outputs some value computed like so. When I draw a neural network, usually I draw only the input nose x1, x2, x3, sometimes when it's useful to do so. I draw an extra node for x zero. This x zero node is sometimes called the bias unit or the bias neuron but because x0 is already equal to 1. Sometimes, I draw with it, sometimes I won't just depending on whether there's more the rotationally convenient for that example. Finally, one last bit of terminology when we talk about neural networks, sometimes we'll say that this is a neuron - an artificial neuron with a sigmoid or a logistic activation function. So this activation function in the neuronetro terminology, this is just another term for that function for that non-linearity g of z, equals 1 over 1 plus e to the negative z. And whereas so far I've been calling theta the parameters of the model are mostly continued to use that terminology to conjugate to the parameters, but the neural networks. In the neural networks literature and sometimes you might hear people talk about weights of a model and weights just means exactly the same thing as parameters of the model. But almost to use the terminology parameters in these videos, but sometimes you may hear others use the weights terminology. So, this little diagram represents a single neuron. What a neural network is Is just a proof of these different neurons strung together. Concretely, here we have input units x1, x2, and x3 and once again, sometimes can draw this extra node x0 or sometimes not. So, I just draw that in here. And here we have three neurons, which I have written, you know, a(2)1, a(2)2 and a(2)3 around top bottles indices later and once again, we can if we want adding this a0 and add an extra bias unit there. It always outputs the value of 1. Then finally we have this third node at the final layer, and it's this third node that opens the value that the hypotheses h of x computes. To introduce a bit more terminology in a neural network, the first layer, this is also called the input layer because this is where we input our features, x1 x2 x3. The final layer is also called the output layer because that layer has the neuron - this one over here - that outputs the final value computed by a hypotheses and then layer two in between, this is called the hidden layer. The term hidden layer isn't a great terminology, but the intuition is that, you know, in supervised learning where you get to see the inputs, and you get to see the correct outputs. Whereas the hidden layer are values you don't get to observe in the training set. If it's not x and it's not y and so we call those hidden. and later on we'll see neural networks with more than one hidden layer, but in this example we have one input layer, layer 1; one hidden layer, layer 2; and one output layer, layer 3. But basically anything that isn't an input layer and isn't a output layer is called a hidden layer. So, I want to be really clear about what this neural network is doing. Let's step through the computational steps that are embodied by this, represented by this diagram. To explain the specific computations represented by a neural network, here's a little bit more notation. I'm going to use a superscript j subscript i to denote the activation of neuron i or of unit i in layer j. So concretely, this a superscript 2 subscript 1 does the activation of the first unit in layer 2, in our hidden layer. And by activation, I just mean, you know, the value that is computed by and that is output by a specific. In addition, our neural network is parametrized by these matrices, theta superscript j where our theta j is going to be a matrix of waves controlling the function mapping from one layer, maybe the first layer to the second layer or from the second layer to the third layer. So, here are the computations that are represented by this diagram. This first hidden unit here, has its value computed as follows: is a a(2)1 is equal to the sigmoid function, or the sigmoid activation function also called the logistic activation function, applied to this sort of linear combination of its inputs. And then this second hidden unit has this activation value computed as sigmoid of this. And similarly, for this third hidden unit, it's computed by that formula. So here, we have three input units and the three hidden units. And so the dimension of theta one which the matrix of parameters governing our mapping from all three input units, about three hidden units theta 1 is going to be a 3, theta 1 is going to be a 3 by 4 dimensional matrix and more generally, if a network has Sj units and their j and Sj + 1 units in their j + 1 then the matrix theta j which governs the function mapping from layer j to layer j + 1 that we'll have to mention Sj + 1 by Sj + 1. Just be clear about this notation, right? This is S subscript j + 1 and that's S subscript j, and then this whole thing, plus 1. Of this whole thing, that's j + 1, okay? So that's S subscript j plus 1 plus, by So, that's S subscript j plus 1 by Sj + 1 where as plus 1 is not part of the subscript. So, we talked about what the three hidden units do to compute their values. Finally, this last, the spinal in optimal layer, we have one more units which computes h of x and that's equal, can also be written as a(3)1 and that's equal to this. And you notice that I've written this with a superscript 2 here because theta superscript 2 is the matrix of parameters, or the matrix of weights that controls the function that maps from the hidden units, that is the layer 2 units, to the 1 layer 3 unit that is the output unit. To summarize, what we've done is shown how a picture like this over here defines an artificial neural network which defines a function h that maps your x's input values to hopefully to some space and provisions y? And these hypotheses after parametrized by parameters that I am denoting with a capital theta so that as we be vary theta we get different hypotheses. So we get different functions mapping say from x to y. So this gives us a mathematical definition of how to represent the hypotheses in the neural network. In the next few videos, what I'd like to do is give you more intuition about what these hypotheses representations do, as well as go through a few examples and talk about how to compute them efficiently.