In this video, I'd like you to work in through our example to show how a neural network can compute complex nonlinear hypotheses. In the last video, we saw how a neural network can be used to compute the functions x1 and x2 and the function x1 or x2 when x1 and x2 are binary. That is, when they take on values of 0, 1. We can also have a network to compute negation, that is to compute the function "not x1". Let me just write down the ways associated with this network. We have only one input feature, x1 in this case and the bias unit plus 1 and if I associate this with the weights +10 and -20 then my hypotheses is computing this. H of x equals sigmoid of 10 minus 20 times x1 so when x1 is equal to 0, my hypothesis will be computing g of 10 minus 20 times 0 which is just 10. And so that's approximately 1, and when is x is equal to 1 this will be g of -10, which is therefore approximately equal to 0. And if you look at what these values are, that's essentially the "not x1" function. So to include negations, the general idea is to put a large negative weight in front of the variable you want to negate. So if it's -20, multiplied by x1 and, you know, that's the general idea of how you end up negating x1. And so, in an example that I hope you will figure out yourself, if you want to compute a function like this: "not x1 and not x2" you know, while part of that would probably be putting large negative weights in front of x1 and x2, but it should be feasible to get a neural network with just one output unit to compute this as well. All right? So, this large fill function "not x1 and not x2" is going to be equal to 1 if, and only if, x1 equals x2 equals zero, right? So this is a logical function, this is not x1, that means x1 must be zero and not x2. That means x2 must be equal to zero as well. So this logical function is equal to 1 if, and only if, both x1 and x2 are equal to zero. And hopefully, you should be able to figure out how to make a small neural network to compute this logical function as well. Now, taking the three pieces that we have put together, the network for computing x1 and x2 and the network for computing not x1 and not x2 and one last network for computing x1 or x2, we should be able to put these three pieces together to compute this x1, XNOR x2 function. And just to remind you, if this was x1, x2, this function that we want to compute would have negative examples here and here and we'd have positive examples there and there. And so clearly this, you know, we'll need a nonlinear decision boundary in order to separate the positive and negative examples. Let's draw the network. I'm going to take my input plus 1, x1, x2, and create my first hidden unit here. I'm going to call this a(2)1 because that's my first hidden unit. And I'm going to copy the weights over from the Red Network, x1 and x2 networks. So now minus 30, 20, 20. Next, let me create a second hidden unit, which I'm going to call a(2)2 that is the second hidden unit of layer two. And I'm going to copy over the Cyan Network in the middle, so I'm going to have the weights 10, minus 20, minus 20. And so, let's pull some of the truth table values. For the Red Network, we know that was computing the x1 and x2. And so this is going to be approximately 0, 0, 0, 1, depending on the values of x1 and x2. And for a (2)2, that's the Cyan Network. Well that we know the function not x1 and not x2 then outputs 1, 0, 0, 0 for the 4 values of x1 and x2. Finally, I'm going to create my output note, my output unit that is a(3)1. This is one more output h of x and I'm going to copy over the OR Network for that and I'm going to need a plus one bias unit here. So, draw that in and I'm going to copy over the weights from the Green Networks. So, it's minus 10, 20, 20 and we know earlier that this computes the OR function. So, let's go on the truth table entries. For the first entry is 0 or 1, which is gonna be 1 then next 0 or 0, which is 0, 0, or 0, which is 0, 1 or 0 and that all is to 1 and thus, h of x is equal to 1 when either both x1 and x2 are 0 or when x1 and x2 are both 1. And concretely, h of x outputs 1 exactly at these two locations and it outputs 0 otherwise and thus, with this neural network, which has an input layer, one hidden layer and one output layer, we end up with a nonlinear decision boundary that computes this XNOR function. And the more general intuition is that in the input layer, we just had our raw inputs then we had a hidden layer, which computed some slightly more complex functions of the inputs that is shown here, these are slightly more complex functions, and then by adding yet another layer, we end up with an even more complex nonlinear function. And this is the sort of intuition about why Neural Networks can compute pretty complicated functions that when you have multiple layers, you have, you know, relatively simple function of the inputs, and the second layer, but the third layer can build on that to compute even more complex functions and then the layer after that can compute even more complex functions. To wrap up this video, I want to show you a fun example of an application of a neural network that captures this intuition of the deeper layers computing more complex features. I want to show you a video that I got from a good friend of mine, Yon Khun. Yon is a professor at New York University, at NYU, and he was one of the early pioneers of neural network research and sort of a legend in the field now and his ideas are used in all sorts of products and applications throughout the world now. So, I want to show you a video from some of his early work in which he was using a neural network to recognize handwriting - to do handwritten digit recognition. You might remember early in this class, at the start of this class, I said that one of early successes of neural networks was trying to use it to read zip codes, to help us, you know, send mail along. So, to read postal codes. So, this is one of the attempts. So, this is one of the algorithms used to try to address that problem. In the video I'll show you this area here is the input area that shows a handwritten character shown to the network. This column here shows a visualization of the features computed by so that the first hidden layer of the network and so the first hidden layer, you know, this visualization shows different features, different edges and lines and so on detected. This is a visualization of the next hidden layer. It's kind of harder to see how to understand deeper hidden layers and that's the visualization of what the next hidden layer is computing. You probably have a hard time seeing what's going on, you know, much beyond the first hidden layer, but then finally, all of these learned features get fed to the output layer and shown over here is the final answers, the final predictive value for what handwritten digit the neural network things that is being shown. So, let's take a look at the video. So, I hope you enjoyed the video and that this hopefully gave you some intuition about the sorts of pretty complicated functions neural networks can learn in which it takes this input this image, just takes this input the raw pixels and the first end of the layer computes some set of features, the next end of the layer computes even more complex features and even more complex features and these features can then be used by essentially the final layer of logistic regression classifiers to make accurate predictions about what are the numbers that the network sees.