1 00:00:00,130 --> 00:00:00,980 In this and the next 2 00:00:01,240 --> 00:00:02,030 video I want to work 3 00:00:02,140 --> 00:00:03,650 through a detailed example, showing 4 00:00:04,530 --> 00:00:05,920 how a neural network can compute 5 00:00:06,220 --> 00:00:07,740 a complex nonlinear function of 6 00:00:07,970 --> 00:00:09,780 the input and hopefully, this will 7 00:00:09,950 --> 00:00:10,950 give you a good sense of why 8 00:00:11,510 --> 00:00:12,470 Neural Networks can be used 9 00:00:13,050 --> 00:00:14,810 to learn complex, nonlinear hypotheses. 10 00:00:16,790 --> 00:00:18,210 Consider the following problem where 11 00:00:18,900 --> 00:00:20,560 we have input features x1 12 00:00:20,770 --> 00:00:21,680 and x2 that are binary 13 00:00:22,310 --> 00:00:23,760 values, so either zero or one. 14 00:00:23,990 --> 00:00:25,320 So x1 and x2 can each 15 00:00:25,510 --> 00:00:27,160 take on only one of two possible values. 16 00:00:28,660 --> 00:00:29,670 In this example, I've drawn 17 00:00:29,990 --> 00:00:31,420 only two positive examples and 18 00:00:31,530 --> 00:00:33,240 two negative examples, but you 19 00:00:33,320 --> 00:00:34,370 can think of this as a 20 00:00:34,540 --> 00:00:36,210 simplified version of a 21 00:00:36,290 --> 00:00:37,710 more complex learning problem where 22 00:00:37,920 --> 00:00:38,910 we may have a bunch 23 00:00:39,030 --> 00:00:40,320 of positive examples in the upper 24 00:00:40,480 --> 00:00:41,350 right and the lower left and 25 00:00:41,470 --> 00:00:43,090 a bunch of negative examples to notify 26 00:00:43,580 --> 00:00:46,110 the circles, and what 27 00:00:46,140 --> 00:00:46,900 we'd like to do is learn a nonlinear, you know, 28 00:00:48,330 --> 00:00:50,090 decision boundary that we 29 00:00:50,210 --> 00:00:52,210 need to separate the positive and the negative examples. 30 00:00:53,750 --> 00:00:54,590 So how can a neural 31 00:00:55,070 --> 00:00:56,160 network do this and rather than 32 00:00:56,710 --> 00:00:57,550 use an example on the 33 00:00:57,600 --> 00:00:59,260 right. I'm going to use this, maybe easier 34 00:00:59,680 --> 00:01:01,670 to examine example on the left. 35 00:01:02,620 --> 00:01:03,940 Concretely, what this is 36 00:01:04,110 --> 00:01:05,570 is really computing the target 37 00:01:05,990 --> 00:01:09,810 label y equals x1 XOR x2. 38 00:01:10,070 --> 00:01:11,650 Or this is actually the 39 00:01:11,910 --> 00:01:13,880 x1 XNOR x2 function 40 00:01:14,700 --> 00:01:15,750 where XNOR is the alternative 41 00:01:16,400 --> 00:01:18,420 notation for "not x1 or x2". 42 00:01:19,350 --> 00:01:20,730 So x1, XOR or 43 00:01:20,760 --> 00:01:22,730 x2 - that's true only 44 00:01:23,210 --> 00:01:24,820 if exactly one of 45 00:01:25,190 --> 00:01:27,900 x1 or x2 is equal to 1. 46 00:01:27,960 --> 00:01:29,160 It turns out that the specific 47 00:01:29,450 --> 00:01:30,680 example I'm going to use works out 48 00:01:30,810 --> 00:01:32,840 a little bit better if we 49 00:01:33,020 --> 00:01:35,000 use the XNOR example, instead. 50 00:01:35,460 --> 00:01:36,290 These two are the same, of course. 51 00:01:36,720 --> 00:01:38,540 It means not x1 XOR 52 00:01:38,780 --> 00:01:40,140 x2, and so we're going 53 00:01:40,320 --> 00:01:42,360 to have positive examples 54 00:01:42,950 --> 00:01:44,150 if either both are true or 55 00:01:44,530 --> 00:01:46,470 both are false and we'll 56 00:01:46,620 --> 00:01:49,600 have that's y equals 1, y equals 1 and 57 00:01:49,990 --> 00:01:51,480 we're going to have y equals 0 if 58 00:01:51,860 --> 00:01:52,650 only one of them is 59 00:01:52,760 --> 00:01:53,830 true and we want 60 00:01:54,000 --> 00:01:54,710 to figure out if we can 61 00:01:54,860 --> 00:01:57,210 get a neural network to fit to this sort of training set. 62 00:01:59,160 --> 00:02:00,200 In order to build up 63 00:02:00,450 --> 00:02:01,610 to a network that fits the 64 00:02:02,080 --> 00:02:04,900 XNOR example, we're going 65 00:02:05,350 --> 00:02:06,590 to start to a slightly simpler one 66 00:02:07,050 --> 00:02:09,710 and show a network that fits the AND function. 67 00:02:10,760 --> 00:02:12,150 Concretely, lets say we 68 00:02:12,310 --> 00:02:14,070 have inputs x1 and 69 00:02:14,240 --> 00:02:17,190 x2 that are again binary. So, it's either zero or one. 70 00:02:17,820 --> 00:02:18,680 And let's say our target 71 00:02:18,760 --> 00:02:20,980 labels y are you know, is 72 00:02:21,910 --> 00:02:23,470 equal to x1 and x2. 73 00:02:23,860 --> 00:02:24,870 This is a logical AND. 74 00:02:30,740 --> 00:02:31,820 So can we get a 75 00:02:32,060 --> 00:02:34,330 one unit network to compute 76 00:02:35,060 --> 00:02:36,120 this logical AND function? 77 00:02:37,400 --> 00:02:38,530 In order to do so, I'm 78 00:02:38,690 --> 00:02:40,000 going to actually draw in 79 00:02:40,580 --> 00:02:42,780 the bias unit as well, the plus one unit. 80 00:02:45,030 --> 00:02:46,500 Now, let me just assign some 81 00:02:46,770 --> 00:02:48,050 values to the weights or 82 00:02:48,160 --> 00:02:50,130 the parameters of this network. 83 00:02:50,450 --> 00:02:52,220 I am going to write down the parameters on this diagram. 84 00:02:52,820 --> 00:02:54,090 Write minus 30 here 85 00:02:56,360 --> 00:02:57,740 plus 20 and plus 86 00:02:58,710 --> 00:02:59,600 20 and what this means 87 00:02:59,970 --> 00:03:01,320 is that I'm assigning a value 88 00:03:01,860 --> 00:03:03,790 of minus thirty to the 89 00:03:04,120 --> 00:03:05,770 value associated with x0. 90 00:03:06,120 --> 00:03:07,230 This is plus 1 going 91 00:03:07,530 --> 00:03:08,840 to this unit and a 92 00:03:09,420 --> 00:03:10,890 parameter value of plus 20 93 00:03:11,250 --> 00:03:12,960 that multiplies in x1 in 94 00:03:13,070 --> 00:03:14,300 a value of plus 20 for 95 00:03:14,680 --> 00:03:15,980 the parameter that multiplies into x2. 96 00:03:17,190 --> 00:03:18,860 So, concretely, this is saying 97 00:03:19,060 --> 00:03:20,340 that my hypotheses h of 98 00:03:20,420 --> 00:03:21,780 x is equal to 99 00:03:22,410 --> 00:03:24,500 g of -30 + 20x1 + 20x2. 100 00:03:25,490 --> 00:03:31,390 So sometimes it's just 101 00:03:31,640 --> 00:03:33,240 convenient to draw these 102 00:03:33,810 --> 00:03:34,880 weights and draw these parameters 103 00:03:35,620 --> 00:03:38,250 up here, you know, in the diagram of the neural network. 104 00:03:38,790 --> 00:03:40,230 And of course this minus 30 105 00:03:40,390 --> 00:03:42,500 this is actually theta 1 106 00:03:43,670 --> 00:03:44,830 of 1,0. 107 00:03:45,290 --> 00:03:47,390 This is theta 1 108 00:03:47,600 --> 00:03:50,550 of 1,1 and that's theta 109 00:03:51,560 --> 00:03:52,990 1 of 1,2 110 00:03:53,290 --> 00:03:54,320 but it's just easier think about 111 00:03:54,590 --> 00:03:56,660 it as, you know, associating these 112 00:03:56,840 --> 00:03:58,430 parameters with the edges of the network. 113 00:04:01,170 --> 00:04:04,170 Let's look at what this little single neuron network will compute. 114 00:04:05,050 --> 00:04:06,290 Just to remind you, the sigmoid 115 00:04:06,720 --> 00:04:08,820 activation function g of z looks like this. 116 00:04:09,110 --> 00:04:10,810 It starts from 0, rises 117 00:04:11,160 --> 00:04:12,270 smoothly, crosses 0.5, and 118 00:04:12,750 --> 00:04:14,720 then it asymptotes at one. 119 00:04:15,730 --> 00:04:16,510 And to give you some landmarks, 120 00:04:17,350 --> 00:04:18,850 if the horizontal axis value 121 00:04:19,460 --> 00:04:21,770 z is equal to 4.6, then 122 00:04:23,840 --> 00:04:25,910 the sigmoid function is equal to 0.99. 123 00:04:26,220 --> 00:04:27,950 This is very close 124 00:04:28,150 --> 00:04:29,560 to 1 and kind of symmetrically 125 00:04:30,350 --> 00:04:32,270 if it is negative 4.6, then 126 00:04:33,090 --> 00:04:34,970 the sigmoid function there is 127 00:04:35,080 --> 00:04:37,820 equal to 0.01 which is very close to 0. 128 00:04:39,440 --> 00:04:40,700 Let's look at the four possible input 129 00:04:41,040 --> 00:04:41,680 values for x1 and x2 130 00:04:41,730 --> 00:04:43,470 and look at whether the hypothesis will 131 00:04:43,620 --> 00:04:47,090 open in that case. 132 00:04:47,220 --> 00:04:47,910 If x1 and x2 are both 133 00:04:48,150 --> 00:04:49,160 equal to 0 - if 134 00:04:49,460 --> 00:04:50,560 you look at this, if 135 00:04:50,710 --> 00:04:51,650 x1 and x2 are both equal 136 00:04:52,010 --> 00:04:54,780 to 0 then the hypotheses of point g of -30. 137 00:04:55,120 --> 00:04:56,790 So, it's like very 138 00:04:57,290 --> 00:04:58,510 far to the left of this diagram. 139 00:04:58,750 --> 00:05:01,380 This will be very close to 0. 140 00:05:01,590 --> 00:05:03,160 If x1 equals 0 and 141 00:05:03,330 --> 00:05:05,100 x2 equals 1 then this 142 00:05:05,550 --> 00:05:07,610 formula here evaluates to 143 00:05:07,830 --> 00:05:09,470 g, thus the sigmoid function applied 144 00:05:09,890 --> 00:05:12,000 to -10 and again, 145 00:05:12,450 --> 00:05:13,640 that's, you know, to the far left 146 00:05:13,880 --> 00:05:14,970 of this plot and so, 147 00:05:15,150 --> 00:05:16,540 that's again very close to 0. 148 00:05:16,660 --> 00:05:19,180 This is also g of -10. 149 00:05:19,270 --> 00:05:21,320 That is if x1 150 00:05:22,000 --> 00:05:24,110 is equal to 1 and 151 00:05:24,560 --> 00:05:26,110 x(2)0, this is -30 plus 20, which is -10. 152 00:05:26,230 --> 00:05:28,450 And finally if 153 00:05:28,590 --> 00:05:29,940 x1 equals 1, x2 equals 154 00:05:30,670 --> 00:05:31,970 1, then you have g of 155 00:05:32,770 --> 00:05:34,020 -30 +20 +20, 156 00:05:34,190 --> 00:05:35,370 so that's g of 157 00:05:35,440 --> 00:05:36,480 +10, which is 158 00:05:36,710 --> 00:05:38,140 therefore very close to 1. 159 00:05:39,040 --> 00:05:40,210 And if you look 160 00:05:40,490 --> 00:05:42,700 in this column, this is 161 00:05:43,010 --> 00:05:45,280 exactly the logical "and" function. 162 00:05:45,820 --> 00:05:47,790 So, this is computing h of 163 00:05:47,890 --> 00:05:49,870 x is, you know, 164 00:05:50,260 --> 00:05:54,910 approximately x1 and x2. 165 00:05:55,200 --> 00:05:56,210 In other words, it outputs 166 00:05:56,650 --> 00:05:57,820 1 if and only 167 00:05:58,270 --> 00:05:59,470 if x1 and x2 are 168 00:06:00,950 --> 00:06:02,410 both equal to 1. 169 00:06:03,360 --> 00:06:04,840 So by writing out our little 170 00:06:05,320 --> 00:06:07,070 truth table like this, 171 00:06:07,780 --> 00:06:09,060 we manage to figure out what's 172 00:06:09,350 --> 00:06:11,170 the logical function that our 173 00:06:11,650 --> 00:06:12,870 neural network computes. 174 00:06:16,990 --> 00:06:18,350 This network shown here computes 175 00:06:18,880 --> 00:06:20,280 the OR function just to 176 00:06:20,370 --> 00:06:21,810 show you how I worked that out. 177 00:06:22,530 --> 00:06:23,230 If you are to write out 178 00:06:23,680 --> 00:06:25,240 the hypotheses you find that 179 00:06:25,360 --> 00:06:26,690 it's computing g of 180 00:06:27,110 --> 00:06:29,980 -10 +20 x1 181 00:06:30,170 --> 00:06:32,040 +20 x2. And so 182 00:06:32,270 --> 00:06:33,380 if you fill in these values you 183 00:06:33,520 --> 00:06:35,110 find that's g of 184 00:06:35,460 --> 00:06:37,080 -10 which is approximately 0, 185 00:06:37,820 --> 00:06:38,840 g of 10 which is 186 00:06:39,040 --> 00:06:40,550 approximately 1, and so on. 187 00:06:40,930 --> 00:06:42,650 These are approximately 1, and approximately 188 00:06:43,550 --> 00:06:45,410 1, and these numbers is 189 00:06:46,160 --> 00:06:47,650 essentially the logical OR 190 00:06:47,860 --> 00:06:50,210 function. So, hopefully 191 00:06:50,590 --> 00:06:52,010 with this, you now understand how 192 00:06:52,350 --> 00:06:53,930 single neurons in a 193 00:06:54,020 --> 00:06:54,980 neural network can be used 194 00:06:55,180 --> 00:06:58,390 to compute logical functions like AND and OR and so on. 195 00:06:59,000 --> 00:07:00,280 In the next video, we'll continue 196 00:07:00,790 --> 00:07:03,870 building on these examples and work through a more complex example. 197 00:07:04,730 --> 00:07:05,610 We'll get to show you how 198 00:07:06,170 --> 00:07:07,570 a neural network, now with 199 00:07:07,820 --> 00:07:09,780 multiple layers of units can 200 00:07:09,960 --> 00:07:10,960 be used to compute more complex 201 00:07:11,400 --> 00:07:13,870 functions like the XOR function or the XNOR function.