1 00:00:00,090 --> 00:00:02,040 In this video, I'd like to tell you about learning curves. 2 00:00:03,310 --> 00:00:05,850 Learning curves is often a very useful thing to plot. 3 00:00:06,710 --> 00:00:08,170 If either you wanted to sanity check 4 00:00:08,430 --> 00:00:09,590 that your algorithm is working correctly, 5 00:00:10,400 --> 00:00:12,730 or if you want to improve the performance of the algorithm. 6 00:00:13,950 --> 00:00:15,200 And learning curves is a 7 00:00:15,310 --> 00:00:16,410 tool that I actually use 8 00:00:16,820 --> 00:00:17,920 very often to try to 9 00:00:18,290 --> 00:00:20,030 diagnose if a physical learning algorithm may be 10 00:00:20,180 --> 00:00:23,220 suffering from bias, sort of variance problem or a bit of both. 11 00:00:27,170 --> 00:00:28,070 Here's what a learning curve is. 12 00:00:28,830 --> 00:00:30,550 To plot a learning curve, what 13 00:00:30,700 --> 00:00:31,760 I usually do is plot 14 00:00:32,210 --> 00:00:33,950 j train which is, say, 15 00:00:35,030 --> 00:00:36,050 average squared error on my training 16 00:00:36,440 --> 00:00:39,090 set or Jcv which is 17 00:00:39,340 --> 00:00:41,130 the average squared error on my cross validation set. 18 00:00:41,590 --> 00:00:42,900 And I'm going to plot 19 00:00:43,140 --> 00:00:44,160 that as a function 20 00:00:44,500 --> 00:00:46,380 of m, that is as a function 21 00:00:47,230 --> 00:00:51,260 of the number of training examples I have. 22 00:00:51,950 --> 00:00:53,420 And so m is usually a constant like maybe I just have, you know, a 100 23 00:00:53,650 --> 00:00:55,220 training examples but what I'm 24 00:00:55,330 --> 00:00:57,670 going to do is artificially with 25 00:00:57,860 --> 00:00:59,280 use my training set exercise. So, I 26 00:00:59,500 --> 00:01:01,460 deliberately limit myself to using only, 27 00:01:01,840 --> 00:01:03,440 say, 10 or 20 or 28 00:01:03,660 --> 00:01:06,040 30 or 40 training examples and 29 00:01:06,170 --> 00:01:07,610 plot what the training error is and 30 00:01:07,740 --> 00:01:09,640 what the cross validation is for this 31 00:01:10,040 --> 00:01:12,260 smallest training set exercises. So 32 00:01:12,620 --> 00:01:14,090 let's see what these plots may look 33 00:01:14,270 --> 00:01:15,530 like. Suppose I have only 34 00:01:15,730 --> 00:01:17,210 one training example like that 35 00:01:17,390 --> 00:01:18,450 shown in this this first example 36 00:01:18,860 --> 00:01:19,970 here and let's say I'm fitting a quadratic function. Well, I 37 00:01:22,470 --> 00:01:24,490 have only one training example. I'm 38 00:01:25,040 --> 00:01:26,100 going to be able to fit it perfectly 39 00:01:26,650 --> 00:01:28,590 right? You know, just fit the quadratic function. I'm 40 00:01:28,760 --> 00:01:30,000 going to have 0 41 00:01:30,150 --> 00:01:32,240 error on the one training example. If I 42 00:01:32,570 --> 00:01:34,170 have two training examples. Well the quadratic function can also fit that very well. So, 43 00:01:37,050 --> 00:01:38,550 even if I am using regularization, 44 00:01:38,750 --> 00:01:40,220 I can probably fit this quite well. 45 00:01:41,080 --> 00:01:41,970 And if I am using no neural regularization, 46 00:01:42,030 --> 00:01:45,200 I'm going to fit this perfectly and 47 00:01:45,440 --> 00:01:46,400 if I have three training examples 48 00:01:47,260 --> 00:01:48,380 again. Yeah, I can fit a quadratic 49 00:01:48,660 --> 00:01:51,320 function perfectly so if 50 00:01:51,550 --> 00:01:52,590 m equals 1 or m equals 2 or m equals 3, 51 00:01:54,850 --> 00:01:56,770 my training error 52 00:01:57,350 --> 00:01:58,870 on my training set is 53 00:01:59,110 --> 00:02:01,180 going to be 0 assuming I'm 54 00:02:01,220 --> 00:02:02,760 not using regularization or it may 55 00:02:03,150 --> 00:02:04,290 slightly large in 0 if 56 00:02:04,560 --> 00:02:06,400 I'm using regularization and 57 00:02:06,500 --> 00:02:07,350 by the way if I have 58 00:02:07,740 --> 00:02:08,980 a large training set and I'm artificially 59 00:02:09,940 --> 00:02:11,040 restricting the size of my 60 00:02:11,120 --> 00:02:13,080 training set in order to J train. 61 00:02:13,830 --> 00:02:14,770 Here if I set 62 00:02:15,110 --> 00:02:16,720 M equals 3, say, and I 63 00:02:17,040 --> 00:02:18,290 train on only three examples, 64 00:02:19,270 --> 00:02:21,030 then, for this figure I 65 00:02:21,110 --> 00:02:22,430 am going to measure my training error 66 00:02:22,830 --> 00:02:24,450 only on the three examples that 67 00:02:24,550 --> 00:02:25,580 actually fit my data too 68 00:02:27,150 --> 00:02:28,130 and so even I have to say 69 00:02:28,290 --> 00:02:31,160 a 100 training examples but if I want to plot what my 70 00:02:31,430 --> 00:02:32,620 training error is the m equals 3. What I'm going to do 71 00:02:34,270 --> 00:02:35,200 is to measure the 72 00:02:35,340 --> 00:02:36,660 training error on the 73 00:02:36,750 --> 00:02:39,870 three examples that I've actually fit to my hypothesis 2. 74 00:02:41,290 --> 00:02:42,900 And not all the other examples that I have 75 00:02:43,010 --> 00:02:44,940 deliberately omitted from the training 76 00:02:45,140 --> 00:02:46,750 process. So just to summarize what we've 77 00:02:46,960 --> 00:02:48,460 seen is that if the training set 78 00:02:48,820 --> 00:02:50,560 size is small then the 79 00:02:50,630 --> 00:02:52,630 training error is going to be small as well. 80 00:02:52,960 --> 00:02:53,900 Because you know, we have a 81 00:02:53,930 --> 00:02:55,150 small training set is 82 00:02:55,350 --> 00:02:56,790 going to be very easy to 83 00:02:56,900 --> 00:02:58,080 fit your training set 84 00:02:58,720 --> 00:02:59,490 very well may be even 85 00:02:59,790 --> 00:03:02,970 perfectly now say 86 00:03:03,190 --> 00:03:04,460 we have m equals 4 for example. Well then 87 00:03:04,680 --> 00:03:06,800 a quadratic function can be 88 00:03:06,920 --> 00:03:07,900 a longer fit this data set 89 00:03:08,100 --> 00:03:09,680 perfectly and if I 90 00:03:09,790 --> 00:03:11,350 have m equals 5 then you 91 00:03:11,460 --> 00:03:13,830 know, maybe quadratic function will fit to stay there so 92 00:03:14,090 --> 00:03:15,940 so, then as my training set gets larger. 93 00:03:16,980 --> 00:03:18,460 It becomes harder and harder to 94 00:03:18,620 --> 00:03:19,860 ensure that I can 95 00:03:20,060 --> 00:03:21,820 find the quadratic function that process through 96 00:03:21,960 --> 00:03:25,460 all my examples perfectly. So 97 00:03:25,840 --> 00:03:27,300 in fact as the training set size 98 00:03:27,690 --> 00:03:28,770 grows what you find 99 00:03:29,300 --> 00:03:30,960 is that my average training error 100 00:03:31,310 --> 00:03:33,080 actually increases and so if you plot 101 00:03:33,500 --> 00:03:34,650 this figure what you find 102 00:03:35,220 --> 00:03:36,860 is that the training set 103 00:03:37,130 --> 00:03:38,520 error that is the average 104 00:03:38,940 --> 00:03:40,660 error on your hypothesis grows 105 00:03:41,300 --> 00:03:44,730 as m grows and just to repeat when the intuition is that when 106 00:03:45,020 --> 00:03:46,200 m is small when you have very 107 00:03:46,500 --> 00:03:48,070 few training examples. It's pretty 108 00:03:48,350 --> 00:03:49,420 easy to fit every single 109 00:03:49,790 --> 00:03:51,350 one of your training examples perfectly and 110 00:03:51,610 --> 00:03:52,840 so your error is going 111 00:03:52,940 --> 00:03:54,540 to be small whereas 112 00:03:54,710 --> 00:03:56,100 when m is larger then gets 113 00:03:56,460 --> 00:03:57,900 harder all the training 114 00:03:58,220 --> 00:03:59,900 examples perfectly and so 115 00:04:00,430 --> 00:04:01,830 your training set error becomes 116 00:04:02,370 --> 00:04:05,840 more larger now, how about the cross validation error. 117 00:04:06,720 --> 00:04:08,460 Well, the cross validation is 118 00:04:08,590 --> 00:04:10,100 my error on this cross 119 00:04:10,350 --> 00:04:12,660 validation set that I haven't seen and 120 00:04:12,880 --> 00:04:14,600 so, you know, when I have 121 00:04:14,720 --> 00:04:15,900 a very small training set, I'm 122 00:04:16,080 --> 00:04:16,890 not going to generalize well, just 123 00:04:17,020 --> 00:04:19,610 not going to do well on that. 124 00:04:19,850 --> 00:04:21,220 So, right, this hypothesis here doesn't 125 00:04:21,620 --> 00:04:22,720 look like a good one, and 126 00:04:23,020 --> 00:04:23,970 it's only when I get 127 00:04:24,050 --> 00:04:25,270 a larger training set that, 128 00:04:25,500 --> 00:04:26,380 you know, I'm starting to get 129 00:04:26,890 --> 00:04:28,100 hypotheses that maybe fit 130 00:04:28,480 --> 00:04:30,810 the data somewhat better. 131 00:04:31,380 --> 00:04:32,050 So your cross validation error and 132 00:04:32,260 --> 00:04:35,650 your test set error will tend 133 00:04:35,890 --> 00:04:37,160 to decrease as your training 134 00:04:37,470 --> 00:04:39,150 set size increases because the 135 00:04:39,250 --> 00:04:40,700 more data you have, the better 136 00:04:40,990 --> 00:04:43,410 you do at generalizing to new examples. 137 00:04:44,010 --> 00:04:46,730 So, just the more data you have, the better the hypothesis you fit. 138 00:04:47,560 --> 00:04:48,560 So if you plot j train, 139 00:04:49,420 --> 00:04:51,670 and Jcv this is the sort of thing that you get. 140 00:04:52,490 --> 00:04:53,550 Now let's look at what 141 00:04:53,770 --> 00:04:54,940 the learning curves may look like 142 00:04:55,360 --> 00:04:56,550 if we have either high 143 00:04:56,930 --> 00:04:58,210 bias or high variance problems. 144 00:04:58,920 --> 00:05:00,530 Suppose your hypothesis has high 145 00:05:00,830 --> 00:05:02,150 bias and to explain this 146 00:05:02,370 --> 00:05:03,780 I'm going to use a, set an 147 00:05:03,940 --> 00:05:05,250 example, of fitting a straight 148 00:05:05,440 --> 00:05:06,500 line to data that, you 149 00:05:06,770 --> 00:05:08,240 know, can't really be fit well by a straight line. 150 00:05:09,540 --> 00:05:12,330 So we end up with a hypotheses that maybe looks like that. 151 00:05:13,910 --> 00:05:15,450 Now let's think what would 152 00:05:15,750 --> 00:05:16,840 happen if we were to increase 153 00:05:17,470 --> 00:05:18,880 the training set size. So if 154 00:05:19,160 --> 00:05:20,480 instead of five examples like 155 00:05:20,590 --> 00:05:22,400 what I've drawn there, imagine that 156 00:05:22,570 --> 00:05:24,080 we have a lot more training examples. 157 00:05:25,280 --> 00:05:27,230 Well what happens, if you fit a straight line to this. 158 00:05:27,980 --> 00:05:29,700 What you find is that, you 159 00:05:30,040 --> 00:05:31,360 end up with you know, pretty much the same straight line. 160 00:05:31,690 --> 00:05:32,940 I mean a straight line that 161 00:05:33,530 --> 00:05:35,110 just cannot fit this 162 00:05:35,270 --> 00:05:37,320 data and getting a ton more data, well 163 00:05:37,890 --> 00:05:39,460 the straight line isn't going to change that much. 164 00:05:40,230 --> 00:05:41,400 This is the best possible straight-line 165 00:05:41,840 --> 00:05:42,770 fit to this data, but the 166 00:05:42,890 --> 00:05:44,160 straight line just can't fit this 167 00:05:44,320 --> 00:05:45,630 data set that well. So, 168 00:05:45,870 --> 00:05:47,420 if you plot across validation error, 169 00:05:49,260 --> 00:05:50,170 this is what it will look like. 170 00:05:51,320 --> 00:05:54,470 Option on the left, if you have already a miniscule training set size like you know, 171 00:05:55,410 --> 00:05:57,710 maybe just one training example and is not going to do well. 172 00:05:58,550 --> 00:05:59,470 But by the time you have 173 00:05:59,660 --> 00:06:00,760 reached a certain number of training 174 00:06:00,940 --> 00:06:02,350 examples, you have almost 175 00:06:02,810 --> 00:06:04,010 fit the best possible straight 176 00:06:04,200 --> 00:06:05,400 line, and even if 177 00:06:05,490 --> 00:06:06,260 you end up with a much 178 00:06:06,480 --> 00:06:07,790 larger training set size, a 179 00:06:07,970 --> 00:06:09,170 much larger value of m, 180 00:06:10,010 --> 00:06:12,040 you know, you're basically getting the same straight line, 181 00:06:12,370 --> 00:06:14,190 and so, the cross-validation error 182 00:06:14,480 --> 00:06:15,420 - let me label that - 183 00:06:15,650 --> 00:06:17,040 or test set error or 184 00:06:17,140 --> 00:06:18,660 plateau out, or flatten out 185 00:06:18,990 --> 00:06:20,480 pretty soon, once you reached 186 00:06:20,910 --> 00:06:22,920 beyond a certain the number 187 00:06:23,270 --> 00:06:24,700 of training examples, unless you 188 00:06:25,130 --> 00:06:27,480 pretty much fit the best possible straight line. 189 00:06:28,390 --> 00:06:29,540 And how about training error? 190 00:06:30,120 --> 00:06:33,050 Well, the training error will again be small. 191 00:06:34,620 --> 00:06:36,280 And what you find 192 00:06:36,760 --> 00:06:38,080 in the high bias case is 193 00:06:38,210 --> 00:06:40,770 that the training error will end 194 00:06:41,000 --> 00:06:42,510 up close to the cross 195 00:06:42,830 --> 00:06:44,700 validation error, because you 196 00:06:44,810 --> 00:06:46,370 have so few parameters and so 197 00:06:46,590 --> 00:06:48,070 much data, at least when m is large. 198 00:06:48,900 --> 00:06:49,840 The performance on the training 199 00:06:50,220 --> 00:06:52,500 set and the cross validation set will be very similar. 200 00:06:53,800 --> 00:06:54,750 And so, this is what your 201 00:06:54,870 --> 00:06:56,460 learning curves will look like, 202 00:06:56,770 --> 00:06:58,850 if you have an algorithm that has high bias. 203 00:07:00,220 --> 00:07:01,470 And finally, the problem with 204 00:07:01,630 --> 00:07:03,260 high bias is reflected in 205 00:07:03,450 --> 00:07:04,930 the fact that both the 206 00:07:05,580 --> 00:07:07,350 cross validation error and the 207 00:07:07,420 --> 00:07:09,130 training error are high, 208 00:07:09,560 --> 00:07:10,440 and so you end up with 209 00:07:10,650 --> 00:07:12,040 a relatively high value of 210 00:07:12,280 --> 00:07:14,250 both Jcv and the j train. 211 00:07:15,370 --> 00:07:16,820 This also implies something very 212 00:07:17,120 --> 00:07:18,520 interesting, which is that, 213 00:07:18,800 --> 00:07:19,990 if a learning algorithm has high 214 00:07:20,360 --> 00:07:22,250 bias, as we 215 00:07:22,390 --> 00:07:23,430 get more and more training examples, 216 00:07:24,060 --> 00:07:25,100 that is, as we move to 217 00:07:25,210 --> 00:07:26,600 the right of this figure, we'll 218 00:07:26,740 --> 00:07:27,880 notice that the cross 219 00:07:28,220 --> 00:07:29,430 validation error isn't going 220 00:07:29,740 --> 00:07:31,020 down much, it's basically fattened 221 00:07:31,560 --> 00:07:32,820 up, and so if 222 00:07:32,950 --> 00:07:35,020 learning algorithms are really suffering from high bias. 223 00:07:36,640 --> 00:07:38,200 Getting more training data by 224 00:07:38,370 --> 00:07:39,710 itself will actually not help 225 00:07:40,190 --> 00:07:41,580 that much,and as our figure 226 00:07:41,760 --> 00:07:43,120 example in the figure 227 00:07:43,210 --> 00:07:45,670 on the right, here we had only five training. 228 00:07:46,060 --> 00:07:47,970 examples, and we fill certain straight line. 229 00:07:48,550 --> 00:07:49,270 And when we had a ton 230 00:07:49,540 --> 00:07:50,730 more training data, we still 231 00:07:51,040 --> 00:07:52,710 end up with roughly the same straight line. 232 00:07:53,200 --> 00:07:54,290 And so if the learning algorithm 233 00:07:54,440 --> 00:07:57,090 has high bias give me a lot more training data. 234 00:07:57,650 --> 00:07:59,060 That doesn't actually help you 235 00:07:59,830 --> 00:08:01,290 get a much lower cross validation 236 00:08:01,890 --> 00:08:02,890 error or test set error. 237 00:08:03,730 --> 00:08:04,950 So knowing if your learning 238 00:08:05,250 --> 00:08:06,600 algorithm is suffering from high 239 00:08:06,780 --> 00:08:07,620 bias seems like a useful 240 00:08:08,100 --> 00:08:09,500 thing to know because this can 241 00:08:09,640 --> 00:08:11,140 prevent you from wasting a 242 00:08:11,290 --> 00:08:12,520 lot of time collecting more training 243 00:08:12,920 --> 00:08:15,440 data where it might just not end up being helpful. 244 00:08:16,200 --> 00:08:17,070 Next let us look at the 245 00:08:17,140 --> 00:08:18,530 setting of a learning algorithm 246 00:08:19,470 --> 00:08:20,340 that may have high variance. 247 00:08:21,590 --> 00:08:22,880 Let us just look at the 248 00:08:23,550 --> 00:08:24,260 training error in a around if 249 00:08:25,120 --> 00:08:26,350 you have very smart training 250 00:08:26,680 --> 00:08:28,730 set like five training examples shown on 251 00:08:29,130 --> 00:08:30,720 the figure on the right and 252 00:08:31,150 --> 00:08:32,170 if we're fitting say a 253 00:08:32,200 --> 00:08:33,050 very high order polynomial, 254 00:08:34,380 --> 00:08:36,530 and I've written a hundredth degree polynomial which 255 00:08:37,090 --> 00:08:38,750 really no one uses, but just an illustration. 256 00:08:39,920 --> 00:08:41,460 And if we're using a 257 00:08:41,550 --> 00:08:43,160 fairly small value of lambda, 258 00:08:43,800 --> 00:08:44,920 maybe not zero, but a fairly 259 00:08:45,070 --> 00:08:46,830 small value of lambda, then 260 00:08:47,040 --> 00:08:47,980 we'll end up, you know, 261 00:08:48,190 --> 00:08:50,590 fitting this data very well that with 262 00:08:50,860 --> 00:08:53,390 a function that overfits this. 263 00:08:54,380 --> 00:08:55,640 So, if the training 264 00:08:55,990 --> 00:08:57,820 set size is small, our training 265 00:08:58,320 --> 00:08:59,530 error, that is, j train 266 00:09:00,030 --> 00:09:01,810 of theta will be small. 267 00:09:03,130 --> 00:09:04,330 And as this training set size increases 268 00:09:04,940 --> 00:09:05,870 a bit, you know, we may 269 00:09:06,000 --> 00:09:07,160 still be overfitting this 270 00:09:07,330 --> 00:09:08,810 data a little bit but 271 00:09:09,780 --> 00:09:11,880 it also becomes slightly harder to 272 00:09:12,020 --> 00:09:12,970 fit this data set perfectly, 273 00:09:13,940 --> 00:09:15,140 and so, as the training set size 274 00:09:15,350 --> 00:09:16,810 increases, we'll find that 275 00:09:16,960 --> 00:09:19,390 j train increases, because 276 00:09:19,840 --> 00:09:21,040 it is just a little harder to fit 277 00:09:21,260 --> 00:09:22,720 the training set perfectly when we have 278 00:09:22,890 --> 00:09:25,700 more examples, but the training set error will still be pretty low. 279 00:09:26,530 --> 00:09:28,600 Now, how about the cross validation error? 280 00:09:29,220 --> 00:09:30,590 Well, in high variance 281 00:09:31,040 --> 00:09:32,760 setting, a hypothesis is 282 00:09:32,980 --> 00:09:34,190 overfitting and so the 283 00:09:34,290 --> 00:09:35,680 cross validation error will remain 284 00:09:36,120 --> 00:09:37,650 high, even as we 285 00:09:37,750 --> 00:09:38,930 get you know, a moderate number 286 00:09:39,260 --> 00:09:40,520 of training examples and, so 287 00:09:41,170 --> 00:09:42,950 maybe, the cross validation 288 00:09:43,730 --> 00:09:45,520 error may look like that. 289 00:09:45,660 --> 00:09:47,720 And the indicative diagnostic that we 290 00:09:47,830 --> 00:09:49,200 have a high variance problem, 291 00:09:50,210 --> 00:09:51,490 is the fact that there's 292 00:09:51,720 --> 00:09:54,010 this large gap between 293 00:09:54,340 --> 00:09:56,440 the training error and the cross validation error. 294 00:09:57,440 --> 00:09:58,180 And looking at this figure. 295 00:09:58,720 --> 00:10:00,170 If we think about adding more 296 00:10:00,440 --> 00:10:01,810 training data, that is, taking 297 00:10:02,110 --> 00:10:03,660 this figure and extrapolating to 298 00:10:03,790 --> 00:10:05,220 the right, we can kind 299 00:10:05,330 --> 00:10:06,830 of tell that, you know the 300 00:10:07,030 --> 00:10:08,120 two curves, the blue curve 301 00:10:08,480 --> 00:10:10,480 and the magenta curve, are converging to each other. 302 00:10:11,420 --> 00:10:12,360 And so, if we were to 303 00:10:12,520 --> 00:10:13,840 extrapolate this figure to 304 00:10:13,980 --> 00:10:21,230 the right, then it 305 00:10:21,360 --> 00:10:23,000 seems it likely that the 306 00:10:23,170 --> 00:10:24,120 training error will keep on 307 00:10:24,270 --> 00:10:25,740 going up and the 308 00:10:27,130 --> 00:10:29,040 cross-validation error would keep on going down. 309 00:10:30,000 --> 00:10:32,340 And the thing we really care about is the cross-validation error 310 00:10:33,010 --> 00:10:35,150 or the test set error, right? 311 00:10:35,300 --> 00:10:36,460 So in this sort 312 00:10:36,730 --> 00:10:37,850 of figure, we can tell that 313 00:10:38,230 --> 00:10:39,420 if we keep on adding training 314 00:10:39,820 --> 00:10:40,930 examples and extrapolate to the 315 00:10:41,050 --> 00:10:42,650 right, well our cross validation 316 00:10:43,290 --> 00:10:44,610 error will keep on coming down. 317 00:10:45,120 --> 00:10:46,090 And, so, in the high 318 00:10:46,330 --> 00:10:47,980 variance setting, getting more 319 00:10:48,180 --> 00:10:49,550 training data is, indeed, 320 00:10:50,170 --> 00:10:51,240 likely to help. 321 00:10:51,520 --> 00:10:52,810 And so again, this seems like a 322 00:10:53,060 --> 00:10:54,180 useful thing to know if your 323 00:10:54,330 --> 00:10:55,830 learning algorithm is suffering 324 00:10:56,150 --> 00:10:57,460 from a high variance problem, because 325 00:10:57,810 --> 00:10:59,150 that tells you, for example that it 326 00:10:59,220 --> 00:11:00,100 may be be worth your while 327 00:11:00,680 --> 00:11:02,430 to see if you can go and get some more training data. 328 00:11:03,700 --> 00:11:04,920 Now, on the previous slide 329 00:11:05,330 --> 00:11:06,450 and this slide, I've drawn fairly 330 00:11:06,970 --> 00:11:08,510 clean fairly idealized curves. 331 00:11:08,900 --> 00:11:10,050 If you plot these curves for 332 00:11:10,170 --> 00:11:11,970 an actual learning algorithm, sometimes 333 00:11:12,500 --> 00:11:13,910 you will actually see, you know, pretty 334 00:11:14,560 --> 00:11:15,900 much curves, like what I've drawn here. 335 00:11:16,600 --> 00:11:17,730 Although, sometimes you see curves 336 00:11:18,150 --> 00:11:19,160 that are a little bit noisier and 337 00:11:19,230 --> 00:11:20,820 a little bit messier than this. 338 00:11:21,090 --> 00:11:22,440 But plotting learning curves like 339 00:11:22,620 --> 00:11:23,850 these can often tell 340 00:11:24,120 --> 00:11:25,460 you, can often help you 341 00:11:25,570 --> 00:11:26,650 figure out if your learning algorithm is 342 00:11:26,950 --> 00:11:29,080 suffering from bias, or variance or even a little bit of both. 343 00:11:29,170 --> 00:11:31,030 So when I'm 344 00:11:31,200 --> 00:11:32,700 trying to improve the performance of 345 00:11:32,760 --> 00:11:34,060 a learning algorithm, one thing 346 00:11:34,260 --> 00:11:35,720 that I'll almost always do 347 00:11:35,960 --> 00:11:37,440 is plot these learning 348 00:11:37,970 --> 00:11:39,460 curves, and usually this will 349 00:11:39,490 --> 00:11:41,710 give you a better sense of whether there is a bias or variance problem. 350 00:11:44,280 --> 00:11:45,180 And in the next video 351 00:11:45,420 --> 00:11:46,440 we'll see how this can 352 00:11:46,650 --> 00:11:48,370 help suggest specific actions is 353 00:11:48,450 --> 00:11:49,580 to take, or to not take, 354 00:11:50,260 --> 00:11:53,250 in order to try to improve the performance of your learning algorithm.