1
00:00:00,090 --> 00:00:02,040
In this video, I'd like to tell you about learning curves.

2
00:00:03,310 --> 00:00:05,850
Learning curves is often a very useful thing to plot.

3
00:00:06,710 --> 00:00:08,170
If either you wanted to sanity check

4
00:00:08,430 --> 00:00:09,590
that your algorithm is working correctly,

5
00:00:10,400 --> 00:00:12,730
or if you want to improve the performance of the algorithm.

6
00:00:13,950 --> 00:00:15,200
And learning curves is a

7
00:00:15,310 --> 00:00:16,410
tool that I actually use

8
00:00:16,820 --> 00:00:17,920
very often to try to

9
00:00:18,290 --> 00:00:20,030
diagnose if a physical learning algorithm may be

10
00:00:20,180 --> 00:00:23,220
suffering from bias, sort of variance problem or a bit of both.

11
00:00:27,170 --> 00:00:28,070
Here's what a learning curve is.

12
00:00:28,830 --> 00:00:30,550
To plot a learning curve, what

13
00:00:30,700 --> 00:00:31,760
I usually do is plot

14
00:00:32,210 --> 00:00:33,950
j train which is, say,

15
00:00:35,030 --> 00:00:36,050
average squared error on my training

16
00:00:36,440 --> 00:00:39,090
set or Jcv which is

17
00:00:39,340 --> 00:00:41,130
the average squared error on my cross validation set.

18
00:00:41,590 --> 00:00:42,900
And I'm going to plot

19
00:00:43,140 --> 00:00:44,160
that as a function

20
00:00:44,500 --> 00:00:46,380
of m, that is as a function

21
00:00:47,230 --> 00:00:51,260
of the number of training examples I have.

22
00:00:51,950 --> 00:00:53,420
And so m is usually a constant like maybe I just have, you know, a 100

23
00:00:53,650 --> 00:00:55,220
training examples but what I'm

24
00:00:55,330 --> 00:00:57,670
going to do is artificially with

25
00:00:57,860 --> 00:00:59,280
use my training set exercise. So, I

26
00:00:59,500 --> 00:01:01,460
deliberately limit myself to using only,

27
00:01:01,840 --> 00:01:03,440
say, 10 or 20 or

28
00:01:03,660 --> 00:01:06,040
30 or 40 training examples and

29
00:01:06,170 --> 00:01:07,610
plot what the training error is and

30
00:01:07,740 --> 00:01:09,640
what the cross validation is for this

31
00:01:10,040 --> 00:01:12,260
smallest training set exercises.
So

32
00:01:12,620 --> 00:01:14,090
let's see what these plots may look

33
00:01:14,270 --> 00:01:15,530
like. Suppose I have only

34
00:01:15,730 --> 00:01:17,210
one training example like that

35
00:01:17,390 --> 00:01:18,450
shown in this this first example

36
00:01:18,860 --> 00:01:19,970
here and let's say I'm fitting a quadratic function. Well, I

37
00:01:22,470 --> 00:01:24,490
have only one training example. I'm

38
00:01:25,040 --> 00:01:26,100
going to be able to fit it perfectly

39
00:01:26,650 --> 00:01:28,590
right? You know, just fit the quadratic function. I'm

40
00:01:28,760 --> 00:01:30,000
going to have 0

41
00:01:30,150 --> 00:01:32,240
error on the one training example. If I

42
00:01:32,570 --> 00:01:34,170
have two training examples. Well the quadratic function can also fit that very well. So,

43
00:01:37,050 --> 00:01:38,550
even if I am using regularization,

44
00:01:38,750 --> 00:01:40,220
I can probably fit this quite well.

45
00:01:41,080 --> 00:01:41,970
And if I am using no neural regularization,

46
00:01:42,030 --> 00:01:45,200
I'm going to fit this perfectly and

47
00:01:45,440 --> 00:01:46,400
if I have three training examples

48
00:01:47,260 --> 00:01:48,380
again. Yeah, I can fit a quadratic

49
00:01:48,660 --> 00:01:51,320
function perfectly so if

50
00:01:51,550 --> 00:01:52,590
m equals 1 or m equals 2 or m equals 3,

51
00:01:54,850 --> 00:01:56,770
my training error

52
00:01:57,350 --> 00:01:58,870
on my training set is

53
00:01:59,110 --> 00:02:01,180
going to be 0 assuming I'm

54
00:02:01,220 --> 00:02:02,760
not using regularization or it may

55
00:02:03,150 --> 00:02:04,290
slightly large in 0 if

56
00:02:04,560 --> 00:02:06,400
I'm using regularization and

57
00:02:06,500 --> 00:02:07,350
by the way if I have

58
00:02:07,740 --> 00:02:08,980
a large training set and I'm artificially

59
00:02:09,940 --> 00:02:11,040
restricting the size of my

60
00:02:11,120 --> 00:02:13,080
training set in order to J train.

61
00:02:13,830 --> 00:02:14,770
Here if I set

62
00:02:15,110 --> 00:02:16,720
M equals 3, say, and I

63
00:02:17,040 --> 00:02:18,290
train on only three examples,

64
00:02:19,270 --> 00:02:21,030
then, for this figure I

65
00:02:21,110 --> 00:02:22,430
am going to measure my training error

66
00:02:22,830 --> 00:02:24,450
only on the three examples that

67
00:02:24,550 --> 00:02:25,580
actually fit my data too

68
00:02:27,150 --> 00:02:28,130
and so even I have to say

69
00:02:28,290 --> 00:02:31,160
a 100 training examples but if I want to plot what my

70
00:02:31,430 --> 00:02:32,620
training error is the m equals 3. What I'm going to do

71
00:02:34,270 --> 00:02:35,200
is to measure the

72
00:02:35,340 --> 00:02:36,660
training error on the

73
00:02:36,750 --> 00:02:39,870
three examples that I've actually fit to my hypothesis 2.

74
00:02:41,290 --> 00:02:42,900
And not all the other examples that I have

75
00:02:43,010 --> 00:02:44,940
deliberately omitted from the training

76
00:02:45,140 --> 00:02:46,750
process. So just to summarize what we've

77
00:02:46,960 --> 00:02:48,460
seen is that if the training set

78
00:02:48,820 --> 00:02:50,560
size is small then the

79
00:02:50,630 --> 00:02:52,630
training error is going to be small as well.

80
00:02:52,960 --> 00:02:53,900
Because you know, we have a

81
00:02:53,930 --> 00:02:55,150
small training set is

82
00:02:55,350 --> 00:02:56,790
going to be very easy to

83
00:02:56,900 --> 00:02:58,080
fit your training set

84
00:02:58,720 --> 00:02:59,490
very well may be even

85
00:02:59,790 --> 00:03:02,970
perfectly now say

86
00:03:03,190 --> 00:03:04,460
we have m equals 4 for example. Well then

87
00:03:04,680 --> 00:03:06,800
a quadratic function can be

88
00:03:06,920 --> 00:03:07,900
a longer fit this data set

89
00:03:08,100 --> 00:03:09,680
perfectly and if I

90
00:03:09,790 --> 00:03:11,350
have m equals 5 then you

91
00:03:11,460 --> 00:03:13,830
know, maybe quadratic function will fit to stay there so

92
00:03:14,090 --> 00:03:15,940
so, then as my training set gets larger.

93
00:03:16,980 --> 00:03:18,460
It becomes harder and harder to

94
00:03:18,620 --> 00:03:19,860
ensure that I can

95
00:03:20,060 --> 00:03:21,820
find the quadratic function that process through

96
00:03:21,960 --> 00:03:25,460
all my examples perfectly. So

97
00:03:25,840 --> 00:03:27,300
in fact as the training set size

98
00:03:27,690 --> 00:03:28,770
grows what you find

99
00:03:29,300 --> 00:03:30,960
is that my average training error

100
00:03:31,310 --> 00:03:33,080
actually increases and so if you plot

101
00:03:33,500 --> 00:03:34,650
this figure what you find

102
00:03:35,220 --> 00:03:36,860
is that the training set

103
00:03:37,130 --> 00:03:38,520
error that is the average

104
00:03:38,940 --> 00:03:40,660
error on your hypothesis grows

105
00:03:41,300 --> 00:03:44,730
as m grows and just to repeat when the intuition is that when

106
00:03:45,020 --> 00:03:46,200
m is small when you have very

107
00:03:46,500 --> 00:03:48,070
few training examples. It's pretty

108
00:03:48,350 --> 00:03:49,420
easy to fit every single

109
00:03:49,790 --> 00:03:51,350
one of your training examples perfectly and

110
00:03:51,610 --> 00:03:52,840
so your error is going

111
00:03:52,940 --> 00:03:54,540
to be small whereas

112
00:03:54,710 --> 00:03:56,100
when m is larger then gets

113
00:03:56,460 --> 00:03:57,900
harder all the training

114
00:03:58,220 --> 00:03:59,900
examples perfectly and so

115
00:04:00,430 --> 00:04:01,830
your training set error becomes

116
00:04:02,370 --> 00:04:05,840
more larger now, how about the cross validation error.

117
00:04:06,720 --> 00:04:08,460
Well, the cross validation is

118
00:04:08,590 --> 00:04:10,100
my error on this cross

119
00:04:10,350 --> 00:04:12,660
validation set that I haven't seen and

120
00:04:12,880 --> 00:04:14,600
so, you know, when I have

121
00:04:14,720 --> 00:04:15,900
a very small training set, I'm

122
00:04:16,080 --> 00:04:16,890
not going to generalize well, just

123
00:04:17,020 --> 00:04:19,610
not going to do well on that.

124
00:04:19,850 --> 00:04:21,220
So, right, this hypothesis here doesn't

125
00:04:21,620 --> 00:04:22,720
look like a good one, and

126
00:04:23,020 --> 00:04:23,970
it's only when I get

127
00:04:24,050 --> 00:04:25,270
a larger training set that,

128
00:04:25,500 --> 00:04:26,380
you know, I'm starting to get

129
00:04:26,890 --> 00:04:28,100
hypotheses that maybe fit

130
00:04:28,480 --> 00:04:30,810
the data somewhat better.

131
00:04:31,380 --> 00:04:32,050
So your cross validation error and

132
00:04:32,260 --> 00:04:35,650
your test set error will tend

133
00:04:35,890 --> 00:04:37,160
to decrease as your training

134
00:04:37,470 --> 00:04:39,150
set size increases because the

135
00:04:39,250 --> 00:04:40,700
more data you have, the better

136
00:04:40,990 --> 00:04:43,410
you do at generalizing to new examples.

137
00:04:44,010 --> 00:04:46,730
So, just the more data you have, the better the hypothesis you fit.

138
00:04:47,560 --> 00:04:48,560
So if you plot j train,

139
00:04:49,420 --> 00:04:51,670
and Jcv this is the sort of thing that you get.

140
00:04:52,490 --> 00:04:53,550
Now let's look at what

141
00:04:53,770 --> 00:04:54,940
the learning curves may look like

142
00:04:55,360 --> 00:04:56,550
if we have either high

143
00:04:56,930 --> 00:04:58,210
bias or high variance problems.

144
00:04:58,920 --> 00:05:00,530
Suppose your hypothesis has high

145
00:05:00,830 --> 00:05:02,150
bias and to explain this

146
00:05:02,370 --> 00:05:03,780
I'm going to use a, set an

147
00:05:03,940 --> 00:05:05,250
example, of fitting a straight

148
00:05:05,440 --> 00:05:06,500
line to data that, you

149
00:05:06,770 --> 00:05:08,240
know, can't really be fit well by a straight line.

150
00:05:09,540 --> 00:05:12,330
So we end up with a hypotheses that maybe looks like that.

151
00:05:13,910 --> 00:05:15,450
Now let's think what would

152
00:05:15,750 --> 00:05:16,840
happen if we were to increase

153
00:05:17,470 --> 00:05:18,880
the training set size. So if

154
00:05:19,160 --> 00:05:20,480
instead of five examples like

155
00:05:20,590 --> 00:05:22,400
what I've drawn there, imagine that

156
00:05:22,570 --> 00:05:24,080
we have a lot more training examples.

157
00:05:25,280 --> 00:05:27,230
Well what happens, if you fit a straight line to this.

158
00:05:27,980 --> 00:05:29,700
What you find is that, you

159
00:05:30,040 --> 00:05:31,360
end up with you know, pretty much the same straight line.

160
00:05:31,690 --> 00:05:32,940
I mean a straight line that

161
00:05:33,530 --> 00:05:35,110
just cannot fit this

162
00:05:35,270 --> 00:05:37,320
data and getting a ton more data, well

163
00:05:37,890 --> 00:05:39,460
the straight line isn't going to change that much.

164
00:05:40,230 --> 00:05:41,400
This is the best possible straight-line

165
00:05:41,840 --> 00:05:42,770
fit to this data, but the

166
00:05:42,890 --> 00:05:44,160
straight line just can't fit this

167
00:05:44,320 --> 00:05:45,630
data set that well. So,

168
00:05:45,870 --> 00:05:47,420
if you plot across validation error,

169
00:05:49,260 --> 00:05:50,170
this is what it will look like.

170
00:05:51,320 --> 00:05:54,470
Option on the left, if you have already a miniscule training set size like you know,

171
00:05:55,410 --> 00:05:57,710
maybe just one training example and is not going to do well.

172
00:05:58,550 --> 00:05:59,470
But by the time you have

173
00:05:59,660 --> 00:06:00,760
reached a certain number of training

174
00:06:00,940 --> 00:06:02,350
examples, you have almost

175
00:06:02,810 --> 00:06:04,010
fit the best possible straight

176
00:06:04,200 --> 00:06:05,400
line, and even if

177
00:06:05,490 --> 00:06:06,260
you end up with a much

178
00:06:06,480 --> 00:06:07,790
larger training set size, a

179
00:06:07,970 --> 00:06:09,170
much larger value of m,

180
00:06:10,010 --> 00:06:12,040
you know, you're basically getting the same straight line,

181
00:06:12,370 --> 00:06:14,190
and so, the cross-validation error

182
00:06:14,480 --> 00:06:15,420
- let me label that -

183
00:06:15,650 --> 00:06:17,040
or test set error or

184
00:06:17,140 --> 00:06:18,660
plateau out, or flatten out

185
00:06:18,990 --> 00:06:20,480
pretty soon, once you reached

186
00:06:20,910 --> 00:06:22,920
beyond a certain the number

187
00:06:23,270 --> 00:06:24,700
of training examples, unless you

188
00:06:25,130 --> 00:06:27,480
pretty much fit the best possible straight line.

189
00:06:28,390 --> 00:06:29,540
And how about training error?

190
00:06:30,120 --> 00:06:33,050
Well, the training error will again be small.

191
00:06:34,620 --> 00:06:36,280
And what you find

192
00:06:36,760 --> 00:06:38,080
in the high bias case is

193
00:06:38,210 --> 00:06:40,770
that the training error will end

194
00:06:41,000 --> 00:06:42,510
up close to the cross

195
00:06:42,830 --> 00:06:44,700
validation error, because you

196
00:06:44,810 --> 00:06:46,370
have so few parameters and so

197
00:06:46,590 --> 00:06:48,070
much data, at least when m is large.

198
00:06:48,900 --> 00:06:49,840
The performance on the training

199
00:06:50,220 --> 00:06:52,500
set and the cross validation set will be very similar.

200
00:06:53,800 --> 00:06:54,750
And so, this is what your

201
00:06:54,870 --> 00:06:56,460
learning curves will look like,

202
00:06:56,770 --> 00:06:58,850
if you have an algorithm that has high bias.

203
00:07:00,220 --> 00:07:01,470
And finally, the problem with

204
00:07:01,630 --> 00:07:03,260
high bias is reflected in

205
00:07:03,450 --> 00:07:04,930
the fact that both the

206
00:07:05,580 --> 00:07:07,350
cross validation error and the

207
00:07:07,420 --> 00:07:09,130
training error are high,

208
00:07:09,560 --> 00:07:10,440
and so you end up with

209
00:07:10,650 --> 00:07:12,040
a relatively high value of

210
00:07:12,280 --> 00:07:14,250
both Jcv and the j train.

211
00:07:15,370 --> 00:07:16,820
This also implies something very

212
00:07:17,120 --> 00:07:18,520
interesting, which is that,

213
00:07:18,800 --> 00:07:19,990
if a learning algorithm has high

214
00:07:20,360 --> 00:07:22,250
bias, as we

215
00:07:22,390 --> 00:07:23,430
get more and more training examples,

216
00:07:24,060 --> 00:07:25,100
that is, as we move to

217
00:07:25,210 --> 00:07:26,600
the right of this figure, we'll

218
00:07:26,740 --> 00:07:27,880
notice that the cross

219
00:07:28,220 --> 00:07:29,430
validation error isn't going

220
00:07:29,740 --> 00:07:31,020
down much, it's basically fattened

221
00:07:31,560 --> 00:07:32,820
up, and so if

222
00:07:32,950 --> 00:07:35,020
learning algorithms are really suffering from high bias.

223
00:07:36,640 --> 00:07:38,200
Getting more training data by

224
00:07:38,370 --> 00:07:39,710
itself will actually not help

225
00:07:40,190 --> 00:07:41,580
that much,and as our figure

226
00:07:41,760 --> 00:07:43,120
example in the figure

227
00:07:43,210 --> 00:07:45,670
on the right, here we had only five training.

228
00:07:46,060 --> 00:07:47,970
examples, and we fill certain straight line.

229
00:07:48,550 --> 00:07:49,270
And when we had a ton

230
00:07:49,540 --> 00:07:50,730
more training data, we still

231
00:07:51,040 --> 00:07:52,710
end up with roughly the same straight line.

232
00:07:53,200 --> 00:07:54,290
And so if the learning algorithm

233
00:07:54,440 --> 00:07:57,090
has high bias give me a lot more training data.

234
00:07:57,650 --> 00:07:59,060
That doesn't actually help you

235
00:07:59,830 --> 00:08:01,290
get a much lower cross validation

236
00:08:01,890 --> 00:08:02,890
error or test set error.

237
00:08:03,730 --> 00:08:04,950
So knowing if your learning

238
00:08:05,250 --> 00:08:06,600
algorithm is suffering from high

239
00:08:06,780 --> 00:08:07,620
bias seems like a useful

240
00:08:08,100 --> 00:08:09,500
thing to know because this can

241
00:08:09,640 --> 00:08:11,140
prevent you from wasting a

242
00:08:11,290 --> 00:08:12,520
lot of time collecting more training

243
00:08:12,920 --> 00:08:15,440
data where it might just not end up being helpful.

244
00:08:16,200 --> 00:08:17,070
Next let us look at the

245
00:08:17,140 --> 00:08:18,530
setting of a learning algorithm

246
00:08:19,470 --> 00:08:20,340
that may have high variance.

247
00:08:21,590 --> 00:08:22,880
Let us just look at the

248
00:08:23,550 --> 00:08:24,260
training error in a around if

249
00:08:25,120 --> 00:08:26,350
you have very smart training

250
00:08:26,680 --> 00:08:28,730
set like five training examples shown on

251
00:08:29,130 --> 00:08:30,720
the figure on the right and

252
00:08:31,150 --> 00:08:32,170
if we're fitting say a

253
00:08:32,200 --> 00:08:33,050
very high order polynomial,

254
00:08:34,380 --> 00:08:36,530
and I've written a hundredth degree polynomial which

255
00:08:37,090 --> 00:08:38,750
really no one uses, but just an illustration.

256
00:08:39,920 --> 00:08:41,460
And if we're using a

257
00:08:41,550 --> 00:08:43,160
fairly small value of lambda,

258
00:08:43,800 --> 00:08:44,920
maybe not zero, but a fairly

259
00:08:45,070 --> 00:08:46,830
small value of lambda, then

260
00:08:47,040 --> 00:08:47,980
we'll end up, you know,

261
00:08:48,190 --> 00:08:50,590
fitting this data very well that with

262
00:08:50,860 --> 00:08:53,390
a function that overfits this.

263
00:08:54,380 --> 00:08:55,640
So, if the training

264
00:08:55,990 --> 00:08:57,820
set size is small, our training

265
00:08:58,320 --> 00:08:59,530
error, that is, j train

266
00:09:00,030 --> 00:09:01,810
of theta will be small.

267
00:09:03,130 --> 00:09:04,330
And as this training set size increases

268
00:09:04,940 --> 00:09:05,870
a bit, you know, we may

269
00:09:06,000 --> 00:09:07,160
still be overfitting this

270
00:09:07,330 --> 00:09:08,810
data a little bit but

271
00:09:09,780 --> 00:09:11,880
it also becomes slightly harder to

272
00:09:12,020 --> 00:09:12,970
fit this data set perfectly,

273
00:09:13,940 --> 00:09:15,140
and so, as the training set size

274
00:09:15,350 --> 00:09:16,810
increases, we'll find that

275
00:09:16,960 --> 00:09:19,390
j train increases, because

276
00:09:19,840 --> 00:09:21,040
it is just a little harder to fit

277
00:09:21,260 --> 00:09:22,720
the training set perfectly when we have

278
00:09:22,890 --> 00:09:25,700
more examples, but the training set error will still be pretty low.

279
00:09:26,530 --> 00:09:28,600
Now, how about the cross validation error?

280
00:09:29,220 --> 00:09:30,590
Well, in high variance

281
00:09:31,040 --> 00:09:32,760
setting, a hypothesis is

282
00:09:32,980 --> 00:09:34,190
overfitting and so the

283
00:09:34,290 --> 00:09:35,680
cross validation error will remain

284
00:09:36,120 --> 00:09:37,650
high, even as we

285
00:09:37,750 --> 00:09:38,930
get you know, a moderate number

286
00:09:39,260 --> 00:09:40,520
of training examples and, so

287
00:09:41,170 --> 00:09:42,950
maybe, the cross validation

288
00:09:43,730 --> 00:09:45,520
error may look like that.

289
00:09:45,660 --> 00:09:47,720
And the indicative diagnostic that we

290
00:09:47,830 --> 00:09:49,200
have a high variance problem,

291
00:09:50,210 --> 00:09:51,490
is the fact that there's

292
00:09:51,720 --> 00:09:54,010
this large gap between

293
00:09:54,340 --> 00:09:56,440
the training error and the cross validation error.

294
00:09:57,440 --> 00:09:58,180
And looking at this figure.

295
00:09:58,720 --> 00:10:00,170
If we think about adding more

296
00:10:00,440 --> 00:10:01,810
training data, that is, taking

297
00:10:02,110 --> 00:10:03,660
this figure and extrapolating to

298
00:10:03,790 --> 00:10:05,220
the right, we can kind

299
00:10:05,330 --> 00:10:06,830
of tell that, you know the

300
00:10:07,030 --> 00:10:08,120
two curves, the blue curve

301
00:10:08,480 --> 00:10:10,480
and the magenta curve, are converging to each other.

302
00:10:11,420 --> 00:10:12,360
And so, if we were to

303
00:10:12,520 --> 00:10:13,840
extrapolate this figure to

304
00:10:13,980 --> 00:10:21,230
the right, then it

305
00:10:21,360 --> 00:10:23,000
seems it likely that the

306
00:10:23,170 --> 00:10:24,120
training error will keep on

307
00:10:24,270 --> 00:10:25,740
going up and the

308
00:10:27,130 --> 00:10:29,040
cross-validation error would keep on going down.

309
00:10:30,000 --> 00:10:32,340
And the thing we really care about is the cross-validation error

310
00:10:33,010 --> 00:10:35,150
or the test set error, right?

311
00:10:35,300 --> 00:10:36,460
So in this sort

312
00:10:36,730 --> 00:10:37,850
of figure, we can tell that

313
00:10:38,230 --> 00:10:39,420
if we keep on adding training

314
00:10:39,820 --> 00:10:40,930
examples and extrapolate to the

315
00:10:41,050 --> 00:10:42,650
right, well our cross validation

316
00:10:43,290 --> 00:10:44,610
error will keep on coming down.

317
00:10:45,120 --> 00:10:46,090
And, so, in the high

318
00:10:46,330 --> 00:10:47,980
variance setting, getting more

319
00:10:48,180 --> 00:10:49,550
training data is, indeed,

320
00:10:50,170 --> 00:10:51,240
likely to help.

321
00:10:51,520 --> 00:10:52,810
And so again, this seems like a

322
00:10:53,060 --> 00:10:54,180
useful thing to know if your

323
00:10:54,330 --> 00:10:55,830
learning algorithm is suffering

324
00:10:56,150 --> 00:10:57,460
from a high variance problem, because

325
00:10:57,810 --> 00:10:59,150
that tells you, for example that it

326
00:10:59,220 --> 00:11:00,100
may be be worth your while

327
00:11:00,680 --> 00:11:02,430
to see if you can go and get some more training data.

328
00:11:03,700 --> 00:11:04,920
Now, on the previous slide

329
00:11:05,330 --> 00:11:06,450
and this slide, I've drawn fairly

330
00:11:06,970 --> 00:11:08,510
clean fairly idealized curves.

331
00:11:08,900 --> 00:11:10,050
If you plot these curves for

332
00:11:10,170 --> 00:11:11,970
an actual learning algorithm, sometimes

333
00:11:12,500 --> 00:11:13,910
you will actually see, you know, pretty

334
00:11:14,560 --> 00:11:15,900
much curves, like what I've drawn here.

335
00:11:16,600 --> 00:11:17,730
Although, sometimes you see curves

336
00:11:18,150 --> 00:11:19,160
that are a little bit noisier and

337
00:11:19,230 --> 00:11:20,820
a little bit messier than this.

338
00:11:21,090 --> 00:11:22,440
But plotting learning curves like

339
00:11:22,620 --> 00:11:23,850
these can often tell

340
00:11:24,120 --> 00:11:25,460
you, can often help you

341
00:11:25,570 --> 00:11:26,650
figure out if your learning algorithm is

342
00:11:26,950 --> 00:11:29,080
suffering from bias, or variance or even a little bit of both.

343
00:11:29,170 --> 00:11:31,030
So when I'm

344
00:11:31,200 --> 00:11:32,700
trying to improve the performance of

345
00:11:32,760 --> 00:11:34,060
a learning algorithm, one thing

346
00:11:34,260 --> 00:11:35,720
that I'll almost always do

347
00:11:35,960 --> 00:11:37,440
is plot these learning

348
00:11:37,970 --> 00:11:39,460
curves, and usually this will

349
00:11:39,490 --> 00:11:41,710
give you a better sense of whether there is a bias or variance problem.

350
00:11:44,280 --> 00:11:45,180
And in the next video

351
00:11:45,420 --> 00:11:46,440
we'll see how this can

352
00:11:46,650 --> 00:11:48,370
help suggest specific actions is

353
00:11:48,450 --> 00:11:49,580
to take, or to not take,

354
00:11:50,260 --> 00:11:53,250
in order to try to improve the performance of your learning algorithm.