1
00:00:00,390 --> 00:00:03,570
In the previous video, we talked about evaluation metrics.

2
00:00:04,730 --> 00:00:05,840
In this video, I'd like

3
00:00:06,080 --> 00:00:07,230
to switch tracks a bit and

4
00:00:07,480 --> 00:00:08,900
touch on another important aspect of

5
00:00:09,570 --> 00:00:10,990
machine learning system design,

6
00:00:11,800 --> 00:00:13,290
which will often come up, which

7
00:00:13,470 --> 00:00:14,990
is the issue of how much

8
00:00:15,270 --> 00:00:17,110
data to train on.

9
00:00:17,310 --> 00:00:18,440
Now, in some earlier videos, I

10
00:00:18,620 --> 00:00:20,320
had cautioned against blindly

11
00:00:20,690 --> 00:00:21,660
going out and just spending

12
00:00:21,980 --> 00:00:23,300
lots of time collecting lots of

13
00:00:23,420 --> 00:00:24,730
data, because it's only

14
00:00:25,040 --> 00:00:26,360
sometimes that that would actually help.

15
00:00:27,510 --> 00:00:28,580
But it turns out that under

16
00:00:28,820 --> 00:00:30,270
certain conditions, and I

17
00:00:30,550 --> 00:00:31,580
will say in this video what those

18
00:00:31,770 --> 00:00:33,590
conditions are, getting a

19
00:00:33,820 --> 00:00:35,440
lot of data and training on

20
00:00:35,730 --> 00:00:36,730
a certain type of learning

21
00:00:37,010 --> 00:00:38,160
algorithm, can be a

22
00:00:38,240 --> 00:00:39,470
very effective way to get

23
00:00:39,770 --> 00:00:41,320
a learning algorithm to do very good performance.

24
00:00:42,810 --> 00:00:44,280
And this arises often enough

25
00:00:44,710 --> 00:00:45,930
that if those conditions hold true

26
00:00:46,310 --> 00:00:47,850
for your problem and if

27
00:00:47,970 --> 00:00:48,770
you're able to get a lot

28
00:00:48,980 --> 00:00:50,070
of data, this could be

29
00:00:50,210 --> 00:00:51,330
a very good way to get

30
00:00:51,560 --> 00:00:52,970
a very high performance learning algorithm.

31
00:00:53,990 --> 00:00:55,620
So in this video, let's

32
00:00:56,320 --> 00:00:58,960
talk more about that.

33
00:00:59,110 --> 00:00:59,820
Let me start with a story.

34
00:01:01,080 --> 00:01:02,620
Many, many years ago, two researchers

35
00:01:03,400 --> 00:01:05,380
that I know, Michelle Banko and

36
00:01:05,520 --> 00:01:08,110
Eric Broule ran the following fascinating study.

37
00:01:09,820 --> 00:01:11,290
They were interested in studying the

38
00:01:11,550 --> 00:01:12,910
effect of using different learning

39
00:01:13,290 --> 00:01:15,210
algorithms versus trying them

40
00:01:15,380 --> 00:01:17,420
out on different training set sciences,

41
00:01:18,020 --> 00:01:19,550
they were considering the problem

42
00:01:20,170 --> 00:01:22,120
of classifying between confusable words,

43
00:01:22,550 --> 00:01:23,610
so for example, in the sentence:

44
00:01:24,410 --> 00:01:26,990
for breakfast I ate, should it be to, two or too?

45
00:01:27,940 --> 00:01:28,890
Well, for this example,

46
00:01:29,450 --> 00:01:32,390
for breakfast I ate two, 2 eggs.

47
00:01:33,510 --> 00:01:34,530
So, this is one example

48
00:01:35,320 --> 00:01:37,800
of a set of confusable words and that's a different set.

49
00:01:38,240 --> 00:01:39,650
So they took machine learning

50
00:01:39,950 --> 00:01:41,580
problems like these, sort of supervised learning

51
00:01:41,780 --> 00:01:43,190
problems to try to categorize

52
00:01:43,970 --> 00:01:45,210
what is the appropriate word to

53
00:01:45,400 --> 00:01:46,560
go into a certain position

54
00:01:47,540 --> 00:01:48,140
in an English sentence.

55
00:01:51,010 --> 00:01:52,110
They took a few different learning

56
00:01:52,340 --> 00:01:53,520
algorithms which were, you know,

57
00:01:53,730 --> 00:01:55,210
sort of considered state of

58
00:01:55,310 --> 00:01:56,110
the art back in the day,

59
00:01:56,410 --> 00:01:57,670
when they ran the study in

60
00:01:57,730 --> 00:01:59,220
2001, so they took a

61
00:01:59,750 --> 00:02:01,140
variance, roughly a variance

62
00:02:01,630 --> 00:02:03,180
on logistic regression called the Perceptron.

63
00:02:03,760 --> 00:02:05,160
They also took some of

64
00:02:05,250 --> 00:02:06,700
their algorithms that were fairly

65
00:02:07,090 --> 00:02:08,620
out back then but somewhat less

66
00:02:08,830 --> 00:02:10,600
used now so when the

67
00:02:10,750 --> 00:02:11,980
algorithm also very similar

68
00:02:12,380 --> 00:02:13,410
to which is a regression

69
00:02:13,660 --> 00:02:15,580
but different in some ways, much

70
00:02:16,140 --> 00:02:18,220
used somewhat less, used

71
00:02:18,480 --> 00:02:19,380
not too much right now

72
00:02:20,180 --> 00:02:21,180
took what's called a memory based

73
00:02:21,430 --> 00:02:23,240
learning algorithm again used somewhat less now.

74
00:02:23,610 --> 00:02:25,940
But I'll talk a little bit about that later.

75
00:02:26,230 --> 00:02:27,230
And they used a naive based

76
00:02:27,690 --> 00:02:29,240
algorithm, which is something they'll

77
00:02:29,410 --> 00:02:32,380
actually talk about in this course.

78
00:02:32,690 --> 00:02:34,400
The exact algorithms of these details aren't important.

79
00:02:35,040 --> 00:02:36,080
Think of this as, you know, just picking

80
00:02:36,430 --> 00:02:40,380
four different classification algorithms and really the exact algorithms aren't important.

81
00:02:41,980 --> 00:02:42,990
But what they did was they

82
00:02:43,140 --> 00:02:44,160
varied the training set size

83
00:02:44,500 --> 00:02:45,390
and tried out these learning

84
00:02:45,450 --> 00:02:47,070
algorithms on the range of

85
00:02:47,210 --> 00:02:49,640
training set sizes and that's the result they got.

86
00:02:50,300 --> 00:02:51,310
And the trends are very

87
00:02:51,470 --> 00:02:53,170
clear right first most of

88
00:02:53,290 --> 00:02:55,470
these outer rooms give remarkably similar performance.

89
00:02:56,200 --> 00:02:57,760
And second, as the training

90
00:02:58,150 --> 00:02:59,760
set size increases, on the

91
00:02:59,860 --> 00:03:00,970
horizontal axis is the

92
00:03:01,280 --> 00:03:02,510
training set size in millions

93
00:03:04,070 --> 00:03:05,360
go from you know a

94
00:03:05,420 --> 00:03:07,440
hundred thousand up to a

95
00:03:07,720 --> 00:03:09,060
thousand million that is a

96
00:03:09,330 --> 00:03:10,980
billion training examples. The

97
00:03:11,090 --> 00:03:11,860
performance of the algorithms

98
00:03:12,870 --> 00:03:15,360
all pretty much monotonically increase

99
00:03:15,740 --> 00:03:16,610
and the fact that if

100
00:03:16,650 --> 00:03:18,600
you pick any algorithm may be

101
00:03:19,000 --> 00:03:21,320
pick a "inferior algorithm" but

102
00:03:21,490 --> 00:03:22,650
if you give that "inferior

103
00:03:23,190 --> 00:03:26,150
algorithm" more data, then from

104
00:03:26,390 --> 00:03:27,570
these examples, it looks like

105
00:03:27,670 --> 00:03:30,330
it will most likely beat even a "superior algorithm".

106
00:03:32,200 --> 00:03:33,270
So since this original study

107
00:03:33,720 --> 00:03:35,850
which is very influential, there's been

108
00:03:36,360 --> 00:03:37,500
a range of many different

109
00:03:37,830 --> 00:03:39,020
studies showing similar results

110
00:03:39,550 --> 00:03:40,840
that show that many different learning

111
00:03:41,150 --> 00:03:42,270
algorithms you know tend

112
00:03:42,630 --> 00:03:44,290
to, can sometimes, depending on

113
00:03:44,460 --> 00:03:46,060
details, can give pretty similar ranges

114
00:03:46,490 --> 00:03:48,320
of performance, but what can

115
00:03:48,520 --> 00:03:51,570
really drive performance is you can give the algorithm a ton of training data.

116
00:03:53,190 --> 00:03:54,640
And this is, results like these

117
00:03:55,010 --> 00:03:56,030
has led to a saying in

118
00:03:56,130 --> 00:03:57,360
machine learning that often in

119
00:03:57,510 --> 00:03:58,920
machine learning it's not

120
00:03:59,180 --> 00:04:00,460
who has the best algorithm that

121
00:04:00,600 --> 00:04:01,720
wins, it's who has the

122
00:04:02,810 --> 00:04:04,260
most data So when is this

123
00:04:04,460 --> 00:04:06,240
true and when is this not true?

124
00:04:06,560 --> 00:04:07,710
Because we have a learning

125
00:04:07,850 --> 00:04:09,000
algorithm for which this is

126
00:04:09,150 --> 00:04:10,590
true then getting a

127
00:04:10,820 --> 00:04:11,970
lot of data is often

128
00:04:12,620 --> 00:04:13,830
maybe the best way to ensure

129
00:04:14,180 --> 00:04:15,700
that we have an algorithm with

130
00:04:15,900 --> 00:04:17,360
very high performance rather than

131
00:04:17,520 --> 00:04:20,080
you know, debating worrying about exactly which of these items to use.

132
00:04:21,710 --> 00:04:23,200
Let's try to lay out a

133
00:04:23,330 --> 00:04:25,130
set of assumptions under which having

134
00:04:25,660 --> 00:04:28,230
a massive training set we think will be able to help.

135
00:04:29,780 --> 00:04:31,310
Let's assume that in our

136
00:04:31,410 --> 00:04:33,210
machine learning problem, the features

137
00:04:34,080 --> 00:04:36,560
x have sufficient information with which

138
00:04:36,830 --> 00:04:38,600
we can use to predict y accurately.

139
00:04:40,380 --> 00:04:41,490
For example, if we take

140
00:04:41,790 --> 00:04:44,860
the confusable words all of them that we had on the previous slide.

141
00:04:45,740 --> 00:04:47,040
Let's say that it features x

142
00:04:47,520 --> 00:04:48,360
capture what are the surrounding

143
00:04:49,090 --> 00:04:51,620
words around the blank that we're trying to fill in.

144
00:04:51,840 --> 00:04:53,630
So the features capture then we

145
00:04:54,220 --> 00:04:56,440
want to have, sometimes for breakfast I have black eggs.

146
00:04:57,350 --> 00:04:58,220
Then yeah that is pretty

147
00:04:58,480 --> 00:04:59,970
much information to tell me

148
00:05:00,170 --> 00:05:01,050
that the word I want

149
00:05:01,420 --> 00:05:03,640
in the middle is TWO and that

150
00:05:03,850 --> 00:05:06,640
is not word TO and its not the word TOO. So

151
00:05:09,650 --> 00:05:11,270
the features capture, you know, one

152
00:05:11,620 --> 00:05:13,390
of these surrounding words then that

153
00:05:13,560 --> 00:05:15,360
gives me enough information to pretty

154
00:05:15,790 --> 00:05:17,640
unambiguously decide what is

155
00:05:17,780 --> 00:05:18,830
the label y or in

156
00:05:19,300 --> 00:05:20,190
other words what is the word

157
00:05:20,750 --> 00:05:21,760
that I should be using to fill

158
00:05:22,100 --> 00:05:23,520
in that blank out of

159
00:05:23,930 --> 00:05:25,610
this set of three confusable words.

160
00:05:27,110 --> 00:05:28,320
So that's an example what

161
00:05:28,460 --> 00:05:29,840
the future ex has sufficient information

162
00:05:30,410 --> 00:05:32,270
for specific y. For

163
00:05:32,470 --> 00:05:33,240
a counter example.

164
00:05:34,690 --> 00:05:36,010
Consider a problem of predicting

165
00:05:36,470 --> 00:05:38,090
the price of a house from

166
00:05:38,340 --> 00:05:39,330
only the size of the

167
00:05:39,390 --> 00:05:40,350
house and from no other

168
00:05:42,060 --> 00:05:42,060
features. So

169
00:05:42,820 --> 00:05:43,890
if you imagine I tell you

170
00:05:44,150 --> 00:05:45,270
that a house is, you

171
00:05:45,370 --> 00:05:48,100
know, 500 square feet but I don't give you any other features.

172
00:05:48,530 --> 00:05:49,520
I don't tell you that the

173
00:05:49,590 --> 00:05:51,990
house is in an expensive part of the city.

174
00:05:52,590 --> 00:05:53,710
Or if I don't tell you that

175
00:05:53,840 --> 00:05:55,290
the house, the number of

176
00:05:55,500 --> 00:05:57,030
rooms in the house, or how

177
00:05:57,180 --> 00:05:58,400
nicely furnished the house

178
00:05:58,790 --> 00:06:00,540
is, or whether the house is new or old.

179
00:06:01,090 --> 00:06:02,290
If I don't tell you anything other

180
00:06:02,540 --> 00:06:03,360
than that this is a

181
00:06:03,520 --> 00:06:05,440
500 square foot house, well there's

182
00:06:05,720 --> 00:06:07,160
so many other factors that would

183
00:06:07,340 --> 00:06:08,280
affect the price of a

184
00:06:08,470 --> 00:06:09,940
house other than just the size

185
00:06:10,320 --> 00:06:11,330
of a house that if all

186
00:06:11,440 --> 00:06:12,910
you know is the size, it's actually

187
00:06:13,050 --> 00:06:14,610
very difficult to predict the price accurately.

188
00:06:16,220 --> 00:06:16,860
So that would be a counter

189
00:06:17,280 --> 00:06:18,230
example to this assumption

190
00:06:18,880 --> 00:06:20,300
that the features have sufficient information

191
00:06:20,800 --> 00:06:23,260
to predict the price to the desired level of accuracy.

192
00:06:24,090 --> 00:06:25,180
The way I think about testing

193
00:06:25,540 --> 00:06:26,730
this assumption, one way I

194
00:06:26,940 --> 00:06:29,160
often think about it is, how often I ask myself.

195
00:06:30,260 --> 00:06:31,660
Given the input features x,

196
00:06:32,180 --> 00:06:33,320
given the features, given the

197
00:06:33,380 --> 00:06:35,440
same information available as well as learning algorithm.

198
00:06:36,510 --> 00:06:38,690
If we were to go to human expert in this domain.

199
00:06:39,680 --> 00:06:41,570
Can a human experts actually or

200
00:06:41,720 --> 00:06:43,160
can human expert confidently predict

201
00:06:43,490 --> 00:06:45,390
the value of y. For this

202
00:06:45,630 --> 00:06:46,730
first example if we go

203
00:06:46,980 --> 00:06:49,420
to, you know an expert human English speaker.

204
00:06:49,810 --> 00:06:51,260
You go to someone that

205
00:06:51,390 --> 00:06:53,740
speaks English well, right, then

206
00:06:53,940 --> 00:06:55,230
a human expert in English

207
00:06:55,940 --> 00:06:57,260
just read most people like

208
00:06:57,450 --> 00:06:59,730
you and me will probably we

209
00:07:00,160 --> 00:07:01,080
would probably be able to

210
00:07:01,170 --> 00:07:02,370
predict what word should go in

211
00:07:02,620 --> 00:07:03,960
here, to a good English

212
00:07:04,290 --> 00:07:05,550
speaker can predict this well,

213
00:07:05,850 --> 00:07:06,710
and so this gives me confidence

214
00:07:07,470 --> 00:07:08,640
that x allows us to predict

215
00:07:08,810 --> 00:07:10,550
y accurately, but in contrast

216
00:07:11,240 --> 00:07:13,550
if we go to an expert in human prices.

217
00:07:14,040 --> 00:07:16,390
Like maybe an expert realtor, right, someone

218
00:07:16,950 --> 00:07:18,090
who sells houses for a living.

219
00:07:18,610 --> 00:07:19,450
If I just tell them the

220
00:07:19,550 --> 00:07:20,440
size of a house and I

221
00:07:20,530 --> 00:07:21,860
tell them what the price

222
00:07:22,240 --> 00:07:23,410
is well even an expert

223
00:07:23,600 --> 00:07:25,210
in pricing or selling

224
00:07:25,600 --> 00:07:26,520
houses wouldn't be able

225
00:07:26,550 --> 00:07:28,280
to tell me and so this is fine that

226
00:07:29,000 --> 00:07:31,060
for the housing price example knowing

227
00:07:31,600 --> 00:07:33,300
only the size doesn't give

228
00:07:33,460 --> 00:07:34,960
me enough information to predict

229
00:07:35,920 --> 00:07:36,870
the price of the house.

230
00:07:37,690 --> 00:07:39,890
So, let's say, this assumption holds.

231
00:07:41,200 --> 00:07:42,650
Let's see then, when having

232
00:07:43,040 --> 00:07:44,230
a lot of data could help.

233
00:07:45,020 --> 00:07:46,370
Suppose the features have enough

234
00:07:46,650 --> 00:07:47,870
information to predict the

235
00:07:48,050 --> 00:07:49,380
value of y.

236
00:07:49,540 --> 00:07:50,750
And let's suppose we use a

237
00:07:50,960 --> 00:07:52,380
learning algorithm with a

238
00:07:52,600 --> 00:07:54,430
large number of parameters so

239
00:07:54,580 --> 00:07:56,020
maybe logistic regression or linear

240
00:07:56,280 --> 00:07:58,090
regression with a large number of features.

241
00:07:58,550 --> 00:07:59,490
Or one thing that I sometimes

242
00:07:59,950 --> 00:08:00,740
do, one thing that I often

243
00:08:00,960 --> 00:08:03,300
do actually is using neural network with many hidden units.

244
00:08:03,860 --> 00:08:05,230
That would be another learning

245
00:08:05,500 --> 00:08:07,420
algorithm with a lot of parameters.

246
00:08:08,470 --> 00:08:10,280
So these are all powerful learning

247
00:08:10,350 --> 00:08:12,350
algorithms with a lot of parameters that

248
00:08:13,040 --> 00:08:14,810
can fit very complex functions.

249
00:08:16,750 --> 00:08:17,550
So, I'm going to call these, I'm

250
00:08:18,630 --> 00:08:19,720
going to think of these as

251
00:08:20,510 --> 00:08:21,970
low-bias algorithms because you

252
00:08:22,140 --> 00:08:23,540
know we can fit very complex functions

253
00:08:25,480 --> 00:08:26,740
and because we have

254
00:08:27,260 --> 00:08:28,920
a very powerful learning algorithm,

255
00:08:29,380 --> 00:08:30,590
they can fit very complex functions.

256
00:08:31,680 --> 00:08:33,470
Chances are, if we

257
00:08:34,070 --> 00:08:35,790
run these algorithms on

258
00:08:35,940 --> 00:08:37,250
the data sets, it will

259
00:08:37,430 --> 00:08:38,770
be able to fit the training

260
00:08:39,200 --> 00:08:40,680
set well, and so

261
00:08:40,940 --> 00:08:43,230
hopefully the training error will be slow.

262
00:08:44,520 --> 00:08:45,520
Now let's say, we use

263
00:08:46,020 --> 00:08:47,790
a massive, massive training set,

264
00:08:48,190 --> 00:08:49,370
in that case, if we

265
00:08:49,430 --> 00:08:51,460
have a huge training set, then

266
00:08:51,630 --> 00:08:53,490
hopefully even though we have a lot of parameters

267
00:08:53,760 --> 00:08:56,080
but if the training set is sort of even much

268
00:08:56,360 --> 00:08:57,450
larger than the number of

269
00:08:57,840 --> 00:08:59,450
parameters then hopefully these

270
00:08:59,640 --> 00:09:01,490
albums will be unlikely to overfit.

271
00:09:02,590 --> 00:09:03,660
Right because we have such a

272
00:09:03,710 --> 00:09:05,680
massive training set and by

273
00:09:06,070 --> 00:09:07,870
unlikely to overfit what that

274
00:09:08,070 --> 00:09:09,090
means is that the training

275
00:09:09,390 --> 00:09:10,860
error will hopefully be

276
00:09:11,050 --> 00:09:13,270
close to the test error.

277
00:09:13,960 --> 00:09:15,160
Finally putting these two

278
00:09:15,350 --> 00:09:16,770
together that the train

279
00:09:16,990 --> 00:09:18,590
set error is small and

280
00:09:18,700 --> 00:09:19,870
the test set error is close

281
00:09:20,360 --> 00:09:22,290
to the training error what

282
00:09:22,460 --> 00:09:24,510
this two together imply is

283
00:09:24,710 --> 00:09:26,630
that hopefully the test set error

284
00:09:27,780 --> 00:09:28,450
will also be small.

285
00:09:30,000 --> 00:09:32,610
Another way to

286
00:09:32,720 --> 00:09:33,930
think about this is that

287
00:09:34,700 --> 00:09:35,740
in order to have a high

288
00:09:35,880 --> 00:09:37,630
performance learning algorithm we want

289
00:09:37,930 --> 00:09:40,470
it not to have high bias and not to have high variance.

290
00:09:42,060 --> 00:09:43,270
So the bias problem we're going

291
00:09:43,350 --> 00:09:44,700
to address by making sure we

292
00:09:44,880 --> 00:09:45,910
have a learning algorithm with many

293
00:09:46,170 --> 00:09:47,670
parameters and so that

294
00:09:47,840 --> 00:09:48,930
gives us a low bias alorithm

295
00:09:50,110 --> 00:09:51,460
and by using a

296
00:09:51,610 --> 00:09:53,240
very large training set, this ensures

297
00:09:53,760 --> 00:09:55,590
that we don't have a variance problem here.

298
00:09:55,840 --> 00:09:57,280
So hopefully our algorithm will

299
00:09:57,430 --> 00:09:59,100
have no variance and so

300
00:09:59,340 --> 00:10:00,940
is by pulling these two together,

301
00:10:01,870 --> 00:10:02,830
that we end up with a low

302
00:10:02,900 --> 00:10:03,990
bias and a low variance

303
00:10:04,990 --> 00:10:06,920
learning algorithm and this

304
00:10:07,140 --> 00:10:08,300
allows us to do well

305
00:10:08,710 --> 00:10:10,150
on the test set.

306
00:10:10,430 --> 00:10:12,140
And fundamentally it's a key ingredients

307
00:10:13,020 --> 00:10:14,560
of assuming that the features

308
00:10:14,940 --> 00:10:16,750
have enough information and we

309
00:10:16,900 --> 00:10:17,960
have a rich class of functions

310
00:10:18,400 --> 00:10:19,580
that's why it guarantees low bias,

311
00:10:20,760 --> 00:10:21,750
and then it having a massive

312
00:10:22,110 --> 00:10:25,010
training set that that's what guarantees more variance.

313
00:10:27,150 --> 00:10:28,310
So this gives us a

314
00:10:28,410 --> 00:10:29,820
set of conditions rather hopefully

315
00:10:30,090 --> 00:10:31,610
some understanding of what's the

316
00:10:31,870 --> 00:10:33,730
sort of problem where if

317
00:10:33,860 --> 00:10:34,790
you have a lot of data

318
00:10:34,960 --> 00:10:36,150
and you train a learning

319
00:10:36,380 --> 00:10:38,930
algorithm with lot of parameters, that might

320
00:10:39,120 --> 00:10:39,870
be a good way to give

321
00:10:40,060 --> 00:10:42,490
a high performance learning algorithm

322
00:10:43,480 --> 00:10:44,140
and really, I think the key test that

323
00:10:44,230 --> 00:10:45,520
I often ask myself are

324
00:10:45,820 --> 00:10:47,100
first, can a human experts

325
00:10:47,200 --> 00:10:48,360
look at the features x and

326
00:10:48,880 --> 00:10:49,890
confidently predict the value of

327
00:10:50,030 --> 00:10:51,080
y. Because that's sort of

328
00:10:51,210 --> 00:10:53,050
a certification that y

329
00:10:53,320 --> 00:10:55,040
can be predicted accurately from

330
00:10:55,140 --> 00:10:57,010
the features x and second,

331
00:10:57,510 --> 00:10:58,630
can we actually get a large

332
00:10:58,820 --> 00:11:00,150
training set, and train the

333
00:11:00,350 --> 00:11:01,470
learning algorithm with a lot of

334
00:11:01,540 --> 00:11:03,290
parameters in the training

335
00:11:03,520 --> 00:11:04,420
set and if you can't do both

336
00:11:04,870 --> 00:11:06,300
then that's more often give

337
00:11:06,460 --> 00:11:08,570
you a very kind performance learning algorithm.