1 00:00:00,390 --> 00:00:03,570 In the previous video, we talked about evaluation metrics. 2 00:00:04,730 --> 00:00:05,840 In this video, I'd like 3 00:00:06,080 --> 00:00:07,230 to switch tracks a bit and 4 00:00:07,480 --> 00:00:08,900 touch on another important aspect of 5 00:00:09,570 --> 00:00:10,990 machine learning system design, 6 00:00:11,800 --> 00:00:13,290 which will often come up, which 7 00:00:13,470 --> 00:00:14,990 is the issue of how much 8 00:00:15,270 --> 00:00:17,110 data to train on. 9 00:00:17,310 --> 00:00:18,440 Now, in some earlier videos, I 10 00:00:18,620 --> 00:00:20,320 had cautioned against blindly 11 00:00:20,690 --> 00:00:21,660 going out and just spending 12 00:00:21,980 --> 00:00:23,300 lots of time collecting lots of 13 00:00:23,420 --> 00:00:24,730 data, because it's only 14 00:00:25,040 --> 00:00:26,360 sometimes that that would actually help. 15 00:00:27,510 --> 00:00:28,580 But it turns out that under 16 00:00:28,820 --> 00:00:30,270 certain conditions, and I 17 00:00:30,550 --> 00:00:31,580 will say in this video what those 18 00:00:31,770 --> 00:00:33,590 conditions are, getting a 19 00:00:33,820 --> 00:00:35,440 lot of data and training on 20 00:00:35,730 --> 00:00:36,730 a certain type of learning 21 00:00:37,010 --> 00:00:38,160 algorithm, can be a 22 00:00:38,240 --> 00:00:39,470 very effective way to get 23 00:00:39,770 --> 00:00:41,320 a learning algorithm to do very good performance. 24 00:00:42,810 --> 00:00:44,280 And this arises often enough 25 00:00:44,710 --> 00:00:45,930 that if those conditions hold true 26 00:00:46,310 --> 00:00:47,850 for your problem and if 27 00:00:47,970 --> 00:00:48,770 you're able to get a lot 28 00:00:48,980 --> 00:00:50,070 of data, this could be 29 00:00:50,210 --> 00:00:51,330 a very good way to get 30 00:00:51,560 --> 00:00:52,970 a very high performance learning algorithm. 31 00:00:53,990 --> 00:00:55,620 So in this video, let's 32 00:00:56,320 --> 00:00:58,960 talk more about that. 33 00:00:59,110 --> 00:00:59,820 Let me start with a story. 34 00:01:01,080 --> 00:01:02,620 Many, many years ago, two researchers 35 00:01:03,400 --> 00:01:05,380 that I know, Michelle Banko and 36 00:01:05,520 --> 00:01:08,110 Eric Broule ran the following fascinating study. 37 00:01:09,820 --> 00:01:11,290 They were interested in studying the 38 00:01:11,550 --> 00:01:12,910 effect of using different learning 39 00:01:13,290 --> 00:01:15,210 algorithms versus trying them 40 00:01:15,380 --> 00:01:17,420 out on different training set sciences, 41 00:01:18,020 --> 00:01:19,550 they were considering the problem 42 00:01:20,170 --> 00:01:22,120 of classifying between confusable words, 43 00:01:22,550 --> 00:01:23,610 so for example, in the sentence: 44 00:01:24,410 --> 00:01:26,990 for breakfast I ate, should it be to, two or too? 45 00:01:27,940 --> 00:01:28,890 Well, for this example, 46 00:01:29,450 --> 00:01:32,390 for breakfast I ate two, 2 eggs. 47 00:01:33,510 --> 00:01:34,530 So, this is one example 48 00:01:35,320 --> 00:01:37,800 of a set of confusable words and that's a different set. 49 00:01:38,240 --> 00:01:39,650 So they took machine learning 50 00:01:39,950 --> 00:01:41,580 problems like these, sort of supervised learning 51 00:01:41,780 --> 00:01:43,190 problems to try to categorize 52 00:01:43,970 --> 00:01:45,210 what is the appropriate word to 53 00:01:45,400 --> 00:01:46,560 go into a certain position 54 00:01:47,540 --> 00:01:48,140 in an English sentence. 55 00:01:51,010 --> 00:01:52,110 They took a few different learning 56 00:01:52,340 --> 00:01:53,520 algorithms which were, you know, 57 00:01:53,730 --> 00:01:55,210 sort of considered state of 58 00:01:55,310 --> 00:01:56,110 the art back in the day, 59 00:01:56,410 --> 00:01:57,670 when they ran the study in 60 00:01:57,730 --> 00:01:59,220 2001, so they took a 61 00:01:59,750 --> 00:02:01,140 variance, roughly a variance 62 00:02:01,630 --> 00:02:03,180 on logistic regression called the Perceptron. 63 00:02:03,760 --> 00:02:05,160 They also took some of 64 00:02:05,250 --> 00:02:06,700 their algorithms that were fairly 65 00:02:07,090 --> 00:02:08,620 out back then but somewhat less 66 00:02:08,830 --> 00:02:10,600 used now so when the 67 00:02:10,750 --> 00:02:11,980 algorithm also very similar 68 00:02:12,380 --> 00:02:13,410 to which is a regression 69 00:02:13,660 --> 00:02:15,580 but different in some ways, much 70 00:02:16,140 --> 00:02:18,220 used somewhat less, used 71 00:02:18,480 --> 00:02:19,380 not too much right now 72 00:02:20,180 --> 00:02:21,180 took what's called a memory based 73 00:02:21,430 --> 00:02:23,240 learning algorithm again used somewhat less now. 74 00:02:23,610 --> 00:02:25,940 But I'll talk a little bit about that later. 75 00:02:26,230 --> 00:02:27,230 And they used a naive based 76 00:02:27,690 --> 00:02:29,240 algorithm, which is something they'll 77 00:02:29,410 --> 00:02:32,380 actually talk about in this course. 78 00:02:32,690 --> 00:02:34,400 The exact algorithms of these details aren't important. 79 00:02:35,040 --> 00:02:36,080 Think of this as, you know, just picking 80 00:02:36,430 --> 00:02:40,380 four different classification algorithms and really the exact algorithms aren't important. 81 00:02:41,980 --> 00:02:42,990 But what they did was they 82 00:02:43,140 --> 00:02:44,160 varied the training set size 83 00:02:44,500 --> 00:02:45,390 and tried out these learning 84 00:02:45,450 --> 00:02:47,070 algorithms on the range of 85 00:02:47,210 --> 00:02:49,640 training set sizes and that's the result they got. 86 00:02:50,300 --> 00:02:51,310 And the trends are very 87 00:02:51,470 --> 00:02:53,170 clear right first most of 88 00:02:53,290 --> 00:02:55,470 these outer rooms give remarkably similar performance. 89 00:02:56,200 --> 00:02:57,760 And second, as the training 90 00:02:58,150 --> 00:02:59,760 set size increases, on the 91 00:02:59,860 --> 00:03:00,970 horizontal axis is the 92 00:03:01,280 --> 00:03:02,510 training set size in millions 93 00:03:04,070 --> 00:03:05,360 go from you know a 94 00:03:05,420 --> 00:03:07,440 hundred thousand up to a 95 00:03:07,720 --> 00:03:09,060 thousand million that is a 96 00:03:09,330 --> 00:03:10,980 billion training examples. The 97 00:03:11,090 --> 00:03:11,860 performance of the algorithms 98 00:03:12,870 --> 00:03:15,360 all pretty much monotonically increase 99 00:03:15,740 --> 00:03:16,610 and the fact that if 100 00:03:16,650 --> 00:03:18,600 you pick any algorithm may be 101 00:03:19,000 --> 00:03:21,320 pick a "inferior algorithm" but 102 00:03:21,490 --> 00:03:22,650 if you give that "inferior 103 00:03:23,190 --> 00:03:26,150 algorithm" more data, then from 104 00:03:26,390 --> 00:03:27,570 these examples, it looks like 105 00:03:27,670 --> 00:03:30,330 it will most likely beat even a "superior algorithm". 106 00:03:32,200 --> 00:03:33,270 So since this original study 107 00:03:33,720 --> 00:03:35,850 which is very influential, there's been 108 00:03:36,360 --> 00:03:37,500 a range of many different 109 00:03:37,830 --> 00:03:39,020 studies showing similar results 110 00:03:39,550 --> 00:03:40,840 that show that many different learning 111 00:03:41,150 --> 00:03:42,270 algorithms you know tend 112 00:03:42,630 --> 00:03:44,290 to, can sometimes, depending on 113 00:03:44,460 --> 00:03:46,060 details, can give pretty similar ranges 114 00:03:46,490 --> 00:03:48,320 of performance, but what can 115 00:03:48,520 --> 00:03:51,570 really drive performance is you can give the algorithm a ton of training data. 116 00:03:53,190 --> 00:03:54,640 And this is, results like these 117 00:03:55,010 --> 00:03:56,030 has led to a saying in 118 00:03:56,130 --> 00:03:57,360 machine learning that often in 119 00:03:57,510 --> 00:03:58,920 machine learning it's not 120 00:03:59,180 --> 00:04:00,460 who has the best algorithm that 121 00:04:00,600 --> 00:04:01,720 wins, it's who has the 122 00:04:02,810 --> 00:04:04,260 most data So when is this 123 00:04:04,460 --> 00:04:06,240 true and when is this not true? 124 00:04:06,560 --> 00:04:07,710 Because we have a learning 125 00:04:07,850 --> 00:04:09,000 algorithm for which this is 126 00:04:09,150 --> 00:04:10,590 true then getting a 127 00:04:10,820 --> 00:04:11,970 lot of data is often 128 00:04:12,620 --> 00:04:13,830 maybe the best way to ensure 129 00:04:14,180 --> 00:04:15,700 that we have an algorithm with 130 00:04:15,900 --> 00:04:17,360 very high performance rather than 131 00:04:17,520 --> 00:04:20,080 you know, debating worrying about exactly which of these items to use. 132 00:04:21,710 --> 00:04:23,200 Let's try to lay out a 133 00:04:23,330 --> 00:04:25,130 set of assumptions under which having 134 00:04:25,660 --> 00:04:28,230 a massive training set we think will be able to help. 135 00:04:29,780 --> 00:04:31,310 Let's assume that in our 136 00:04:31,410 --> 00:04:33,210 machine learning problem, the features 137 00:04:34,080 --> 00:04:36,560 x have sufficient information with which 138 00:04:36,830 --> 00:04:38,600 we can use to predict y accurately. 139 00:04:40,380 --> 00:04:41,490 For example, if we take 140 00:04:41,790 --> 00:04:44,860 the confusable words all of them that we had on the previous slide. 141 00:04:45,740 --> 00:04:47,040 Let's say that it features x 142 00:04:47,520 --> 00:04:48,360 capture what are the surrounding 143 00:04:49,090 --> 00:04:51,620 words around the blank that we're trying to fill in. 144 00:04:51,840 --> 00:04:53,630 So the features capture then we 145 00:04:54,220 --> 00:04:56,440 want to have, sometimes for breakfast I have black eggs. 146 00:04:57,350 --> 00:04:58,220 Then yeah that is pretty 147 00:04:58,480 --> 00:04:59,970 much information to tell me 148 00:05:00,170 --> 00:05:01,050 that the word I want 149 00:05:01,420 --> 00:05:03,640 in the middle is TWO and that 150 00:05:03,850 --> 00:05:06,640 is not word TO and its not the word TOO. So 151 00:05:09,650 --> 00:05:11,270 the features capture, you know, one 152 00:05:11,620 --> 00:05:13,390 of these surrounding words then that 153 00:05:13,560 --> 00:05:15,360 gives me enough information to pretty 154 00:05:15,790 --> 00:05:17,640 unambiguously decide what is 155 00:05:17,780 --> 00:05:18,830 the label y or in 156 00:05:19,300 --> 00:05:20,190 other words what is the word 157 00:05:20,750 --> 00:05:21,760 that I should be using to fill 158 00:05:22,100 --> 00:05:23,520 in that blank out of 159 00:05:23,930 --> 00:05:25,610 this set of three confusable words. 160 00:05:27,110 --> 00:05:28,320 So that's an example what 161 00:05:28,460 --> 00:05:29,840 the future ex has sufficient information 162 00:05:30,410 --> 00:05:32,270 for specific y. For 163 00:05:32,470 --> 00:05:33,240 a counter example. 164 00:05:34,690 --> 00:05:36,010 Consider a problem of predicting 165 00:05:36,470 --> 00:05:38,090 the price of a house from 166 00:05:38,340 --> 00:05:39,330 only the size of the 167 00:05:39,390 --> 00:05:40,350 house and from no other 168 00:05:42,060 --> 00:05:42,060 features. So 169 00:05:42,820 --> 00:05:43,890 if you imagine I tell you 170 00:05:44,150 --> 00:05:45,270 that a house is, you 171 00:05:45,370 --> 00:05:48,100 know, 500 square feet but I don't give you any other features. 172 00:05:48,530 --> 00:05:49,520 I don't tell you that the 173 00:05:49,590 --> 00:05:51,990 house is in an expensive part of the city. 174 00:05:52,590 --> 00:05:53,710 Or if I don't tell you that 175 00:05:53,840 --> 00:05:55,290 the house, the number of 176 00:05:55,500 --> 00:05:57,030 rooms in the house, or how 177 00:05:57,180 --> 00:05:58,400 nicely furnished the house 178 00:05:58,790 --> 00:06:00,540 is, or whether the house is new or old. 179 00:06:01,090 --> 00:06:02,290 If I don't tell you anything other 180 00:06:02,540 --> 00:06:03,360 than that this is a 181 00:06:03,520 --> 00:06:05,440 500 square foot house, well there's 182 00:06:05,720 --> 00:06:07,160 so many other factors that would 183 00:06:07,340 --> 00:06:08,280 affect the price of a 184 00:06:08,470 --> 00:06:09,940 house other than just the size 185 00:06:10,320 --> 00:06:11,330 of a house that if all 186 00:06:11,440 --> 00:06:12,910 you know is the size, it's actually 187 00:06:13,050 --> 00:06:14,610 very difficult to predict the price accurately. 188 00:06:16,220 --> 00:06:16,860 So that would be a counter 189 00:06:17,280 --> 00:06:18,230 example to this assumption 190 00:06:18,880 --> 00:06:20,300 that the features have sufficient information 191 00:06:20,800 --> 00:06:23,260 to predict the price to the desired level of accuracy. 192 00:06:24,090 --> 00:06:25,180 The way I think about testing 193 00:06:25,540 --> 00:06:26,730 this assumption, one way I 194 00:06:26,940 --> 00:06:29,160 often think about it is, how often I ask myself. 195 00:06:30,260 --> 00:06:31,660 Given the input features x, 196 00:06:32,180 --> 00:06:33,320 given the features, given the 197 00:06:33,380 --> 00:06:35,440 same information available as well as learning algorithm. 198 00:06:36,510 --> 00:06:38,690 If we were to go to human expert in this domain. 199 00:06:39,680 --> 00:06:41,570 Can a human experts actually or 200 00:06:41,720 --> 00:06:43,160 can human expert confidently predict 201 00:06:43,490 --> 00:06:45,390 the value of y. For this 202 00:06:45,630 --> 00:06:46,730 first example if we go 203 00:06:46,980 --> 00:06:49,420 to, you know an expert human English speaker. 204 00:06:49,810 --> 00:06:51,260 You go to someone that 205 00:06:51,390 --> 00:06:53,740 speaks English well, right, then 206 00:06:53,940 --> 00:06:55,230 a human expert in English 207 00:06:55,940 --> 00:06:57,260 just read most people like 208 00:06:57,450 --> 00:06:59,730 you and me will probably we 209 00:07:00,160 --> 00:07:01,080 would probably be able to 210 00:07:01,170 --> 00:07:02,370 predict what word should go in 211 00:07:02,620 --> 00:07:03,960 here, to a good English 212 00:07:04,290 --> 00:07:05,550 speaker can predict this well, 213 00:07:05,850 --> 00:07:06,710 and so this gives me confidence 214 00:07:07,470 --> 00:07:08,640 that x allows us to predict 215 00:07:08,810 --> 00:07:10,550 y accurately, but in contrast 216 00:07:11,240 --> 00:07:13,550 if we go to an expert in human prices. 217 00:07:14,040 --> 00:07:16,390 Like maybe an expert realtor, right, someone 218 00:07:16,950 --> 00:07:18,090 who sells houses for a living. 219 00:07:18,610 --> 00:07:19,450 If I just tell them the 220 00:07:19,550 --> 00:07:20,440 size of a house and I 221 00:07:20,530 --> 00:07:21,860 tell them what the price 222 00:07:22,240 --> 00:07:23,410 is well even an expert 223 00:07:23,600 --> 00:07:25,210 in pricing or selling 224 00:07:25,600 --> 00:07:26,520 houses wouldn't be able 225 00:07:26,550 --> 00:07:28,280 to tell me and so this is fine that 226 00:07:29,000 --> 00:07:31,060 for the housing price example knowing 227 00:07:31,600 --> 00:07:33,300 only the size doesn't give 228 00:07:33,460 --> 00:07:34,960 me enough information to predict 229 00:07:35,920 --> 00:07:36,870 the price of the house. 230 00:07:37,690 --> 00:07:39,890 So, let's say, this assumption holds. 231 00:07:41,200 --> 00:07:42,650 Let's see then, when having 232 00:07:43,040 --> 00:07:44,230 a lot of data could help. 233 00:07:45,020 --> 00:07:46,370 Suppose the features have enough 234 00:07:46,650 --> 00:07:47,870 information to predict the 235 00:07:48,050 --> 00:07:49,380 value of y. 236 00:07:49,540 --> 00:07:50,750 And let's suppose we use a 237 00:07:50,960 --> 00:07:52,380 learning algorithm with a 238 00:07:52,600 --> 00:07:54,430 large number of parameters so 239 00:07:54,580 --> 00:07:56,020 maybe logistic regression or linear 240 00:07:56,280 --> 00:07:58,090 regression with a large number of features. 241 00:07:58,550 --> 00:07:59,490 Or one thing that I sometimes 242 00:07:59,950 --> 00:08:00,740 do, one thing that I often 243 00:08:00,960 --> 00:08:03,300 do actually is using neural network with many hidden units. 244 00:08:03,860 --> 00:08:05,230 That would be another learning 245 00:08:05,500 --> 00:08:07,420 algorithm with a lot of parameters. 246 00:08:08,470 --> 00:08:10,280 So these are all powerful learning 247 00:08:10,350 --> 00:08:12,350 algorithms with a lot of parameters that 248 00:08:13,040 --> 00:08:14,810 can fit very complex functions. 249 00:08:16,750 --> 00:08:17,550 So, I'm going to call these, I'm 250 00:08:18,630 --> 00:08:19,720 going to think of these as 251 00:08:20,510 --> 00:08:21,970 low-bias algorithms because you 252 00:08:22,140 --> 00:08:23,540 know we can fit very complex functions 253 00:08:25,480 --> 00:08:26,740 and because we have 254 00:08:27,260 --> 00:08:28,920 a very powerful learning algorithm, 255 00:08:29,380 --> 00:08:30,590 they can fit very complex functions. 256 00:08:31,680 --> 00:08:33,470 Chances are, if we 257 00:08:34,070 --> 00:08:35,790 run these algorithms on 258 00:08:35,940 --> 00:08:37,250 the data sets, it will 259 00:08:37,430 --> 00:08:38,770 be able to fit the training 260 00:08:39,200 --> 00:08:40,680 set well, and so 261 00:08:40,940 --> 00:08:43,230 hopefully the training error will be slow. 262 00:08:44,520 --> 00:08:45,520 Now let's say, we use 263 00:08:46,020 --> 00:08:47,790 a massive, massive training set, 264 00:08:48,190 --> 00:08:49,370 in that case, if we 265 00:08:49,430 --> 00:08:51,460 have a huge training set, then 266 00:08:51,630 --> 00:08:53,490 hopefully even though we have a lot of parameters 267 00:08:53,760 --> 00:08:56,080 but if the training set is sort of even much 268 00:08:56,360 --> 00:08:57,450 larger than the number of 269 00:08:57,840 --> 00:08:59,450 parameters then hopefully these 270 00:08:59,640 --> 00:09:01,490 albums will be unlikely to overfit. 271 00:09:02,590 --> 00:09:03,660 Right because we have such a 272 00:09:03,710 --> 00:09:05,680 massive training set and by 273 00:09:06,070 --> 00:09:07,870 unlikely to overfit what that 274 00:09:08,070 --> 00:09:09,090 means is that the training 275 00:09:09,390 --> 00:09:10,860 error will hopefully be 276 00:09:11,050 --> 00:09:13,270 close to the test error. 277 00:09:13,960 --> 00:09:15,160 Finally putting these two 278 00:09:15,350 --> 00:09:16,770 together that the train 279 00:09:16,990 --> 00:09:18,590 set error is small and 280 00:09:18,700 --> 00:09:19,870 the test set error is close 281 00:09:20,360 --> 00:09:22,290 to the training error what 282 00:09:22,460 --> 00:09:24,510 this two together imply is 283 00:09:24,710 --> 00:09:26,630 that hopefully the test set error 284 00:09:27,780 --> 00:09:28,450 will also be small. 285 00:09:30,000 --> 00:09:32,610 Another way to 286 00:09:32,720 --> 00:09:33,930 think about this is that 287 00:09:34,700 --> 00:09:35,740 in order to have a high 288 00:09:35,880 --> 00:09:37,630 performance learning algorithm we want 289 00:09:37,930 --> 00:09:40,470 it not to have high bias and not to have high variance. 290 00:09:42,060 --> 00:09:43,270 So the bias problem we're going 291 00:09:43,350 --> 00:09:44,700 to address by making sure we 292 00:09:44,880 --> 00:09:45,910 have a learning algorithm with many 293 00:09:46,170 --> 00:09:47,670 parameters and so that 294 00:09:47,840 --> 00:09:48,930 gives us a low bias alorithm 295 00:09:50,110 --> 00:09:51,460 and by using a 296 00:09:51,610 --> 00:09:53,240 very large training set, this ensures 297 00:09:53,760 --> 00:09:55,590 that we don't have a variance problem here. 298 00:09:55,840 --> 00:09:57,280 So hopefully our algorithm will 299 00:09:57,430 --> 00:09:59,100 have no variance and so 300 00:09:59,340 --> 00:10:00,940 is by pulling these two together, 301 00:10:01,870 --> 00:10:02,830 that we end up with a low 302 00:10:02,900 --> 00:10:03,990 bias and a low variance 303 00:10:04,990 --> 00:10:06,920 learning algorithm and this 304 00:10:07,140 --> 00:10:08,300 allows us to do well 305 00:10:08,710 --> 00:10:10,150 on the test set. 306 00:10:10,430 --> 00:10:12,140 And fundamentally it's a key ingredients 307 00:10:13,020 --> 00:10:14,560 of assuming that the features 308 00:10:14,940 --> 00:10:16,750 have enough information and we 309 00:10:16,900 --> 00:10:17,960 have a rich class of functions 310 00:10:18,400 --> 00:10:19,580 that's why it guarantees low bias, 311 00:10:20,760 --> 00:10:21,750 and then it having a massive 312 00:10:22,110 --> 00:10:25,010 training set that that's what guarantees more variance. 313 00:10:27,150 --> 00:10:28,310 So this gives us a 314 00:10:28,410 --> 00:10:29,820 set of conditions rather hopefully 315 00:10:30,090 --> 00:10:31,610 some understanding of what's the 316 00:10:31,870 --> 00:10:33,730 sort of problem where if 317 00:10:33,860 --> 00:10:34,790 you have a lot of data 318 00:10:34,960 --> 00:10:36,150 and you train a learning 319 00:10:36,380 --> 00:10:38,930 algorithm with lot of parameters, that might 320 00:10:39,120 --> 00:10:39,870 be a good way to give 321 00:10:40,060 --> 00:10:42,490 a high performance learning algorithm 322 00:10:43,480 --> 00:10:44,140 and really, I think the key test that 323 00:10:44,230 --> 00:10:45,520 I often ask myself are 324 00:10:45,820 --> 00:10:47,100 first, can a human experts 325 00:10:47,200 --> 00:10:48,360 look at the features x and 326 00:10:48,880 --> 00:10:49,890 confidently predict the value of 327 00:10:50,030 --> 00:10:51,080 y. Because that's sort of 328 00:10:51,210 --> 00:10:53,050 a certification that y 329 00:10:53,320 --> 00:10:55,040 can be predicted accurately from 330 00:10:55,140 --> 00:10:57,010 the features x and second, 331 00:10:57,510 --> 00:10:58,630 can we actually get a large 332 00:10:58,820 --> 00:11:00,150 training set, and train the 333 00:11:00,350 --> 00:11:01,470 learning algorithm with a lot of 334 00:11:01,540 --> 00:11:03,290 parameters in the training 335 00:11:03,520 --> 00:11:04,420 set and if you can't do both 336 00:11:04,870 --> 00:11:06,300 then that's more often give 337 00:11:06,460 --> 00:11:08,570 you a very kind performance learning algorithm.