1 00:00:00,090 --> 00:00:01,140 in earlier videos, I have 2 00:00:01,260 --> 00:00:02,510 said over and over that, when 3 00:00:02,650 --> 00:00:03,980 you are developing machine learning system, 4 00:00:04,770 --> 00:00:06,630 one of the most valuable resources is 5 00:00:06,810 --> 00:00:08,050 your time as the developer 6 00:00:08,490 --> 00:00:09,820 in terms of picking what 7 00:00:09,950 --> 00:00:11,520 to work on next. 8 00:00:11,950 --> 00:00:12,710 Or, you have a team of developers 9 00:00:13,300 --> 00:00:14,610 or a team of engineers working together 10 00:00:15,090 --> 00:00:16,620 on a machine learning system, again 11 00:00:16,930 --> 00:00:18,420 one of the most valuable resources is 12 00:00:18,990 --> 00:00:20,790 the time of the engineers or the developers working on the system. 13 00:00:22,420 --> 00:00:23,340 And what you really want to 14 00:00:23,430 --> 00:00:25,340 avoid is that you or 15 00:00:25,360 --> 00:00:26,410 your colleagues or your friends spend 16 00:00:26,680 --> 00:00:27,560 a lot of time working on 17 00:00:27,970 --> 00:00:29,510 some component, only to realize 18 00:00:30,470 --> 00:00:31,540 after weeks or months of 19 00:00:31,620 --> 00:00:33,070 time spent, that all that 20 00:00:33,310 --> 00:00:35,090 work, you know, just doesn't 21 00:00:35,380 --> 00:00:38,120 make a huge difference on the performance of the final system. 22 00:00:39,350 --> 00:00:40,430 In this video, what I'd 23 00:00:40,550 --> 00:00:42,960 like to to is, to talk about something called ceiling analysis. 24 00:00:44,510 --> 00:00:45,760 When you or your team 25 00:00:46,280 --> 00:00:47,270 are working on a pipeline 26 00:00:47,520 --> 00:00:48,860 machine learning system, this can 27 00:00:49,020 --> 00:00:50,380 sometimes give you a very 28 00:00:50,630 --> 00:00:51,650 strong signal, a very strong 29 00:00:52,340 --> 00:00:53,730 guidance, on what parts 30 00:00:54,150 --> 00:00:56,550 of the pipeline might be the best use of your time to work on. 31 00:00:59,740 --> 00:01:01,700 To talk about ceiling analysis, I'm 32 00:01:01,860 --> 00:01:03,140 going to keep on using the 33 00:01:03,690 --> 00:01:04,910 example of the photo 34 00:01:05,640 --> 00:01:06,870 OCR pipeline and I said 35 00:01:07,170 --> 00:01:08,270 earlier each of these 36 00:01:08,480 --> 00:01:09,900 boxes text detection, character 37 00:01:10,200 --> 00:01:12,140 segmentation, character recognition, each 38 00:01:12,310 --> 00:01:13,730 of these boxes can have even 39 00:01:14,100 --> 00:01:15,550 a small engineering team working 40 00:01:15,920 --> 00:01:17,370 on it, or maybe the 41 00:01:17,690 --> 00:01:18,640 entire system is just built 42 00:01:18,800 --> 00:01:19,700 by you, either way, but 43 00:01:19,960 --> 00:01:22,340 the question is, where should you allocate resources? 44 00:01:22,730 --> 00:01:24,250 Which of these boxes is 45 00:01:24,430 --> 00:01:26,630 most worth your efforts, trying 46 00:01:26,920 --> 00:01:28,260 to improve the performance of. 47 00:01:29,070 --> 00:01:30,350 In order to explain the idea 48 00:01:30,840 --> 00:01:32,560 of ceiling analysis, I'm going 49 00:01:32,730 --> 00:01:35,690 to keep using the example of our photo OCR pipeline. 50 00:01:37,000 --> 00:01:38,320 As I mentioned earlier, each of 51 00:01:38,430 --> 00:01:39,630 these boxes here, each of 52 00:01:39,850 --> 00:01:41,860 these machine learning components could be 53 00:01:42,170 --> 00:01:43,270 the work of even a 54 00:01:43,470 --> 00:01:44,720 small team of engineers, or 55 00:01:45,280 --> 00:01:48,110 maybe the whole system could be built by just one person. 56 00:01:48,780 --> 00:01:49,920 But the question is, where should 57 00:01:50,100 --> 00:01:51,990 you allocate scarce resources? 58 00:01:52,130 --> 00:01:53,200 Now this, which of these 59 00:01:53,690 --> 00:01:54,860 components, or which one or 60 00:01:54,950 --> 00:01:56,250 two or maybe all three of these components 61 00:01:57,080 --> 00:01:58,540 is most worth your time 62 00:01:59,200 --> 00:02:01,060 to try to improve the performance of. 63 00:02:01,660 --> 00:02:02,810 So here's the idea of ceiling analysis. 64 00:02:04,140 --> 00:02:05,520 As in the development process for 65 00:02:05,890 --> 00:02:07,170 other machine learning systems as 66 00:02:07,340 --> 00:02:08,490 well, in order to make 67 00:02:08,670 --> 00:02:09,740 decisions on what to do 68 00:02:09,970 --> 00:02:11,150 for developing the system 69 00:02:11,710 --> 00:02:12,770 is going to be 70 00:02:12,900 --> 00:02:14,070 very helpful to have a 71 00:02:14,580 --> 00:02:17,650 single road number evaluation metric for this learning system. 72 00:02:18,450 --> 00:02:19,390 So let's say we pick characters level accuracy. 73 00:02:19,530 --> 00:02:21,140 So if, you know, given a 74 00:02:21,570 --> 00:02:22,840 test set image, while just 75 00:02:22,860 --> 00:02:24,710 a fraction of alphabets of 76 00:02:25,060 --> 00:02:26,570 characters in the testing image that 77 00:02:28,980 --> 00:02:29,390 we recognize correctly. 78 00:02:29,550 --> 00:02:30,830 Or you can pick some other single world 79 00:02:31,030 --> 00:02:32,270 number evaluation metric, if you 80 00:02:32,370 --> 00:02:33,740 want, but let's say that 81 00:02:34,040 --> 00:02:35,820 whatever evaluation metric we 82 00:02:35,920 --> 00:02:37,680 pick, we get that, we 83 00:02:37,880 --> 00:02:40,090 find that the overall system currently has 72% accuracy. 84 00:02:40,350 --> 00:02:42,210 So, in other 85 00:02:42,350 --> 00:02:43,380 words, we have some set 86 00:02:43,520 --> 00:02:44,960 of test set images and for 87 00:02:45,180 --> 00:02:46,460 each test set images, we 88 00:02:46,640 --> 00:02:47,850 run it through text section, then 89 00:02:47,980 --> 00:02:49,280 character 7 nation, then character 90 00:02:49,560 --> 00:02:50,680 recognition, and we find 91 00:02:51,010 --> 00:02:52,240 that on our test set, the 92 00:02:52,370 --> 00:02:53,570 overall accuracy of the 93 00:02:53,800 --> 00:02:56,220 entire system was 72% on one of the metric you chose. 94 00:02:58,120 --> 00:02:59,700 Now just the idea behind 95 00:03:00,070 --> 00:03:01,610 sealing analysis which is that 96 00:03:01,910 --> 00:03:03,530 we're going to go to let 97 00:03:03,670 --> 00:03:05,100 see the first module of a 98 00:03:05,400 --> 00:03:06,810 machinery pipelines text detection. 99 00:03:07,270 --> 00:03:08,400 And what we are going 100 00:03:08,420 --> 00:03:09,170 to do is we are going to 101 00:03:09,270 --> 00:03:11,310 monkey around with the test set. 102 00:03:11,980 --> 00:03:12,920 We are going to go to the 103 00:03:12,990 --> 00:03:14,270 test set and for every test example 104 00:03:14,830 --> 00:03:16,170 we are just going to provide it 105 00:03:16,380 --> 00:03:18,230 the correct text detection outputs. 106 00:03:19,210 --> 00:03:20,300 In other words, we are going 107 00:03:20,560 --> 00:03:21,760 to the test set and just 108 00:03:21,960 --> 00:03:23,340 manually tell the algorithm 109 00:03:24,350 --> 00:03:26,210 where the text is 110 00:03:26,780 --> 00:03:27,940 in each of the test examples. 111 00:03:28,950 --> 00:03:29,960 So in other words, we 112 00:03:30,030 --> 00:03:31,510 are going to simulate what happens 113 00:03:32,030 --> 00:03:33,640 if we have a text detection 114 00:03:33,890 --> 00:03:35,350 system with a 100% 115 00:03:35,610 --> 00:03:37,180 accuracy, for the purpose 116 00:03:38,300 --> 00:03:40,410 of detecting text in an image. 117 00:03:42,050 --> 00:03:43,070 And really the way you 118 00:03:43,110 --> 00:03:44,210 do that is very simple right, instead 119 00:03:44,620 --> 00:03:45,840 of letting your learning algorithm 120 00:03:46,340 --> 00:03:47,630 detect the text in the images. 121 00:03:48,180 --> 00:03:49,110 You wouldn't say go to the 122 00:03:49,340 --> 00:03:51,230 images and just manually label what 123 00:03:51,540 --> 00:03:53,620 is the location of the text in my test set image. 124 00:03:54,200 --> 00:03:55,040 And you would then let these 125 00:03:55,530 --> 00:03:56,620 correct, so let these ground 126 00:03:56,990 --> 00:03:58,370 true labels of where as 127 00:03:58,560 --> 00:04:00,010 the text be part of 128 00:04:00,090 --> 00:04:01,330 your text set and use these 129 00:04:01,580 --> 00:04:02,990 ground true labels what you 130 00:04:03,110 --> 00:04:04,200 feed in to the next 131 00:04:04,470 --> 00:04:07,550 stage of the pipeline, to the character segmentation pipeline. 132 00:04:07,710 --> 00:04:09,250 So just said it again, by 133 00:04:09,680 --> 00:04:10,790 putting a checkmark over here, 134 00:04:11,500 --> 00:04:12,590 what I mean is Im going 135 00:04:12,750 --> 00:04:13,750 to go to my test set and 136 00:04:13,860 --> 00:04:14,970 just give it the correct answers, 137 00:04:15,480 --> 00:04:16,520 give it the correct labels, for 138 00:04:16,650 --> 00:04:18,250 the text detection part of the pipeline. 139 00:04:19,240 --> 00:04:20,280 So that, as it, I have 140 00:04:20,410 --> 00:04:21,700 a perfect text detection system 141 00:04:22,370 --> 00:04:24,270 on my test One into 142 00:04:24,460 --> 00:04:26,570 do that run this data 143 00:04:27,190 --> 00:04:28,150 to the rest of five points 144 00:04:28,530 --> 00:04:29,860 paper presentation and counter definition. 145 00:04:30,680 --> 00:04:31,930 And then, use the same 146 00:04:32,300 --> 00:04:33,310 evaluation metric as before, 147 00:04:34,000 --> 00:04:35,240 to measure what is the 148 00:04:35,450 --> 00:04:36,900 overall accuracy of the entire system. 149 00:04:37,790 --> 00:04:39,890 And with perfect hopefully the performance goes up. 150 00:04:40,330 --> 00:04:41,870 Let 's say it 151 00:04:41,930 --> 00:04:44,550 goes up 89% and then 152 00:04:44,680 --> 00:04:45,830 were going to keep going, next lets 153 00:04:46,090 --> 00:04:47,120 go to the next selection of 154 00:04:47,330 --> 00:04:50,230 pipeline, two character segmentation and again were going to go to my test. 155 00:04:50,540 --> 00:04:52,300 And now going to 156 00:04:52,390 --> 00:04:54,140 give the correct text detection 157 00:04:54,900 --> 00:04:55,970 output and give the correct 158 00:04:56,490 --> 00:04:58,220 character segmentation outputs and 159 00:04:59,400 --> 00:05:00,780 manually label the correct 160 00:05:01,330 --> 00:05:03,710 segment orientations of text into individual characters. 161 00:05:04,730 --> 00:05:05,560 And see how much that helps. 162 00:05:05,810 --> 00:05:06,670 And let's say it goes up to 163 00:05:06,800 --> 00:05:09,140 90% accuracy for the overall system. 164 00:05:10,070 --> 00:05:11,060 Alright so as always the accuracy is. 165 00:05:11,340 --> 00:05:13,420 Accuracy of the overall systems. 166 00:05:14,120 --> 00:05:15,460 So whatever the final output 167 00:05:15,830 --> 00:05:17,450 of the character recognition system is. 168 00:05:17,560 --> 00:05:18,870 Whatever the final output of 169 00:05:19,040 --> 00:05:19,660 the overall pipeline is, it's going 170 00:05:19,930 --> 00:05:22,400 to measure the accuracy of that. 171 00:05:22,520 --> 00:05:23,720 And then finally like character recognition 172 00:05:24,170 --> 00:05:26,170 system and give that the correct label as well. 173 00:05:26,780 --> 00:05:29,270 And if I do that too then, no surprise that I should get a 100% accuracy. 174 00:05:31,270 --> 00:05:32,530 Now, the nice thing about having 175 00:05:32,850 --> 00:05:34,340 done this analysis analysis is we 176 00:05:34,450 --> 00:05:36,080 can now understand what is 177 00:05:36,700 --> 00:05:40,250 the upside potential for improving each of these components. 178 00:05:41,390 --> 00:05:44,180 So we see that if we get perfect text detection. 179 00:05:44,950 --> 00:05:46,360 Our performance went up from 180 00:05:46,710 --> 00:05:48,080 72 to 89 percent, so 181 00:05:48,420 --> 00:05:50,670 that's' a 17 percent performance gain. 182 00:05:51,640 --> 00:05:52,680 So this means that you've 183 00:05:52,890 --> 00:05:54,030 to take your current system you 184 00:05:54,160 --> 00:05:56,130 spend a lot of time improving text detection. 185 00:05:57,330 --> 00:05:58,750 That means that we could potentially improve 186 00:05:59,200 --> 00:06:00,640 our system's performance by 17 percent. 187 00:06:01,020 --> 00:06:02,850 This seems like it's well worth our while. 188 00:06:03,770 --> 00:06:05,840 Whereas in contrast, when going 189 00:06:06,200 --> 00:06:08,360 from text detection When we 190 00:06:08,640 --> 00:06:12,450 gave it perfect character segmentation, performance went up only by one percent. 191 00:06:13,020 --> 00:06:14,820 So, that's a more sobering message. 192 00:06:15,250 --> 00:06:16,880 It means that no matter how 193 00:06:17,090 --> 00:06:18,510 much time you spend character segmentation, 194 00:06:19,800 --> 00:06:20,990 maybe the upside potential is 195 00:06:21,080 --> 00:06:22,280 going to be pretty small, and maybe 196 00:06:22,460 --> 00:06:23,420 you do not want to 197 00:06:23,580 --> 00:06:24,340 have a large team of engineers 198 00:06:24,860 --> 00:06:26,860 working on character segmentation that 199 00:06:26,990 --> 00:06:28,860 this sort of analysis shows that 200 00:06:29,150 --> 00:06:30,180 even when you give it the 201 00:06:30,260 --> 00:06:32,480 perfect character segmentation, your 202 00:06:32,620 --> 00:06:34,180 performance goes up by only one percent. 203 00:06:34,620 --> 00:06:36,090 So right there, this is really estimates. 204 00:06:36,890 --> 00:06:38,080 What is the ceiling, or what's 205 00:06:38,300 --> 00:06:39,360 an upper bound on how much 206 00:06:39,550 --> 00:06:40,690 you can improve the performance of your 207 00:06:40,740 --> 00:06:42,710 system by working on one of these components? 208 00:06:44,330 --> 00:06:45,600 And finally, going for character, 209 00:06:46,320 --> 00:06:47,700 when we get better 210 00:06:47,900 --> 00:06:50,080 character recognition, the performance went up by ten percent. 211 00:06:50,530 --> 00:06:51,640 So you know, again you 212 00:06:51,750 --> 00:06:52,570 can decide, is a ten 213 00:06:52,860 --> 00:06:55,630 percent improvement, how much is that working out? 214 00:06:55,830 --> 00:06:57,200 It tells you that maybe 215 00:06:57,400 --> 00:06:58,670 with more efforts spent on the 216 00:06:58,730 --> 00:06:59,690 last station of the pipeline, 217 00:07:00,360 --> 00:07:02,840 you can improve the performance 218 00:07:03,760 --> 00:07:04,500 of the systems as well. 219 00:07:05,610 --> 00:07:06,580 Another way of thinking about this 220 00:07:06,870 --> 00:07:08,090 is that, by going through this 221 00:07:08,290 --> 00:07:09,470 sort of analysis you're trying to 222 00:07:09,570 --> 00:07:10,640 figure out, you know, what is 223 00:07:10,740 --> 00:07:12,700 the upside potential, of improving 224 00:07:13,480 --> 00:07:14,980 each of these components or how 225 00:07:15,080 --> 00:07:16,730 much could you possibly gain if 226 00:07:17,260 --> 00:07:18,910 one of these components became absolutely 227 00:07:19,380 --> 00:07:20,780 perfect and just really 228 00:07:21,060 --> 00:07:23,230 places an upper bound on the performance of that system. 229 00:07:24,220 --> 00:07:26,290 So, the idea of ceiling analysis is pretty important. 230 00:07:26,900 --> 00:07:29,840 Let me just illustrate this idea again, but with a different example but a more complex one. 231 00:07:31,860 --> 00:07:32,990 Let's say that you want to 232 00:07:33,260 --> 00:07:34,830 do face recognition from images, 233 00:07:35,280 --> 00:07:35,960 so unless you want to look at 234 00:07:35,990 --> 00:07:37,650 the picture and recognize whether or 235 00:07:37,820 --> 00:07:38,770 not the person in this picture 236 00:07:39,470 --> 00:07:40,640 is a particular friend of yours, 237 00:07:40,670 --> 00:07:43,880 trying to recognize the person shown in this image. 238 00:07:44,180 --> 00:07:46,260 This is a slightly artificial example. 239 00:07:47,130 --> 00:07:51,080 This isn't actually how face 240 00:07:51,320 --> 00:07:52,790 recognition is done in 241 00:07:52,800 --> 00:07:53,660 practice, but I want to step through an example of what a 242 00:07:53,870 --> 00:07:54,800 pipeline might look like to 243 00:07:54,940 --> 00:07:56,220 give you another example of how 244 00:07:56,450 --> 00:07:57,820 a ceiling analysis process might look. 245 00:07:58,710 --> 00:07:59,980 So, we have a 246 00:08:00,160 --> 00:08:03,830 camera image and let's say that we design a pipeline as follows. 247 00:08:04,420 --> 00:08:05,120 Let's say the first thing you want 248 00:08:05,380 --> 00:08:07,480 to do is do pre-processing of 249 00:08:07,560 --> 00:08:08,770 the image, so let's take those 250 00:08:08,910 --> 00:08:10,310 images like I have shown on 251 00:08:10,390 --> 00:08:11,040 the upper right, and let's say we 252 00:08:11,140 --> 00:08:12,510 want to remove the background, so 253 00:08:13,030 --> 00:08:14,790 through pre-processing the background disappears. 254 00:08:16,070 --> 00:08:18,820 Next we want to say detect the face of the person. 255 00:08:19,370 --> 00:08:20,550 That's usually done with a learning algorithm. 256 00:08:20,930 --> 00:08:21,960 So we'll run a sliding 257 00:08:22,180 --> 00:08:24,900 windows crossfire to draw a box around the person's face. 258 00:08:25,680 --> 00:08:26,720 Having detected the face it 259 00:08:26,790 --> 00:08:27,650 turns out that if you 260 00:08:27,770 --> 00:08:29,320 want to recognize people it turns 261 00:08:29,530 --> 00:08:31,630 out that the eyes is a highly useful cue. 262 00:08:32,000 --> 00:08:33,860 We actually, in terms 263 00:08:34,130 --> 00:08:35,420 ofrecognizing your friends, the 264 00:08:35,700 --> 00:08:36,870 appearance of their eyes is actually 265 00:08:37,330 --> 00:08:38,680 one of the most important cues that you use. 266 00:08:39,470 --> 00:08:41,610 So let's run another crossfire to detect the eyes of the person. 267 00:08:42,500 --> 00:08:43,660 So, segment out the eyes, 268 00:08:44,410 --> 00:08:45,650 and then and since this 269 00:08:45,900 --> 00:08:47,290 will give us useful features to 270 00:08:47,380 --> 00:08:48,840 recognize a person, and then 271 00:08:49,100 --> 00:08:50,400 other parts of the face of physical interest. 272 00:08:50,990 --> 00:08:52,330 Maybe segment out the nose, 273 00:08:52,830 --> 00:08:54,750 segment out the mouth, and 274 00:08:54,980 --> 00:08:56,230 then, having found the 275 00:08:56,370 --> 00:08:57,060 eyes, the nose and the mouth, 276 00:08:57,340 --> 00:08:58,420 all of these give us useful 277 00:08:58,740 --> 00:08:59,920 features to maybe feed into 278 00:09:00,580 --> 00:09:01,540 a logistic regression crossfire. 279 00:09:02,500 --> 00:09:03,200 And it's the job of the 280 00:09:03,480 --> 00:09:04,420 crossfire to then give us the 281 00:09:04,710 --> 00:09:05,850 overall label to find the 282 00:09:05,970 --> 00:09:06,930 label for who we think 283 00:09:07,190 --> 00:09:08,450 is the identity of this person. 284 00:09:10,110 --> 00:09:11,730 So this is a kind of complicated pipeline. 285 00:09:12,160 --> 00:09:13,300 It's actually probably more complicated 286 00:09:13,950 --> 00:09:16,810 than you should be using, if you actually want to recognize people. 287 00:09:17,620 --> 00:09:20,330 But there's an illustrative example that's useful to think about for ceiling analysis. 288 00:09:22,150 --> 00:09:24,510 So how do you go through ceiling analysis for this pipeline? 289 00:09:25,000 --> 00:09:26,790 Well, we'll step through these pieces one at a time. 290 00:09:27,470 --> 00:09:28,900 Let's say your overall system has 291 00:09:29,150 --> 00:09:30,560 85 percent accuracy, the first 292 00:09:30,720 --> 00:09:31,670 thing I do is go to 293 00:09:31,750 --> 00:09:32,890 my test set and manually 294 00:09:33,860 --> 00:09:36,200 give it a ground foreground, background, 295 00:09:36,740 --> 00:09:38,090 segmentations, and then manually go to 296 00:09:38,150 --> 00:09:39,670 the test set, and use Photoshop 297 00:09:40,290 --> 00:09:41,750 or something, to just tell it 298 00:09:41,950 --> 00:09:43,130 where's the background, and just 299 00:09:43,360 --> 00:09:45,230 manually remove the background, so 300 00:09:45,470 --> 00:09:48,050 ground true background, and see how much the accuracy changes. 301 00:09:48,990 --> 00:09:50,320 In this example, the accuracy 302 00:09:50,800 --> 00:09:53,700 goes up by 0.1% so 303 00:09:53,860 --> 00:09:54,900 this is a strong sign that 304 00:09:55,100 --> 00:09:56,240 even if you had perfect background 305 00:09:56,630 --> 00:09:59,680 segmentation your performance, even 306 00:09:59,840 --> 00:10:01,650 if perfect background removal, the 307 00:10:01,730 --> 00:10:03,740 performance of your system isn't going to go up that much. 308 00:10:03,880 --> 00:10:05,000 So this is maybe not worth a 309 00:10:05,190 --> 00:10:07,720 huge effort to work on pre-processing, on background removal. 310 00:10:09,270 --> 00:10:10,170 Then, everything goes to the 311 00:10:10,230 --> 00:10:11,290 test set, given the correct 312 00:10:11,780 --> 00:10:13,650 face detection images, then again 313 00:10:14,140 --> 00:10:16,690 step through the eyes, nose, mouth segmentations in some order. 314 00:10:17,100 --> 00:10:17,470 Pick one order. 315 00:10:17,700 --> 00:10:18,890 Let's give the correct location 316 00:10:19,340 --> 00:10:20,520 of the eyes, correct location of 317 00:10:20,750 --> 00:10:22,510 the nose, correct location of 318 00:10:22,520 --> 00:10:23,740 the mouth, and then finally 319 00:10:24,130 --> 00:10:26,200 if I just give it the correct overall label, I get 100% accuracy. 320 00:10:27,900 --> 00:10:29,390 And so, you know, as 321 00:10:29,500 --> 00:10:30,430 I go through the system 322 00:10:31,040 --> 00:10:32,080 and just give more and more 323 00:10:32,210 --> 00:10:33,900 components the correct labels 324 00:10:34,370 --> 00:10:35,350 in the test set, the performance 325 00:10:35,830 --> 00:10:37,550 So if the overall system goes up, 326 00:10:37,730 --> 00:10:38,640 and you can look at how much 327 00:10:38,890 --> 00:10:39,860 the performance went up on 328 00:10:40,240 --> 00:10:41,660 different steps, so, you know, from 329 00:10:42,550 --> 00:10:43,830 giving it the perfect face detection, 330 00:10:44,440 --> 00:10:45,270 and it looks like the overall 331 00:10:45,570 --> 00:10:48,290 performance of this system went up by 5.9 percent. 332 00:10:49,710 --> 00:10:50,670 So that's a pretty big jump, 333 00:10:50,980 --> 00:10:52,100 means that maybe it's worth quite 334 00:10:52,370 --> 00:10:53,660 a bit of effort on better face detection. 335 00:10:54,670 --> 00:10:56,290 Went four percent there, went 336 00:10:56,710 --> 00:10:58,680 one percent there, one percent 337 00:10:59,160 --> 00:11:00,600 there and three percent there. 338 00:11:01,520 --> 00:11:02,840 So it looks like the 339 00:11:02,980 --> 00:11:04,250 components that most worth 340 00:11:04,730 --> 00:11:06,520 our while are, when 341 00:11:06,680 --> 00:11:08,540 I gave it perfect face detection, 342 00:11:09,680 --> 00:11:10,190 system went up. 343 00:11:10,490 --> 00:11:11,990 By 5.9 performance, might give 344 00:11:12,170 --> 00:11:14,170 it perfect eye segmentation, went up 345 00:11:14,380 --> 00:11:15,540 by 4%, and then my final logistical 346 00:11:16,000 --> 00:11:19,220 crossfire, well there's another 3 percent gap there maybe. 347 00:11:19,570 --> 00:11:20,580 And so, this tells us 348 00:11:20,810 --> 00:11:23,400 maybe one of the components that are most worth our while working on. 349 00:11:24,610 --> 00:11:25,690 And by the way, I 350 00:11:25,830 --> 00:11:28,110 want to tell you, it's a true cautionary story. 351 00:11:28,680 --> 00:11:29,620 The reason I put in this 352 00:11:29,850 --> 00:11:32,350 pre-processing background removal is 353 00:11:32,600 --> 00:11:34,050 because I actually know 354 00:11:34,340 --> 00:11:35,530 of a true story where there 355 00:11:35,770 --> 00:11:37,140 was a research team that actually 356 00:11:37,480 --> 00:11:38,990 literally had two people spend 357 00:11:39,580 --> 00:11:40,250 about a year and a half, 358 00:11:40,530 --> 00:11:42,410 spend 18 months, working on 359 00:11:42,770 --> 00:11:44,050 better background removal. 360 00:11:44,480 --> 00:11:45,680 We are rushing here... I am 361 00:11:46,120 --> 00:11:47,490 obscuring the details for obvious 362 00:11:47,970 --> 00:11:48,770 reasons, but there was a 363 00:11:48,820 --> 00:11:50,610 computer vision application where there 364 00:11:50,720 --> 00:11:51,660 was a team of two engineers 365 00:11:51,770 --> 00:11:52,850 who literally spent I think 366 00:11:52,990 --> 00:11:54,210 about a year and a half, working 367 00:11:54,550 --> 00:11:56,050 on better background removal. 368 00:11:56,550 --> 00:11:57,720 Actually they worked out 369 00:11:57,820 --> 00:12:00,270 really complicated algorithms, so I ended up publishing I think, one research paper. 370 00:12:01,080 --> 00:12:02,000 But after all that work they 371 00:12:02,110 --> 00:12:03,020 found that, it just did 372 00:12:03,260 --> 00:12:04,910 not make a huge difference to 373 00:12:05,200 --> 00:12:06,490 the overall performance of the 374 00:12:06,710 --> 00:12:09,120 actual application they were working on. 375 00:12:09,450 --> 00:12:10,770 And if only, you know if 376 00:12:10,770 --> 00:12:13,170 only someone were to do a [xx] analysis 377 00:12:13,700 --> 00:12:15,790 beforehand, maybe we could have realized this. 378 00:12:17,240 --> 00:12:18,360 And one of them said to me 379 00:12:18,480 --> 00:12:19,510 afterward, you know, if only they 380 00:12:19,640 --> 00:12:20,580 had done the sort of analysis 381 00:12:20,850 --> 00:12:21,710 like this, maybe they could 382 00:12:21,990 --> 00:12:23,190 have realized before that 18 months 383 00:12:23,440 --> 00:12:25,180 of work, that they 384 00:12:25,240 --> 00:12:26,300 should have spent their effort focusing 385 00:12:26,680 --> 00:12:28,920 on some different component than literally 386 00:12:29,380 --> 00:12:31,230 spending 18 months working on background removal. 387 00:12:33,910 --> 00:12:36,140 So to summarize, pipelines are 388 00:12:36,390 --> 00:12:38,630 pretty pervasive and complex machine learning applications. 389 00:12:39,890 --> 00:12:40,950 And when you are working on 390 00:12:41,200 --> 00:12:42,780 a big machine learning application, I 391 00:12:42,830 --> 00:12:45,450 mean I think your time as a developer is so valuable. 392 00:12:46,090 --> 00:12:47,360 So just don't waste your 393 00:12:47,460 --> 00:12:50,120 time working on something that ultimately isn't going to matter. 394 00:12:51,350 --> 00:12:52,370 And in this video, we talked 395 00:12:52,490 --> 00:12:53,570 about this idea of ceiling analysis, 396 00:12:54,340 --> 00:12:55,750 which I've often found to 397 00:12:55,850 --> 00:12:57,000 be a very good tool for 398 00:12:57,130 --> 00:12:58,660 identifying the component, and if 399 00:12:58,760 --> 00:12:59,830 you actually put a focused effort 400 00:13:00,050 --> 00:13:01,010 on that component, and make a 401 00:13:01,250 --> 00:13:02,420 big difference, it would actually 402 00:13:03,050 --> 00:13:04,360 have a huge effect on the 403 00:13:04,620 --> 00:13:06,040 overall performance of your final system. 404 00:13:07,070 --> 00:13:08,010 So, over the years, working 405 00:13:08,340 --> 00:13:09,520 with machine learning, I've actually learned 406 00:13:09,710 --> 00:13:10,900 to not trust my own gut 407 00:13:11,100 --> 00:13:13,200 feeling about what component to work on. 408 00:13:13,280 --> 00:13:14,310 So, very often, when you have 409 00:13:14,540 --> 00:13:15,440 worked with machine learning for a 410 00:13:15,570 --> 00:13:17,160 long time, but often, our local 411 00:13:17,360 --> 00:13:18,770 machine learning problem, and I 412 00:13:18,950 --> 00:13:20,130 may have some gut feeling about, 413 00:13:20,450 --> 00:13:22,970 oh, let's, you know, jump on that component, and just spend more time on that. 414 00:13:24,120 --> 00:13:25,050 That over the years that I 415 00:13:25,160 --> 00:13:26,600 have come to even trust my 416 00:13:26,740 --> 00:13:27,800 own gut feelings and knowing not 417 00:13:28,130 --> 00:13:29,310 to trust gut feelings that much 418 00:13:29,980 --> 00:13:31,450 and instead really have a 419 00:13:31,520 --> 00:13:33,060 solid machine learning problem, where it's 420 00:13:33,180 --> 00:13:34,750 possible to structure things. 421 00:13:34,960 --> 00:13:36,340 To do a ceiling analysis often 422 00:13:36,660 --> 00:13:37,720 does a much better and much 423 00:13:37,910 --> 00:13:39,110 more reliable way for deciding 424 00:13:39,670 --> 00:13:40,900 where to put a focused effort 425 00:13:40,940 --> 00:13:42,270 into, to really improve this, 426 00:13:42,690 --> 00:13:44,570 the performance of some component and 427 00:13:44,680 --> 00:13:45,900 we kind of be sure that when 428 00:13:46,180 --> 00:13:46,960 do that it will actually have 429 00:13:47,200 --> 00:13:49,460 a huge effect on the final performance of your process system.