1 00:00:00,000 --> 00:00:02,910 It's not always easy to combine all the things you care 2 00:00:02,910 --> 00:00:06,265 about into a single row number evaluation metric. 3 00:00:06,265 --> 00:00:09,150 In those cases I've found it sometimes useful to set up 4 00:00:09,150 --> 00:00:12,390 satisficing as well as optimizing matrix. 5 00:00:12,390 --> 00:00:13,950 Let me show you what I mean. 6 00:00:13,950 --> 00:00:16,410 Let's say that you've decided you care about 7 00:00:16,410 --> 00:00:20,694 the classification accuracy of your cat's classifier, 8 00:00:20,694 --> 00:00:25,470 this could have been F1 score or some other measure of accuracy, 9 00:00:25,470 --> 00:00:29,610 but let's say that in addition to accuracy you also care about the running time. 10 00:00:29,610 --> 00:00:35,050 So how long it takes to classify an image and classifier A takes 80 milliseconds, 11 00:00:35,050 --> 00:00:36,690 B takes 95 milliseconds, 12 00:00:36,690 --> 00:00:39,325 and C takes 1,500 milliseconds, 13 00:00:39,325 --> 00:00:42,150 that's 1.5 seconds to classify an image. 14 00:00:42,150 --> 00:00:45,000 So one thing you could do is combine accuracy 15 00:00:45,000 --> 00:00:48,075 and running time into an overall evaluation metric. 16 00:00:48,075 --> 00:00:57,898 And so the costs such as maybe the overall cost is accuracy minus 0.5 times running time. 17 00:00:57,898 --> 00:01:01,460 But maybe it seems a bit artificial to combine 18 00:01:01,460 --> 00:01:05,265 accuracy and running time using a formula like this, 19 00:01:05,265 --> 00:01:08,805 like a linear weighted sum of these two things. 20 00:01:08,805 --> 00:01:11,090 So here's something else you could do instead which is 21 00:01:11,090 --> 00:01:13,841 that you might want to choose a classifier 22 00:01:13,841 --> 00:01:26,470 that maximizes accuracy but subject to that the running time, 23 00:01:26,470 --> 00:01:28,584 that is the time it takes to classify an image, 24 00:01:28,584 --> 00:01:36,325 that that has to be less than or equal to 100 milliseconds. 25 00:01:36,325 --> 00:01:40,170 So in this case we would say that accuracy is an 26 00:01:40,170 --> 00:01:44,460 optimizing metric because you want to maximize accuracy. 27 00:01:44,460 --> 00:01:48,195 You want to do as well as possible on accuracy but 28 00:01:48,195 --> 00:01:53,845 that running time is what we call a satisficing metric. 29 00:01:53,845 --> 00:01:55,580 Meaning that it just has to be good enough, 30 00:01:55,580 --> 00:02:00,285 it just needs to be less than 100 milliseconds and beyond that you don't really care, 31 00:02:00,285 --> 00:02:04,280 or at least you don't care that much. 32 00:02:04,280 --> 00:02:07,340 So this will be a pretty reasonable way to trade off or to put 33 00:02:07,340 --> 00:02:11,705 together accuracy as well as running time. 34 00:02:11,705 --> 00:02:16,015 And it may be the case that so long as the running time is less that 100 milliseconds, 35 00:02:16,015 --> 00:02:18,465 your users won't care that much whether it's 36 00:02:18,465 --> 00:02:21,855 100 milliseconds or 50 milliseconds or even faster. 37 00:02:21,855 --> 00:02:26,380 And by defining optimizing as well as satisficing matrix, 38 00:02:26,380 --> 00:02:30,475 this gives you a clear way to pick the, quote, best classifier, 39 00:02:30,475 --> 00:02:34,450 which in this case would be classifier B because of all the ones with 40 00:02:34,450 --> 00:02:39,865 a running time better than 100 milliseconds it has the best accuracy. 41 00:02:39,865 --> 00:02:45,220 So more generally, if you have N matrix that you care 42 00:02:45,220 --> 00:02:50,830 about it's sometimes reasonable to pick one of them to be optimizing. 43 00:02:50,830 --> 00:02:54,005 So you want to do as well as is possible on that one. 44 00:02:54,005 --> 00:02:57,515 And then N minus 1 to be satisficing, 45 00:02:57,515 --> 00:02:59,380 meaning that so long as they reach 46 00:02:59,380 --> 00:03:02,730 some threshold such as running times faster than 100 milliseconds, 47 00:03:02,730 --> 00:03:04,405 but so long as they reach some threshold, 48 00:03:04,405 --> 00:03:06,520 you don't care how much better it is in that threshold, 49 00:03:06,520 --> 00:03:09,455 but they have to reach that threshold. 50 00:03:09,455 --> 00:03:11,350 Here's another example. 51 00:03:11,350 --> 00:03:15,280 Let's say you're building a system to detect wake words, 52 00:03:15,280 --> 00:03:19,030 also called trigger words. 53 00:03:19,030 --> 00:03:22,900 So this refers to the voice control devices like 54 00:03:22,900 --> 00:03:25,780 the Amazon Echo where you wake up by saying 55 00:03:25,780 --> 00:03:29,020 Alexa or some Google devices which you wake up 56 00:03:29,020 --> 00:03:35,095 by saying okay Google or some Apple devices which you wake up by saying Hey Siri 57 00:03:35,095 --> 00:03:42,300 or some Baidu devices we should wake up by saying you ni hao Baidu. 58 00:03:42,300 --> 00:03:46,390 Oh I guess, you want to read the Chinese, that's ni hao Baidu. 59 00:03:46,390 --> 00:03:51,560 Right, so these are the wake words you use to 60 00:03:51,560 --> 00:03:54,350 tell one of these voice control devices 61 00:03:54,350 --> 00:03:56,990 to wake up and listen to something you want to say. 62 00:03:56,990 --> 00:04:02,090 And for these other Chinese characters for ni hao Baidu. 63 00:04:02,090 --> 00:04:07,935 So you might care about the accuracy of your trigger word detection system. 64 00:04:07,935 --> 00:04:10,325 So when someone says one of these trigger words, 65 00:04:10,325 --> 00:04:13,525 how likely are you to actually wake up your device, 66 00:04:13,525 --> 00:04:16,970 and you might also care about the number of false positives. 67 00:04:16,970 --> 00:04:19,891 So when no one actually said this trigger word, 68 00:04:19,891 --> 00:04:23,294 how often does it randomly wake up? 69 00:04:23,294 --> 00:04:27,770 So in this case maybe one reasonable way of 70 00:04:27,770 --> 00:04:33,275 combining these two evaluation matrix might be to maximize accuracy, 71 00:04:33,275 --> 00:04:35,165 so when someone says one of the trigger words, 72 00:04:35,165 --> 00:04:37,565 maximize the chance that your device wakes up. 73 00:04:37,565 --> 00:04:39,215 And subject to that, 74 00:04:39,215 --> 00:04:48,815 you have at most one false positive every 24 hours 75 00:04:48,815 --> 00:04:51,070 of operation, right? 76 00:04:51,070 --> 00:04:53,760 So that your device randomly wakes up only once 77 00:04:53,760 --> 00:04:57,270 per day on average when no one is actually talking to it. 78 00:04:57,270 --> 00:05:00,900 So in this case accuracy is the 79 00:05:00,900 --> 00:05:05,505 optimizing metric and a number of false positives every 24 hours 80 00:05:05,505 --> 00:05:09,870 is the satisficing metric where you'd be satisfied so long as there 81 00:05:09,870 --> 00:05:14,490 is at most one false positive every 24 hours. 82 00:05:14,490 --> 00:05:17,100 To summarize, if there are multiple things you care 83 00:05:17,100 --> 00:05:19,920 about by say there's one as the optimizing metric 84 00:05:19,920 --> 00:05:22,530 that you want to do as well as possible on and one or 85 00:05:22,530 --> 00:05:25,475 more as satisficing metrics were you'll be satisfice. 86 00:05:25,475 --> 00:05:29,430 Almost it does better than some threshold you can now have 87 00:05:29,430 --> 00:05:32,310 an almost automatic way of quickly 88 00:05:32,310 --> 00:05:35,864 looking at multiple core size and picking the, quote, best one. 89 00:05:35,864 --> 00:05:39,000 Now these evaluation matrix must be 90 00:05:39,000 --> 00:05:44,095 evaluated or calculated on a training set or a development set or maybe on the test set. 91 00:05:44,095 --> 00:05:46,935 So one of the things you also need to do is set up training, 92 00:05:46,935 --> 00:05:50,100 dev or development, as well as test sets. 93 00:05:50,100 --> 00:05:52,800 In the next video, I want to share with you some guidelines for 94 00:05:52,800 --> 00:05:55,800 how to set up training, dev, and test sets. 95 00:05:55,800 --> 00:05:57,470 So let's go on to the next.