1 00:00:00,000 --> 00:00:04,182 If you're working on a brand new machine learning application, 2 00:00:04,182 --> 00:00:06,840 one of the piece of advice I often give people is that, 3 00:00:06,840 --> 00:00:11,005 I think you should build your first system quickly and then iterate. 4 00:00:11,005 --> 00:00:14,895 Let me show you what I mean. I've worked on speech recognition for many years. 5 00:00:14,895 --> 00:00:18,385 And if you're thinking of building a new speech recognition system, 6 00:00:18,385 --> 00:00:20,430 there's actually a lot of directions you could 7 00:00:20,430 --> 00:00:22,976 go and a lot of things you could prioritize. 8 00:00:22,976 --> 00:00:25,590 For example, there are specific techniques for making 9 00:00:25,590 --> 00:00:29,235 speech recognition systems more robust to noisy background. 10 00:00:29,235 --> 00:00:32,340 And noisy background could mean cafe noise, 11 00:00:32,340 --> 00:00:35,430 like a lot of people talking in the background or car noise, 12 00:00:35,430 --> 00:00:38,930 the sounds of cars and highways or other types of noise. 13 00:00:38,930 --> 00:00:43,440 There are ways to make a speech recognition system more robust to accented speech. 14 00:00:43,440 --> 00:00:48,311 There are specific problems associated with speakers that are far from the microphone, 15 00:00:48,311 --> 00:00:50,705 this is called far-field speech recognition. 16 00:00:50,705 --> 00:00:53,610 Young children speech poses special challenges, 17 00:00:53,610 --> 00:00:56,535 both in terms of how they pronounce individual words as well 18 00:00:56,535 --> 00:00:59,820 as their choice of words and the vocabulary they tend to use. 19 00:00:59,820 --> 00:01:07,130 And if sometimes the speaker stutters or if they use nonsensical phrases like oh, ah, 20 00:01:07,130 --> 00:01:09,960 um, there are different choices and 21 00:01:09,960 --> 00:01:12,940 different techniques for making the transcript that you output, 22 00:01:12,940 --> 00:01:15,310 still read more fluently. 23 00:01:15,310 --> 00:01:17,880 So, there are these and 24 00:01:17,880 --> 00:01:22,710 many other things you could do to improve a speech recognition system. 25 00:01:22,710 --> 00:01:26,693 And more generally, for almost any machine learning application, 26 00:01:26,693 --> 00:01:30,030 there could be 50 different directions you could go in 27 00:01:30,030 --> 00:01:34,650 and each of these directions is reasonable and would make your system better. 28 00:01:34,650 --> 00:01:35,955 But the challenge is, 29 00:01:35,955 --> 00:01:38,990 how do you pick which of these to focus on. 30 00:01:38,990 --> 00:01:42,970 And even though I've worked in speech recognition for many years, 31 00:01:42,970 --> 00:01:46,075 if I'm building a new system for a new application domain, 32 00:01:46,075 --> 00:01:48,730 I would still find it maybe a little bit difficult to 33 00:01:48,730 --> 00:01:52,625 pick without spending some time thinking about the problem. 34 00:01:52,625 --> 00:01:54,550 So what we recommend you do, 35 00:01:54,550 --> 00:01:58,570 if you're starting on building a brand new machine learning application, 36 00:01:58,570 --> 00:02:02,277 is to build your first system quickly and then iterate. 37 00:02:02,277 --> 00:02:04,720 What I mean by that is I recommend that 38 00:02:04,720 --> 00:02:08,470 you first quickly set up a dev/test set and metric. 39 00:02:08,470 --> 00:02:12,360 So this is really deciding where to place your target. 40 00:02:12,360 --> 00:02:14,560 And if you get it wrong, you can always move it later, 41 00:02:14,560 --> 00:02:16,695 but just set up a target somewhere. 42 00:02:16,695 --> 00:02:20,920 And then I recommend you build an initial machine learning system quickly. 43 00:02:20,920 --> 00:02:23,248 Find the training set, train it and see. 44 00:02:23,248 --> 00:02:25,180 Start to see and understand how well you're 45 00:02:25,180 --> 00:02:29,475 doing against your dev/test set and your values and metric. 46 00:02:29,475 --> 00:02:32,633 When you build your initial system, 47 00:02:32,633 --> 00:02:37,180 you then be able to use bias/variance analysis which we talked about 48 00:02:37,180 --> 00:02:42,470 earlier as well as error analysis which we talked about just in the last several videos, 49 00:02:42,470 --> 00:02:45,260 to prioritize the next steps. 50 00:02:45,260 --> 00:02:49,320 In particular, if error analysis 51 00:02:49,320 --> 00:02:52,780 causes you to realize that a lot of the errors are 52 00:02:52,780 --> 00:02:55,675 from the speaker being very far from the microphone, 53 00:02:55,675 --> 00:02:58,342 which causes special challenges to speech recognition, 54 00:02:58,342 --> 00:03:03,990 then that will give you a good reason to focus on techniques to address this called 55 00:03:03,990 --> 00:03:06,640 far-field speech recognition which 56 00:03:06,640 --> 00:03:10,865 basically means handling when the speaker is very far from the microphone. 57 00:03:10,865 --> 00:03:14,693 Of all the value of building this initial system, 58 00:03:14,693 --> 00:03:16,737 it can be a quick and dirty implementation, 59 00:03:16,737 --> 00:03:18,120 you know, don't overthink it, 60 00:03:18,120 --> 00:03:22,690 but all the value of the initial system is having some learned system, 61 00:03:22,690 --> 00:03:26,497 having some trained system allows you to localize bias/variance, 62 00:03:26,497 --> 00:03:28,255 to try to prioritize what to do next, 63 00:03:28,255 --> 00:03:30,270 allows you to do error analysis, 64 00:03:30,270 --> 00:03:31,480 look at some mistakes, 65 00:03:31,480 --> 00:03:34,630 to figure out all the different directions you can go in, 66 00:03:34,630 --> 00:03:37,822 which ones are actually the most worthwhile. 67 00:03:37,822 --> 00:03:44,125 So to recap, what I recommend you do is build your first system quickly, then iterate. 68 00:03:44,125 --> 00:03:47,050 This advice applies less strongly if you're working on 69 00:03:47,050 --> 00:03:52,300 an application area in which you have significant prior experience. 70 00:03:52,300 --> 00:03:56,080 It also implies to build less strongly if there's a significant body of 71 00:03:56,080 --> 00:03:58,480 academic literature that you can draw on 72 00:03:58,480 --> 00:04:01,425 for pretty much the exact same problem you're building. 73 00:04:01,425 --> 00:04:05,810 So, for example, there's a large academic literature on face recognition. 74 00:04:05,810 --> 00:04:08,185 And if you're trying to build a face recognizer, 75 00:04:08,185 --> 00:04:11,725 it might be okay to build a more complex system from the get-go 76 00:04:11,725 --> 00:04:16,595 by building on this large body of academic literature. 77 00:04:16,595 --> 00:04:19,990 But if you are tackling a new problem for the first time, 78 00:04:19,990 --> 00:04:23,235 then I would encourage you to really not 79 00:04:23,235 --> 00:04:27,010 overthink or not make your first system too complicated. 80 00:04:27,010 --> 00:04:30,070 Well, just build something quick and dirty and then use that 81 00:04:30,070 --> 00:04:33,447 to help you prioritize how to improve your system. 82 00:04:33,447 --> 00:04:36,670 So I've seen a lot of machine learning projects and I've 83 00:04:36,670 --> 00:04:40,465 seen some teams overthink the solution and build something too complicated. 84 00:04:40,465 --> 00:04:44,335 I've also seen some teams underthink and then build something maybe too simple. 85 00:04:44,335 --> 00:04:46,240 Well on average, I've seen a lot more 86 00:04:46,240 --> 00:04:49,315 teams overthink and build something too complicated. 87 00:04:49,315 --> 00:04:52,275 And I've seen teams build something too simple. 88 00:04:52,275 --> 00:04:53,920 So I hope this helps, 89 00:04:53,920 --> 00:04:58,583 and if you are applying to your machine learning algorithms to a new application, 90 00:04:58,583 --> 00:05:01,840 and if your main goal is to build something that works, 91 00:05:01,840 --> 00:05:04,720 as opposed to if your main goal is to invent 92 00:05:04,720 --> 00:05:08,020 a new machine learning algorithm which is a different goal, 93 00:05:08,020 --> 00:05:11,075 then your main goal is to get something that works really well. 94 00:05:11,075 --> 00:05:13,360 I'd encourage you to build something quick and dirty. 95 00:05:13,360 --> 00:05:14,600 Use that to do bias/variance analysis, 96 00:05:14,600 --> 00:05:17,890 use that to do error analysis and 97 00:05:17,890 --> 00:05:23,510 use the results of those analysis to help you prioritize where to go next.