1 00:00:00,300 --> 00:00:02,290 By now you have seen a lot of different learning algorithms. 2 00:00:03,330 --> 00:00:04,450 And if you've been following along 3 00:00:04,770 --> 00:00:06,030 these videos you should consider 4 00:00:06,770 --> 00:00:09,530 yourself an expert on many state-of-the-art machine learning techniques. 5 00:00:09,730 --> 00:00:12,310 But even among 6 00:00:12,560 --> 00:00:14,460 people that know a certain learning algorithm. 7 00:00:15,250 --> 00:00:16,830 There's often a huge difference between 8 00:00:17,090 --> 00:00:18,240 someone that really knows how 9 00:00:18,410 --> 00:00:20,130 to powerfully and effectively apply 10 00:00:20,450 --> 00:00:22,270 that algorithm, versus someone that's 11 00:00:22,950 --> 00:00:24,090 less familiar with some of 12 00:00:24,160 --> 00:00:25,080 the material that I'm about 13 00:00:25,420 --> 00:00:26,900 to teach and who doesn't really 14 00:00:27,080 --> 00:00:28,090 understand how to apply these 15 00:00:28,250 --> 00:00:29,180 algorithms and can end up 16 00:00:29,570 --> 00:00:30,760 wasting a lot of 17 00:00:30,870 --> 00:00:33,320 their time trying things out that don't really make sense. 18 00:00:34,380 --> 00:00:35,180 What I would like to do is 19 00:00:35,340 --> 00:00:36,350 make sure that if you 20 00:00:36,560 --> 00:00:37,830 are developing machine learning systems, 21 00:00:38,600 --> 00:00:39,780 that you know how to choose 22 00:00:40,400 --> 00:00:42,900 one of the most promising avenues to spend your time pursuing. 23 00:00:43,890 --> 00:00:45,050 And on this and the next 24 00:00:45,190 --> 00:00:46,530 few videos I'm going to 25 00:00:46,750 --> 00:00:47,890 give a number of practical 26 00:00:48,380 --> 00:00:51,150 suggestions, advice, guidelines on how to do that. 27 00:00:51,520 --> 00:00:53,410 And concretely what we'd 28 00:00:53,600 --> 00:00:54,460 focus on is the problem 29 00:00:54,940 --> 00:00:56,380 of, suppose you are 30 00:00:56,580 --> 00:00:57,760 developing a machine learning system 31 00:00:58,390 --> 00:00:59,390 or trying to improve the performance 32 00:00:59,950 --> 00:01:01,810 of a machine learning system, how 33 00:01:02,000 --> 00:01:03,630 do you go about deciding what are 34 00:01:03,700 --> 00:01:05,260 the proxy avenues to try 35 00:01:07,620 --> 00:01:07,620 next? 36 00:01:09,300 --> 00:01:11,200 To explain this, let's continue using 37 00:01:11,670 --> 00:01:13,210 our example of learning to 38 00:01:13,350 --> 00:01:15,280 predict housing prices. 39 00:01:15,570 --> 00:01:17,760 And let's say you've implement and regularize linear regression. 40 00:01:18,700 --> 00:01:20,090 Thus minimizing that cost function 41 00:01:20,520 --> 00:01:22,870 j. Now suppose that 42 00:01:23,130 --> 00:01:24,310 after you take your learn parameters, 43 00:01:24,820 --> 00:01:26,570 if you test your hypothesis on 44 00:01:26,720 --> 00:01:28,360 the new set of houses, suppose you 45 00:01:28,540 --> 00:01:29,470 find that this is making huge 46 00:01:29,860 --> 00:01:31,770 errors in this prediction of the housing prices. 47 00:01:33,220 --> 00:01:34,490 The question is what should 48 00:01:34,670 --> 00:01:37,600 you then try mixing in order to improve the learning algorithm? 49 00:01:39,000 --> 00:01:40,000 There are many things that one 50 00:01:40,210 --> 00:01:41,460 can think of that could improve 51 00:01:41,950 --> 00:01:43,660 the performance of the learning algorithm. 52 00:01:44,800 --> 00:01:47,510 One thing they could try, is to get more training examples. 53 00:01:48,060 --> 00:01:49,240 And concretely, you can imagine, maybe, you 54 00:01:49,600 --> 00:01:51,150 know, setting up phone surveys, going 55 00:01:51,570 --> 00:01:52,820 door to door, to try to 56 00:01:52,930 --> 00:01:54,050 get more data on how much 57 00:01:54,310 --> 00:01:56,660 different houses sell for. 58 00:01:57,730 --> 00:01:58,770 And the sad thing is I've seen 59 00:01:59,010 --> 00:02:00,060 a lot of people spend a 60 00:02:00,200 --> 00:02:01,400 lot of time collecting more training 61 00:02:01,760 --> 00:02:03,270 examples, thinking oh, if we have 62 00:02:03,760 --> 00:02:04,780 twice as much or ten times 63 00:02:05,050 --> 00:02:07,250 as much training data, that is certainly going to help, right? 64 00:02:07,990 --> 00:02:09,020 But sometimes getting more training 65 00:02:09,380 --> 00:02:10,680 data doesn't actually help 66 00:02:11,240 --> 00:02:11,920 and in the next few videos 67 00:02:12,430 --> 00:02:13,670 we will see why, and we 68 00:02:13,720 --> 00:02:15,270 will see how you 69 00:02:15,500 --> 00:02:16,780 can avoid spending a lot 70 00:02:16,950 --> 00:02:18,160 of time collecting more training data 71 00:02:18,910 --> 00:02:20,660 in settings where it is just not going to help. 72 00:02:22,370 --> 00:02:23,510 Other things you might try are 73 00:02:23,730 --> 00:02:25,830 to well maybe try a smaller set of features. 74 00:02:26,470 --> 00:02:27,270 So if you have some set 75 00:02:27,450 --> 00:02:29,030 of features such as x1, 76 00:02:29,270 --> 00:02:30,330 x2, x3 and so on, 77 00:02:30,680 --> 00:02:31,840 maybe a large number of features. 78 00:02:32,570 --> 00:02:33,460 Maybe you want to spend time 79 00:02:33,860 --> 00:02:35,240 carefully selecting some small 80 00:02:35,590 --> 00:02:37,410 subset of them to prevent overfitting. 81 00:02:38,670 --> 00:02:40,730 Or maybe you need to get additional features. 82 00:02:41,330 --> 00:02:42,390 Maybe the current set of features 83 00:02:42,570 --> 00:02:44,740 aren't informative enough and you 84 00:02:44,840 --> 00:02:47,460 want to collect more data in the sense of getting more features. 85 00:02:48,510 --> 00:02:49,590 And once again this is the 86 00:02:49,730 --> 00:02:50,900 sort of project that can scale 87 00:02:51,180 --> 00:02:52,260 up the huge projects can you 88 00:02:52,580 --> 00:02:54,110 imagine getting phone 89 00:02:54,350 --> 00:02:55,280 surveys to find out more 90 00:02:55,490 --> 00:02:57,230 houses, or extra land 91 00:02:57,640 --> 00:02:58,620 surveys to find out more 92 00:02:58,800 --> 00:03:01,130 about the pieces of land and so on, so a huge project. 93 00:03:01,690 --> 00:03:02,820 And once again it would be 94 00:03:02,930 --> 00:03:04,140 nice to know in advance if 95 00:03:04,330 --> 00:03:05,210 this is going to help before we 96 00:03:05,760 --> 00:03:07,690 spend a lot of time doing something like this. 97 00:03:07,920 --> 00:03:09,390 We can also try 98 00:03:10,360 --> 00:03:12,100 adding polynomial features things like 99 00:03:12,180 --> 00:03:13,100 x2 square x2 square and product 100 00:03:13,860 --> 00:03:14,700 features x1, x2. 101 00:03:14,930 --> 00:03:16,040 We can still spend quite a 102 00:03:16,180 --> 00:03:17,830 lot of time thinking about that and 103 00:03:18,270 --> 00:03:19,340 we can also try other things like 104 00:03:19,540 --> 00:03:21,390 decreasing lambda, the regularization parameter or increasing lambda. 105 00:03:23,840 --> 00:03:25,160 Given a menu of options 106 00:03:25,520 --> 00:03:26,680 like these, some of which 107 00:03:26,970 --> 00:03:28,240 can easily scale up to 108 00:03:28,950 --> 00:03:30,000 six month or longer projects. 109 00:03:31,310 --> 00:03:32,660 Unfortunately, the most common 110 00:03:32,760 --> 00:03:34,010 method that people use to 111 00:03:34,170 --> 00:03:36,040 pick one of these is to go by gut feeling. 112 00:03:36,520 --> 00:03:37,670 In which what many people 113 00:03:38,170 --> 00:03:39,520 will do is sort of randomly 114 00:03:39,940 --> 00:03:41,100 pick one of these options and 115 00:03:41,250 --> 00:03:43,050 maybe say, "Oh, lets go and get more training data." 116 00:03:43,980 --> 00:03:45,480 And easily spend six months collecting 117 00:03:45,880 --> 00:03:47,540 more training data or maybe someone 118 00:03:47,780 --> 00:03:48,860 else would rather be saying, "Well, 119 00:03:49,430 --> 00:03:51,810 let's go collect a lot more features on these houses in our data set." 120 00:03:52,780 --> 00:03:54,010 And I have a lot 121 00:03:54,220 --> 00:03:55,870 of times, sadly seen people spend, you know, 122 00:03:56,630 --> 00:03:58,360 literally 6 months doing one 123 00:03:58,530 --> 00:03:59,680 of these avenues that they have 124 00:04:00,240 --> 00:04:01,810 sort of at random only to 125 00:04:01,920 --> 00:04:03,220 discover six months later that 126 00:04:03,460 --> 00:04:05,610 that really wasn't a promising avenue to pursue. 127 00:04:07,090 --> 00:04:08,170 Fortunately, there is a 128 00:04:08,310 --> 00:04:10,650 pretty simple technique that can 129 00:04:10,930 --> 00:04:12,640 let you very quickly rule 130 00:04:12,900 --> 00:04:14,190 out half of the things 131 00:04:14,500 --> 00:04:16,440 on this list as being potentially 132 00:04:16,970 --> 00:04:17,990 promising things to pursue. 133 00:04:18,390 --> 00:04:19,310 And there is a very simple technique, 134 00:04:19,830 --> 00:04:21,080 that if you run, can easily 135 00:04:21,710 --> 00:04:22,820 rule out many of these options, 136 00:04:24,120 --> 00:04:25,470 and potentially save you 137 00:04:25,580 --> 00:04:28,600 a lot of time pursuing something that's just is not going to work. 138 00:04:29,610 --> 00:04:30,950 In the next two videos 139 00:04:31,320 --> 00:04:32,450 after this, I'm going to 140 00:04:32,560 --> 00:04:35,420 first talk about how to evaluate learning algorithms. 141 00:04:36,540 --> 00:04:37,810 And in the next few 142 00:04:38,080 --> 00:04:39,770 videos after that, I'm 143 00:04:40,070 --> 00:04:41,130 going to talk about these techniques, 144 00:04:42,470 --> 00:04:44,270 which are called the machine learning diagnostics. 145 00:04:46,690 --> 00:04:47,980 And what a diagnostic is, is 146 00:04:48,120 --> 00:04:49,080 a test you can run, 147 00:04:49,900 --> 00:04:52,240 to get insight into what 148 00:04:52,430 --> 00:04:53,740 is or isn't working with 149 00:04:54,130 --> 00:04:55,810 an algorithm, and which will 150 00:04:56,070 --> 00:04:57,720 often give you insight as to 151 00:04:57,940 --> 00:04:59,360 what are promising things to try 152 00:04:59,920 --> 00:05:01,100 to improve a learning algorithm's 153 00:05:03,910 --> 00:05:03,910 performance. 154 00:05:04,730 --> 00:05:07,140 We'll talk about specific diagnostics later in this video sequence. 155 00:05:08,050 --> 00:05:09,230 But I should mention in advance 156 00:05:09,440 --> 00:05:10,780 that diagnostics can take 157 00:05:11,100 --> 00:05:12,280 time to implement and can sometimes, 158 00:05:12,820 --> 00:05:14,300 you know, take quite a 159 00:05:14,340 --> 00:05:15,610 lot of time to implement and 160 00:05:15,740 --> 00:05:17,120 understand but doing so 161 00:05:17,410 --> 00:05:18,330 can be a very good use 162 00:05:18,610 --> 00:05:19,380 of your time when you are 163 00:05:19,660 --> 00:05:21,460 developing learning algorithms because they 164 00:05:21,560 --> 00:05:22,660 can often save you from 165 00:05:22,880 --> 00:05:24,670 spending many months pursuing an 166 00:05:24,840 --> 00:05:26,580 avenue that you could 167 00:05:26,870 --> 00:05:29,460 have found out much earlier just was not going to be fruitful. 168 00:05:32,220 --> 00:05:33,070 So in the next few 169 00:05:33,250 --> 00:05:34,250 videos, I'm going to first 170 00:05:34,570 --> 00:05:36,220 talk about how evaluate your 171 00:05:36,450 --> 00:05:38,210 learning algorithms and after 172 00:05:38,410 --> 00:05:39,210 that I'm going to talk 173 00:05:39,300 --> 00:05:41,490 about some of these diagnostics which will hopefully 174 00:05:41,810 --> 00:05:42,950 let you much more 175 00:05:43,110 --> 00:05:44,470 effectively select more of the 176 00:05:44,770 --> 00:05:45,880 useful things to try mixing 177 00:05:46,560 --> 00:05:48,200 if your goal to improve 178 00:05:48,760 --> 00:05:50,430 the machine learning system.