1 00:00:00,000 --> 00:00:04,723 [MUSIC] 2 00:00:04,723 --> 00:00:09,270 Okay, well in summary, we've really learned a lot this module. 3 00:00:09,270 --> 00:00:11,640 We've thought about this task of feature selection and 4 00:00:11,640 --> 00:00:15,965 we've described ways searching over first, all possible 5 00:00:15,965 --> 00:00:20,745 sets of features that we might want to include to come up with the best model. 6 00:00:20,745 --> 00:00:23,995 Talked about the challenges of that computationally, and 7 00:00:23,995 --> 00:00:29,585 then we turned to thinking about greedy algorithms and then discuss these 8 00:00:29,585 --> 00:00:34,215 radialized regression approach of lasso for addressing the feature selection task. 9 00:00:35,430 --> 00:00:38,340 So really it's covering a lot of ground. 10 00:00:38,340 --> 00:00:41,090 That's really important concepts in machine learning. 11 00:00:43,090 --> 00:00:49,000 And this lasso regularized regression approach, although really, really simple, 12 00:00:49,000 --> 00:00:53,040 has dramatically transformed the field of machine learning statistics and 13 00:00:53,040 --> 00:00:54,110 engineering. 14 00:00:54,110 --> 00:00:59,350 It's shown its utility in a variety of different applied domains. 15 00:00:59,350 --> 00:01:04,410 But I wanna mention a really important issue, which we kind of alluded to, 16 00:01:04,410 --> 00:01:08,360 which is that for feature selection, not just lasso, but 17 00:01:08,360 --> 00:01:11,520 in general when you're thinking about feature selection, you have to be really 18 00:01:11,520 --> 00:01:17,740 careful about interpreting the features that you selected. 19 00:01:17,740 --> 00:01:20,660 And some reasons for this include the fact that 20 00:01:21,780 --> 00:01:24,810 the features you selected are always just in the context of 21 00:01:24,810 --> 00:01:29,030 what you provided as the set of possible features to choose from to begin with. 22 00:01:30,710 --> 00:01:35,030 And likewise, the set of selected features are really sensitive to 23 00:01:35,030 --> 00:01:38,950 correlations between features and, in those cases, 24 00:01:38,950 --> 00:01:44,000 small changes in the data can lead to different features being included, too. 25 00:01:44,000 --> 00:01:47,080 So to say that one feature is important and the other isn't, 26 00:01:47,080 --> 00:01:49,190 you have to be careful with statements like that. 27 00:01:50,750 --> 00:01:52,630 And also of course, 28 00:01:52,630 --> 00:01:56,450 the set of selective features depends on which algorithm you use. 29 00:01:56,450 --> 00:01:59,500 We especially saw this when we talked about those greedy algorithms, 30 00:01:59,500 --> 00:02:01,367 like the forward stepwise procedure. 31 00:02:02,580 --> 00:02:06,180 But I did want to mention that there are some nice theoretical guarantees for 32 00:02:06,180 --> 00:02:08,800 lasso under very specific conditions. 33 00:02:10,510 --> 00:02:13,390 So in conclusion, here's a very long list of things that you 34 00:02:13,390 --> 00:02:15,890 can do now that you've completed this module. 35 00:02:15,890 --> 00:02:20,650 Everywhere from thinking about searching over the discrete set of possible 36 00:02:20,650 --> 00:02:25,040 models to do future selection using all subsets with these 3D algorithms. 37 00:02:25,040 --> 00:02:28,650 To formulating a regularized regression approach, 38 00:02:28,650 --> 00:02:33,590 lasso, to implicitly do this feature selection, 39 00:02:33,590 --> 00:02:37,710 searching over a continuous space, this tuning parameter lambda. 40 00:02:37,710 --> 00:02:40,650 We talked about formulating the objective, 41 00:02:40,650 --> 00:02:45,420 geometric interpretations of why the lasso objective leads to sparsity. 42 00:02:45,420 --> 00:02:50,974 And we talked about using coordinate descent as an algorithm for solving lasso. 43 00:02:50,974 --> 00:02:55,660 And so coordinate descent itself was an algorithm that generalizes well beyond 44 00:02:55,660 --> 00:03:01,020 lasso, so that was an important concept that we got out of this module as well. 45 00:03:01,020 --> 00:03:05,790 And finally, if you watch the optional video, 46 00:03:05,790 --> 00:03:09,800 we talked about some really technical concepts relating to subgradients. 47 00:03:09,800 --> 00:03:14,610 And to conclude this module we talked about some of the challenges 48 00:03:14,610 --> 00:03:16,400 associated with lasso. 49 00:03:16,400 --> 00:03:20,310 But, as well as some of the potential impact that this method has, 50 00:03:20,310 --> 00:03:23,910 because it's really quite an important tool. 51 00:03:23,910 --> 00:03:25,392 And like I've mentioned, 52 00:03:25,392 --> 00:03:28,812 it's really shown a lot of promise in many different domains. 53 00:03:28,812 --> 00:03:29,312 [MUSIC]