1 00:00:00,000 --> 00:00:03,997 [MUSIC] 2 00:00:03,997 --> 00:00:05,170 As Emily discussed, 3 00:00:05,170 --> 00:00:09,601 we're gonna see machine learning through the lens of a wide range of case studies 4 00:00:09,601 --> 00:00:13,750 in different areas that really ground the concepts behind them. 5 00:00:13,750 --> 00:00:17,810 So, other machine learning classes that you might take out there 6 00:00:17,810 --> 00:00:21,180 are really a laundry list of algorithms and methods. 7 00:00:21,180 --> 00:00:23,860 So things like support vector machines and 8 00:00:23,860 --> 00:00:27,840 kernels and logistic ration and networks and so on. 9 00:00:27,840 --> 00:00:30,360 And they're just like, a laundry list of methods. 10 00:00:30,360 --> 00:00:34,750 And the problem with that approach is that since you start from the algorithms you 11 00:00:34,750 --> 00:00:39,055 end up with really simplistic use cases with the applications, 12 00:00:39,055 --> 00:00:41,515 they're really disconnected from reality. 13 00:00:41,515 --> 00:00:44,785 So, we're doing things very different in this specialization, and 14 00:00:44,785 --> 00:00:47,655 we've done this for quite a while here. 15 00:00:47,655 --> 00:00:50,885 Emily and I created a course at the University of Washington 16 00:00:50,885 --> 00:00:54,145 on machine learning at scale for big data. 17 00:00:54,145 --> 00:00:58,785 We pioneered this use case approach for teaching machine learning. 18 00:00:58,785 --> 00:01:03,160 And in that course, we saw a lot of positive feedback 19 00:01:03,160 --> 00:01:06,410 from folks really understanding rounding the concepts. 20 00:01:06,410 --> 00:01:10,680 So we're going to start from the use cases in the first course. 21 00:01:10,680 --> 00:01:15,770 And by starting from use cases, you're really going to be able to grasp the key 22 00:01:15,770 --> 00:01:21,950 concepts and the techniques that allow you to build, measure the quality and 23 00:01:21,950 --> 00:01:25,690 understand whether your intelligent applications is working well or not. 24 00:01:25,690 --> 00:01:29,760 And in the end, you are going to build a bunch of these intelligent applications. 25 00:01:29,760 --> 00:01:32,330 So to build such intelligent applications, 26 00:01:32,330 --> 00:01:34,970 you typically have to think about what task am I going to do. 27 00:01:34,970 --> 00:01:40,212 I am going to solve a sentiment analysis problem and what models, what 28 00:01:40,212 --> 00:01:44,765 machine learning models am I going to use, and things like support vector machines or 29 00:01:44,765 --> 00:01:50,450 regression what methods when they use to optimize the parameters of that model? 30 00:01:50,450 --> 00:01:54,230 And then I ask a question like is this really providing the intelligence that I'm 31 00:01:54,230 --> 00:01:55,060 hoping for? 32 00:01:55,060 --> 00:01:57,340 How do we measure the quality of that system? 33 00:01:57,340 --> 00:02:02,340 So in this specialization what we're gonna do is defer the core 34 00:02:02,340 --> 00:02:08,080 pieces of how to describe a model and optimize it to the follow on courses. 35 00:02:08,080 --> 00:02:13,060 And this first course is going to be focused on helping us figure out what 36 00:02:13,060 --> 00:02:16,250 task we're trying to solve, what machine learning methods make sense, and 37 00:02:16,250 --> 00:02:17,480 how to measure them. 38 00:02:17,480 --> 00:02:21,949 And with that, using the algorithms as black boxes, we're going to be able to 39 00:02:21,949 --> 00:02:26,310 build a wide range of really intelligent cool applications together. 40 00:02:26,310 --> 00:02:27,558 And we'll actually code them and 41 00:02:27,558 --> 00:02:31,290 build them and demonstrate them in a wide range of ways. 42 00:02:31,290 --> 00:02:32,890 Now the following on courses, they're, 43 00:02:32,890 --> 00:02:35,760 it's going to be four of those plus a capstone. 44 00:02:35,760 --> 00:02:37,820 They really go into depth in different areas. 45 00:02:37,820 --> 00:02:42,400 So let me give you a few quick examples of the kind of depth we're going to see 46 00:02:42,400 --> 00:02:44,080 throughout this specialization. 47 00:02:44,080 --> 00:02:48,930 So the regression course is going to talk about various models of predicting 48 00:02:48,930 --> 00:02:53,150 a real value, so for example, a house price from the features of the house. 49 00:02:53,150 --> 00:02:55,830 And we're going to discuss linear regression techniques, 50 00:02:55,830 --> 00:02:59,730 we're going to discuss advanced techniques like ridge regression and lasso that allow 51 00:02:59,730 --> 00:03:03,720 you to select what features are most appropriate for your problem. 52 00:03:03,720 --> 00:03:07,360 We're going to talk about optimization techniques like gradient descent and 53 00:03:07,360 --> 00:03:11,020 coordinate descent to optimize the parameters of those models. 54 00:03:11,020 --> 00:03:14,548 As well as some key machine learning concepts like loss functions, 55 00:03:14,548 --> 00:03:16,840 bias-variance tradeoffs, cross-validation. 56 00:03:16,840 --> 00:03:21,690 Things that you need to know to really take this method and kind of improve them, 57 00:03:21,690 --> 00:03:24,050 develop them and build applications with them. 58 00:03:25,240 --> 00:03:29,340 The second course on classification, we're gonna build, for example, 59 00:03:29,340 --> 00:03:32,930 the sentiment analysis use case that Emily talked about, and 60 00:03:32,930 --> 00:03:35,010 talk about more of those classifications. 61 00:03:35,010 --> 00:03:39,437 From linear classifiers to more advanced things like linear regression, 62 00:03:39,437 --> 00:03:43,330 sorry, logistic regression, support vector machines. 63 00:03:43,330 --> 00:03:45,650 But then add kernels and 64 00:03:45,650 --> 00:03:51,010 decision trees which allow you to deal with non-linear complex features. 65 00:03:51,010 --> 00:03:55,250 We talked about optimization methods for dealing with these techniques at scale and 66 00:03:55,250 --> 00:03:58,730 for building ensembles of them something called boosting. 67 00:03:58,730 --> 00:04:03,820 And then the underlying concepts in machine learning that really help 68 00:04:03,820 --> 00:04:08,920 you grasp classifier and scaled it up and apply it to different methods. 69 00:04:10,860 --> 00:04:15,440 Now, in the next course, we're gonna focus on clustering and 70 00:04:15,440 --> 00:04:17,440 retrieval, especially in the context of documents. 71 00:04:17,440 --> 00:04:20,830 So we're gonna talk about basic techniques like nearest neighbors 72 00:04:20,830 --> 00:04:24,870 as well as more advanced clustering techniques, mixture of Gaussians, and even 73 00:04:24,870 --> 00:04:29,180 latent Dirichlet allocation can advance text analysis clustering technique. 74 00:04:29,180 --> 00:04:32,910 We're gonna talk about the algorithms that underpin these things and 75 00:04:32,910 --> 00:04:36,820 how to scale them up with techniques like KD-trees and 76 00:04:36,820 --> 00:04:41,180 sampling and expectation maximization. 77 00:04:41,180 --> 00:04:46,970 Now the core concepts here are really around how to scale these things up, 78 00:04:46,970 --> 00:04:50,120 how to measure the quality and really how to write them 79 00:04:50,120 --> 00:04:54,000 as distributed algorithms using techniques like map-reduce, 80 00:04:54,000 --> 00:04:57,410 which are implementing systems like Hadoop that you might have learned about. 81 00:04:57,410 --> 00:05:02,908 So in the fourth course, you're actually going to write some map-reduce code for 82 00:05:02,908 --> 00:05:05,118 distributed machine learning. 83 00:05:05,118 --> 00:05:09,627 Now in the final technical course we're gonna focus on techniques of matrix 84 00:05:09,627 --> 00:05:14,348 factorization and dimensionality reduction, which are widely applicable, 85 00:05:14,348 --> 00:05:19,020 but in particular for recommender systems, for recommending products. 86 00:05:19,020 --> 00:05:20,890 So these are things like collaborative filtering, 87 00:05:20,890 --> 00:05:25,840 matrix factorization, PCA, and the underlying techniques for 88 00:05:25,840 --> 00:05:30,340 optimizing them, like coordinate descent, Eigen decomposition, SVD. 89 00:05:30,340 --> 00:05:36,370 And then, a wide variety of whole machine learning concepts that are really useful. 90 00:05:36,370 --> 00:05:38,290 Especially in the recommenders' domain. 91 00:05:38,290 --> 00:05:43,240 Like how to pick a diverse set of recommendations and 92 00:05:43,240 --> 00:05:45,270 how to scale them up to large problems. 93 00:05:46,860 --> 00:05:50,080 Now the capstone is going to be really exciting and towards the end of 94 00:05:50,080 --> 00:05:55,020 this module, I'm going to go back and tell you quite a bit more about the capstone. 95 00:05:55,020 --> 00:05:58,160 But just to give you a little hint, you're going to build 96 00:05:58,160 --> 00:06:02,640 something extremely cool that you can show to all your friends, potential employers. 97 00:06:02,640 --> 00:06:05,711 And you'll see that it can build a really smart intelligent 98 00:06:05,711 --> 00:06:08,908 application around recommenders, the combined text data, 99 00:06:08,908 --> 00:06:13,221 image data, sentiment analysis, deep learning, it's going to be really cool. 100 00:06:13,221 --> 00:06:17,049 [MUSIC]