1 00:00:00,000 --> 00:00:04,577 [MUSIC] 2 00:00:04,577 --> 00:00:07,921 Now, we've talked about that initial deployment, taking a model that we learned 3 00:00:07,921 --> 00:00:11,560 from recommender systems and deploying as a service that your website can query. 4 00:00:11,560 --> 00:00:15,310 But there's more to that deployment process, and 5 00:00:15,310 --> 00:00:17,020 to machine learning in production. 6 00:00:17,020 --> 00:00:20,750 There is that deployment piece, but there's also the management of models, 7 00:00:20,750 --> 00:00:24,520 there is the evaluation, and there's monitoring collection of metrics. 8 00:00:24,520 --> 00:00:26,555 So let's talk about those last three pieces. 9 00:00:26,555 --> 00:00:30,354 And those pieces are really about taking those models that we've learned seeing how 10 00:00:30,354 --> 00:00:31,970 they're performing in practice. 11 00:00:31,970 --> 00:00:36,670 Not just in the batch offline process, but with real users in practice. 12 00:00:36,670 --> 00:00:41,020 And use that information to train new models, and deploy new models, 13 00:00:41,020 --> 00:00:44,190 and update the models as we gather more information about the world. 14 00:00:44,190 --> 00:00:48,537 So if we go back to our pipeline which involved the batch process and the real 15 00:00:48,537 --> 00:00:53,367 time process, now, that feedback piece where the user maybe bought the product or 16 00:00:53,367 --> 00:00:58,267 didn't buy a product they were recommended that gets fed back into both the real time 17 00:00:58,267 --> 00:01:02,750 data, but also the historical data is going to be very useful for us. 18 00:01:02,750 --> 00:01:06,200 We're going to use that feedback to go back and learn new models. 19 00:01:06,200 --> 00:01:08,370 For example, now that we have more historical data, 20 00:01:08,370 --> 00:01:09,510 I might learn a second model. 21 00:01:09,510 --> 00:01:13,140 Let's call it Model 2 for recommenders, which I think is better, and 22 00:01:13,140 --> 00:01:15,910 I wanna start serving the model in production, and 23 00:01:15,910 --> 00:01:20,810 figure out is this Model 2 really better than old Model 1 that I had? 24 00:01:20,810 --> 00:01:21,710 Which one is better? 25 00:01:21,710 --> 00:01:23,255 How do I figure that out? 26 00:01:23,255 --> 00:01:27,270 That's some of the key questions around managing models in production. 27 00:01:27,270 --> 00:01:31,760 So we'll figure out when to update to Model 2 if it's worth it, and 28 00:01:31,760 --> 00:01:33,550 how do we choose between models. 29 00:01:33,550 --> 00:01:38,450 And this is really about monitoring the models in production with real users, and 30 00:01:38,450 --> 00:01:41,960 understanding what those usage patterns look like. 31 00:01:41,960 --> 00:01:46,050 And the key piece to monitoring models is evaluation of models in production. 32 00:01:46,050 --> 00:01:49,130 So this is really combining the predictions that we're making 33 00:01:49,130 --> 00:01:50,230 with the metrics. 34 00:01:50,230 --> 00:01:53,660 What are users doing in real time with our system? 35 00:01:53,660 --> 00:01:57,840 So the questions around here that you need to address with the deploy models 36 00:01:57,840 --> 00:02:00,660 are what is the data they are collecting from users? 37 00:02:00,660 --> 00:02:03,530 Not just the data you had started, but what data are you collecting from that 38 00:02:03,530 --> 00:02:06,340 real time interaction, whether the users are buying or not. 39 00:02:06,340 --> 00:02:09,960 And what are the metrics they're going to use to measure whether those interactions 40 00:02:09,960 --> 00:02:13,250 are good, whether you're getting the right kind of response you're hoping for, 41 00:02:13,250 --> 00:02:16,650 whether the machine is actually working for you in the system that you've built. 42 00:02:17,880 --> 00:02:21,050 Now, if we go back to our pipeline, you can imagine saying, 43 00:02:21,050 --> 00:02:22,840 okay, I'm gonna collect the data, and 44 00:02:22,840 --> 00:02:25,970 I'm gonna use to measure the metrics that we use to train my model. 45 00:02:25,970 --> 00:02:29,410 So for example, when we talked about system, we talked about one such metric. 46 00:02:29,410 --> 00:02:32,470 We discussed some minimizing sum squared error. 47 00:02:33,740 --> 00:02:36,360 Now, is this the right metric to evaluate in production? 48 00:02:37,420 --> 00:02:39,970 It's a god metric to optimize a model offline, but 49 00:02:39,970 --> 00:02:44,610 in production, you really care about whether people buy a product or not, or 50 00:02:44,610 --> 00:02:48,330 whether this machine remodel is getting your users more engaged for web site. 51 00:02:48,330 --> 00:02:53,170 Whether that model is helping people use their smartphones better, 52 00:02:53,170 --> 00:02:54,660 or their wearable watches, 53 00:02:54,660 --> 00:02:58,250 or whatever technology that's using machine learning in the background. 54 00:02:58,250 --> 00:03:03,260 So sum squared errors or some offline training, 55 00:03:03,260 --> 00:03:08,320 those matrix are really about optimizing the model offline, and 56 00:03:08,320 --> 00:03:13,070 figuring out whether model is good, and perhaps whether it can be updated. 57 00:03:13,070 --> 00:03:16,980 Now, the online matrix, let's say, whose buying, the usage matrix, 58 00:03:16,980 --> 00:03:22,540 how is this changing, the bottom line for my business, this is great about 59 00:03:22,540 --> 00:03:26,700 kind of choosing whether the old model is better than this new model I've created. 60 00:03:26,700 --> 00:03:29,040 And let's talk a little bit about what that process looks like. 61 00:03:30,440 --> 00:03:33,810 So the question here is, should I update my old model 62 00:03:33,810 --> 00:03:36,910 with a new one that I learned, will they have new data? 63 00:03:36,910 --> 00:03:38,890 And there's many questions around this. 64 00:03:38,890 --> 00:03:40,220 Why should I update? 65 00:03:40,220 --> 00:03:44,100 Why should I take what I've done before, and change it with something new? 66 00:03:44,100 --> 00:03:48,480 And this has to do with the trends in the world of change, 67 00:03:48,480 --> 00:03:51,930 new products have come in, users tastes have changed. 68 00:03:51,930 --> 00:03:56,800 A fad like the chewy giraffe that we've talked about goes out of fashion. 69 00:03:56,800 --> 00:03:58,060 Nobody else wants it. 70 00:03:58,060 --> 00:04:00,660 So we wanna change them all or we are gonna update them. 71 00:04:00,660 --> 00:04:04,290 So that's what we have to do to say, okay, this is why we should update it, but 72 00:04:04,290 --> 00:04:06,220 when do we update it, when do we say, okay, 73 00:04:06,220 --> 00:04:11,116 it's time to take that old model and switch it out and put in some new one. 74 00:04:11,116 --> 00:04:17,150 This is about tracking real world statistics, it's not about intuition, this 75 00:04:17,150 --> 00:04:21,980 sounds like the right time, or talking to some person who's not looking at data. 76 00:04:21,980 --> 00:04:24,760 Maybe some kind of intuitive business analysis. 77 00:04:24,760 --> 00:04:26,690 This is really about data. 78 00:04:26,690 --> 00:04:30,660 So about tracking, those matrix that we measure, those statistics, and 79 00:04:30,660 --> 00:04:33,750 really coming up with like a quantitative 80 00:04:35,450 --> 00:04:39,460 of quality as to say, things have changed, it's time to update model. 81 00:04:39,460 --> 00:04:42,060 This is what's going to happen when we update the model. 82 00:04:42,060 --> 00:04:44,800 And this combines those offline metrics when we use the chain model. 83 00:04:44,800 --> 00:04:47,400 But really online metrics that we're capturing. 84 00:04:47,400 --> 00:04:50,830 So let's talk about how online metrics get used. 85 00:04:50,830 --> 00:04:55,388 One example how to choose between models using online metrics is what's the idea of 86 00:04:55,388 --> 00:04:56,115 AB testing. 87 00:04:56,115 --> 00:04:59,370 Let's say you have two models, Model 1 and Model 2. 88 00:04:59,370 --> 00:05:01,380 And I wanna figure out which one is gonna be better, 89 00:05:01,380 --> 00:05:03,810 which one should I give to my system. 90 00:05:03,810 --> 00:05:07,510 So what I can do is give some of my population, call them group A, let's say. 91 00:05:07,510 --> 00:05:11,141 Some of the people or people from a certain geographic region, 92 00:05:11,141 --> 00:05:14,445 let's say people from the United States, get Model 1. 93 00:05:14,445 --> 00:05:18,575 And then say people from a different geographic region, 94 00:05:18,575 --> 00:05:21,423 say people from Canada, get Model 2. 95 00:05:21,423 --> 00:05:25,220 And so, you look at the behavior between those two models, capture some metrics. 96 00:05:25,220 --> 00:05:30,945 And let's say that Model 1 does better, sorry, Model 1 does worse. 97 00:05:30,945 --> 00:05:34,083 It only has 10% click through rate, so CTR. 98 00:05:34,083 --> 00:05:37,825 So that means only 10% of the time, people are buying the product. 99 00:05:37,825 --> 00:05:41,882 While with Model 2, it's amazing, 30% of the time, people are buying the product. 100 00:05:41,882 --> 00:05:44,420 So the CTR, clicks through rate is 30%. 101 00:05:44,420 --> 00:05:46,950 So what do you do after you've done this test is say okay, 102 00:05:46,950 --> 00:05:48,235 I've done the test enough. 103 00:05:48,235 --> 00:05:50,070 I've collected enough samples. 104 00:05:50,070 --> 00:05:53,140 Now, I'm gonna start serving Model 2 instead of Model 1. 105 00:05:53,140 --> 00:05:55,341 Now, there are many other issues and 106 00:05:55,341 --> 00:05:58,043 caveats around ideas we talked about so far. 107 00:05:58,043 --> 00:06:00,798 A/B testing, deciding when it's time to switch a model, 108 00:06:00,798 --> 00:06:02,960 how much data you have to collect, what to do. 109 00:06:02,960 --> 00:06:04,210 It's very tricky. 110 00:06:04,210 --> 00:06:06,980 And it requires a lot of thought, and we will talk more about it 111 00:06:06,980 --> 00:06:11,310 towards the capstone, but really something that you need to think about quite deeply. 112 00:06:11,310 --> 00:06:15,585 Now, also thinking about what version of model we have, model one, model two, 113 00:06:15,585 --> 00:06:20,010 simplification, typically you have many data scientists capturing their own 114 00:06:20,010 --> 00:06:22,980 models with their own ideas, the question is how do they keep track of that? 115 00:06:22,980 --> 00:06:25,850 How do you know what data was used to train different models? 116 00:06:25,850 --> 00:06:29,580 How do they keep track of how they are performing which ones are performing well, 117 00:06:29,580 --> 00:06:30,870 which ones don't? 118 00:06:30,870 --> 00:06:32,520 Is it because of some fluke? 119 00:06:32,520 --> 00:06:35,890 Is it because of some real property of the data? 120 00:06:35,890 --> 00:06:37,790 How do you monitor these dashboards? 121 00:06:37,790 --> 00:06:39,080 How do you come up with reports? 122 00:06:39,080 --> 00:06:40,370 Say okay, this is what's happening. 123 00:06:40,370 --> 00:06:43,100 This is what machinery is doing, what difference it's making. 124 00:06:43,100 --> 00:06:45,090 All that can be quite complicated. 125 00:06:45,090 --> 00:06:49,290 And so, it's very important for you to think about not just how do 126 00:06:49,290 --> 00:06:53,670 you use machinery in the algorithms, how do you write your own method. 127 00:06:53,670 --> 00:06:54,550 How do you pick your features? 128 00:06:54,550 --> 00:06:57,923 But how do you keep track of that, and make sure the models are working and 129 00:06:57,923 --> 00:07:01,094 providing the file that you want for the system that you've built. 130 00:07:01,094 --> 00:07:05,109 [MUSIC]