[MUSIC] Now, we've talked about that initial
deployment, taking a model that we learned from recommender systems and deploying as
a service that your website can query. But there's more to that
deployment process, and to machine learning in production. There is that deployment piece, but
there's also the management of models, there is the evaluation, and
there's monitoring collection of metrics. So let's talk about
those last three pieces. And those pieces are really about taking
those models that we've learned seeing how they're performing in practice. Not just in the batch offline process,
but with real users in practice. And use that information to train
new models, and deploy new models, and update the models as we gather
more information about the world. So if we go back to our pipeline which
involved the batch process and the real time process, now, that feedback piece
where the user maybe bought the product or didn't buy a product they were recommended
that gets fed back into both the real time data, but also the historical data
is going to be very useful for us. We're going to use that feedback
to go back and learn new models. For example,
now that we have more historical data, I might learn a second model. Let's call it Model 2 for recommenders,
which I think is better, and I wanna start serving
the model in production, and figure out is this Model 2 really
better than old Model 1 that I had? Which one is better? How do I figure that out? That's some of the key questions
around managing models in production. So we'll figure out when to update
to Model 2 if it's worth it, and how do we choose between models. And this is really about monitoring the
models in production with real users, and understanding what those
usage patterns look like. And the key piece to monitoring models
is evaluation of models in production. So this is really combining
the predictions that we're making with the metrics. What are users doing in
real time with our system? So the questions around here that you
need to address with the deploy models are what is the data they
are collecting from users? Not just the data you had started, but
what data are you collecting from that real time interaction,
whether the users are buying or not. And what are the metrics they're going to
use to measure whether those interactions are good, whether you're getting the right
kind of response you're hoping for, whether the machine is actually working
for you in the system that you've built. Now, if we go back to our pipeline,
you can imagine saying, okay, I'm gonna collect the data, and I'm gonna use to measure the metrics
that we use to train my model. So for example, when we talked about
system, we talked about one such metric. We discussed some minimizing
sum squared error. Now, is this the right metric
to evaluate in production? It's a god metric to optimize
a model offline, but in production, you really care about
whether people buy a product or not, or whether this machine remodel is getting
your users more engaged for web site. Whether that model is helping people
use their smartphones better, or their wearable watches, or whatever technology that's using
machine learning in the background. So sum squared errors or
some offline training, those matrix are really about
optimizing the model offline, and figuring out whether model is good,
and perhaps whether it can be updated. Now, the online matrix, let's say,
whose buying, the usage matrix, how is this changing, the bottom line for
my business, this is great about kind of choosing whether the old model is
better than this new model I've created. And let's talk a little bit about
what that process looks like. So the question here is,
should I update my old model with a new one that I learned,
will they have new data? And there's many questions around this. Why should I update? Why should I take what I've done before,
and change it with something new? And this has to do with the trends
in the world of change, new products have come in,
users tastes have changed. A fad like the chewy giraffe that we've
talked about goes out of fashion. Nobody else wants it. So we wanna change them all or
we are gonna update them. So that's what we have to do to say,
okay, this is why we should update it, but when do we update it,
when do we say, okay, it's time to take that old model and
switch it out and put in some new one. This is about tracking real world
statistics, it's not about intuition, this sounds like the right time, or talking to
some person who's not looking at data. Maybe some kind of intuitive
business analysis. This is really about data. So about tracking, those matrix that
we measure, those statistics, and really coming up with like a quantitative of quality as to say, things have changed,
it's time to update model. This is what's going to happen
when we update the model. And this combines those offline
metrics when we use the chain model. But really online metrics
that we're capturing. So let's talk about how
online metrics get used. One example how to choose between models
using online metrics is what's the idea of AB testing. Let's say you have two models,
Model 1 and Model 2. And I wanna figure out which
one is gonna be better, which one should I give to my system. So what I can do is give some of my
population, call them group A, let's say. Some of the people or
people from a certain geographic region, let's say people from the United States,
get Model 1. And then say people from
a different geographic region, say people from Canada, get Model 2. And so, you look at the behavior between
those two models, capture some metrics. And let's say that Model 1 does better,
sorry, Model 1 does worse. It only has 10% click through rate,
so CTR. So that means only 10% of the time,
people are buying the product. While with Model 2, it's amazing, 30% of
the time, people are buying the product. So the CTR, clicks through rate is 30%. So what do you do after you've
done this test is say okay, I've done the test enough. I've collected enough samples. Now, I'm gonna start serving
Model 2 instead of Model 1. Now, there are many other issues and caveats around ideas we talked about so
far. A/B testing,
deciding when it's time to switch a model, how much data you have to collect,
what to do. It's very tricky. And it requires a lot of thought,
and we will talk more about it towards the capstone, but really something
that you need to think about quite deeply. Now, also thinking about what version
of model we have, model one, model two, simplification, typically you have many
data scientists capturing their own models with their own ideas, the question
is how do they keep track of that? How do you know what data was
used to train different models? How do they keep track of how they are
performing which ones are performing well, which ones don't? Is it because of some fluke? Is it because of some real
property of the data? How do you monitor these dashboards? How do you come up with reports? Say okay, this is what's happening. This is what machinery is doing,
what difference it's making. All that can be quite complicated. And so, it's very important for
you to think about not just how do you use machinery in the algorithms,
how do you write your own method. How do you pick your features? But how do you keep track of that, and
make sure the models are working and providing the file that you want for
the system that you've built. [MUSIC]