1
00:00:00,000 --> 00:00:04,577
[MUSIC]

2
00:00:04,577 --> 00:00:07,921
Now, we've talked about that initial
deployment, taking a model that we learned

3
00:00:07,921 --> 00:00:11,560
from recommender systems and deploying as
a service that your website can query.

4
00:00:11,560 --> 00:00:15,310
But there's more to that
deployment process, and

5
00:00:15,310 --> 00:00:17,020
to machine learning in production.

6
00:00:17,020 --> 00:00:20,750
There is that deployment piece, but
there's also the management of models,

7
00:00:20,750 --> 00:00:24,520
there is the evaluation, and
there's monitoring collection of metrics.

8
00:00:24,520 --> 00:00:26,555
So let's talk about
those last three pieces.

9
00:00:26,555 --> 00:00:30,354
And those pieces are really about taking
those models that we've learned seeing how

10
00:00:30,354 --> 00:00:31,970
they're performing in practice.

11
00:00:31,970 --> 00:00:36,670
Not just in the batch offline process,
but with real users in practice.

12
00:00:36,670 --> 00:00:41,020
And use that information to train
new models, and deploy new models,

13
00:00:41,020 --> 00:00:44,190
and update the models as we gather
more information about the world.

14
00:00:44,190 --> 00:00:48,537
So if we go back to our pipeline which
involved the batch process and the real

15
00:00:48,537 --> 00:00:53,367
time process, now, that feedback piece
where the user maybe bought the product or

16
00:00:53,367 --> 00:00:58,267
didn't buy a product they were recommended
that gets fed back into both the real time

17
00:00:58,267 --> 00:01:02,750
data, but also the historical data
is going to be very useful for us.

18
00:01:02,750 --> 00:01:06,200
We're going to use that feedback
to go back and learn new models.

19
00:01:06,200 --> 00:01:08,370
For example,
now that we have more historical data,

20
00:01:08,370 --> 00:01:09,510
I might learn a second model.

21
00:01:09,510 --> 00:01:13,140
Let's call it Model 2 for recommenders,
which I think is better, and

22
00:01:13,140 --> 00:01:15,910
I wanna start serving
the model in production, and

23
00:01:15,910 --> 00:01:20,810
figure out is this Model 2 really
better than old Model 1 that I had?

24
00:01:20,810 --> 00:01:21,710
Which one is better?

25
00:01:21,710 --> 00:01:23,255
How do I figure that out?

26
00:01:23,255 --> 00:01:27,270
That's some of the key questions
around managing models in production.

27
00:01:27,270 --> 00:01:31,760
So we'll figure out when to update
to Model 2 if it's worth it, and

28
00:01:31,760 --> 00:01:33,550
how do we choose between models.

29
00:01:33,550 --> 00:01:38,450
And this is really about monitoring the
models in production with real users, and

30
00:01:38,450 --> 00:01:41,960
understanding what those
usage patterns look like.

31
00:01:41,960 --> 00:01:46,050
And the key piece to monitoring models
is evaluation of models in production.

32
00:01:46,050 --> 00:01:49,130
So this is really combining
the predictions that we're making

33
00:01:49,130 --> 00:01:50,230
with the metrics.

34
00:01:50,230 --> 00:01:53,660
What are users doing in
real time with our system?

35
00:01:53,660 --> 00:01:57,840
So the questions around here that you
need to address with the deploy models

36
00:01:57,840 --> 00:02:00,660
are what is the data they
are collecting from users?

37
00:02:00,660 --> 00:02:03,530
Not just the data you had started, but
what data are you collecting from that

38
00:02:03,530 --> 00:02:06,340
real time interaction,
whether the users are buying or not.

39
00:02:06,340 --> 00:02:09,960
And what are the metrics they're going to
use to measure whether those interactions

40
00:02:09,960 --> 00:02:13,250
are good, whether you're getting the right
kind of response you're hoping for,

41
00:02:13,250 --> 00:02:16,650
whether the machine is actually working
for you in the system that you've built.

42
00:02:17,880 --> 00:02:21,050
Now, if we go back to our pipeline,
you can imagine saying,

43
00:02:21,050 --> 00:02:22,840
okay, I'm gonna collect the data, and

44
00:02:22,840 --> 00:02:25,970
I'm gonna use to measure the metrics
that we use to train my model.

45
00:02:25,970 --> 00:02:29,410
So for example, when we talked about
system, we talked about one such metric.

46
00:02:29,410 --> 00:02:32,470
We discussed some minimizing
sum squared error.

47
00:02:33,740 --> 00:02:36,360
Now, is this the right metric
to evaluate in production?

48
00:02:37,420 --> 00:02:39,970
It's a god metric to optimize
a model offline, but

49
00:02:39,970 --> 00:02:44,610
in production, you really care about
whether people buy a product or not, or

50
00:02:44,610 --> 00:02:48,330
whether this machine remodel is getting
your users more engaged for web site.

51
00:02:48,330 --> 00:02:53,170
Whether that model is helping people
use their smartphones better,

52
00:02:53,170 --> 00:02:54,660
or their wearable watches,

53
00:02:54,660 --> 00:02:58,250
or whatever technology that's using
machine learning in the background.

54
00:02:58,250 --> 00:03:03,260
So sum squared errors or
some offline training,

55
00:03:03,260 --> 00:03:08,320
those matrix are really about
optimizing the model offline, and

56
00:03:08,320 --> 00:03:13,070
figuring out whether model is good,
and perhaps whether it can be updated.

57
00:03:13,070 --> 00:03:16,980
Now, the online matrix, let's say,
whose buying, the usage matrix,

58
00:03:16,980 --> 00:03:22,540
how is this changing, the bottom line for
my business, this is great about

59
00:03:22,540 --> 00:03:26,700
kind of choosing whether the old model is
better than this new model I've created.

60
00:03:26,700 --> 00:03:29,040
And let's talk a little bit about
what that process looks like.

61
00:03:30,440 --> 00:03:33,810
So the question here is,
should I update my old model

62
00:03:33,810 --> 00:03:36,910
with a new one that I learned,
will they have new data?

63
00:03:36,910 --> 00:03:38,890
And there's many questions around this.

64
00:03:38,890 --> 00:03:40,220
Why should I update?

65
00:03:40,220 --> 00:03:44,100
Why should I take what I've done before,
and change it with something new?

66
00:03:44,100 --> 00:03:48,480
And this has to do with the trends
in the world of change,

67
00:03:48,480 --> 00:03:51,930
new products have come in,
users tastes have changed.

68
00:03:51,930 --> 00:03:56,800
A fad like the chewy giraffe that we've
talked about goes out of fashion.

69
00:03:56,800 --> 00:03:58,060
Nobody else wants it.

70
00:03:58,060 --> 00:04:00,660
So we wanna change them all or
we are gonna update them.

71
00:04:00,660 --> 00:04:04,290
So that's what we have to do to say,
okay, this is why we should update it, but

72
00:04:04,290 --> 00:04:06,220
when do we update it,
when do we say, okay,

73
00:04:06,220 --> 00:04:11,116
it's time to take that old model and
switch it out and put in some new one.

74
00:04:11,116 --> 00:04:17,150
This is about tracking real world
statistics, it's not about intuition, this

75
00:04:17,150 --> 00:04:21,980
sounds like the right time, or talking to
some person who's not looking at data.

76
00:04:21,980 --> 00:04:24,760
Maybe some kind of intuitive
business analysis.

77
00:04:24,760 --> 00:04:26,690
This is really about data.

78
00:04:26,690 --> 00:04:30,660
So about tracking, those matrix that
we measure, those statistics, and

79
00:04:30,660 --> 00:04:33,750
really coming up with like a quantitative

80
00:04:35,450 --> 00:04:39,460
of quality as to say, things have changed,
it's time to update model.

81
00:04:39,460 --> 00:04:42,060
This is what's going to happen
when we update the model.

82
00:04:42,060 --> 00:04:44,800
And this combines those offline
metrics when we use the chain model.

83
00:04:44,800 --> 00:04:47,400
But really online metrics
that we're capturing.

84
00:04:47,400 --> 00:04:50,830
So let's talk about how
online metrics get used.

85
00:04:50,830 --> 00:04:55,388
One example how to choose between models
using online metrics is what's the idea of

86
00:04:55,388 --> 00:04:56,115
AB testing.

87
00:04:56,115 --> 00:04:59,370
Let's say you have two models,
Model 1 and Model 2.

88
00:04:59,370 --> 00:05:01,380
And I wanna figure out which
one is gonna be better,

89
00:05:01,380 --> 00:05:03,810
which one should I give to my system.

90
00:05:03,810 --> 00:05:07,510
So what I can do is give some of my
population, call them group A, let's say.

91
00:05:07,510 --> 00:05:11,141
Some of the people or
people from a certain geographic region,

92
00:05:11,141 --> 00:05:14,445
let's say people from the United States,
get Model 1.

93
00:05:14,445 --> 00:05:18,575
And then say people from
a different geographic region,

94
00:05:18,575 --> 00:05:21,423
say people from Canada, get Model 2.

95
00:05:21,423 --> 00:05:25,220
And so, you look at the behavior between
those two models, capture some metrics.

96
00:05:25,220 --> 00:05:30,945
And let's say that Model 1 does better,
sorry, Model 1 does worse.

97
00:05:30,945 --> 00:05:34,083
It only has 10% click through rate,
so CTR.

98
00:05:34,083 --> 00:05:37,825
So that means only 10% of the time,
people are buying the product.

99
00:05:37,825 --> 00:05:41,882
While with Model 2, it's amazing, 30% of
the time, people are buying the product.

100
00:05:41,882 --> 00:05:44,420
So the CTR, clicks through rate is 30%.

101
00:05:44,420 --> 00:05:46,950
So what do you do after you've
done this test is say okay,

102
00:05:46,950 --> 00:05:48,235
I've done the test enough.

103
00:05:48,235 --> 00:05:50,070
I've collected enough samples.

104
00:05:50,070 --> 00:05:53,140
Now, I'm gonna start serving
Model 2 instead of Model 1.

105
00:05:53,140 --> 00:05:55,341
Now, there are many other issues and

106
00:05:55,341 --> 00:05:58,043
caveats around ideas we talked about so
far.

107
00:05:58,043 --> 00:06:00,798
A/B testing,
deciding when it's time to switch a model,

108
00:06:00,798 --> 00:06:02,960
how much data you have to collect,
what to do.

109
00:06:02,960 --> 00:06:04,210
It's very tricky.

110
00:06:04,210 --> 00:06:06,980
And it requires a lot of thought,
and we will talk more about it

111
00:06:06,980 --> 00:06:11,310
towards the capstone, but really something
that you need to think about quite deeply.

112
00:06:11,310 --> 00:06:15,585
Now, also thinking about what version
of model we have, model one, model two,

113
00:06:15,585 --> 00:06:20,010
simplification, typically you have many
data scientists capturing their own

114
00:06:20,010 --> 00:06:22,980
models with their own ideas, the question
is how do they keep track of that?

115
00:06:22,980 --> 00:06:25,850
How do you know what data was
used to train different models?

116
00:06:25,850 --> 00:06:29,580
How do they keep track of how they are
performing which ones are performing well,

117
00:06:29,580 --> 00:06:30,870
which ones don't?

118
00:06:30,870 --> 00:06:32,520
Is it because of some fluke?

119
00:06:32,520 --> 00:06:35,890
Is it because of some real
property of the data?

120
00:06:35,890 --> 00:06:37,790
How do you monitor these dashboards?

121
00:06:37,790 --> 00:06:39,080
How do you come up with reports?

122
00:06:39,080 --> 00:06:40,370
Say okay, this is what's happening.

123
00:06:40,370 --> 00:06:43,100
This is what machinery is doing,
what difference it's making.

124
00:06:43,100 --> 00:06:45,090
All that can be quite complicated.

125
00:06:45,090 --> 00:06:49,290
And so, it's very important for
you to think about not just how do

126
00:06:49,290 --> 00:06:53,670
you use machinery in the algorithms,
how do you write your own method.

127
00:06:53,670 --> 00:06:54,550
How do you pick your features?

128
00:06:54,550 --> 00:06:57,923
But how do you keep track of that, and
make sure the models are working and

129
00:06:57,923 --> 00:07:01,094
providing the file that you want for
the system that you've built.

130
00:07:01,094 --> 00:07:05,109
[MUSIC]