[MUSIC] And this is an example of
an online learning problem. Data is arriving over time. You see an import xi and
you need to make a prediction, y hat i. So, the input may be texting the web
page information about you and y hat i might be a prediction
about what ads you might click on. And then, given what happens in the real
world, whether you're clicking an ad, in which case yt might be ad two,
or you don't click on anything, in which case yt would be none of
the above, no ad was good for you. Whatever that is gets fed into a machine
learning algorithm that improves its coefficients, so
it can improve its performance over time. The question is how do we design a machine
learning algorithm that behaves like this? What's a good example of
a machine learning algorithm? It can improve its performance over
time in an online fashion like this. And it turns out that we've seen one. Stochastic gradient. Stochastic gradient is a learning
algorithm that can be used for online learning. So let's review it. You give me some initial
set of coefficients, say everything is equal to zero. Every time we stop, you get some input xi. You can make a prediction y hat t based on
your current estimate of the coefficients. And then,
you're given that true label, yt, and you want to feed
those into an algorithm. Well, stochastic gradients will take those
inputs and use it to compute the gradient, and then just update the coefficients, so w j t + 1 is going to be w j t,
+ eta times the gradient, which is computed from these observed
quantities in the real world. So, online learning is a different kind
of learning that we haven't talked at all about in the specialization but
it's really important to practice. So when data arrives over time and you need to make a decision right
away of what to do with it. But based on that decision,
you're going to get some feedback and you're going to update the parameters
immediately and keep going. This online learning approach, where you
update the parameters immediately as you see some information in the real world,
can be extremely useful. So, for example,
your model is always up to date. It's always based on the latest data,
latest information in the world. It can have lower computational
cost because you can use techniques like stochastic gradient
that don't have to look at all the data. And in fact, you don't even have to
store all the data if it's too massive. However, most people do store the data
because they might want to use it later. So that's a side note. But you don't have to. However it has some really
difficult practical properties. So this system that you have to build,
the actual design of how data interacts with
the world and where problems get stored, the coefficients get stored and all
that is really complex and complicated. It's hard to maintain. If you have oscillations in your
machine learning algorithm, it can do really stupid things, and nobody
want their website to do stupid things. And you don't really trust those machinery
stochastic gradient updates, necessarily. Sometimes it can give you bad predictions. And so, in practice, most companies
don't do something like this. They, what they do is they save their
data for a little while and update their models with the data from last hour or
from the last day or from the last week. It's very common. So it's very common, for example,
for a large like retailer, to every night change its recommender system and
run a big service every night to do that. And you can think about that as
an extreme version of mini-batches that we talked about earlier in this module. But now the batch is the whole
data from the whole day. For you,
it will be those 5 billion page views. [MUSIC]