[MUSIC] And this is an example of an online learning problem. Data is arriving over time. You see an import xi and you need to make a prediction, y hat i. So, the input may be texting the web page information about you and y hat i might be a prediction about what ads you might click on. And then, given what happens in the real world, whether you're clicking an ad, in which case yt might be ad two, or you don't click on anything, in which case yt would be none of the above, no ad was good for you. Whatever that is gets fed into a machine learning algorithm that improves its coefficients, so it can improve its performance over time. The question is how do we design a machine learning algorithm that behaves like this? What's a good example of a machine learning algorithm? It can improve its performance over time in an online fashion like this. And it turns out that we've seen one. Stochastic gradient. Stochastic gradient is a learning algorithm that can be used for online learning. So let's review it. You give me some initial set of coefficients, say everything is equal to zero. Every time we stop, you get some input xi. You can make a prediction y hat t based on your current estimate of the coefficients. And then, you're given that true label, yt, and you want to feed those into an algorithm. Well, stochastic gradients will take those inputs and use it to compute the gradient, and then just update the coefficients, so w j t + 1 is going to be w j t, + eta times the gradient, which is computed from these observed quantities in the real world. So, online learning is a different kind of learning that we haven't talked at all about in the specialization but it's really important to practice. So when data arrives over time and you need to make a decision right away of what to do with it. But based on that decision, you're going to get some feedback and you're going to update the parameters immediately and keep going. This online learning approach, where you update the parameters immediately as you see some information in the real world, can be extremely useful. So, for example, your model is always up to date. It's always based on the latest data, latest information in the world. It can have lower computational cost because you can use techniques like stochastic gradient that don't have to look at all the data. And in fact, you don't even have to store all the data if it's too massive. However, most people do store the data because they might want to use it later. So that's a side note. But you don't have to. However it has some really difficult practical properties. So this system that you have to build, the actual design of how data interacts with the world and where problems get stored, the coefficients get stored and all that is really complex and complicated. It's hard to maintain. If you have oscillations in your machine learning algorithm, it can do really stupid things, and nobody want their website to do stupid things. And you don't really trust those machinery stochastic gradient updates, necessarily. Sometimes it can give you bad predictions. And so, in practice, most companies don't do something like this. They, what they do is they save their data for a little while and update their models with the data from last hour or from the last day or from the last week. It's very common. So it's very common, for example, for a large like retailer, to every night change its recommender system and run a big service every night to do that. And you can think about that as an extreme version of mini-batches that we talked about earlier in this module. But now the batch is the whole data from the whole day. For you, it will be those 5 billion page views. [MUSIC]