1 00:00:00,000 --> 00:00:04,848 [MUSIC] 2 00:00:04,848 --> 00:00:07,940 Now it seems to cast a gradient which is really exciting. 3 00:00:07,940 --> 00:00:09,180 Simple algorithm, 4 00:00:09,180 --> 00:00:12,990 simple modification to gradient, which really speeds up in practice. 5 00:00:12,990 --> 00:00:16,590 Has many practical challenges, and we talked about several of those, and 6 00:00:16,590 --> 00:00:17,860 how to address them. 7 00:00:17,860 --> 00:00:21,500 But now, I would like to step back, and think about a broader question, 8 00:00:21,500 --> 00:00:26,140 what's called online learning, of how do we learn from streaming data. 9 00:00:26,140 --> 00:00:30,590 And we see that is one way to learn from data that arrives over time or 10 00:00:30,590 --> 00:00:31,300 streaming data. 11 00:00:32,700 --> 00:00:34,990 Let's define the idea of online learning. 12 00:00:34,990 --> 00:00:37,170 But first, let's look at what we've been doing so far. 13 00:00:37,170 --> 00:00:39,980 What we've been doing so far in this course, and 14 00:00:39,980 --> 00:00:42,420 in the regression course, is what's called batch learning. 15 00:00:42,420 --> 00:00:44,440 I'm given the full data set. 16 00:00:44,440 --> 00:00:47,520 And I'm going to run some machine algorithm over this data set, 17 00:00:47,520 --> 00:00:49,740 maybe gradient, and do many pass over the data. 18 00:00:49,740 --> 00:00:55,030 And finally output my best guess, my best estimate, 19 00:00:55,030 --> 00:00:58,540 for the coefficients, and we're going to call that W hot and we're done. 20 00:00:58,540 --> 00:01:00,280 That's batch learning. 21 00:01:00,280 --> 00:01:01,660 Online learning is something different. 22 00:01:01,660 --> 00:01:04,380 Actually, what you are doing here is online learning. 23 00:01:04,380 --> 00:01:06,470 But that's a different kind of online learning. 24 00:01:06,470 --> 00:01:10,080 What we're talking about here is online machine learning. 25 00:01:10,080 --> 00:01:15,200 And in online machine learning, data raise over time, one data point at a time. 26 00:01:15,200 --> 00:01:19,711 So, for example, as we'll see next, ad serving ads on web pages, 27 00:01:19,711 --> 00:01:22,740 is an example, where your things are arriving one data point at a time. 28 00:01:22,740 --> 00:01:25,130 And so, that's where data is coming in. 29 00:01:25,130 --> 00:01:29,160 And your machine learning algorithm, sees a little trench of that data, 30 00:01:29,160 --> 00:01:30,010 one little bit. 31 00:01:30,010 --> 00:01:33,048 Let's say, a timesstamp one, takes it in, and 32 00:01:33,048 --> 00:01:36,320 makes an estimate of the coefficient, say w hat 1. 33 00:01:37,610 --> 00:01:40,310 And the timestamp two, this is another little bit of the data, 34 00:01:40,310 --> 00:01:44,550 and makes another estimate of the coefficient w hat 2. 35 00:01:44,550 --> 00:01:47,748 And the timestamp three, it makes another estimate w hat 3. 36 00:01:47,748 --> 00:01:51,470 Timestamp four, a little more data and makes an estimate w hat4. 37 00:01:51,470 --> 00:01:56,338 So every timestamp is making a new estimate, so it can make new predictions. 38 00:01:56,338 --> 00:02:01,850 To better the ideas, let's look at really practical real world 39 00:02:01,850 --> 00:02:06,690 example of where online learning makes a huge difference, and it's on ad targeting. 40 00:02:06,690 --> 00:02:10,710 So let's see on navigating the web and you hit the particular website, 41 00:02:10,710 --> 00:02:13,740 what's happening behind the scenes when you're shown ads? 42 00:02:13,740 --> 00:02:18,410 Well some information about you, like your age, or the websites you've visited, and 43 00:02:18,410 --> 00:02:22,980 some of the information about the website, like the text of the website, are fed 44 00:02:22,980 --> 00:02:27,190 into a machine learning algorithm, that's going to use some set of quotations, 45 00:02:27,190 --> 00:02:32,090 w hat t, to figure out what's the best ads to show you. 46 00:02:32,090 --> 00:02:34,990 And we're going to call that y hat suggested ads. 47 00:02:34,990 --> 00:02:38,050 It might show you ad 1, ad 2, ad 3, and so on. 48 00:02:38,050 --> 00:02:39,320 And then, look at the website. 49 00:02:39,320 --> 00:02:41,710 You're like, cool, that's a really interesting ad. 50 00:02:41,710 --> 00:02:43,860 And you go and you click on ad two. 51 00:02:43,860 --> 00:02:46,173 Well, when you click on ad two, 52 00:02:46,173 --> 00:02:51,320 the machine learning algorithm figures out that you clicked on ad two, 53 00:02:51,320 --> 00:02:56,510 and assigns true label, for website ad two. 54 00:02:56,510 --> 00:02:58,156 That's where you clicked on. 55 00:02:58,156 --> 00:03:02,786 And then the machine learning algorithm takes the and 56 00:03:02,786 --> 00:03:07,070 updates its coefficient from w high t to w high t plus one. 57 00:03:07,070 --> 00:03:09,530 And what we describe so 58 00:03:09,530 --> 00:03:14,410 far, is really how ad systems work, a lot of them work in practice. 59 00:03:14,410 --> 00:03:19,238 So this is a little bit of an is really something that makes a big difference in 60 00:03:19,238 --> 00:03:20,190 the real world. 61 00:03:20,190 --> 00:03:24,169 [MUSIC]