[MUSIC] In this module we're going to address
a really important problem with machine learning today. How to scale the algorithms we
discussed to really large data sets. The ideas we discuss today
are going to be broadly applicable. They're going to be applicable for
all of classification. They're also going to be applicable
even for the regression course, or the second course of the specialization. We're going to talk about the technique
called stochastic gradient, and when you relate it to something called
online learning, where you're learning from data streams where one piece
of data is arriving at a time. Let's take a moment to review how
gradient ascent works in the context of classification, and
how it's impacted by the data set size. So we have a very large data set here, and we have a set of coefficients w(t)
which we're hoping to update. So we're going to use gradient ascent. We're going to compute the gradient on
this data set which requires us to make a pass or scan over the data, computing the contribution of each one
of these data points to the gradient. Then we compute the gradient, and
we go ahead and update the parameters. Then we get W (t+1). Now we have to go back to data set, and
make another pass where we visit every single data point, and
we go computer a new gradient and update the parameters,
the coefficients, and get W (t+2). So, in this process every time we're
going to do a coefficient update we're going to have to make a full scan or
a full pass over the entire data set, which can be really slow if
the data set is really big. And these days data sets are getting huge. You can think about the 4.8 billion
webpages that are out there on the web. You can think about the fact that Twitter,
for example, is generating 500 million tweets a day. That's a lot. By the way follow me on Twitter. [LAUGH] Or you can think about again
how the world is getting embedded with sensors which is something we today
call internet of things where you have devices throughout the homes, devices we
carry with us like the smart watch here. And everything else connected to
each other generating tons and tons and tons of data. And you can think about specific
websites like YouTube where 300 hours of video uploaded
every minute and nobody watches. And this is a really fundamental
problem for machine learning. How to tackle this huge,
massive data sets. Let's use YouTube as an example. We have this tons of
videos being uploaded, and there's a billion users
who are visiting this website. And YouTube makes money
out of ad revenue or showing the right ad to
each one of it's users. Now, the number you should be thinking
about is not 300 hours of video a minute or 1 billion users, but if you were
thinking about the 4 billion page views or video views that they have everyday. So for each one of these page
views they have to serve ads. They have to figure out what ads
to put with those videos, and they have to go back and
retrain their learning algorithm. In other words they need the machine and
the algorithm to figure out what ad to show that can
deal with 5 billion events per day, and that's fast enough that you can
make predictions as to what ad to show within milliseconds as
you're going to watch those videos. [MUSIC]