In this video, I'd like to talk
about a new large-scale
machine learning setting called
the online learning setting.
The online learning setting
allows us to model problems
where we have a continuous flood
or a continuous stream of data
coming in and we would like
an algorithm to learn from that.
Today, many of the largest
websites, or many of the largest
website companies use different
versions of online learning
algorithms to learn from
the flood of users that keep
on coming to, back to the website.
Specifically, if you have
a continuous stream of data
generated by a continuous
stream of users coming to
your website, what you can
do is sometimes use an
online learning algorithm to learn
user preferences from the
stream of data and use that
to optimize some of the
decisions on your website.
Suppose you run a shipping service,
so, you know, users come and ask
you to help ship their package from
location A to location B and suppose
you run a website, where users
repeatedly come and they
tell you where they want
to send the package from, and
where they want to send it to
(so the origin and destination) and
your website offers to ship the package
for some asking price,
so I'll ship your package for $50,
I'll ship it for $20.
And based on the price
that you offer to the users,
the users sometimes chose to use a shipping service;
that's a positive example and
sometimes they go away and
they do not choose to
purchase your shipping service.
So let's say that we want
a learning algorithm to help us
to optimize what is the asking
price that we want to offer to our users.
And specifically, let's say we
come up with some sort of features
that capture properties of the users.
If we know anything about the demographics,
they capture, you know, the origin and
destination of the package, where they want to ship the package.
And what is the price
that we offer to them for shipping the package.
and what we want to do
is learn what is the
probability that they will
elect to ship the
package, using our
shipping service given these features, and
again just as a reminder these
features X also captures the price that we're asking for.
And so if we could
estimate the chance that they'll
agree to use our service
for any given price, then we
can try to pick
a price so that they
have a pretty high probability of
choosing our website while simultaneously
hopefully offering us a
fair return, offering us
a fair profit for shipping their package.
So if we can learn this property
of y equals 1 given
any price and given the other
features we could really
use this to choose appropriate
prices as new users come to us.
So in order to model
the probability of y equals 1,
what we can do is use
logistic regression or neural
network or some other algorithm like that.
But let's start with logistic regression.
Now if you have a
website that just runs continuously,
here's what an online learning algorithm would do.
I'm gonna write repeat forever.
This just means that our website
is going to, you know, keep on
staying up.
What happens on the website is
occasionally a user
will come and for
the user that comes we'll get
some x,y pair corresponding to
a customer or to a user on the website.
So the features x are, you
know, the origin and destination specified
by this user and the price
that we happened to offer to
them this time around, and
y is either one or
zero depending one whether or
not they chose to
use our shipping service.
Now once we get this {x,y}
pair, what an online
learning algorithm does is then
update the parameters theta
using just this example
x,y, and in particular
we would update my parameters theta
as Theta j get updated as Theta j
minus the learning rate alpha times
my usual gradient descent
rule for logistic regression.
So we do this for j
equals zero up to n,
and that's my close curly brace.
So, for other learning algorithms
instead of writing X-Y, right, I
was writing things like Xi,
Yi but
in this online learning setting
where actually discarding the notion
of there being a fixed training
set instead we have an algorithm.
Now what happens as we get
an example and then we
learn using that example like
so and then we throw that example away.
We discard that example and we
never use it again and
so that's why we just look at one example at a time.
We learn from that example.
We discard it.
Which is why, you know, we're
also doing away with this
notion of there being this
sort of fixed training set indexed by i.
And, if you really run
a major website where you
really have a continuous stream
of users coming, then this
sort of online learning algorithm
is actually a pretty reasonable algorithm.
Because of data is essentially
free if you have so
much data, that data
is essentially unlimited then there
is really may be no
need to look at a
training example more than once.
Of course if we had only
a small number of users then
rather than using an online learning
algorithm like this, you might
be better off saving away all
your data in a fixed training
set and then running some algorithm over that training set.
But if you really have a continuous
stream of data, then an
online learning algorithm can be very effective.
I should mention also that one
interesting effect of this sort
of online learning algorithm is
that it can adapt to changing user preferences.
And in particular, if over
time because of changes in
the economy maybe users
start to become more price
sensitive and willing to pay,
you know, less willing to pay high prices.
Or if they become less price sensitive and they're willing to pay higher prices.
Or if different things
become more important to users,
if you start to have new
types of users coming to your website.
This sort of online learning algorithm
can also adapt to changing
user preferences and kind
of keep track of what your
changing population of users
may be willing to pay for.
And it does that because if
your pool of users changes,
then these updates to your
parameters theta will just slowly adapt
your parameters to whatever your
latest pool of users looks like.
Here's another example of a
sort of application to which you might apply online learning.
this is an application in product
search in which we want to
apply learning algorithm to learn
to give good search listings to a user.
Let's say you run an online
store that sells phones - that
sells mobile phones or sells cell phones.
And you have a user interface
where a user can come to
your website and type in the
query like "Android phone 1080p camera".
So 1080p is a type
of a specification for a
video camera that you might
have on a phone, a cell phone, a mobile phone.
Suppose, suppose we have a hundred phones in our store.
And because of the way our
website is laid out, when
a user types in a query,
if it was a search query, we
would like to find a
choice of ten different phones to
show what to offer to the user.
What we'd like to do is have
a learning algorithm help us figure
out what are the ten phones
out of the 100 we
should return the user in response to
a user-search query like the one here.
Here's how we can go about the problem.
For each phone and given
a specific user query; we
can construct a feature vector
X. So the feature
vector X might capture different properties of the phone.
It might capture things like,
how similar the user search query is in the phones.
We capture things like how many
words in the user search
query match the name of
the phone, how many words
in the user search query match the description of the phone and so on.
So the features x capture
properties of the phone and
it captures things about how
similar or how well
the phone matches the user query along different dimensions.
What we like to do is
estimate the probability that a
user will click on the
link for a specific phone,
because we want to show
the user phones that they
are likely to want to
buy, want to show the user
phones that they have high
probability of clicking on in the web browser.
So I'm going to define y equals
one if the user clicks on
the link for a phone and
y equals zero otherwise and
what I would like to do is
learn the probability the user
will click on a specific
phone given, you know,
the features x, which capture properties
of the phone and how well the query matches the phone.
To give this problem a name
in the language of
people that run websites like
this, the problem of learning this is
actually called the problem of
learning the predicted click-through rate, the predicted CTR.
It just means learning the probability
that the user will click on
the specific link that you
offer them, so CTR is
an abbreviation for click through rate.
And if you can estimate the
predicted click-through rate for any
particular phone, what we
can do is use this to
show the user the ten phones
that are most likely to click on,
because out of the hundred phones,
we can compute this for
each of the 100 phones and
just select the 10 phones
that the user is most likely to click on,
and this will be a pretty reasonable
way to decide what ten results to show to the user.
Just to be clear, suppose that
every time a user does
a search, we return ten results
what that will do is it
will actually give us ten
x,y pairs, this actually
gives us ten training examples every
time a user comes to
our website because, because for
the ten phone that we chose
to show the user, for each
of those 10 phones we get
a feature vector X, and
for each of those 10 phones we
show the user we will also
get a value for y, we
will also observe the value
of y, depending on whether
or not we clicked on that
url or not and
so, one way to run a
website like this would be to
continuously show the user,
you know, your ten best guesses for
what other phones they might like
and so, each time a user
comes you would get ten
examples, ten x,y pairs,
and then use an online
learning algorithm to update the
parameters using essentially 10
steps of gradient descent on these
10 examples, and then
you can throw the data away, and
if you really have a continuous
stream of users coming to
your website, this would be
a pretty reasonable way to learn
parameters for your algorithm
so as to show the ten phones
to your users that may
be most promising and the most likely to click on.
So, this is a product search
problem or learning to rank
phones, learning to search for phones example.
So, I'll quickly mention a few others.
One is, if you have
a website and you're trying to
decide, you know, what special
offer to show the user,
this is very similar to phones,
or if you have a
website and you show different users different news articles.
So, if you're a news aggregator
website, then you can
again use a similar system to
select, to show to
the user, you know, what
are the news articles that they
are most likely to be interested
in and what are the news articles that they are most likely to click on.
Closely related to special offers, will we profit from recommendations.
And in fact, if you have
a collaborative filtering system, you
can even imagine a collaborative filtering
system giving you additional
features to feed into a
logistic regression classifier to try
to predict the click through
rate for different products that you might recommend to a user.
Of course, I should say that
any of these problems could also
have been formulated as a
standard machine learning problem, where you have a fixed training set.
Maybe, you can run your
website for a few days and
then save away a training set,
a fixed training set, and run
a learning algorithm on that.
But these are the actual
sorts of problems, where you do
see large companies get so
much data, that there's really
maybe no need to save away
a fixed training set, but instead
you can use an online learning algorithm to just learn continuously.
from the data that users are generating on your website.
So, that was the online
learning setting and as we
saw, the algorithm that we apply to
it is really very similar
to this schotastic gradient descent
algorithm, only instead of
scanning through a fixed
training set, we're instead getting
one example from a user,
learning from that example, then
discarding it and moving on.
And if you have a continuous
stream of data for some application,
this sort of algorithm may be
well worth considering for your application.
And of course, one advantage of
online learning is also that
if you have a changing pool
of users, or if the
things you're trying to predict are
slowly changing like your user
taste is slowly changing, the online
learning algorithm can slowly
adapt your learned hypothesis to
whatever the latest sets of
user behaviors are like as well.