So what we're, trying to discern.
Is intent.
What meaning is trying to be conveyed by
the writer, the speaker or the actor?
Well, we'll return to, the business of
trying to figure out what, meaning is
trying to be conveyed.
Next week, when we look at extracting
this, information from documents.
But for the moment, let's turn to trying
to find out other predictors of intent.
Which might not have anything to do with
the content of a document, which is being
read.
But, by the actions taken to retrieve the
document by searching for it.
For example, suppose we are searching for
flower, red, gift, cheap, I mean you are
searching using some combination of these
keyword, and the web property, which is
Google for example, needs to decide
whether or not to show you some ads.
In other words, are you a surfer or a
shopper?
Are you, interested in buying something or
are you just browsing?
Let's see what I get.
When I search for cheap flowers.
Clearly, it thinks that I'm trying to find
some flowers to buy.
On the other hand, if I just search for
flowers and red, it shows me a whole bunch
of red flowers, and no hats.
Somehow, Google has figured out that if
I'm searching for red flowers, I'm most
likely trying to find out some information
about flowers, rather than buy some
flowers.
How might it have figured this out?
One way is by looking at past history.
What the people searching about red
flowers normally do?
Do they buy stuff or do they not?
On the other hand, people searching for
cheap flowers or gift using flowers, do
they buy or don't they?
Learning from past experiences is
something that we do all our lives.
The field of machine learning is all about
teaching computers, how to learn form
their past experience or from past data
and as you all know this is the heart of
web intelligence and is what big data
analytics is really all about.
We'll introduce machine learning very
simply, in a very basic manner.
By looking at the simple example of how
one might guess whether or not to put an
ad, based on the past behavior of many,
many searchers using just these four
keywords.
So let's suppose we have all the
historical data.
For example, somebody who used the
keywords red, flower, gift, and not cheap,
and did not buy something.
On the other hand, somebody used the terms
red and cheap but did buy something.
And at the same time there may be people
using exactly the same combination such as
red flower gift.
But these, they bought something where as
in this case they didn't buy anything.
So you have all this data from which you
want to learn whether or not you should
put an ad given the fact that somebody is
searching with some combination of these
four keywords.
In the language of probability theory,
what we really want to figure out is the
probability of a by-action, given values
yes or no, for whether or not the words,
red, flowers, gift and cheap are present
or absent in the query.
In other words, we're trying to find the
conditional probability of a buy given the
values of these four random variables.
Let's see what we really need to do.
For each combination of keywords being
present or absent.
For example, all yes, three yes and a no.
Two no and two yes.
We have the probability that there is a
buy and we would also like to figure out
the probability for the same combination
that there is not a buy.
Let's see how we might do this, using the
data that we have from past history.
Notice that this is a summary table with
one entry, for each combination, whereas
our historical data probably has millions
of entries for each combination.
So this probability is computed by adding
up, appropriately, the data from our
historical transactions.
Let's look at this pictorially.
Suppose we have N instances or N
historical queries.
In, our cases they had, the keyword red.
In F cases, they had the keyword flower.
Similarly G for gift and C for cheap.
And for K cases there was actually a buy
action.
And for N - K cases there was not a buy
action.
Well, let's try to find out what the
conditional probability of a buy action is
given that query had all the keywords
present.
So, the query lies in this piece of this
diagram, because all these ovals are
overlapping for I cases.
The denominator is the set of all
transactions which actually had a yes for
R, F, G, and C.
So, it's R + F + G + C.
So, this becomes the conditional
probability of a buy given R, F, G, and C
are all yes.
Similarly, this part of the diagram J
indicates those transactions that the
number of transactions which had a no for
exactly the same combination R, F, G, and
C being yes.
So J / R + F + G + C is the conditional
probability that buy = no given that
combination all yes.
So it appears all we need to figure out is
all these values for every possible
combination and we should be able to
decide whether or not to put an add in
front of a particular query sequence.
The trouble is the number of such
combinations can be quite large.
How many do you think there are for just
four keywords?
Obviously, just sixteen.
But suppose we had 1000 key words that
suddenly becomes a very, very large
number.
Even for a few 100 key words it becomes
extremely difficult to compute all these
conditional probabilities.
More importantly, when the number of
keywords gets really large, there will be
many combinations for which, you never
actually have any history.
For example, you're quite unlikely to find
a query, which has red flowers and say map
produce where intelligence all in one
query you'll just not have such history.
Nobody would be searching for this
combination.
So even if you had infinite computing
power, you simply don't have enough
history to compute all entries in this
conditional probability table.