So what we're, trying to discern. Is intent. What meaning is trying to be conveyed by the writer, the speaker or the actor? Well, we'll return to, the business of trying to figure out what, meaning is trying to be conveyed. Next week, when we look at extracting this, information from documents. But for the moment, let's turn to trying to find out other predictors of intent. Which might not have anything to do with the content of a document, which is being read. But, by the actions taken to retrieve the document by searching for it. For example, suppose we are searching for flower, red, gift, cheap, I mean you are searching using some combination of these keyword, and the web property, which is Google for example, needs to decide whether or not to show you some ads. In other words, are you a surfer or a shopper? Are you, interested in buying something or are you just browsing? Let's see what I get. When I search for cheap flowers. Clearly, it thinks that I'm trying to find some flowers to buy. On the other hand, if I just search for flowers and red, it shows me a whole bunch of red flowers, and no hats. Somehow, Google has figured out that if I'm searching for red flowers, I'm most likely trying to find out some information about flowers, rather than buy some flowers. How might it have figured this out? One way is by looking at past history. What the people searching about red flowers normally do? Do they buy stuff or do they not? On the other hand, people searching for cheap flowers or gift using flowers, do they buy or don't they? Learning from past experiences is something that we do all our lives. The field of machine learning is all about teaching computers, how to learn form their past experience or from past data and as you all know this is the heart of web intelligence and is what big data analytics is really all about. We'll introduce machine learning very simply, in a very basic manner. By looking at the simple example of how one might guess whether or not to put an ad, based on the past behavior of many, many searchers using just these four keywords. So let's suppose we have all the historical data. For example, somebody who used the keywords red, flower, gift, and not cheap, and did not buy something. On the other hand, somebody used the terms red and cheap but did buy something. And at the same time there may be people using exactly the same combination such as red flower gift. But these, they bought something where as in this case they didn't buy anything. So you have all this data from which you want to learn whether or not you should put an ad given the fact that somebody is searching with some combination of these four keywords. In the language of probability theory, what we really want to figure out is the probability of a by-action, given values yes or no, for whether or not the words, red, flowers, gift and cheap are present or absent in the query. In other words, we're trying to find the conditional probability of a buy given the values of these four random variables. Let's see what we really need to do. For each combination of keywords being present or absent. For example, all yes, three yes and a no. Two no and two yes. We have the probability that there is a buy and we would also like to figure out the probability for the same combination that there is not a buy. Let's see how we might do this, using the data that we have from past history. Notice that this is a summary table with one entry, for each combination, whereas our historical data probably has millions of entries for each combination. So this probability is computed by adding up, appropriately, the data from our historical transactions. Let's look at this pictorially. Suppose we have N instances or N historical queries. In, our cases they had, the keyword red. In F cases, they had the keyword flower. Similarly G for gift and C for cheap. And for K cases there was actually a buy action. And for N - K cases there was not a buy action. Well, let's try to find out what the conditional probability of a buy action is given that query had all the keywords present. So, the query lies in this piece of this diagram, because all these ovals are overlapping for I cases. The denominator is the set of all transactions which actually had a yes for R, F, G, and C. So, it's R + F + G + C. So, this becomes the conditional probability of a buy given R, F, G, and C are all yes. Similarly, this part of the diagram J indicates those transactions that the number of transactions which had a no for exactly the same combination R, F, G, and C being yes. So J / R + F + G + C is the conditional probability that buy = no given that combination all yes. So it appears all we need to figure out is all these values for every possible combination and we should be able to decide whether or not to put an add in front of a particular query sequence. The trouble is the number of such combinations can be quite large. How many do you think there are for just four keywords? Obviously, just sixteen. But suppose we had 1000 key words that suddenly becomes a very, very large number. Even for a few 100 key words it becomes extremely difficult to compute all these conditional probabilities. More importantly, when the number of keywords gets really large, there will be many combinations for which, you never actually have any history. For example, you're quite unlikely to find a query, which has red flowers and say map produce where intelligence all in one query you'll just not have such history. Nobody would be searching for this combination. So even if you had infinite computing power, you simply don't have enough history to compute all entries in this conditional probability table.