Now let's ask how fast the procedure that we just described for index creation actually is. Lets assume there are n documents, m words and on the average, w words per document. For example, in the case of the web, n will be a really large number, maybe in the millions, billions or trillions. M will be another large number, perhaps not as large as n. But quite large nevertheless. And W may or may not be very large because every document may have a few thousand or a few hundred thousand words but certainly not billions of words. Let's see now what the answer is. We have an index structure here, we have our documents and we assume we have N documents M total possible words and on the average twenty words per document. First thing we notice is that every word in each document needs to be read at-least once, so the complexity is at-least order nw. Additionally as each word is read, we need to look up this sort of structure of words to figure out whether or not it has been already inserted before. The cost of looking this kind of a structure up is, log in, as we've seen earlier. For example, if a balanced binary tree is used to store these words. Further, we must insert the URL of the document against the word if we happen to find it already in the index. Else, we must insert an entry for that word if we don't find it. And then insert the document against it. Now we claim that each of these is just a constant cost per word. Obviously the second part which is inserting a blank entry for a word like worm with... With just one fresh document is clearly a constant cost. Once we know where to insert the word, worm. But suppose the word already exist and there are thousands of document already against it, then it is not that clear that it is constant cost to insert. The fresh ID in a long list of IDs. So there's an important assumption. That this is going be a constant cost. And the reason for that assumption we'll come to in a later lecture. For the moment we'll go ahead with that assumption. If either of you missed it, in the homework, and got confused that's okay because you were thinking pretty well. But for the moment we're gonna make this assumption. And therefore the complexity of our procedure is order N W log M. Using a balance bynatry destroy the words and assuming that entering a fresh document in an existing list against some word is a constant cost.