An equally interesting question might be
how we figure out what is the first word
that comes to mind.
For example, what is the first word
starting with an a?
Many of you might say apple.
Are some words more important than others.
Is it just the common words which are more
important than others.
In this context if you are talking about
this course in this lecture, what's the
first word that comes to mind starting
with G.
I bet many of you think Google.
Does this, have anything to do with how
Google figures out?
Which topics to include, in the top ten
documents in displays matching the query
Clinton plays in their carts.
Importance of search results in google, as
many of you may have read, is because of
an algorithm called pagerank which we will
describe in a minute.
What we also want to ask is there anything
deeper?
Which we will come to just, after that.
So let's look at page [inaudible].
The web consists of documents which are
linked to each other, through hyper-links.
And this was the initial structure of the
web, in a way when it first became very
popular in the late 90s.
And it remains so today, but we will come
to, come to that shortly, as to how it
might be changing.
So what Brandon Page at Google, the
founders of Google imagined while they
must have stand for was suppose there was
a random surfer, who hops from page to
page, hyperlink to hyperlink At random.
So at a page the surfer chooses at random
any of the links that go out from that
page.
So the surfer is going from page to page,
and the question that Sergie brings and
every page asks was; what is the relative
probability of visiting a particular page?
So of all the pages on the web, which
pages are more likely to visited by such a
random surfer than others?
And that probability across all the pages
on the web is the page rank of that page.
No.
It might appear that the number of
hyperlinks going into a page.
Is sufficient to compute its page rank.
Obviously if more links point to a page.
More likely it is that this random circle
will reach there.
Question is, is this enough?
It turns out that this is not enough.
And the answer is null because.
Even if a page doesn't have many incoming
links.
A surfer can revisit a page because of
cycles in the graph, so that The surfer
will go and come back to the page through
variety of different roots maybe
traversing the same link again and again
but because there are so many cycles which
return back.
To the same page.
A particular page can become important
even if it doesn't have a lot of incoming
hyperlinks.
The point is that page rank is a global
property of this web graph and cannot be
computed simply by looking at the number
links of each page.
This is the second major computation that
a search engine like Google has to do,
computing the page rank of each page, the
first of course being indexing the web as
it grows.
This page rank of each page is computed
iteratively continuously and in parallel
on, as we shall see, thousands and
thousands of servers.
For those of you who are slightly more
mathematically minded.
The page rank is related to the
eigenvector of a particular adjacency
matrix.
But we're not going to go into that math
in this course.