1
00:00:00,000 --> 00:00:06,115
An equally interesting question might be
how we figure out what is the first word

2
00:00:06,115 --> 00:00:10,142
that comes to mind.
For example, what is the first word

3
00:00:10,142 --> 00:00:14,560
starting with an a?
Many of you might say apple.

4
00:00:15,860 --> 00:00:21,976
Are some words more important than others.
Is it just the common words which are more

5
00:00:21,976 --> 00:00:26,908
important than others.
In this context if you are talking about

6
00:00:26,908 --> 00:00:32,631
this course in this lecture, what's the
first word that comes to mind starting

7
00:00:32,631 --> 00:00:35,420
with G.
I bet many of you think Google.

8
00:00:38,660 --> 00:00:43,180
Does this, have anything to do with how
Google figures out?

9
00:00:43,680 --> 00:00:49,988
Which topics to include, in the top ten
documents in displays matching the query

10
00:00:49,988 --> 00:00:56,218
Clinton plays in their carts.
Importance of search results in google, as

11
00:00:56,218 --> 00:01:02,764
many of you may have read, is because of
an algorithm called pagerank which we will

12
00:01:02,764 --> 00:01:07,811
describe in a minute.
What we also want to ask is there anything

13
00:01:07,811 --> 00:01:11,360
deeper?
Which we will come to just, after that.

14
00:01:11,860 --> 00:01:18,403
So let's look at page [inaudible].
The web consists of documents which are

15
00:01:18,403 --> 00:01:24,761
linked to each other, through hyper-links.
And this was the initial structure of the

16
00:01:24,761 --> 00:01:29,586
web, in a way when it first became very
popular in the late 90s.

17
00:01:29,586 --> 00:01:35,561
And it remains so today, but we will come
to, come to that shortly, as to how it

18
00:01:35,561 --> 00:01:42,175
might be changing.
So what Brandon Page at Google, the

19
00:01:42,175 --> 00:01:50,310
founders of Google imagined while they
must have stand for was suppose there was

20
00:01:50,310 --> 00:01:58,224
a random surfer, who hops from page to
page, hyperlink to hyperlink At random.

21
00:01:58,224 --> 00:02:04,903
So at a page the surfer chooses at random
any of the links that go out from that

22
00:02:04,903 --> 00:02:08,696
page.
So the surfer is going from page to page,

23
00:02:08,696 --> 00:02:15,376
and the question that Sergie brings and
every page asks was; what is the relative

24
00:02:15,376 --> 00:02:21,890
probability of visiting a particular page?
So of all the pages on the web, which

25
00:02:21,890 --> 00:02:27,580
pages are more likely to visited by such a
random surfer than others?

26
00:02:28,100 --> 00:02:34,540
And that probability across all the pages
on the web is the page rank of that page.

27
00:02:35,020 --> 00:02:40,553
No.
It might appear that the number of

28
00:02:40,553 --> 00:02:47,400
hyperlinks going into a page.
Is sufficient to compute its page rank.

29
00:02:48,000 --> 00:02:57,178
Obviously if more links point to a page.
More likely it is that this random circle

30
00:02:57,178 --> 00:03:01,200
will reach there.
Question is, is this enough?

31
00:03:02,080 --> 00:03:10,400
It turns out that this is not enough.
And the answer is null because.

32
00:03:11,080 --> 00:03:16,560
Even if a page doesn't have many incoming
links.

33
00:03:16,840 --> 00:03:25,405
A surfer can revisit a page because of
cycles in the graph, so that The surfer

34
00:03:25,405 --> 00:03:31,306
will go and come back to the page through
variety of different roots maybe

35
00:03:31,306 --> 00:03:37,915
traversing the same link again and again
but because there are so many cycles which

36
00:03:37,915 --> 00:03:40,448
return back.
To the same page.

37
00:03:40,448 --> 00:03:46,586
A particular page can become important
even if it doesn't have a lot of incoming

38
00:03:46,586 --> 00:03:54,309
hyperlinks.
The point is that page rank is a global

39
00:03:54,309 --> 00:04:02,323
property of this web graph and cannot be
computed simply by looking at the number

40
00:04:02,323 --> 00:04:07,706
links of each page.
This is the second major computation that

41
00:04:07,706 --> 00:04:13,313
a search engine like Google has to do,
computing the page rank of each page, the

42
00:04:13,313 --> 00:04:16,600
first of course being indexing the web as
it grows.

43
00:04:17,620 --> 00:04:24,856
This page rank of each page is computed
iteratively continuously and in parallel

44
00:04:24,856 --> 00:04:29,680
on, as we shall see, thousands and
thousands of servers.

45
00:04:30,600 --> 00:04:35,561
For those of you who are slightly more
mathematically minded.

46
00:04:35,561 --> 00:04:41,255
The page rank is related to the
eigenvector of a particular adjacency

47
00:04:41,255 --> 00:04:45,159
matrix.
But we're not going to go into that math

48
00:04:45,159 --> 00:04:46,380
in this course.