1
00:00:00,000 --> 00:00:06,640
This week we begin to talk about Connect,
Which is how we connect the dots and make

2
00:00:06,640 --> 00:00:11,800
sense of the world.
How to go beyond learning to reasoning,

3
00:00:11,800 --> 00:00:18,489
and why reasoning is needed, beyond simple
learning as we have covered last week.

4
00:00:18,489 --> 00:00:23,608
This leads us into logic.
As well as its limits both fundamental as

5
00:00:23,608 --> 00:00:29,305
well as those arising from the uncertain
nature of the facts and rules that we

6
00:00:29,305 --> 00:00:35,074
learn about the world so we will talk
about reasoning under uncertainty in some

7
00:00:35,074 --> 00:00:38,938
detail this week.
And then, come full circle, back to

8
00:00:38,938 --> 00:00:44,992
learning, where some of the techniques
that we'll study, back to Bayes rule and

9
00:00:44,992 --> 00:00:50,581
things like that again, will help us to
learn better this time from text.

10
00:00:50,581 --> 00:00:54,849
So here we go.
To motivate why we might need to connect

11
00:00:54,849 --> 00:01:01,059
the dots and go beyond mere learning and
search, consider the following question.

12
00:01:01,059 --> 00:01:06,779
Who is the leader of the USA?
And consider asking this question of

13
00:01:06,779 --> 00:01:11,000
search engine or any web intelligence
system.

14
00:01:12,940 --> 00:01:18,926
The system might be aware of some facts,
such as x is the prime minister of some

15
00:01:18,926 --> 00:01:22,668
country, c.
X is the president of another country, c.

16
00:01:22,668 --> 00:01:26,560
And many such facts for different values
of x and c.

17
00:01:27,680 --> 00:01:32,135
But, there is no such fact that X is the
leader of the USA.

18
00:01:32,135 --> 00:01:37,743
For example, we might have learned many
such facts by looking at text and

19
00:01:37,743 --> 00:01:43,889
extracting them, from textual documents,
something that we'll come to, towards the

20
00:01:43,889 --> 00:01:48,421
end of this week.
But, for the moment assume that we do have

21
00:01:48,421 --> 00:01:54,260
many such facts, but there is no such
fact, for X being the leader of the USA.

22
00:01:55,620 --> 00:02:02,817
Somehow we haven't learned this because we
only learn the facts about specific posts

23
00:02:02,817 --> 00:02:09,761
like president or prime minister so now
what well if X is the president of C then

24
00:02:09,761 --> 00:02:17,109
X is the leader of C.
The system might know such facts or rules

25
00:02:17,109 --> 00:02:25,104
which constitute its knowledge.
As a result, combining of facts, such as

26
00:02:25,104 --> 00:02:31,717
Obama is the President of the U.S.A., the
system might be able to conclude that

27
00:02:31,717 --> 00:02:38,110
Obama is the leader of the U.S.A.
This is an example of reasoning.

28
00:02:38,110 --> 00:02:45,068
Taking facts and knowledge which is rules
and combining facts and knowledge to come

29
00:02:45,068 --> 00:02:49,640
up with new facts.
But reasoning can be pretty.

30
00:02:50,720 --> 00:02:54,560
Manmohan Singh for example is the prime
minister of India.

31
00:02:55,940 --> 00:02:58,973
But Pranab Mukherjee is the President of
India.

32
00:02:58,973 --> 00:03:02,200
India has a prime minister as well as a
president.

33
00:03:03,180 --> 00:03:09,036
So who's the leader of India.
You need more facts, and rules, to figure

34
00:03:09,036 --> 00:03:13,640
this out.
Much more knowledge is there for needed,

35
00:03:13,640 --> 00:03:20,444
for example one might need to know that in
India the president is a ceremonial post

36
00:03:20,444 --> 00:03:26,763
whereas a prime minister is a leader.
In other countries like France it is the

37
00:03:26,763 --> 00:03:32,190
president who is leader.
So knowledge is not necessarily static and

38
00:03:32,190 --> 00:03:38,589
can lead to confusions if one doesn't
understand the semantics of knowledge, so

39
00:03:38,589 --> 00:03:42,640
reasoning is not as simple as it appears
at first.

40
00:03:43,480 --> 00:03:49,627
Lets take a look at a few more examples,
to really understand how deep the problems

41
00:03:49,627 --> 00:03:56,331
with reasoning can actually become.
We've seen this example, a few weeks ago.

42
00:03:56,331 --> 00:04:01,540
Book me an American flight to New York, as
soon as possible.

43
00:04:01,940 --> 00:04:08,412
Does the questioner or requester want a
flight on American Airlines or on any

44
00:04:08,412 --> 00:04:13,142
American carrier.
It might depend on where that person is.

45
00:04:13,142 --> 00:04:20,113
If he's in London any American carrier but
if he's in New York or rather not in New

46
00:04:20,113 --> 00:04:26,420
York but in, in Boston he might definitely
mean the American Airlines flight.

47
00:04:28,280 --> 00:04:34,098
This New Yorker, who fought at the Battle
of Gettysburg, was once considered the

48
00:04:34,098 --> 00:04:38,296
inventor of baseball.
This is a question posed to the IBM

49
00:04:38,296 --> 00:04:42,200
program Watson during the Jeopardy
challenge of 2009.

50
00:04:43,060 --> 00:04:47,122
There are two possible answers if you look
at the web.

51
00:04:47,122 --> 00:04:51,110
Alexander Cartwright, who wrote the rules
of baseball.

52
00:04:51,110 --> 00:04:55,999
Or Abner Doubleday.
It turns out that its Abner Doubleday,

53
00:04:55,999 --> 00:05:01,645
because this person actually fought at
Gettysburg, and Watson got it right.

54
00:05:01,645 --> 00:05:07,368
So Watson had to reason many different
facts, including the fact that Abner

55
00:05:07,368 --> 00:05:13,319
Doubleday also contributed to the rules of
baseball, and in addition, fought at

56
00:05:13,319 --> 00:05:16,752
Gettysburg.
So these two things had to be put

57
00:05:16,752 --> 00:05:20,415
together.
Watson had to connect the dots, put two

58
00:05:20,415 --> 00:05:25,680
and two together to make this conclusion
and get this question right.

59
00:05:26,040 --> 00:05:31,162
I think of a more difficult question like,
who is the Tony of USA?

60
00:05:31,162 --> 00:05:38,322
Those of you who are not from India.
Tony is the cricket captain of India, so

61
00:05:38,322 --> 00:05:45,414
this question is really asking a very deep
question, in terms of, who is the

62
00:05:45,414 --> 00:05:52,980
equivalent of the cricket captain of USA.
Cricket is not really played in the US.

63
00:05:53,280 --> 00:05:58,360
So what's the equivalent of cricket
anywhere, baseball probably.

64
00:06:00,100 --> 00:06:04,120
So, this is an example of, analogical
reasoning.

65
00:06:04,120 --> 00:06:09,888
So, x is to U.S.A., what cricket is to
India, would give us baseball.

66
00:06:09,888 --> 00:06:15,127
But, trouble is.
There is no US baseball team.

67
00:06:15,127 --> 00:06:20,707
So there, given that first step of
reasoning doesn't seem to work, so one

68
00:06:20,707 --> 00:06:24,844
needs to go beyond.
Deductive reasoning to what is called

69
00:06:24,844 --> 00:06:29,051
abductive reasoning.
In the sense that one needs to find out

70
00:06:29,051 --> 00:06:33,679
the best possible answer.
Who is the most popular sportsman in the

71
00:06:33,679 --> 00:06:36,975
USA?
And there may be many popular sportsmen in

72
00:06:36,975 --> 00:06:42,514
the USA, so one is trying to find the best
possible answer from a probabilistic

73
00:06:42,514 --> 00:06:46,300
perspective.
This is an example of abductive reasoning,

74
00:06:46,300 --> 00:06:51,700
as opposed to deductive reasoning, and
we'll come across this later this week.

75
00:06:53,120 --> 00:06:57,510
Furth, further this is an example of
reasoning under uncertainty.

76
00:06:57,510 --> 00:07:02,207
Most popular is not, given in any one web
page or any one statement.

77
00:07:02,207 --> 00:07:08,165
One needs to come to a conclusion based on
a probabilistic assessment, of who appears

78
00:07:08,165 --> 00:07:14,247
to be most popular, using some measures.
So this is an example, uncertain reasoning

79
00:07:14,247 --> 00:07:18,072
as well.
The idea of adding reasoning to the web,

80
00:07:18,072 --> 00:07:24,369
or web intelligence systems, is credited
to Tim Berners-Lee who, if you remember,

81
00:07:24,369 --> 00:07:30,824
is actually the, credited as being the
inventor of the web in the first place way

82
00:07:30,824 --> 00:07:36,005
back in the early'90's.
In 2000, Tim Berners-Lee came out with his

83
00:07:36,005 --> 00:07:39,990
vision for a semantic web, where instead
of having.

84
00:07:39,990 --> 00:07:46,074
Simple pages of text which could only be
understood by human readers, one would

85
00:07:46,074 --> 00:07:49,164
have.
Linked to data on the web.

86
00:07:49,164 --> 00:07:53,473
So it's not just text, but data which are
facts.

87
00:07:53,473 --> 00:07:57,600
Like, Obama is the President of the
U.S.A., or.

88
00:07:58,200 --> 00:08:03,472
President of U.S.A.
Implies that someone is also the leader of

89
00:08:03,472 --> 00:08:06,109
the U.S.A.
And things like that.

90
00:08:06,109 --> 00:08:12,997
So you'd have data which is linked to
other data through inference rules as well

91
00:08:12,997 --> 00:08:20,141
as engines or systems that could perform
reasoning and therefore answer complicated

92
00:08:20,141 --> 00:08:26,860
queries like, who is the Dhoni of U.S.A.
or who is the leader of the U.S.A.?

93
00:08:27,480 --> 00:08:35,148
We'll come back to the vision that Tim
Berners-Lee, espoused in 2000 in a little

94
00:08:35,148 --> 00:08:39,583
while.
For the moment, lets take a closer look at

95
00:08:39,583 --> 00:08:44,479
the concept of reasoning with a basic
study of logic.

96
00:08:44,479 --> 00:08:51,736
And how reasoning can be modeled formally.
From there we'll go and study reasoning in

97
00:08:51,736 --> 00:08:55,505
more detail.
And finally, towards the end of this

98
00:08:55,505 --> 00:09:01,787
week's lecture, we'll get back to how
facts and rules required for reasoning can

99
00:09:01,787 --> 00:09:07,520
be extracted from large volumes of text,
such as are available on the web.