1 00:00:00,000 --> 00:00:06,640 This week we begin to talk about Connect, Which is how we connect the dots and make 2 00:00:06,640 --> 00:00:11,800 sense of the world. How to go beyond learning to reasoning, 3 00:00:11,800 --> 00:00:18,489 and why reasoning is needed, beyond simple learning as we have covered last week. 4 00:00:18,489 --> 00:00:23,608 This leads us into logic. As well as its limits both fundamental as 5 00:00:23,608 --> 00:00:29,305 well as those arising from the uncertain nature of the facts and rules that we 6 00:00:29,305 --> 00:00:35,074 learn about the world so we will talk about reasoning under uncertainty in some 7 00:00:35,074 --> 00:00:38,938 detail this week. And then, come full circle, back to 8 00:00:38,938 --> 00:00:44,992 learning, where some of the techniques that we'll study, back to Bayes rule and 9 00:00:44,992 --> 00:00:50,581 things like that again, will help us to learn better this time from text. 10 00:00:50,581 --> 00:00:54,849 So here we go. To motivate why we might need to connect 11 00:00:54,849 --> 00:01:01,059 the dots and go beyond mere learning and search, consider the following question. 12 00:01:01,059 --> 00:01:06,779 Who is the leader of the USA? And consider asking this question of 13 00:01:06,779 --> 00:01:11,000 search engine or any web intelligence system. 14 00:01:12,940 --> 00:01:18,926 The system might be aware of some facts, such as x is the prime minister of some 15 00:01:18,926 --> 00:01:22,668 country, c. X is the president of another country, c. 16 00:01:22,668 --> 00:01:26,560 And many such facts for different values of x and c. 17 00:01:27,680 --> 00:01:32,135 But, there is no such fact that X is the leader of the USA. 18 00:01:32,135 --> 00:01:37,743 For example, we might have learned many such facts by looking at text and 19 00:01:37,743 --> 00:01:43,889 extracting them, from textual documents, something that we'll come to, towards the 20 00:01:43,889 --> 00:01:48,421 end of this week. But, for the moment assume that we do have 21 00:01:48,421 --> 00:01:54,260 many such facts, but there is no such fact, for X being the leader of the USA. 22 00:01:55,620 --> 00:02:02,817 Somehow we haven't learned this because we only learn the facts about specific posts 23 00:02:02,817 --> 00:02:09,761 like president or prime minister so now what well if X is the president of C then 24 00:02:09,761 --> 00:02:17,109 X is the leader of C. The system might know such facts or rules 25 00:02:17,109 --> 00:02:25,104 which constitute its knowledge. As a result, combining of facts, such as 26 00:02:25,104 --> 00:02:31,717 Obama is the President of the U.S.A., the system might be able to conclude that 27 00:02:31,717 --> 00:02:38,110 Obama is the leader of the U.S.A. This is an example of reasoning. 28 00:02:38,110 --> 00:02:45,068 Taking facts and knowledge which is rules and combining facts and knowledge to come 29 00:02:45,068 --> 00:02:49,640 up with new facts. But reasoning can be pretty. 30 00:02:50,720 --> 00:02:54,560 Manmohan Singh for example is the prime minister of India. 31 00:02:55,940 --> 00:02:58,973 But Pranab Mukherjee is the President of India. 32 00:02:58,973 --> 00:03:02,200 India has a prime minister as well as a president. 33 00:03:03,180 --> 00:03:09,036 So who's the leader of India. You need more facts, and rules, to figure 34 00:03:09,036 --> 00:03:13,640 this out. Much more knowledge is there for needed, 35 00:03:13,640 --> 00:03:20,444 for example one might need to know that in India the president is a ceremonial post 36 00:03:20,444 --> 00:03:26,763 whereas a prime minister is a leader. In other countries like France it is the 37 00:03:26,763 --> 00:03:32,190 president who is leader. So knowledge is not necessarily static and 38 00:03:32,190 --> 00:03:38,589 can lead to confusions if one doesn't understand the semantics of knowledge, so 39 00:03:38,589 --> 00:03:42,640 reasoning is not as simple as it appears at first. 40 00:03:43,480 --> 00:03:49,627 Lets take a look at a few more examples, to really understand how deep the problems 41 00:03:49,627 --> 00:03:56,331 with reasoning can actually become. We've seen this example, a few weeks ago. 42 00:03:56,331 --> 00:04:01,540 Book me an American flight to New York, as soon as possible. 43 00:04:01,940 --> 00:04:08,412 Does the questioner or requester want a flight on American Airlines or on any 44 00:04:08,412 --> 00:04:13,142 American carrier. It might depend on where that person is. 45 00:04:13,142 --> 00:04:20,113 If he's in London any American carrier but if he's in New York or rather not in New 46 00:04:20,113 --> 00:04:26,420 York but in, in Boston he might definitely mean the American Airlines flight. 47 00:04:28,280 --> 00:04:34,098 This New Yorker, who fought at the Battle of Gettysburg, was once considered the 48 00:04:34,098 --> 00:04:38,296 inventor of baseball. This is a question posed to the IBM 49 00:04:38,296 --> 00:04:42,200 program Watson during the Jeopardy challenge of 2009. 50 00:04:43,060 --> 00:04:47,122 There are two possible answers if you look at the web. 51 00:04:47,122 --> 00:04:51,110 Alexander Cartwright, who wrote the rules of baseball. 52 00:04:51,110 --> 00:04:55,999 Or Abner Doubleday. It turns out that its Abner Doubleday, 53 00:04:55,999 --> 00:05:01,645 because this person actually fought at Gettysburg, and Watson got it right. 54 00:05:01,645 --> 00:05:07,368 So Watson had to reason many different facts, including the fact that Abner 55 00:05:07,368 --> 00:05:13,319 Doubleday also contributed to the rules of baseball, and in addition, fought at 56 00:05:13,319 --> 00:05:16,752 Gettysburg. So these two things had to be put 57 00:05:16,752 --> 00:05:20,415 together. Watson had to connect the dots, put two 58 00:05:20,415 --> 00:05:25,680 and two together to make this conclusion and get this question right. 59 00:05:26,040 --> 00:05:31,162 I think of a more difficult question like, who is the Tony of USA? 60 00:05:31,162 --> 00:05:38,322 Those of you who are not from India. Tony is the cricket captain of India, so 61 00:05:38,322 --> 00:05:45,414 this question is really asking a very deep question, in terms of, who is the 62 00:05:45,414 --> 00:05:52,980 equivalent of the cricket captain of USA. Cricket is not really played in the US. 63 00:05:53,280 --> 00:05:58,360 So what's the equivalent of cricket anywhere, baseball probably. 64 00:06:00,100 --> 00:06:04,120 So, this is an example of, analogical reasoning. 65 00:06:04,120 --> 00:06:09,888 So, x is to U.S.A., what cricket is to India, would give us baseball. 66 00:06:09,888 --> 00:06:15,127 But, trouble is. There is no US baseball team. 67 00:06:15,127 --> 00:06:20,707 So there, given that first step of reasoning doesn't seem to work, so one 68 00:06:20,707 --> 00:06:24,844 needs to go beyond. Deductive reasoning to what is called 69 00:06:24,844 --> 00:06:29,051 abductive reasoning. In the sense that one needs to find out 70 00:06:29,051 --> 00:06:33,679 the best possible answer. Who is the most popular sportsman in the 71 00:06:33,679 --> 00:06:36,975 USA? And there may be many popular sportsmen in 72 00:06:36,975 --> 00:06:42,514 the USA, so one is trying to find the best possible answer from a probabilistic 73 00:06:42,514 --> 00:06:46,300 perspective. This is an example of abductive reasoning, 74 00:06:46,300 --> 00:06:51,700 as opposed to deductive reasoning, and we'll come across this later this week. 75 00:06:53,120 --> 00:06:57,510 Furth, further this is an example of reasoning under uncertainty. 76 00:06:57,510 --> 00:07:02,207 Most popular is not, given in any one web page or any one statement. 77 00:07:02,207 --> 00:07:08,165 One needs to come to a conclusion based on a probabilistic assessment, of who appears 78 00:07:08,165 --> 00:07:14,247 to be most popular, using some measures. So this is an example, uncertain reasoning 79 00:07:14,247 --> 00:07:18,072 as well. The idea of adding reasoning to the web, 80 00:07:18,072 --> 00:07:24,369 or web intelligence systems, is credited to Tim Berners-Lee who, if you remember, 81 00:07:24,369 --> 00:07:30,824 is actually the, credited as being the inventor of the web in the first place way 82 00:07:30,824 --> 00:07:36,005 back in the early'90's. In 2000, Tim Berners-Lee came out with his 83 00:07:36,005 --> 00:07:39,990 vision for a semantic web, where instead of having. 84 00:07:39,990 --> 00:07:46,074 Simple pages of text which could only be understood by human readers, one would 85 00:07:46,074 --> 00:07:49,164 have. Linked to data on the web. 86 00:07:49,164 --> 00:07:53,473 So it's not just text, but data which are facts. 87 00:07:53,473 --> 00:07:57,600 Like, Obama is the President of the U.S.A., or. 88 00:07:58,200 --> 00:08:03,472 President of U.S.A. Implies that someone is also the leader of 89 00:08:03,472 --> 00:08:06,109 the U.S.A. And things like that. 90 00:08:06,109 --> 00:08:12,997 So you'd have data which is linked to other data through inference rules as well 91 00:08:12,997 --> 00:08:20,141 as engines or systems that could perform reasoning and therefore answer complicated 92 00:08:20,141 --> 00:08:26,860 queries like, who is the Dhoni of U.S.A. or who is the leader of the U.S.A.? 93 00:08:27,480 --> 00:08:35,148 We'll come back to the vision that Tim Berners-Lee, espoused in 2000 in a little 94 00:08:35,148 --> 00:08:39,583 while. For the moment, lets take a closer look at 95 00:08:39,583 --> 00:08:44,479 the concept of reasoning with a basic study of logic. 96 00:08:44,479 --> 00:08:51,736 And how reasoning can be modeled formally. From there we'll go and study reasoning in 97 00:08:51,736 --> 00:08:55,505 more detail. And finally, towards the end of this 98 00:08:55,505 --> 00:09:01,787 week's lecture, we'll get back to how facts and rules required for reasoning can 99 00:09:01,787 --> 00:09:07,520 be extracted from large volumes of text, such as are available on the web.