1 00:00:00,000 --> 00:00:04,099 Still, we, I think there is something missing. 2 00:00:04,099 --> 00:00:11,832 Even if we have a neural architecture which appears to predict time series very 3 00:00:11,832 --> 00:00:16,024 well. Even if we have a variety of techniques 4 00:00:16,024 --> 00:00:23,541 for prediction classification, On the one hand, reasoning and rules on 5 00:00:23,541 --> 00:00:29,185 the other. The link between these is still missing. 6 00:00:29,185 --> 00:00:34,305 For example, If you're trying to predict how other 7 00:00:34,305 --> 00:00:40,255 players or pedestrians on the road will move, or if you're trying to predict the 8 00:00:40,255 --> 00:00:45,609 consequences of a decision cuz when one takes a decision, one imagines the future. 9 00:00:45,609 --> 00:00:50,071 If I did x, then y will happen, if I did z, then a will happen. 10 00:00:50,071 --> 00:00:55,500 And we continuously imagining the future by playing things out in our head. 11 00:00:57,760 --> 00:01:04,710 The missing element is that symbolic reasoning, optimization, planning. 12 00:01:04,710 --> 00:01:10,939 These features, or these sort of techniques appear very different from the 13 00:01:10,939 --> 00:01:16,496 regressions, or the neural learning, or sequence prediction, or naive-based 14 00:01:17,001 --> 00:01:23,568 classification which essentially data driven predictions that we have seen. 15 00:01:23,568 --> 00:01:30,141 Reasoning requires one to learn rules. One requires one to learn classes, and 16 00:01:30,141 --> 00:01:36,924 then reason in a symbolic way about these things. And the link between how did a 17 00:01:36,924 --> 00:01:42,140 driven bottom-up techniques, eventually give rise to higher level symbolic 18 00:01:42,140 --> 00:01:46,861 reasoning in an architecture like the brain, is, is the missing link. 19 00:01:46,861 --> 00:01:52,500 We still don't know how that happens. The hierarchical temple of memory promises 20 00:01:52,500 --> 00:01:57,221 that, yes we'll, we're going to learn about this, but that's not been 21 00:01:57,221 --> 00:02:01,707 demonstrated yet. So, in the absence of that link being 22 00:02:01,707 --> 00:02:07,781 there, there are other ways to put these different techniques together in practical 23 00:02:07,781 --> 00:02:10,781 systems. The most popular one is called a 24 00:02:10,781 --> 00:02:16,270 blackboard architecture and it's a very old technique going back to the 50's. 25 00:02:16,474 --> 00:02:21,445 And is now starting to get used increasingly in complex AI systems which 26 00:02:21,445 --> 00:02:24,987 require, which need to use many different techniques. 27 00:02:24,987 --> 00:02:29,686 Some bottom up data driven, some top down symbolic reasoning oriented, 28 00:02:29,686 --> 00:02:33,160 And this is how the black board architecture works. 29 00:02:34,180 --> 00:02:40,575 The blackboard architecture consists of a blackboard where knowledge or, or that 30 00:02:40,575 --> 00:02:47,131 what one learns about the world is posted. And this, this knowledge is posted by 31 00:02:47,131 --> 00:02:51,527 knowledge sources. Now, knowledge sources can be of many 32 00:02:51,527 --> 00:02:55,285 types. They could be bottom-up feature learning, 33 00:02:55,285 --> 00:02:59,122 clustering, sequence miners like HTM, classifiers. 34 00:02:59,362 --> 00:03:07,396 So things which learn from the data directly, or there could be symbolic rule 35 00:03:07,396 --> 00:03:12,620 engines or decision engines which do planning or, or reasoning, 36 00:03:13,440 --> 00:03:21,499 And they operate on a common blackboard.. So, the lower level data driven knowledge 37 00:03:21,499 --> 00:03:28,222 sources might learn something about the world, like what are the features to look 38 00:03:28,222 --> 00:03:31,874 for, what are the classes, what are the rules. 39 00:03:31,874 --> 00:03:38,099 And then, higher level rule engines might operate on these rules to perform 40 00:03:38,099 --> 00:03:41,493 reasoning, do planning, and take decisions. 41 00:03:41,493 --> 00:03:48,357 And a controller looks at the blackboard and tries to figure out based on what is 42 00:03:48,357 --> 00:03:54,560 available on the black board, what kinds of knowledge sources would be most 43 00:03:54,840 --> 00:03:58,781 applicable to the kinds of stuff which are on the blackboard.. 44 00:03:59,860 --> 00:04:04,939 So, This is a way of putting different types 45 00:04:04,939 --> 00:04:10,865 of machine learning techniques, reasoning techniques together in one architecture. 46 00:04:10,865 --> 00:04:16,865 It's a hierarchical system, and some blackboard systems are also Bayesian in 47 00:04:16,865 --> 00:04:21,401 the sense that if the two different elements are on the blackboard,. 48 00:04:23,280 --> 00:04:27,795 Placing a third element might make the probability that one of the older 49 00:04:27,795 --> 00:04:32,744 elements, which was already deemed to be true, become less true through something 50 00:04:32,744 --> 00:04:37,260 like the explaining away effect. So, those are called Bayesian blackboards. 51 00:04:37,680 --> 00:04:42,814 Some examples of blackboards are the earliest, one of the earliest examples is 52 00:04:42,814 --> 00:04:46,829 speech recognition. The first speech recognition systems used 53 00:04:46,829 --> 00:04:50,713 blackboard reasoning. So, the lower levels of the blackboard 54 00:04:50,713 --> 00:04:55,913 would detect things like of phonemes, And then higher levels would detect words, 55 00:04:55,913 --> 00:04:59,205 And even higher levels would talk about sentences. 56 00:04:59,205 --> 00:05:03,252 And at each level, One is not only going bottom up, but one 57 00:05:03,252 --> 00:05:09,692 is using the predictions at the higher level, layers to drive the reasoning at, 58 00:05:09,692 --> 00:05:15,988 or the classification at the lower layer. So, the likelihood of the next word being 59 00:05:15,988 --> 00:05:21,784 a particular one is driven by what the previous word is and that, as we have seen 60 00:05:21,784 --> 00:05:26,313 in, in, in a few lectures back, Well that also drives what phonemes to 61 00:05:26,313 --> 00:05:30,379 look for. So, lower level classifiers are adjusted 62 00:05:30,379 --> 00:05:35,176 based on what possible words are most likely in this particular higher level 63 00:05:35,176 --> 00:05:38,229 context. So, that's how speech recognition systems 64 00:05:38,229 --> 00:05:41,780 have used this hierarchical reasoning fairly effectively. 65 00:05:41,780 --> 00:05:47,838 There are other systems which are do, deal with analogical reasoning which are 66 00:05:47,838 --> 00:05:53,388 essentially ways of trying to mimic analogy, like who is the Dhoni of USA. 67 00:05:53,388 --> 00:05:59,050 So, how do you map different frames of reference to different contexts through 68 00:05:59,050 --> 00:06:02,389 analogies? I'd like to show you an example of an 69 00:06:02,389 --> 00:06:07,834 analogical reasoning system, or at least one that tries to mimic analogical 70 00:06:07,834 --> 00:06:12,135 reasoning. This one is due to a student of Hofstadter 71 00:06:12,135 --> 00:06:17,183 called Melanie Mitchell.. Hofstadter, if you remember was the author 72 00:06:17,183 --> 00:06:22,686 of the Pulitzer Prize winning book Godel, Escher, Bach, which many of you might have 73 00:06:22,686 --> 00:06:25,841 read. It's an old book, about more than 30 years 74 00:06:25,841 --> 00:06:28,324 old. Melanie Mitchell, his student, has 75 00:06:28,324 --> 00:06:33,358 recently written a book called Complexity which is also a very interesting 76 00:06:33,358 --> 00:06:38,258 exposition of variety of areas in artificial intelligence and complex 77 00:06:38,258 --> 00:06:39,130 systems. Well, 78 00:06:39,130 --> 00:06:44,298 Let's look what analogical systems reasoning works in the copycat program of 79 00:06:44,298 --> 00:06:49,855 Melanie Mitchell. The analogy one is trying to mimic is, if 80 00:06:49,855 --> 00:06:56,209 you're given a transformation between a, b, c which takes a, b, c to a, 81 00:06:56,209 --> 00:07:02,132 B, d. Its like a puzzle. What would you deem as the analogous 82 00:07:02,132 --> 00:07:06,997 transformation of i, j, k? Well, Think about it. 83 00:07:06,997 --> 00:07:10,546 Let's see what the system does. It's reasoning. 84 00:07:10,546 --> 00:07:16,872 It's trying to find out what the analogy is between these two and apply that same 85 00:07:16,872 --> 00:07:22,196 analogy to this particular strain. And it figures out that analogy is 86 00:07:22,196 --> 00:07:28,677 replaced the letter category of the right most letter by it's successo, and it came 87 00:07:28,677 --> 00:07:31,422 out with i, J, l. Let's try it again. 88 00:07:31,422 --> 00:07:36,879 This time we'll give it a problem, a, b, c goes to b, b, c and see what it comes up 89 00:07:36,879 --> 00:07:40,226 with. The blackboard architecture is reasoning, 90 00:07:40,226 --> 00:07:46,411 different types of rules are being applied in a hierarchy and each rule is affecting 91 00:07:46,411 --> 00:07:50,540 what to look at next. And it comes up with j, 92 00:07:50,540 --> 00:07:54,921 J, k. Replace the category of the left most letter by its successor. 93 00:07:54,921 --> 00:08:00,015 It has learned the analogy. So, as we can see, the blackboard systems 94 00:08:00,015 --> 00:08:06,706 are extremely powerful. And they do form a way of marrying the bottom-up data driven 95 00:08:06,706 --> 00:08:13,078 reasoning with the top-down symbolic reasoning, and allowing both of these to 96 00:08:13,078 --> 00:08:19,927 influence each other just as Bayesian networks and hierarchical temporal memory, 97 00:08:19,927 --> 00:08:26,220 all also include this element of top-down, bottom-up reasoning working together. 98 00:08:26,860 --> 00:08:32,310 So, we will now end the course with a recap. 99 00:08:32,310 --> 00:08:38,444 I hope you've enjoyed this lecture. I've tried to cover many exciting things, 100 00:08:38,444 --> 00:08:43,306 at least things which I find extremely exciting and promising. 101 00:08:43,530 --> 00:08:49,739 A few things in a little detail, like linear regression and the ability to, to 102 00:08:49,739 --> 00:08:56,098 predict values using regression and maybe even other techniques like logistic and 103 00:08:56,098 --> 00:09:00,511 SVM, if you use packages. And then, some more speculative AI 104 00:09:00,661 --> 00:09:05,000 aspects, and how they come together for big data analytics