1
00:00:00,000 --> 00:00:04,099
Still, we, I think there is something
missing.

2
00:00:04,099 --> 00:00:11,832
Even if we have a neural architecture
which appears to predict time series very

3
00:00:11,832 --> 00:00:16,024
well.
Even if we have a variety of techniques

4
00:00:16,024 --> 00:00:23,541
for prediction classification,
On the one hand, reasoning and rules on

5
00:00:23,541 --> 00:00:29,185
the other.
The link between these is still missing.

6
00:00:29,185 --> 00:00:34,305
For example,
If you're trying to predict how other

7
00:00:34,305 --> 00:00:40,255
players or pedestrians on the road will
move, or if you're trying to predict the

8
00:00:40,255 --> 00:00:45,609
consequences of a decision cuz when one
takes a decision, one imagines the future.

9
00:00:45,609 --> 00:00:50,071
If I did x, then y will happen, if I did
z, then a will happen.

10
00:00:50,071 --> 00:00:55,500
And we continuously imagining the future
by playing things out in our head.

11
00:00:57,760 --> 00:01:04,710
The missing element is that symbolic
reasoning, optimization, planning.

12
00:01:04,710 --> 00:01:10,939
These features, or these sort of
techniques appear very different from the

13
00:01:10,939 --> 00:01:16,496
regressions, or the neural learning, or
sequence prediction, or naive-based

14
00:01:17,001 --> 00:01:23,568
classification which essentially data
driven predictions that we have seen.

15
00:01:23,568 --> 00:01:30,141
Reasoning requires one to learn rules.
One requires one to learn classes, and

16
00:01:30,141 --> 00:01:36,924
then reason in a symbolic way about these
things. And the link between how did a

17
00:01:36,924 --> 00:01:42,140
driven bottom-up techniques, eventually
give rise to higher level symbolic

18
00:01:42,140 --> 00:01:46,861
reasoning in an architecture like the
brain, is, is the missing link.

19
00:01:46,861 --> 00:01:52,500
We still don't know how that happens.
The hierarchical temple of memory promises

20
00:01:52,500 --> 00:01:57,221
that, yes we'll, we're going to learn
about this, but that's not been

21
00:01:57,221 --> 00:02:01,707
demonstrated yet.
So, in the absence of that link being

22
00:02:01,707 --> 00:02:07,781
there, there are other ways to put these
different techniques together in practical

23
00:02:07,781 --> 00:02:10,781
systems.
The most popular one is called a

24
00:02:10,781 --> 00:02:16,270
blackboard architecture and it's a very
old technique going back to the 50's.

25
00:02:16,474 --> 00:02:21,445
And is now starting to get used
increasingly in complex AI systems which

26
00:02:21,445 --> 00:02:24,987
require, which need to use many different
techniques.

27
00:02:24,987 --> 00:02:29,686
Some bottom up data driven, some top down
symbolic reasoning oriented,

28
00:02:29,686 --> 00:02:33,160
And this is how the black board
architecture works.

29
00:02:34,180 --> 00:02:40,575
The blackboard architecture consists of a
blackboard where knowledge or, or that

30
00:02:40,575 --> 00:02:47,131
what one learns about the world is posted.
And this, this knowledge is posted by

31
00:02:47,131 --> 00:02:51,527
knowledge sources.
Now, knowledge sources can be of many

32
00:02:51,527 --> 00:02:55,285
types.
They could be bottom-up feature learning,

33
00:02:55,285 --> 00:02:59,122
clustering, sequence miners like HTM,
classifiers.

34
00:02:59,362 --> 00:03:07,396
So things which learn from the data
directly, or there could be symbolic rule

35
00:03:07,396 --> 00:03:12,620
engines or decision engines which do
planning or, or reasoning,

36
00:03:13,440 --> 00:03:21,499
And they operate on a common blackboard..
So, the lower level data driven knowledge

37
00:03:21,499 --> 00:03:28,222
sources might learn something about the
world, like what are the features to look

38
00:03:28,222 --> 00:03:31,874
for, what are the classes, what are the
rules.

39
00:03:31,874 --> 00:03:38,099
And then, higher level rule engines might
operate on these rules to perform

40
00:03:38,099 --> 00:03:41,493
reasoning, do planning, and take
decisions.

41
00:03:41,493 --> 00:03:48,357
And a controller looks at the blackboard
and tries to figure out based on what is

42
00:03:48,357 --> 00:03:54,560
available on the black board, what kinds
of knowledge sources would be most

43
00:03:54,840 --> 00:03:58,781
applicable to the kinds of stuff which are
on the blackboard..

44
00:03:59,860 --> 00:04:04,939
So,
This is a way of putting different types

45
00:04:04,939 --> 00:04:10,865
of machine learning techniques, reasoning
techniques together in one architecture.

46
00:04:10,865 --> 00:04:16,865
It's a hierarchical system, and some
blackboard systems are also Bayesian in

47
00:04:16,865 --> 00:04:21,401
the sense that if the two different
elements are on the blackboard,.

48
00:04:23,280 --> 00:04:27,795
Placing a third element might make the
probability that one of the older

49
00:04:27,795 --> 00:04:32,744
elements, which was already deemed to be
true, become less true through something

50
00:04:32,744 --> 00:04:37,260
like the explaining away effect.
So, those are called Bayesian blackboards.

51
00:04:37,680 --> 00:04:42,814
Some examples of blackboards are the
earliest, one of the earliest examples is

52
00:04:42,814 --> 00:04:46,829
speech recognition.
The first speech recognition systems used

53
00:04:46,829 --> 00:04:50,713
blackboard reasoning.
So, the lower levels of the blackboard

54
00:04:50,713 --> 00:04:55,913
would detect things like of phonemes,
And then higher levels would detect words,

55
00:04:55,913 --> 00:04:59,205
And even higher levels would talk about
sentences.

56
00:04:59,205 --> 00:05:03,252
And at each level,
One is not only going bottom up, but one

57
00:05:03,252 --> 00:05:09,692
is using the predictions at the higher
level, layers to drive the reasoning at,

58
00:05:09,692 --> 00:05:15,988
or the classification at the lower layer.
So, the likelihood of the next word being

59
00:05:15,988 --> 00:05:21,784
a particular one is driven by what the
previous word is and that, as we have seen

60
00:05:21,784 --> 00:05:26,313
in, in, in a few lectures back,
Well that also drives what phonemes to

61
00:05:26,313 --> 00:05:30,379
look for.
So, lower level classifiers are adjusted

62
00:05:30,379 --> 00:05:35,176
based on what possible words are most
likely in this particular higher level

63
00:05:35,176 --> 00:05:38,229
context.
So, that's how speech recognition systems

64
00:05:38,229 --> 00:05:41,780
have used this hierarchical reasoning
fairly effectively.

65
00:05:41,780 --> 00:05:47,838
There are other systems which are do, deal
with analogical reasoning which are

66
00:05:47,838 --> 00:05:53,388
essentially ways of trying to mimic
analogy, like who is the Dhoni of USA.

67
00:05:53,388 --> 00:05:59,050
So, how do you map different frames of
reference to different contexts through

68
00:05:59,050 --> 00:06:02,389
analogies?
I'd like to show you an example of an

69
00:06:02,389 --> 00:06:07,834
analogical reasoning system, or at least
one that tries to mimic analogical

70
00:06:07,834 --> 00:06:12,135
reasoning.
This one is due to a student of Hofstadter

71
00:06:12,135 --> 00:06:17,183
called Melanie Mitchell..
Hofstadter, if you remember was the author

72
00:06:17,183 --> 00:06:22,686
of the Pulitzer Prize winning book Godel,
Escher, Bach, which many of you might have

73
00:06:22,686 --> 00:06:25,841
read.
It's an old book, about more than 30 years

74
00:06:25,841 --> 00:06:28,324
old.
Melanie Mitchell, his student, has

75
00:06:28,324 --> 00:06:33,358
recently written a book called Complexity
which is also a very interesting

76
00:06:33,358 --> 00:06:38,258
exposition of variety of areas in
artificial intelligence and complex

77
00:06:38,258 --> 00:06:39,130
systems.
Well,

78
00:06:39,130 --> 00:06:44,298
Let's look what analogical systems
reasoning works in the copycat program of

79
00:06:44,298 --> 00:06:49,855
Melanie Mitchell.
The analogy one is trying to mimic is, if

80
00:06:49,855 --> 00:06:56,209
you're given a transformation between a,
b, c which takes a, b, c to a,

81
00:06:56,209 --> 00:07:02,132
B, d. Its like a puzzle.
What would you deem as the analogous

82
00:07:02,132 --> 00:07:06,997
transformation of i, j, k? Well,
Think about it.

83
00:07:06,997 --> 00:07:10,546
Let's see what the system does.
It's reasoning.

84
00:07:10,546 --> 00:07:16,872
It's trying to find out what the analogy
is between these two and apply that same

85
00:07:16,872 --> 00:07:22,196
analogy to this particular strain.
And it figures out that analogy is

86
00:07:22,196 --> 00:07:28,677
replaced the letter category of the right
most letter by it's successo, and it came

87
00:07:28,677 --> 00:07:31,422
out with i,
J, l. Let's try it again.

88
00:07:31,422 --> 00:07:36,879
This time we'll give it a problem, a, b, c
goes to b, b, c and see what it comes up

89
00:07:36,879 --> 00:07:40,226
with.
The blackboard architecture is reasoning,

90
00:07:40,226 --> 00:07:46,411
different types of rules are being applied
in a hierarchy and each rule is affecting

91
00:07:46,411 --> 00:07:50,540
what to look at next.
And it comes up with j,

92
00:07:50,540 --> 00:07:54,921
J, k. Replace the category of the left
most letter by its successor.

93
00:07:54,921 --> 00:08:00,015
It has learned the analogy.
So, as we can see, the blackboard systems

94
00:08:00,015 --> 00:08:06,706
are extremely powerful. And they do form a
way of marrying the bottom-up data driven

95
00:08:06,706 --> 00:08:13,078
reasoning with the top-down symbolic
reasoning, and allowing both of these to

96
00:08:13,078 --> 00:08:19,927
influence each other just as Bayesian
networks and hierarchical temporal memory,

97
00:08:19,927 --> 00:08:26,220
all also include this element of top-down,
bottom-up reasoning working together.

98
00:08:26,860 --> 00:08:32,310
So, we will now end the course with a
recap.

99
00:08:32,310 --> 00:08:38,444
I hope you've enjoyed this lecture.
I've tried to cover many exciting things,

100
00:08:38,444 --> 00:08:43,306
at least things which I find extremely
exciting and promising.

101
00:08:43,530 --> 00:08:49,739
A few things in a little detail, like
linear regression and the ability to, to

102
00:08:49,739 --> 00:08:56,098
predict values using regression and maybe
even other techniques like logistic and

103
00:08:56,098 --> 00:09:00,511
SVM, if you use packages.
And then, some more speculative AI

104
00:09:00,661 --> 00:09:05,000
aspects, and how they come together for
big data analytics