1 00:00:00,000 --> 00:00:05,009 We're now going to talk a little bit about an issue that's of interest according to 2 00:00:05,009 --> 00:00:08,050 scientists, but may not be of much interest to engineers. 3 00:00:08,050 --> 00:00:11,090 So if you're an engineer you can just ignore this video. 4 00:00:12,040 --> 00:00:18,058 In computer science, there's been a debate going on for maybe a 100 years about the 5 00:00:18,058 --> 00:00:23,066 relationship between feature vector representations of concepts and 6 00:00:23,066 --> 00:00:28,050 representations of concepts by their relations to other concepts. 7 00:00:28,050 --> 00:00:34,069 And the learning algorithm we've just seen for family trees has a lot to say about 8 00:00:34,069 --> 00:00:38,072 that debate. We're now going to make a brief diversion 9 00:00:38,072 --> 00:00:43,024 into cognitive science. There's been a long debate between two 10 00:00:43,024 --> 00:00:46,042 rival theories of what it means to have a concept. 11 00:00:47,020 --> 00:00:52,006 The feature says a concept is a big set of semantic features. 12 00:00:52,035 --> 00:00:56,012 This is good for explaining similarities between concepts. 13 00:00:56,012 --> 00:00:59,057 And it's convenient for things like machine learning. 14 00:00:59,057 --> 00:01:02,088 Because we like to deal with vectors of activities. 15 00:01:02,088 --> 00:01:07,056 The structuralist theory says that the meaning of a concept lies in its 16 00:01:07,056 --> 00:01:13,026 relationships to other concepts. So conceptual knowledge is best expressed 17 00:01:13,026 --> 00:01:16,051 not as a big vector, but as a relational graph. 18 00:01:16,051 --> 00:01:22,044 In the early 1970s, Marvin Minsky use the limitations of perceptrons as evidence 19 00:01:22,044 --> 00:01:27,056 against featured actors in favor of relational graph representations. 20 00:01:27,056 --> 00:01:32,044 My belief is that both sides in this debate are wrong because both sides 21 00:01:32,044 --> 00:01:37,020 believe that the two theories are rivals and they're not rivals at all. 22 00:01:37,020 --> 00:01:42,069 A neural net can use vectors of semantic features to implement a relational graph. 23 00:01:42,069 --> 00:01:48,018 In the neural network that learns family trees, we can think of explicit inference 24 00:01:48,018 --> 00:01:53,073 as I give you person one and I give you a relationship then you tell me person two. 25 00:01:54,012 --> 00:01:59,073 And to arrive at that conclusion, the neural net doesn't follow a whole bunch of 26 00:01:59,073 --> 00:02:04,007 rules of inference. It just passes information forward through 27 00:02:04,007 --> 00:02:09,009 the net. As far as the neural net is concerned, the 28 00:02:09,009 --> 00:02:14,035 answer is intuitively obvious. Now if you look at the details of what's 29 00:02:14,035 --> 00:02:19,040 happening, there's lots of probabilistic features that are influencing each other. 30 00:02:19,040 --> 00:02:24,007 We call these microfeatures to sort of emphasize they're not like explicit 31 00:02:24,007 --> 00:02:28,059 conscious features. In a real brain, there might be millions 32 00:02:28,059 --> 00:02:34,025 of them and millions of interactions, and as a result of all these interactions, we 33 00:02:34,025 --> 00:02:39,091 can make one step of explicit inference. And that's what we believe is involved in 34 00:02:39,091 --> 00:02:45,022 just seeing the answer to something. There are no intervening conscious steps, 35 00:02:45,022 --> 00:02:50,040 but nevertheless there's a lot of computation going on in the interactions 36 00:02:50,040 --> 00:02:55,044 of neurons. So we may use explicit rules for conscious 37 00:02:55,044 --> 00:03:00,045 deliberate reasoning, but a lot of our common sense reasoning, particularly 38 00:03:00,045 --> 00:03:06,007 analogical reasoning, works by just seeing the answer, with no conscious intervening 39 00:03:06,007 --> 00:03:09,018 steps. And even when we do conscious reasoning, 40 00:03:09,018 --> 00:03:14,052 we have to have some way of just seeing which rules apply, in order to avoid an 41 00:03:14,052 --> 00:03:19,077 infinite regress. So, many people, when they think about 42 00:03:19,077 --> 00:03:25,009 implementing a relational graph in a neural net, just assume that you should 43 00:03:25,009 --> 00:03:30,096 make a neuron correspond to a node in the relational graph and a connection between, 44 00:03:30,096 --> 00:03:34,033 two neurons correspond to a binary relationship. 45 00:03:34,033 --> 00:03:39,026 But this method doesn't work. For a start, the relationships come in 46 00:03:39,026 --> 00:03:42,047 different flavors. They're are different kinds of 47 00:03:42,047 --> 00:03:45,347 relationship. Like mother of, or aunt of and, a 48 00:03:45,347 --> 00:03:48,050 connection in a neural net only has a strength. 49 00:03:48,050 --> 00:03:54,087 It doesn't come in different types. Also we need to do we turn around your 50 00:03:54,087 --> 00:03:57,074 relationships like'a' is between'b' and'c'. 51 00:03:58,034 --> 00:04:03,080 We still don't know for sure the right way to implement relational knowledge in a 52 00:04:03,080 --> 00:04:07,060 neural net. But it seems very probable that many 53 00:04:07,060 --> 00:04:13,042 neurons are used for representing each of the concepts we know, and each of those 54 00:04:13,042 --> 00:04:18,039 neurons is probably involved in dealing with many different concepts. 55 00:04:18,039 --> 00:04:21,056 This is called a distributed representation. 56 00:04:22,013 --> 00:04:26,043 It's a many to many mapping between concepts and neurons.