We're now going to talk a little bit about
an issue that's of interest according to
scientists, but may not be of much
interest to engineers.
So if you're an engineer you can just
ignore this video.
In computer science, there's been a debate
going on for maybe a 100 years about the
relationship between feature vector
representations of concepts and
representations of concepts by their
relations to other concepts.
And the learning algorithm we've just seen
for family trees has a lot to say about
that debate.
We're now going to make a brief diversion
into cognitive science.
There's been a long debate between two
rival theories of what it means to have a
concept.
The feature says a concept is a big set of
semantic features.
This is good for explaining similarities
between concepts.
And it's convenient for things like
machine learning.
Because we like to deal with vectors of
activities.
The structuralist theory says that the
meaning of a concept lies in its
relationships to other concepts.
So conceptual knowledge is best expressed
not as a big vector, but as a relational
graph.
In the early 1970s, Marvin Minsky use the
limitations of perceptrons as evidence
against featured actors in favor of
relational graph representations.
My belief is that both sides in this
debate are wrong because both sides
believe that the two theories are rivals
and they're not rivals at all.
A neural net can use vectors of semantic
features to implement a relational graph.
In the neural network that learns family
trees, we can think of explicit inference
as I give you person one and I give you a
relationship then you tell me person two.
And to arrive at that conclusion, the
neural net doesn't follow a whole bunch of
rules of inference.
It just passes information forward through
the net.
As far as the neural net is concerned, the
answer is intuitively obvious.
Now if you look at the details of what's
happening, there's lots of probabilistic
features that are influencing each other.
We call these microfeatures to sort of
emphasize they're not like explicit
conscious features.
In a real brain, there might be millions
of them and millions of interactions, and
as a result of all these interactions, we
can make one step of explicit inference.
And that's what we believe is involved in
just seeing the answer to something.
There are no intervening conscious steps,
but nevertheless there's a lot of
computation going on in the interactions
of neurons.
So we may use explicit rules for conscious
deliberate reasoning, but a lot of our
common sense reasoning, particularly
analogical reasoning, works by just seeing
the answer, with no conscious intervening
steps.
And even when we do conscious reasoning,
we have to have some way of just seeing
which rules apply, in order to avoid an
infinite regress.
So, many people, when they think about
implementing a relational graph in a
neural net, just assume that you should
make a neuron correspond to a node in the
relational graph and a connection between,
two neurons correspond to a binary
relationship.
But this method doesn't work.
For a start, the relationships come in
different flavors.
They're are different kinds of
relationship.
Like mother of, or aunt of and, a
connection in a neural net only has a
strength.
It doesn't come in different types.
Also we need to do we turn around your
relationships like'a' is between'b'
and'c'.
We still don't know for sure the right way
to implement relational knowledge in a
neural net.
But it seems very probable that many
neurons are used for representing each of
the concepts we know, and each of those
neurons is probably involved in dealing
with many different concepts.
This is called a distributed
representation.
It's a many to many mapping between
concepts and neurons.