In this video, I'm going to describe how a
recurrent neural network solves a toy
problem.
It's a problem that's chosen to
demonstrate what it is you can do with
recurrent neural networks that you cannot
do conveniently with feet forward neural
networks.
The problem is adding up two binary
numbers.
Off to the recurrent neural network, has
learned to solve the problem.
It's interesting to look at its hidden
states, and see how they relate to the
hidden states in a finite state automaton
that's solving the same problem.
So consider the problem of adding up two
binary numbers.
We could train a feed-forward neural
network to do that.
And the diagram on the right shows a
network that gets some inputs and produces
some outputs.
But there's problems with using a
feed-forward neural network.
We have to decide in advance what the
maximum number of digits is both for both
of the input numbers and for the output
number.
And more importantly, the processing that
we apply to the different bits of the
input numbers, doesn't generalize.
That is, when we learn how to add up the
last two digits and deal with the carries,
that knowledges in some weights.
And as we go to a different part of a long
binary number, the knowledge will have to
be in different weights.
So we won't get automatic generalization.
As a result, although you can train a
neuron feedfoward neural network, and it
will eventually learn to do binary
addition on fixed-length numbers, it's not
an elegant way to solve the problem.
This is a picture of the algorithm for
binary addition.
The states shown here are like the states
in a hidden Markov model, except they're
not really hidden.
The system is in one state at a time.
When it enters a state it performs an
action.
So it either prints a one or prints a zero
and when it's in a state it gets some
input, which is the two numbers in the
next column.
And that input causes it to go into a new
state.
So if you look on the top right,
It's in the carry station and it's just
printed a one.
If it sees a one, one, it goes back in to
the same stage and print another one.
If however it sees a one, zero or zero,
one, It goes into the carry state but
prints a zero.
If it sees a zero, zero, it goes into the
no carry state, and prints a one.
And so on.
So a recurring neuro net for binary
edition needs to have two input units and
one output unit.
It's given two input digits at each time
stamp.
And it also has to produce an output at
each time step.
And the output is the output for the
column that it took in two time steps ago.
The reason we need a delay of two time
steps, is that it takes one time step to
update the hidden units based on the
inputs, and another time step to produce
the output from the hidden state.
So the net looks like this.
I only gave it three hidden units.
That's sufficient to do the job.
It would learn faster with more hidden
units, but it can do it with three.
The three hidden units are fully
interconnected and they have connections
in both directions that don't necessarily
have the same weight.
In fact in general they don't have the
same weight.
The connections between hidden units allow
the pattern of one time step to insensate
the pattern of the next time step.
The input units have feed forward
connections to the hidden units and that's
how it sees the two digits in a column.
And similarly, the hidden units have feed
forward connections to the output unit and
that's how it produces its output.
It's interesting to look at what the
recurring neural network learns.
It learns four distinct patterns of
activity in its three hidden units.
And these patterns correspond to the nodes
in the finite state automaton for binary
addition.
We must confuse the units in a neural
network, with the nodes in a final state
automaton.
The nodes in the finite state automaton
correspond to the activity vectors of the
recurrent neural network.
The automaton is restricted to being
exactly one state at each time.
And similarly, the hidden units are
restricted to have exactly one activity
vector at each time in the recurrent
neural network.
So a recurrent neural network can emulate
a finite state automaton but it's
exponentially more powerful in its
representation.
With any hidden neurons, it has 2n to the
N possible binary activity vectors.
Of course it only has n squared weights so
it can't necessarily make full use of all
that representational power.
But if the bottleneck is in the
representation a recurrent neural network
can do much better than a finite state
automaton.
This is important when the input stream
has two separate things going on at once.
A finite state automaton needs to square
its number of states in order to deal with
the fact that there's two things going on
at once.
A recurrent neural network only needs to
double its number of hidden units.
By doubling the number of units, it does
of course square the number of binary
vector states that it has.