So, after sparse pattern, the next
important piece of hierarchical temporal
memory is that it doesn't just learn
images,
Many neural networks learn images.
And, pattern classification was one of the
first applications of neural networks, and
we ran into trouble because of situations
that were not linearly separable. And
then, sort of waned and we found different
techniques for image and pattern
recognition over the years.
But, what hierarchical temporal memory
does is, it learns sequences of patterns
which is another, the, the other most
important feature.
So, let's, let's look at this particular,
image.
Again, this is taken from Jeff Hawkins
talk.
And after a second or ao, or a few
milliseconds,
You see another image which is a slightly
shifted version of this.
And then, you might see another one, which
we have a slightly different pattern.
So, that the, the, the pattern that one is
seeing is, is changing over time.
But, the important part is that each
neuron, not only gets triggered by the
inputs and gets selected if it's chosen in
the sparse pattern,
It also keeps track of what other neurons
were triggered just before it got
triggered. Now, obviously, it can't keep
track of every neuron, but it keeps track
of a few. And it keeps track of those few
based on the connections that it makes to
nearby neurons by its dendrites,
Or rather, not necessarily nearby ones,
but those that have, are nearby it on its
dendrites.
So, each cell tracks the previous
configuration again, sparsely.
It doesn't keep track of all the cells
which were active in the previous time
step, but only a few.
And it does this via these synapse
condition, connections which are made
along these dendrites.
So, each cell is connected to a number of
other cells via, or is potentially
connected via this dendrite, and the
synapses are the actual connections which
are not permanent but get learned over
time and this is how the learning takes
place.
Assume that the predicted values are the
ones in yellow that come from the, from a
variety of different predictions by all
the different cells which are actually
firing, and the ones which actually occur
the next time step are the red ones.
Then, those neurons which predicted the
red values based on their previous inputs
would get the synopsis that corresponded
to the actual red values stranded.
So, the neuron saw pattern and based on
its history, it predicted that it, another
pattern would take place in the next step.
There are many possible predictions, but
some of them actually come true. And,
based on what's coming, what comes true,
those synopsis which predicted the ones
which came true get reinforced and
strengthened.
So,
A new run needs to make predictions,
But it needs to store this prediction
somewhere.
Obviously, it can only store one value in
one layer. So,
Instead of one layer that each particular
cell consists of a column of neurons, and
these neurons the, the column actually
consists of the predictions for that
particular cell position over time, and
this is how this works.
Think about the sparse pattern consisting
of 40 active bits out of 2,000.
And, let's suppose that there are ten
sales per column,
And there are ten to the 40 ways to
represent the same input in different
contexts.
Now, ten to the 40 is a large number.
So, each context corresponds to a set of
neighboring cells firing which set depends
on which synapse, which, which synapses
and then write the segments are capturing
that context. But, depending on which
context this cell is firing in, the
particular column gets activated, and not
just the, the lowest one but that
particular one. And similarly, for every
cell in, in the, in, in the in the
pattern.
As a result, the, this pattern can be
stored in ten to the 40 different context.
And, so it's able, one is able to remember
sequences using this representation.
So, sequence learning is the second most
important part of hierarchical temporal
memory.
And finally,
There, that what we just saw was only one
tiny region of the model of the neo-cortex
of the brain.
And each region itself is connected to
other regions in a hierarchy.
Each region, region consists of many, many
columns of cells like the 2,000 columns of
ten cells each that we touched, just
mentioned.
There are many, many regions in the, in
the overall model.
Each region is activated by bottom-up
sensory input, either directly from the
measurements which are taken of a system,
or a visual system, or, or, or whatever
one is measuring. Or, from the previous
layer in the hierarchy, as well as
top-down feedback because every layer is
also making predictions, and the
predictions go upwards as well as
downwards.
This is actually a lot like what the brain
is.
And, in fact, neurological studies have
shown that more than 75% of the
connections go back down towards the
senses as opposed to the 25% which come
from the senses.
So, what, what one is actually seeing is
actually what one is imagining.
It's not really purely bottom-up, data
driven perception.
One sees some pixels, one interprets them
much more strongly,
And those downward predictions are far
stronger than the upward ones in the
actual brain.
And, the hierarchical temporal memory
mimics some of these aspects.
The interesting thing is that hierarchical
temporal memory, even though it's a neural
model, has been shown to be mathematically
equivalent to a deep belief network which
is probablistic graphical model.
More interestingly or equally
interestingly,
The hierarchical temporal memory is not
just a abstract model of how the brain
works for purely scientific purposes, but
it has been shown to work on real
applications.
For example, the applications that Jeff
Hawkins talks about are very much big data
analytic applications, where one is
talking about large volumes of data
streams that one is picking up from many,
many devices all over the web.
The hierarchical temporal memory based
models are then able to predict future
values of a data stream, detect anomalies,
and possibly in the future, control
actions based on these models.
An example, some examples are, you know,
energy pricing, energy demand, product
forecasting, machine efficiency, ad
netbook return,], server loads,
All of these have been shown to actually
work.
An example that he uses in his talk is the
regional energy load during different
parts of the day.
And as you can see, these are weekends and
these are weekdays.
And, you know, the, the blues are the
predicted values, the reds are the actual
values.
And you can see that this predicted value
that it's learned from the data all by
itself is fairly accurate.
Think about a linear regression trying to
predict this,
Or think about even a complicated function
f trying to be fitted to predict this kind
of a time series.
So the HTM in my opinion, represents a
fairly interesting area where neural
networks are coming back, getting
mathematically modeled as deep belief
networks which have shown great promise in
most parts of prediction and learning, as
we've seen.
At the same time, the HTM architecture is
uniform,
Very plastic architecture, no complicated
techniques apart from a very uniform
learning system,
And it's able to learn a wide variety of
time series patterns.
Much like the brain's plasticity.
As many people have found through actual
clinical examples,
For example, parts of the brain which we
all use to see are actually used by blind
people to augment their hearing.
And that's been proven through MRI
experiments.
So, the same architecture, learning many
different types of patterns is really what
we're trying to look at, look for in a in
a future web intelligence architecture.
And HTM, certainly, points the way towards
some of these areas.
However, there is something missing.
And, we'll just come to that, in the next
section.
Htm doesn't appear to solve all the
problems, or actually far from it.
Very important pieces are still missing
and they remain open problems, and we'll
talk about those.