1
00:00:00,000 --> 00:00:05,035
So, after sparse pattern, the next
important piece of hierarchical temporal

2
00:00:05,035 --> 00:00:08,029
memory is that it doesn't just learn
images,

3
00:00:08,029 --> 00:00:13,201
Many neural networks learn images.
And, pattern classification was one of the

4
00:00:13,201 --> 00:00:18,680
first applications of neural networks, and
we ran into trouble because of situations

5
00:00:18,680 --> 00:00:23,624
that were not linearly separable. And
then, sort of waned and we found different

6
00:00:23,624 --> 00:00:27,380
techniques for image and pattern
recognition over the years.

7
00:00:27,380 --> 00:00:32,200
But, what hierarchical temporal memory
does is, it learns sequences of patterns

8
00:00:32,200 --> 00:00:35,580
which is another, the, the other most
important feature.

9
00:00:35,580 --> 00:00:40,426
So, let's, let's look at this particular,
image.

10
00:00:40,426 --> 00:00:44,767
Again, this is taken from Jeff Hawkins
talk.

11
00:00:45,070 --> 00:00:49,797
And after a second or ao, or a few
milliseconds,

12
00:00:49,797 --> 00:00:54,259
You see another image which is a slightly
shifted version of this.

13
00:00:54,259 --> 00:00:59,531
And then, you might see another one, which
we have a slightly different pattern.

14
00:00:59,531 --> 00:01:04,601
So, that the, the, the pattern that one is
seeing is, is changing over time.

15
00:01:04,601 --> 00:01:09,603
But, the important part is that each
neuron, not only gets triggered by the

16
00:01:09,603 --> 00:01:14,200
inputs and gets selected if it's chosen in
the sparse pattern,

17
00:01:14,580 --> 00:01:22,333
It also keeps track of what other neurons
were triggered just before it got

18
00:01:22,333 --> 00:01:28,956
triggered. Now, obviously, it can't keep
track of every neuron, but it keeps track

19
00:01:28,956 --> 00:01:35,914
of a few. And it keeps track of those few
based on the connections that it makes to

20
00:01:35,914 --> 00:01:41,782
nearby neurons by its dendrites,
Or rather, not necessarily nearby ones,

21
00:01:41,782 --> 00:01:46,310
but those that have, are nearby it on its
dendrites.

22
00:01:46,310 --> 00:01:51,485
So, each cell tracks the previous
configuration again, sparsely.

23
00:01:51,485 --> 00:01:57,274
It doesn't keep track of all the cells
which were active in the previous time

24
00:01:57,274 --> 00:02:00,984
step, but only a few.
And it does this via these synapse

25
00:02:00,984 --> 00:02:04,288
condition, connections which are made
along these dendrites.

26
00:02:04,288 --> 00:02:09,346
So, each cell is connected to a number of
other cells via, or is potentially

27
00:02:09,346 --> 00:02:14,607
connected via this dendrite, and the
synapses are the actual connections which

28
00:02:14,607 --> 00:02:19,935
are not permanent but get learned over
time and this is how the learning takes

29
00:02:19,935 --> 00:02:28,473
place.
Assume that the predicted values are the

30
00:02:28,473 --> 00:02:35,592
ones in yellow that come from the, from a
variety of different predictions by all

31
00:02:35,592 --> 00:02:41,183
the different cells which are actually
firing, and the ones which actually occur

32
00:02:41,183 --> 00:02:46,424
the next time step are the red ones.
Then, those neurons which predicted the

33
00:02:46,424 --> 00:02:53,757
red values based on their previous inputs
would get the synopsis that corresponded

34
00:02:53,757 --> 00:02:59,925
to the actual red values stranded.
So, the neuron saw pattern and based on

35
00:02:59,925 --> 00:03:06,759
its history, it predicted that it, another
pattern would take place in the next step.

36
00:03:06,759 --> 00:03:13,094
There are many possible predictions, but
some of them actually come true. And,

37
00:03:13,094 --> 00:03:19,679
based on what's coming, what comes true,
those synopsis which predicted the ones

38
00:03:19,679 --> 00:03:23,680
which came true get reinforced and
strengthened.

39
00:03:23,680 --> 00:03:29,140
So,
A new run needs to make predictions,

40
00:03:29,560 --> 00:03:32,825
But it needs to store this prediction
somewhere.

41
00:03:32,825 --> 00:03:36,500
Obviously, it can only store one value in
one layer. So,

42
00:03:37,100 --> 00:03:44,585
Instead of one layer that each particular
cell consists of a column of neurons, and

43
00:03:44,585 --> 00:03:51,348
these neurons the, the column actually
consists of the predictions for that

44
00:03:51,348 --> 00:03:56,940
particular cell position over time, and
this is how this works.

45
00:03:57,200 --> 00:04:04,853
Think about the sparse pattern consisting
of 40 active bits out of 2,000.

46
00:04:04,853 --> 00:04:10,410
And, let's suppose that there are ten
sales per column,

47
00:04:10,410 --> 00:04:17,010
And there are ten to the 40 ways to
represent the same input in different

48
00:04:17,010 --> 00:04:21,024
contexts.
Now, ten to the 40 is a large number.

49
00:04:21,024 --> 00:04:28,428
So, each context corresponds to a set of
neighboring cells firing which set depends

50
00:04:28,428 --> 00:04:34,750
on which synapse, which, which synapses
and then write the segments are capturing

51
00:04:34,750 --> 00:04:42,394
that context. But, depending on which
context this cell is firing in, the

52
00:04:42,394 --> 00:04:50,631
particular column gets activated, and not
just the, the lowest one but that

53
00:04:50,631 --> 00:04:56,195
particular one. And similarly, for every
cell in, in the, in, in the in the

54
00:04:56,195 --> 00:05:00,268
pattern.
As a result, the, this pattern can be

55
00:05:00,268 --> 00:05:07,296
stored in ten to the 40 different context.
And, so it's able, one is able to remember

56
00:05:07,296 --> 00:05:14,240
sequences using this representation.
So, sequence learning is the second most

57
00:05:14,240 --> 00:05:18,220
important part of hierarchical temporal
memory.

58
00:05:18,820 --> 00:05:25,018
And finally,
There, that what we just saw was only one

59
00:05:25,018 --> 00:05:30,218
tiny region of the model of the neo-cortex
of the brain.

60
00:05:30,218 --> 00:05:36,173
And each region itself is connected to
other regions in a hierarchy.

61
00:05:36,173 --> 00:05:43,135
Each region, region consists of many, many
columns of cells like the 2,000 columns of

62
00:05:43,386 --> 00:05:47,245
ten cells each that we touched, just
mentioned.

63
00:05:47,245 --> 00:05:52,193
There are many, many regions in the, in
the overall model.

64
00:05:52,193 --> 00:05:59,174
Each region is activated by bottom-up
sensory input, either directly from the

65
00:05:59,174 --> 00:06:04,839
measurements which are taken of a system,
or a visual system, or, or, or whatever

66
00:06:04,839 --> 00:06:10,504
one is measuring. Or, from the previous
layer in the hierarchy, as well as

67
00:06:10,504 --> 00:06:16,248
top-down feedback because every layer is
also making predictions, and the

68
00:06:16,248 --> 00:06:19,710
predictions go upwards as well as
downwards.

69
00:06:19,710 --> 00:06:23,003
This is actually a lot like what the brain
is.

70
00:06:23,003 --> 00:06:27,944
And, in fact, neurological studies have
shown that more than 75% of the

71
00:06:27,944 --> 00:06:33,386
connections go back down towards the
senses as opposed to the 25% which come

72
00:06:33,386 --> 00:06:37,324
from the senses.
So, what, what one is actually seeing is

73
00:06:37,324 --> 00:06:42,481
actually what one is imagining.
It's not really purely bottom-up, data

74
00:06:42,481 --> 00:06:47,204
driven perception.
One sees some pixels, one interprets them

75
00:06:47,204 --> 00:06:51,928
much more strongly,
And those downward predictions are far

76
00:06:51,928 --> 00:06:56,000
stronger than the upward ones in the
actual brain.

77
00:06:56,000 --> 00:07:00,903
And, the hierarchical temporal memory
mimics some of these aspects.

78
00:07:00,903 --> 00:07:07,218
The interesting thing is that hierarchical
temporal memory, even though it's a neural

79
00:07:07,218 --> 00:07:13,682
model, has been shown to be mathematically
equivalent to a deep belief network which

80
00:07:13,682 --> 00:07:20,243
is probablistic graphical model.
More interestingly or equally

81
00:07:20,243 --> 00:07:24,949
interestingly,
The hierarchical temporal memory is not

82
00:07:24,949 --> 00:07:30,653
just a abstract model of how the brain
works for purely scientific purposes, but

83
00:07:30,653 --> 00:07:33,884
it has been shown to work on real
applications.

84
00:07:33,884 --> 00:07:39,520
For example, the applications that Jeff
Hawkins talks about are very much big data

85
00:07:39,520 --> 00:07:44,400
analytic applications, where one is
talking about large volumes of data

86
00:07:44,400 --> 00:07:49,280
streams that one is picking up from many,
many devices all over the web.

87
00:07:49,280 --> 00:07:55,443
The hierarchical temporal memory based
models are then able to predict future

88
00:07:55,443 --> 00:08:01,449
values of a data stream, detect anomalies,
and possibly in the future, control

89
00:08:01,449 --> 00:08:06,822
actions based on these models.
An example, some examples are, you know,

90
00:08:06,822 --> 00:08:12,433
energy pricing, energy demand, product
forecasting, machine efficiency, ad

91
00:08:12,433 --> 00:08:17,964
netbook return,], server loads,
All of these have been shown to actually

92
00:08:17,964 --> 00:08:23,518
work.
An example that he uses in his talk is the

93
00:08:23,832 --> 00:08:28,635
regional energy load during different
parts of the day.

94
00:08:28,635 --> 00:08:33,309
And as you can see, these are weekends and
these are weekdays.

95
00:08:33,309 --> 00:08:38,826
And, you know, the, the blues are the
predicted values, the reds are the actual

96
00:08:38,826 --> 00:08:42,581
values.
And you can see that this predicted value

97
00:08:42,581 --> 00:08:47,562
that it's learned from the data all by
itself is fairly accurate.

98
00:08:47,562 --> 00:08:51,777
Think about a linear regression trying to
predict this,

99
00:08:51,777 --> 00:08:58,367
Or think about even a complicated function
f trying to be fitted to predict this kind

100
00:08:58,367 --> 00:09:03,719
of a time series.
So the HTM in my opinion, represents a

101
00:09:03,719 --> 00:09:10,080
fairly interesting area where neural
networks are coming back, getting

102
00:09:10,080 --> 00:09:15,630
mathematically modeled as deep belief
networks which have shown great promise in

103
00:09:15,836 --> 00:09:19,400
most parts of prediction and learning, as
we've seen.

104
00:09:19,760 --> 00:09:27,582
At the same time, the HTM architecture is
uniform,

105
00:09:27,582 --> 00:09:34,970
Very plastic architecture, no complicated
techniques apart from a very uniform

106
00:09:34,970 --> 00:09:40,368
learning system,
And it's able to learn a wide variety of

107
00:09:40,368 --> 00:09:45,387
time series patterns.
Much like the brain's plasticity.

108
00:09:45,387 --> 00:09:50,820
As many people have found through actual
clinical examples,

109
00:09:50,820 --> 00:09:57,585
For example, parts of the brain which we
all use to see are actually used by blind

110
00:09:57,585 --> 00:10:03,360
people to augment their hearing.
And that's been proven through MRI

111
00:10:03,360 --> 00:10:07,015
experiments.
So, the same architecture, learning many

112
00:10:07,015 --> 00:10:13,370
different types of patterns is really what
we're trying to look at, look for in a in

113
00:10:13,370 --> 00:10:18,837
a future web intelligence architecture.
And HTM, certainly, points the way towards

114
00:10:19,247 --> 00:10:23,005
some of these areas.
However, there is something missing.

115
00:10:23,005 --> 00:10:26,696
And, we'll just come to that, in the next
section.

116
00:10:26,696 --> 00:10:31,411
Htm doesn't appear to solve all the
problems, or actually far from it.

117
00:10:31,411 --> 00:10:36,878
Very important pieces are still missing
and they remain open problems, and we'll

118
00:10:36,878 --> 00:10:38,040
talk about those.