1 00:00:00,000 --> 00:00:05,035 So, after sparse pattern, the next important piece of hierarchical temporal 2 00:00:05,035 --> 00:00:08,029 memory is that it doesn't just learn images, 3 00:00:08,029 --> 00:00:13,201 Many neural networks learn images. And, pattern classification was one of the 4 00:00:13,201 --> 00:00:18,680 first applications of neural networks, and we ran into trouble because of situations 5 00:00:18,680 --> 00:00:23,624 that were not linearly separable. And then, sort of waned and we found different 6 00:00:23,624 --> 00:00:27,380 techniques for image and pattern recognition over the years. 7 00:00:27,380 --> 00:00:32,200 But, what hierarchical temporal memory does is, it learns sequences of patterns 8 00:00:32,200 --> 00:00:35,580 which is another, the, the other most important feature. 9 00:00:35,580 --> 00:00:40,426 So, let's, let's look at this particular, image. 10 00:00:40,426 --> 00:00:44,767 Again, this is taken from Jeff Hawkins talk. 11 00:00:45,070 --> 00:00:49,797 And after a second or ao, or a few milliseconds, 12 00:00:49,797 --> 00:00:54,259 You see another image which is a slightly shifted version of this. 13 00:00:54,259 --> 00:00:59,531 And then, you might see another one, which we have a slightly different pattern. 14 00:00:59,531 --> 00:01:04,601 So, that the, the, the pattern that one is seeing is, is changing over time. 15 00:01:04,601 --> 00:01:09,603 But, the important part is that each neuron, not only gets triggered by the 16 00:01:09,603 --> 00:01:14,200 inputs and gets selected if it's chosen in the sparse pattern, 17 00:01:14,580 --> 00:01:22,333 It also keeps track of what other neurons were triggered just before it got 18 00:01:22,333 --> 00:01:28,956 triggered. Now, obviously, it can't keep track of every neuron, but it keeps track 19 00:01:28,956 --> 00:01:35,914 of a few. And it keeps track of those few based on the connections that it makes to 20 00:01:35,914 --> 00:01:41,782 nearby neurons by its dendrites, Or rather, not necessarily nearby ones, 21 00:01:41,782 --> 00:01:46,310 but those that have, are nearby it on its dendrites. 22 00:01:46,310 --> 00:01:51,485 So, each cell tracks the previous configuration again, sparsely. 23 00:01:51,485 --> 00:01:57,274 It doesn't keep track of all the cells which were active in the previous time 24 00:01:57,274 --> 00:02:00,984 step, but only a few. And it does this via these synapse 25 00:02:00,984 --> 00:02:04,288 condition, connections which are made along these dendrites. 26 00:02:04,288 --> 00:02:09,346 So, each cell is connected to a number of other cells via, or is potentially 27 00:02:09,346 --> 00:02:14,607 connected via this dendrite, and the synapses are the actual connections which 28 00:02:14,607 --> 00:02:19,935 are not permanent but get learned over time and this is how the learning takes 29 00:02:19,935 --> 00:02:28,473 place. Assume that the predicted values are the 30 00:02:28,473 --> 00:02:35,592 ones in yellow that come from the, from a variety of different predictions by all 31 00:02:35,592 --> 00:02:41,183 the different cells which are actually firing, and the ones which actually occur 32 00:02:41,183 --> 00:02:46,424 the next time step are the red ones. Then, those neurons which predicted the 33 00:02:46,424 --> 00:02:53,757 red values based on their previous inputs would get the synopsis that corresponded 34 00:02:53,757 --> 00:02:59,925 to the actual red values stranded. So, the neuron saw pattern and based on 35 00:02:59,925 --> 00:03:06,759 its history, it predicted that it, another pattern would take place in the next step. 36 00:03:06,759 --> 00:03:13,094 There are many possible predictions, but some of them actually come true. And, 37 00:03:13,094 --> 00:03:19,679 based on what's coming, what comes true, those synopsis which predicted the ones 38 00:03:19,679 --> 00:03:23,680 which came true get reinforced and strengthened. 39 00:03:23,680 --> 00:03:29,140 So, A new run needs to make predictions, 40 00:03:29,560 --> 00:03:32,825 But it needs to store this prediction somewhere. 41 00:03:32,825 --> 00:03:36,500 Obviously, it can only store one value in one layer. So, 42 00:03:37,100 --> 00:03:44,585 Instead of one layer that each particular cell consists of a column of neurons, and 43 00:03:44,585 --> 00:03:51,348 these neurons the, the column actually consists of the predictions for that 44 00:03:51,348 --> 00:03:56,940 particular cell position over time, and this is how this works. 45 00:03:57,200 --> 00:04:04,853 Think about the sparse pattern consisting of 40 active bits out of 2,000. 46 00:04:04,853 --> 00:04:10,410 And, let's suppose that there are ten sales per column, 47 00:04:10,410 --> 00:04:17,010 And there are ten to the 40 ways to represent the same input in different 48 00:04:17,010 --> 00:04:21,024 contexts. Now, ten to the 40 is a large number. 49 00:04:21,024 --> 00:04:28,428 So, each context corresponds to a set of neighboring cells firing which set depends 50 00:04:28,428 --> 00:04:34,750 on which synapse, which, which synapses and then write the segments are capturing 51 00:04:34,750 --> 00:04:42,394 that context. But, depending on which context this cell is firing in, the 52 00:04:42,394 --> 00:04:50,631 particular column gets activated, and not just the, the lowest one but that 53 00:04:50,631 --> 00:04:56,195 particular one. And similarly, for every cell in, in the, in, in the in the 54 00:04:56,195 --> 00:05:00,268 pattern. As a result, the, this pattern can be 55 00:05:00,268 --> 00:05:07,296 stored in ten to the 40 different context. And, so it's able, one is able to remember 56 00:05:07,296 --> 00:05:14,240 sequences using this representation. So, sequence learning is the second most 57 00:05:14,240 --> 00:05:18,220 important part of hierarchical temporal memory. 58 00:05:18,820 --> 00:05:25,018 And finally, There, that what we just saw was only one 59 00:05:25,018 --> 00:05:30,218 tiny region of the model of the neo-cortex of the brain. 60 00:05:30,218 --> 00:05:36,173 And each region itself is connected to other regions in a hierarchy. 61 00:05:36,173 --> 00:05:43,135 Each region, region consists of many, many columns of cells like the 2,000 columns of 62 00:05:43,386 --> 00:05:47,245 ten cells each that we touched, just mentioned. 63 00:05:47,245 --> 00:05:52,193 There are many, many regions in the, in the overall model. 64 00:05:52,193 --> 00:05:59,174 Each region is activated by bottom-up sensory input, either directly from the 65 00:05:59,174 --> 00:06:04,839 measurements which are taken of a system, or a visual system, or, or, or whatever 66 00:06:04,839 --> 00:06:10,504 one is measuring. Or, from the previous layer in the hierarchy, as well as 67 00:06:10,504 --> 00:06:16,248 top-down feedback because every layer is also making predictions, and the 68 00:06:16,248 --> 00:06:19,710 predictions go upwards as well as downwards. 69 00:06:19,710 --> 00:06:23,003 This is actually a lot like what the brain is. 70 00:06:23,003 --> 00:06:27,944 And, in fact, neurological studies have shown that more than 75% of the 71 00:06:27,944 --> 00:06:33,386 connections go back down towards the senses as opposed to the 25% which come 72 00:06:33,386 --> 00:06:37,324 from the senses. So, what, what one is actually seeing is 73 00:06:37,324 --> 00:06:42,481 actually what one is imagining. It's not really purely bottom-up, data 74 00:06:42,481 --> 00:06:47,204 driven perception. One sees some pixels, one interprets them 75 00:06:47,204 --> 00:06:51,928 much more strongly, And those downward predictions are far 76 00:06:51,928 --> 00:06:56,000 stronger than the upward ones in the actual brain. 77 00:06:56,000 --> 00:07:00,903 And, the hierarchical temporal memory mimics some of these aspects. 78 00:07:00,903 --> 00:07:07,218 The interesting thing is that hierarchical temporal memory, even though it's a neural 79 00:07:07,218 --> 00:07:13,682 model, has been shown to be mathematically equivalent to a deep belief network which 80 00:07:13,682 --> 00:07:20,243 is probablistic graphical model. More interestingly or equally 81 00:07:20,243 --> 00:07:24,949 interestingly, The hierarchical temporal memory is not 82 00:07:24,949 --> 00:07:30,653 just a abstract model of how the brain works for purely scientific purposes, but 83 00:07:30,653 --> 00:07:33,884 it has been shown to work on real applications. 84 00:07:33,884 --> 00:07:39,520 For example, the applications that Jeff Hawkins talks about are very much big data 85 00:07:39,520 --> 00:07:44,400 analytic applications, where one is talking about large volumes of data 86 00:07:44,400 --> 00:07:49,280 streams that one is picking up from many, many devices all over the web. 87 00:07:49,280 --> 00:07:55,443 The hierarchical temporal memory based models are then able to predict future 88 00:07:55,443 --> 00:08:01,449 values of a data stream, detect anomalies, and possibly in the future, control 89 00:08:01,449 --> 00:08:06,822 actions based on these models. An example, some examples are, you know, 90 00:08:06,822 --> 00:08:12,433 energy pricing, energy demand, product forecasting, machine efficiency, ad 91 00:08:12,433 --> 00:08:17,964 netbook return,], server loads, All of these have been shown to actually 92 00:08:17,964 --> 00:08:23,518 work. An example that he uses in his talk is the 93 00:08:23,832 --> 00:08:28,635 regional energy load during different parts of the day. 94 00:08:28,635 --> 00:08:33,309 And as you can see, these are weekends and these are weekdays. 95 00:08:33,309 --> 00:08:38,826 And, you know, the, the blues are the predicted values, the reds are the actual 96 00:08:38,826 --> 00:08:42,581 values. And you can see that this predicted value 97 00:08:42,581 --> 00:08:47,562 that it's learned from the data all by itself is fairly accurate. 98 00:08:47,562 --> 00:08:51,777 Think about a linear regression trying to predict this, 99 00:08:51,777 --> 00:08:58,367 Or think about even a complicated function f trying to be fitted to predict this kind 100 00:08:58,367 --> 00:09:03,719 of a time series. So the HTM in my opinion, represents a 101 00:09:03,719 --> 00:09:10,080 fairly interesting area where neural networks are coming back, getting 102 00:09:10,080 --> 00:09:15,630 mathematically modeled as deep belief networks which have shown great promise in 103 00:09:15,836 --> 00:09:19,400 most parts of prediction and learning, as we've seen. 104 00:09:19,760 --> 00:09:27,582 At the same time, the HTM architecture is uniform, 105 00:09:27,582 --> 00:09:34,970 Very plastic architecture, no complicated techniques apart from a very uniform 106 00:09:34,970 --> 00:09:40,368 learning system, And it's able to learn a wide variety of 107 00:09:40,368 --> 00:09:45,387 time series patterns. Much like the brain's plasticity. 108 00:09:45,387 --> 00:09:50,820 As many people have found through actual clinical examples, 109 00:09:50,820 --> 00:09:57,585 For example, parts of the brain which we all use to see are actually used by blind 110 00:09:57,585 --> 00:10:03,360 people to augment their hearing. And that's been proven through MRI 111 00:10:03,360 --> 00:10:07,015 experiments. So, the same architecture, learning many 112 00:10:07,015 --> 00:10:13,370 different types of patterns is really what we're trying to look at, look for in a in 113 00:10:13,370 --> 00:10:18,837 a future web intelligence architecture. And HTM, certainly, points the way towards 114 00:10:19,247 --> 00:10:23,005 some of these areas. However, there is something missing. 115 00:10:23,005 --> 00:10:26,696 And, we'll just come to that, in the next section. 116 00:10:26,696 --> 00:10:31,411 Htm doesn't appear to solve all the problems, or actually far from it. 117 00:10:31,411 --> 00:10:36,878 Very important pieces are still missing and they remain open problems, and we'll 118 00:10:36,878 --> 00:10:38,040 talk about those.