1 00:00:00,000 --> 00:00:05,637 Prediction, decision, control, lots of techniques, many different ways of 2 00:00:05,637 --> 00:00:12,161 stringing them together, deciding which techniques to use sounds very complicated. 3 00:00:12,161 --> 00:00:18,442 And, thinking about how to implement a system like a global traffic management 4 00:00:18,442 --> 00:00:24,000 system with lots of self driving cars certainly is extremely complex. 5 00:00:24,940 --> 00:00:31,074 But, there's actually a machine which is doing many more complex things everyday 6 00:00:31,074 --> 00:00:36,979 and there are about seven billion of them on this planet already. Obviously, I'm 7 00:00:36,979 --> 00:00:42,761 talking about the human brain. Not only does it do all the tasks that we 8 00:00:42,761 --> 00:00:49,144 have discussed, discussed in this course, it does them with a fairly uniform 9 00:00:49,144 --> 00:00:53,952 architecture. So, the plasticity or uniform architecture 10 00:00:53,952 --> 00:00:58,781 of the brain look something like this. I mean, there are a whole bunch of 11 00:00:58,781 --> 00:01:01,769 neurons, About a hundred billion neurons in the 12 00:01:01,769 --> 00:01:04,757 brain. And, if you look at the top most layer of 13 00:01:04,757 --> 00:01:08,890 the brain, a few millimeters thick, It's called the, the neocortex. 14 00:01:08,890 --> 00:01:13,150 Which is believed responsible for much of our conscious thought. 15 00:01:13,150 --> 00:01:18,709 The way it's organized is, is in bunches of neurons which are arranged in columns, 16 00:01:18,709 --> 00:01:21,797 so there's a vertical arrangement of neurons. 17 00:01:21,797 --> 00:01:27,081 And these neurons have connections vertically as well as horizontally across 18 00:01:27,081 --> 00:01:29,140 columns. That's very important, 19 00:01:29,140 --> 00:01:36,410 That's one element of that structure. Each neuron, on the other hand, 20 00:01:36,410 --> 00:01:40,230 Looks something like this. The most of the white matter in one's 21 00:01:40,230 --> 00:01:43,752 brain is consisting of the connections between the neurons. 22 00:01:43,752 --> 00:01:48,766 The neuron body is actually a small fraction of that of the gray matter which 23 00:01:48,766 --> 00:01:53,407 is there in the brain. And the connections between neurons look 24 00:01:53,407 --> 00:01:57,868 something like this. These are called dendrites, and this is 25 00:01:57,868 --> 00:02:01,876 the axon. And, a dendrite connects to other neurons 26 00:02:01,876 --> 00:02:06,484 via synapses. Synapses we've learned in the past decade 27 00:02:06,697 --> 00:02:12,737 form very rapidly between neurons that are dendrites which are close to, dendrites 28 00:02:12,737 --> 00:02:18,849 and axons which are close to each other, and can, can, can form in minutes and can, 29 00:02:19,062 --> 00:02:24,321 can decay over time, and very rapidly as well. So, there's another important 30 00:02:24,321 --> 00:02:30,256 feature of, of, of the brain. An abstract model of the neuron which is 31 00:02:30,256 --> 00:02:35,327 what is used by hierarchical temporal memory, which is what we're going to talk 32 00:02:35,327 --> 00:02:41,278 about, which is a model of the brain profounded by Jeff Hawkins. and he talks 33 00:02:41,278 --> 00:02:46,737 about this in a recent talk at the International Symposium on, on Computer 34 00:02:46,737 --> 00:02:50,885 Architecture, funny enough just in June this year. 35 00:02:50,885 --> 00:02:56,854 So, the model that he uses is an abstract model of the neuron which looks lot like a 36 00:02:56,854 --> 00:03:01,939 neural element in a neural network. However, it has some very important 37 00:03:01,939 --> 00:03:07,905 differences and we'll explain these, this structure in the next few minutes as we go 38 00:03:07,905 --> 00:03:10,357 along. The first important feature of 39 00:03:10,357 --> 00:03:17,200 hierarchical temporal memory is that it relies on sparse representations. 40 00:03:17,920 --> 00:03:22,353 In particular, sparse distributed representations. 41 00:03:22,353 --> 00:03:29,683 These are closely related to the sparse distributed memory that we discussed way 42 00:03:29,683 --> 00:03:36,107 back during the Locke lecture. Remember, the properties of very long bit 43 00:03:36,107 --> 00:03:41,717 sequences of zeros and ones. In particular, if we have patterns of 44 00:03:41,717 --> 00:03:47,871 thousand bits in length, we, we learned that there is a very low chance that 45 00:03:47,871 --> 00:03:55,835 patterns defer in less than 450 places. So, most of them are far apart in this 46 00:03:55,835 --> 00:04:01,480 sense, most random patterns chosen at random are far apart. 47 00:04:02,040 --> 00:04:09,066 Now, consider special types of patterns. This time we'll take 2,000 bits because 48 00:04:09,066 --> 00:04:14,135 that's the example that Jeff Hawkins uses in his lecture. 49 00:04:14,135 --> 00:04:20,450 So, we have pattern of 2,000 bits but it's forced to have only 40 ones. 50 00:04:20,450 --> 00:04:23,652 So, only two percent of these bits are ones, 51 00:04:23,652 --> 00:04:29,700 And we forced that in a particular way, which we'll describe shortly. 52 00:04:31,040 --> 00:04:34,402 But, if we do this, then lets see what happens. 53 00:04:34,402 --> 00:04:40,305 There's a very low chance of a random sparse pattern, that means another random 54 00:04:40,305 --> 00:04:44,115 pattern with only 40 ones, matching any of these ones. 55 00:04:44,115 --> 00:04:47,552 So, matching a significant number of these 40 ones. 56 00:04:47,552 --> 00:04:53,231 Just imagine, if, if twenty of these ones have to match then the chance is one by 57 00:04:53,231 --> 00:05:01,842 2,000 multiplied by itself twenty times. Even if we drop all the ten random 58 00:05:01,842 --> 00:05:07,769 positions out of these 40, right? Let's just, for example, we have a pattern 59 00:05:07,769 --> 00:05:12,398 which has 40 ones, But we decided to use, to retain only ten 60 00:05:12,398 --> 00:05:17,954 of them at random. And think about another sparse pattern of 61 00:05:17,954 --> 00:05:23,376 40 ones to start with, The chance that, after dropping ten, all 62 00:05:23,376 --> 00:05:30,694 but ten from that pattern and all but ten from our first pattern, the chance that 63 00:05:30,694 --> 00:05:34,580 any of these ten match is again, very small, 64 00:05:34,580 --> 00:05:39,426 Right? Again, it's the one by 2,000 multiplied by itself many times and you 65 00:05:39,426 --> 00:05:43,186 can work that out. Please try to work it out because I, I 66 00:05:43,186 --> 00:05:46,409 might even ask a question on this at some point. 67 00:05:46,409 --> 00:05:52,318 It's fairly simple arithmetic, not even as complex as what we did for the zero one 68 00:05:52,519 --> 00:05:57,925 sequences in sparse memory. Now, this a, this particular feature is 69 00:05:57,925 --> 00:06:01,940 exploited by hierarchical temporal memory in the following way. 70 00:06:03,660 --> 00:06:11,640 If this con, if you consider the bunch of neurons in the sheet of neurons in the 71 00:06:11,640 --> 00:06:18,556 brain, say, or models of the brain, And say these neurons are being activated 72 00:06:18,556 --> 00:06:26,536 by, by, by, by light intensity in the,, following in a particular pattern, right? 73 00:06:26,536 --> 00:06:32,843 Then, obviously, a light intensity pattern won't just have two percent of it bright. 74 00:06:32,843 --> 00:06:38,660 It will be many more pixels will be bright in this kind of an image. 75 00:06:38,660 --> 00:06:45,263 But, what this sparse representation does is, it chooses the 40 brightest in the 76 00:06:45,263 --> 00:06:49,507 same sense that it's not the 40 brightest necessarily. 77 00:06:49,507 --> 00:06:55,445 But, if, if a set of neurons is bright in this area, they will be inhibited by 78 00:06:55,445 --> 00:07:00,830 neurons which are bright nearby.. So, only one of these neurbons, neurons 79 00:07:00,830 --> 00:07:05,001 out of these bunch which are bright, will end up firing. 80 00:07:05,001 --> 00:07:11,676 And by forcing neurons to turn off because their neighborhood is also equally bright, 81 00:07:11,676 --> 00:07:17,440 we get a sparse representation for a fairly, otherwise fairly dense image. 82 00:07:17,860 --> 00:07:23,901 Now, it's not clear why this is important right now, but it'll become clearer, sort 83 00:07:23,901 --> 00:07:29,540 of, in a few minutes. What happens is that the similar scene, 84 00:07:29,540 --> 00:07:34,523 Say that a similar scene this was a particular pattern. 85 00:07:34,781 --> 00:07:41,569 And you, you view a similar scene, it will give this a very sparse pattern even after 86 00:07:41,569 --> 00:07:46,467 sub sampling. So, even after we get rid of 30 of the 40 87 00:07:46,467 --> 00:07:53,342 ones, we'll still get a similar sparse pattern from two different instances of 88 00:07:53,342 --> 00:07:59,603 the same scene or even similar scenes. Howeve, if you see completely different 89 00:07:59,603 --> 00:08:02,719 scenes, The chance that the sparse patterns that 90 00:08:02,719 --> 00:08:06,680 we'll get match in a large number of positions is very small. 91 00:08:06,680 --> 00:08:11,775 And that's the very important part of these kinds of representations, 92 00:08:11,775 --> 00:08:15,394 That the key is the number of, of, of, of base that we start with. 93 00:08:15,394 --> 00:08:21,155 The fact that we choose only a small number of them as ones, and randomly drop 94 00:08:21,155 --> 00:08:26,029 some of them each time. All these things put together make it very 95 00:08:26,029 --> 00:08:31,051 unlikely that, that this similar scenes will make the same sparse patterns. 96 00:08:31,051 --> 00:08:36,590 But, similar scenes will very likely give the same sparse patters, or sparse 97 00:08:36,590 --> 00:08:38,880 patterns that match each other.