1 00:00:06,000 --> 00:00:10,900 Now we're going to look at another form of data called an arbitrary arity tree. 2 00:00:10,900 --> 00:00:15,125 And as we'll see, this form of data is characterized by having two cycles in the 3 00:00:15,125 --> 00:00:20,068 type comments. It'll be the most complex form of data 4 00:00:20,068 --> 00:00:24,123 we've seen so far. But don't worry, a pretty incredible 5 00:00:24,123 --> 00:00:28,792 thing is going to happen here. Because of everything we've learned about 6 00:00:28,792 --> 00:00:32,574 type comments, and references, and templates, and trusting the natural 7 00:00:32,574 --> 00:00:36,418 recursion, desinging functions that operate on arbitrary arity trees is 8 00:00:36,418 --> 00:00:42,930 actually going to be very simple. It's only to be a very little bit harder. 9 00:00:42,930 --> 00:00:44,830 Than designing functions that operate on lists. 10 00:00:46,860 --> 00:00:49,250 That's what you'll see, as we work through these next couple of videos. 11 00:00:51,070 --> 00:00:54,840 So here's what I mean by file system, and again this could just as well be a photo 12 00:00:54,840 --> 00:00:58,552 organizer or something like that, but when I look at the files I'm organizing 13 00:00:58,552 --> 00:01:02,660 for this class. There's a desktop. 14 00:01:02,660 --> 00:01:06,244 On my desktop there's these folders, and there's a folder called Mook, and in 15 00:01:06,244 --> 00:01:09,772 there there's a folder called Current, and in there there's a whole bunch of 16 00:01:09,772 --> 00:01:15,450 other folder including one called Starters, and then there's week three. 17 00:01:15,450 --> 00:01:20,775 And here is all the starters that you had a little earlier in the course. 18 00:01:20,775 --> 00:01:24,991 So what we need to do is to come up with one or more data definitions, that can 19 00:01:24,991 --> 00:01:29,869 help us represent an information structure like this. 20 00:01:31,140 --> 00:01:35,126 We're going to simplify it a little bit. We're not going to worry about the actual 21 00:01:35,126 --> 00:01:38,730 contents of the files, we're just going to worry about a directory 22 00:01:38,730 --> 00:01:43,310 structure like this, that has files in it. 23 00:01:43,310 --> 00:01:46,958 And for the actual contents of the files, let's just assume they hold a single 24 00:01:46,958 --> 00:01:50,570 number, a single integer is all each file holds. 25 00:01:51,740 --> 00:01:55,304 If I redraw it this way then it looks like, or at least to a computer 26 00:01:55,304 --> 00:02:00,584 scientist, it looks like a tree. This is an arbitrary arity tree and I'll, 27 00:02:00,584 --> 00:02:05,080 I'll say why it's that in a minute. First let me say why it looks to me like 28 00:02:05,080 --> 00:02:07,811 a tree. It's because computer scientists draw 29 00:02:07,811 --> 00:02:10,506 trees upside down. We're kind of drawing the roots or 30 00:02:10,506 --> 00:02:14,206 something. And let me just point something out about 31 00:02:14,206 --> 00:02:17,055 this. What I'm going to say about this tree, or 32 00:02:17,055 --> 00:02:22,870 this model of an arbitrary area tree, is that each element can have sub elements. 33 00:02:22,870 --> 00:02:27,095 So for example, mooc has a sub element or a sub folder current, and current has a 34 00:02:27,095 --> 00:02:32,542 number of sub folders. And Starters has sub-folders. 35 00:02:32,542 --> 00:02:37,340 But Aisle-Starter just has data. It's a file that just has data. 36 00:02:37,340 --> 00:02:40,298 And to simplify things for now we're going to say that the only data it can 37 00:02:40,298 --> 00:02:44,264 have is just a single integer. We're not going to model storing the 38 00:02:44,264 --> 00:02:50,440 actual contents of the files in here. Now why is it called arbitrary arity? 39 00:02:50,440 --> 00:02:54,710 Well unlike lists which were arbitrarily long in one dimension or arbitrarily wide 40 00:02:54,710 --> 00:02:58,248 in one dimension, this arbitrary arity tree is arbitrary sized in two 41 00:02:58,248 --> 00:03:03,056 dimensions. And given folder or directory like w03 42 00:03:03,056 --> 00:03:07,270 can have an arbitrary number of elements in it. 43 00:03:07,270 --> 00:03:11,046 Current has a number of them, starters has a number of them, w03 has a number of 44 00:03:11,046 --> 00:03:15,450 them. So it can be arbitrarily wide. 45 00:03:15,450 --> 00:03:18,480 In addition, the structure can be arbitrarily deep. 46 00:03:18,480 --> 00:03:23,705 This particular one is one, two, three, four, five deep but it could be deeper. 47 00:03:23,705 --> 00:03:27,658 Remember arbitrary doesn't mean large and it doesn't mean random, it just means 48 00:03:27,658 --> 00:03:31,930 that before we start out, we don't know how big it will be. 49 00:03:31,930 --> 00:03:35,950 Now, in order to deal with this arbitrary size in two dimensions, structure that 50 00:03:35,950 --> 00:03:39,446 this has. The key thing that we're about to see, is 51 00:03:39,446 --> 00:03:43,976 that it's going to require two cycles in the type reference graph. 52 00:03:43,976 --> 00:03:47,920 Remember, in order to have an arbitrarily-sized list, we had one self 53 00:03:47,920 --> 00:03:51,730 reference cycle in the type reference graph. 54 00:03:51,730 --> 00:03:53,733 Here we're going to see that we end up having two of them. 55 00:03:55,540 --> 00:04:01,020 So now let's turn to DrRacket and start looking at that. 56 00:04:01,020 --> 00:04:05,484 Now I'm in fs-starter.rkt, and what I have here is the beginning of two data 57 00:04:05,484 --> 00:04:12,060 definitions that can be used to represent arbitrary [UNKNOWN] like this one. 58 00:04:13,270 --> 00:04:17,428 So first is the structure definition, elt, which has three fields: main, data, 59 00:04:17,428 --> 00:04:20,066 and subs. And of course we have to keep reading to 60 00:04:20,066 --> 00:04:23,671 know what that's about. And there's a type element and it says, 61 00:04:23,671 --> 00:04:28,696 element is a make element, string integer, list of element. 62 00:04:28,696 --> 00:04:32,433 An element is an element in the file system. 63 00:04:32,433 --> 00:04:36,525 So that's going to be something like from our picture [UNKNOWN] or current or 64 00:04:36,525 --> 00:04:41,798 lecture exercises or starters. And what I'm saying here, because we said 65 00:04:41,798 --> 00:04:46,334 in the analysis that an element could either have sub-elements, or it could 66 00:04:46,334 --> 00:04:50,139 have data. I'm saying here that there's data which 67 00:04:50,139 --> 00:04:53,665 can be an integer. And there's subs, which is going to be an 68 00:04:53,665 --> 00:04:57,020 arbitrary number of elements, and it says if the data is zero, then subs is 69 00:04:57,020 --> 00:05:02,264 considered to be a list of sub elements. If data is not zero, then subs is 70 00:05:02,264 --> 00:05:06,242 ignored. There are other ways to do this, where, 71 00:05:06,242 --> 00:05:09,714 instead of having an element have both fields, in other words both data and 72 00:05:09,714 --> 00:05:13,430 subs. You make two different kinds of elements. 73 00:05:13,430 --> 00:05:16,200 I thought this was simpler for our purposes for now. 74 00:05:16,200 --> 00:05:20,050 And then, what's list development, well, list development is just an ordinary list 75 00:05:20,050 --> 00:05:23,490 of type like we've seen several times now. 76 00:05:23,490 --> 00:05:27,910 It's just a list of file system elements. Now let's see if this can build up the 77 00:05:27,910 --> 00:05:32,486 structure we want. So what I'm saying here, is f1 is an 78 00:05:32,486 --> 00:05:36,590 element called f one. It has a nonzero data. 79 00:05:36,590 --> 00:05:41,954 So its like an ordinary file. That's why I called it f and its subs are 80 00:05:41,954 --> 00:05:45,641 supposed to be empty. Its subs are ignored basically so I just 81 00:05:45,641 --> 00:05:49,546 put empty. So that's F1. 82 00:05:49,546 --> 00:05:52,690 F2 is just like F1. It's another file. 83 00:05:52,690 --> 00:05:55,564 It happens to have the contents with the date of 2. 84 00:05:55,564 --> 00:05:59,682 And F3 is the same. Now, what's D4? 85 00:05:59,682 --> 00:06:03,762 Well, D4 has 0 for the data, so that means we're going to pay attention to the 86 00:06:03,762 --> 00:06:08,130 sub. And what subs does it have? 87 00:06:08,130 --> 00:06:11,540 Well, it has two subs, F1 and F2. So that looks like this. 88 00:06:11,540 --> 00:06:21,430 And then, D5 has a single sub, F3, and then D6 has both D4 and D5. 89 00:06:21,430 --> 00:06:25,918 So that little set of examples there. Runs up this directory structure here, 90 00:06:25,918 --> 00:06:29,812 and I think we could see now pretty clearly, it looks like we can build an 91 00:06:29,812 --> 00:06:35,055 arbitrary arity tree here. Let's see why that is. 92 00:06:35,055 --> 00:06:40,390 And if we look at list development, list development has a self reference in it. 93 00:06:40,390 --> 00:06:44,410 And that's what's allowing a directory's list of sub elements to be arbitrarily 94 00:06:44,410 --> 00:06:47,435 long. That's what lets this list have F1 and F2 95 00:06:47,435 --> 00:06:52,690 and this one have D4 and D5 and clearly those could be as long as they want. 96 00:06:52,690 --> 00:06:57,485 Now, in addition to that self reference. Notice that there's a reference here from 97 00:06:57,485 --> 00:07:02,046 the middle of the list of element type back to the element type. 98 00:07:02,046 --> 00:07:07,684 We've seen that kind of reference before. But hang on, 'cuz there's something else 99 00:07:07,684 --> 00:07:13,130 going on here. There's also a reference from the element 100 00:07:13,130 --> 00:07:19,328 type down to list development. Now, those two references together form 101 00:07:19,328 --> 00:07:23,280 what's called a mutual references. So we're going to label them, MR. 102 00:07:23,280 --> 00:07:29,570 Because you can go from the definition of list development, up to element. 103 00:07:30,740 --> 00:07:35,717 And the definition of element down to list development, and there's a cycle 104 00:07:35,717 --> 00:07:41,418 there, a mutual reference cycle. So, in addition to the soft reference 105 00:07:41,418 --> 00:07:45,770 cycle, which is letting one directory have an arbitrary number of immediate 106 00:07:45,770 --> 00:07:50,858 subdirectories. The mutual reference cycle between list 107 00:07:50,858 --> 00:07:55,045 development and element, is letting that be arbitrarily deep. 108 00:07:55,045 --> 00:08:00,652 So, D6 can have an arbitrary number of sub-directories, D4 and 5, and each of 109 00:08:00,652 --> 00:08:07,370 those can have an arbitrary number of sub-directories. 110 00:08:07,370 --> 00:08:11,122 And so on, and so on, and so on. And so what you can see here, is that 111 00:08:11,122 --> 00:08:15,802 these two cycles in the type comments, the self reference cycle and the mutual 112 00:08:15,802 --> 00:08:21,120 reference cycle, support an arbitrary arity tree. 113 00:08:21,120 --> 00:08:24,630 A tree that can be arbitrarily deep and arbitrarily wide at any point. 114 00:08:25,700 --> 00:08:30,410 One quick note about these arrows and the way you label mutual reference arrows. 115 00:08:30,410 --> 00:08:33,710 The way I do it is, first I draw all the arrows. 116 00:08:33,710 --> 00:08:35,810 Then I label all the self reference arrows. 117 00:08:35,810 --> 00:08:40,088 Then I put my finger on an arrow, and I start following it around through the 118 00:08:40,088 --> 00:08:43,220 types. And if I ever come back to that same 119 00:08:43,220 --> 00:08:46,575 arrow, then all those arrows along the way were part of a mutual reference 120 00:08:46,575 --> 00:08:50,628 cycle. And then any arrows that aren't part of 121 00:08:50,628 --> 00:08:53,488 the usual reference cycle and aren't part of the self reference cycle are just 122 00:08:53,488 --> 00:08:57,646 ordinary reference arrows. In this case, when there's just two types 123 00:08:57,646 --> 00:09:00,148 it's easy. But sometimes you can have mutual 124 00:09:00,148 --> 00:09:04,240 reference through three or four or five or six or more types. 125 00:09:04,240 --> 00:09:06,150 And then it's a little harder to get the arrows right. 126 00:09:06,150 --> 00:09:10,190 It's very important to get them right. We're going to see it helps us a lot with 127 00:09:10,190 --> 00:09:13,810 the templating that we'll do in the next video.