1 00:00:01,010 --> 00:00:02,860 Welcome to the course Introduction to Databases. 2 00:00:04,000 --> 00:00:05,650 I'm Jennifer Widom from Stanford University. 3 00:00:06,940 --> 00:00:07,830 In this course we'll be learning 4 00:00:08,130 --> 00:00:09,440 about databases and the use 5 00:00:09,760 --> 00:00:11,710 of database management systems, primarily 6 00:00:12,330 --> 00:00:13,580 from the viewpoint of the designer, 7 00:00:14,280 --> 00:00:16,310 user and developer of database applications. 8 00:00:18,120 --> 00:00:20,640 I'm going to start by describing in 9 00:00:21,010 --> 00:00:22,690 one very long sentence what 10 00:00:22,860 --> 00:00:25,450 a database management system provides for applications. 11 00:00:27,550 --> 00:00:28,830 It provides a means of handling large amounts 12 00:00:29,120 --> 00:00:31,120 of data primarily, but let's looks at a little more detail. 13 00:00:32,160 --> 00:00:33,240 What it provides, in a 14 00:00:33,420 --> 00:00:35,340 long sentence, is efficient, reliable, 15 00:00:36,560 --> 00:00:39,170 convenient and safe multi-user 16 00:00:40,320 --> 00:00:41,800 storage of and access to 17 00:00:41,940 --> 00:00:44,240 massive amounts of persistent data. 18 00:00:45,340 --> 00:00:46,480 So, I'm going to go 19 00:00:46,630 --> 00:00:47,740 into each one of those adjectives in 20 00:00:47,820 --> 00:00:49,620 a little bit more detail in a moment. 21 00:00:49,810 --> 00:00:50,860 But I did want to mention that database 22 00:00:51,290 --> 00:00:53,930 systems are extremely prevalent in the world today. 23 00:00:54,750 --> 00:00:56,110 They sit behind many websites 24 00:00:56,710 --> 00:00:57,990 that will run your banking systems, 25 00:00:58,700 --> 00:01:01,270 your telecommunications, deployments of 26 00:01:01,430 --> 00:01:04,250 sensors, scientific experiments and much, much more. 27 00:01:04,770 --> 00:01:05,190 Highly prevalent. 28 00:01:05,700 --> 00:01:06,620 So let's talk a little 29 00:01:06,730 --> 00:01:08,130 bit about why database systems are 30 00:01:08,250 --> 00:01:11,360 so popular so and prevalent by looking at these seven adjectives. 31 00:01:13,240 --> 00:01:14,550 The first aspect of database 32 00:01:15,030 --> 00:01:16,100 systems is that they handle 33 00:01:16,560 --> 00:01:17,840 data at a massive scale. 34 00:01:19,010 --> 00:01:20,390 So if you think about 35 00:01:20,670 --> 00:01:21,910 the amount of data that is 36 00:01:22,020 --> 00:01:23,920 being produced today, database systems 37 00:01:24,040 --> 00:01:25,520 are handling terabytes of data, 38 00:01:25,930 --> 00:01:28,290 sometimes even terabytes of data every day. 39 00:01:29,410 --> 00:01:30,100 And one of the critical 40 00:01:30,560 --> 00:01:31,440 aspects is that the data 41 00:01:31,740 --> 00:01:33,440 that's handled by database management systems 42 00:01:33,590 --> 00:01:35,400 systems is much larger than can 43 00:01:35,590 --> 00:01:37,520 fit in the memory of a typical computing system. 44 00:01:38,380 --> 00:01:39,470 So memories are indeed growing 45 00:01:39,750 --> 00:01:41,220 very, very fast, but the 46 00:01:41,430 --> 00:01:42,350 amount of data in the world 47 00:01:42,730 --> 00:01:43,720 and data to be handled by 48 00:01:43,850 --> 00:01:45,460 database systems is growing much faster. 49 00:01:46,430 --> 00:01:48,100 So database systems are 50 00:01:48,210 --> 00:01:51,520 designed to handle data that to residing outside of memory. 51 00:01:52,760 --> 00:01:54,280 Secondly, the data that's 52 00:01:54,450 --> 00:01:56,670 handled by database management systems is typically persistent. 53 00:01:58,090 --> 00:01:59,080 And what I mean by that is 54 00:01:59,160 --> 00:02:00,060 that the data in the database 55 00:02:00,660 --> 00:02:03,210 outlives the programs that execute on that data. 56 00:02:04,180 --> 00:02:06,130 So if you run 57 00:02:06,350 --> 00:02:07,850 a typical computer program the program 58 00:02:08,350 --> 00:02:11,160 will start the variables we created. 59 00:02:11,750 --> 00:02:13,020 There will be data that's operated on 60 00:02:13,140 --> 00:02:15,760 the program, the program will finish and the data will go away. 61 00:02:16,940 --> 00:02:17,630 It's sort of the other way with databases. 62 00:02:18,230 --> 00:02:19,550 The data is what sits there 63 00:02:20,150 --> 00:02:21,300 and then program will start 64 00:02:21,700 --> 00:02:22,630 up, it will operate on the 65 00:02:22,690 --> 00:02:25,160 data, the program will stop and the data will still be there. 66 00:02:25,390 --> 00:02:27,360 Very often actually multiple programs 67 00:02:27,850 --> 00:02:29,220 will be operating on the same data. 68 00:02:31,020 --> 00:02:31,380 Next, safety. 69 00:02:32,700 --> 00:02:34,300 So database systems, since 70 00:02:34,490 --> 00:02:36,080 they run critical applications such as 71 00:02:36,170 --> 00:02:37,820 telecommunications and banking systems, 72 00:02:39,110 --> 00:02:40,500 have to have guarantees that 73 00:02:40,560 --> 00:02:41,660 the data managed by the system 74 00:02:42,120 --> 00:02:43,050 will stay in a consistent 75 00:02:44,040 --> 00:02:45,330 state, it won't be lost or 76 00:02:45,470 --> 00:02:46,850 overwritten when there are 77 00:02:47,190 --> 00:02:49,210 failures, and there can be hardware failures. 78 00:02:50,210 --> 00:02:51,490 There can be software failures. 79 00:02:53,560 --> 00:02:54,540 Even simple power outages. 80 00:02:55,560 --> 00:02:57,140 You don't want your bank 81 00:02:57,410 --> 00:02:58,560 balance to change because the 82 00:02:58,780 --> 00:03:00,070 power went out at your bank branch. 83 00:03:00,840 --> 00:03:01,910 And of course there are the problem 84 00:03:02,300 --> 00:03:04,450 of malicious users that may try to corrupt data. 85 00:03:05,160 --> 00:03:06,240 So database systems have a 86 00:03:06,480 --> 00:03:08,130 number of built in mechanisms that 87 00:03:08,270 --> 00:03:09,530 ensure that the data remains consistent, 88 00:03:10,150 --> 00:03:11,270 regardless of what happens. 89 00:03:12,880 --> 00:03:14,810 Next multi-user. So I 90 00:03:14,980 --> 00:03:18,160 mentioned that multiple programs may operate on the same database. 91 00:03:18,710 --> 00:03:20,290 And even with one program operating 92 00:03:20,760 --> 00:03:22,280 on a database, that program may 93 00:03:22,440 --> 00:03:23,710 allow many different users or 94 00:03:23,780 --> 00:03:25,960 applications to access the data concurrently. 95 00:03:27,270 --> 00:03:28,430 So when you have 96 00:03:28,650 --> 00:03:30,210 multiple applications working on 97 00:03:30,350 --> 00:03:31,640 the same data, the system 98 00:03:32,030 --> 00:03:33,580 has to have some mechanisms, again, 99 00:03:33,890 --> 00:03:35,580 to ensure that the data stays consistent. 100 00:03:36,530 --> 00:03:37,370 That you don't have, for example, 101 00:03:37,850 --> 00:03:38,930 half of a data item 102 00:03:39,480 --> 00:03:40,840 overwritten by one person and 103 00:03:41,390 --> 00:03:42,690 the other half overwritten by another. 104 00:03:43,240 --> 00:03:44,770 So there's mechanisms in database 105 00:03:45,270 --> 00:03:46,690 systems called concurrency control. 106 00:03:48,220 --> 00:03:49,660 And the idea there is 107 00:03:49,850 --> 00:03:52,910 that we control the way multiple users access the database. 108 00:03:53,480 --> 00:03:55,220 Now we don't control it by 109 00:03:55,400 --> 00:03:56,820 only having one user have 110 00:03:57,190 --> 00:03:58,320 exclusive access to the database 111 00:03:58,800 --> 00:04:00,480 or the performance would slow down considerably. 112 00:04:01,480 --> 00:04:03,070 So the control actually occurs at 113 00:04:03,150 --> 00:04:04,750 the level of the data items in the database. 114 00:04:05,300 --> 00:04:06,540 So many users might be operating 115 00:04:07,030 --> 00:04:08,930 on the same database but be 116 00:04:09,030 --> 00:04:10,870 operating on different individual data items. 117 00:04:11,230 --> 00:04:12,190 It's a little bit similar 118 00:04:12,600 --> 00:04:14,290 to, say, file system concurrency or 119 00:04:14,380 --> 00:04:15,840 even variable concurrency in programs, 120 00:04:16,660 --> 00:04:19,120 except it's more centered around the data itself. 121 00:04:21,220 --> 00:04:23,860 The next adjective is convenience, and 122 00:04:24,340 --> 00:04:25,940 convenience is actually one of the 123 00:04:26,350 --> 00:04:27,800 critical features of database systems. 124 00:04:28,530 --> 00:04:29,650 They really are designed to make 125 00:04:29,770 --> 00:04:30,700 it easy to work with large 126 00:04:31,060 --> 00:04:31,940 amounts of data and to 127 00:04:32,040 --> 00:04:34,840 do very powerful and interesting processing on that data. 128 00:04:35,910 --> 00:04:38,290 So there's a couple levels at which that happens. 129 00:04:39,170 --> 00:04:41,880 There's a notion in databases called Physical Data Independence. 130 00:04:44,130 --> 00:04:45,150 It's kind of a mouthful, but 131 00:04:45,300 --> 00:04:46,560 what that's saying is that 132 00:04:47,280 --> 00:04:48,600 the way that data is actually 133 00:04:49,230 --> 00:04:51,140 stored and laid out on 134 00:04:51,390 --> 00:04:53,370 disk is independent of the 135 00:04:53,440 --> 00:04:55,790 way that programs think about the structure of the data. 136 00:04:56,570 --> 00:04:57,610 So you could have a program that 137 00:04:57,940 --> 00:04:59,540 operates on a database and 138 00:04:59,680 --> 00:05:00,640 underneath there could be a 139 00:05:00,680 --> 00:05:02,050 complete change in the 140 00:05:02,120 --> 00:05:03,780 way the data is stored, yet 141 00:05:04,070 --> 00:05:05,680 the program itself would not have to be changed. 142 00:05:06,000 --> 00:05:07,200 So the operations on the 143 00:05:07,270 --> 00:05:10,020 data are independent from the way the data is laid out. 144 00:05:11,220 --> 00:05:12,390 And somewhat related to 145 00:05:12,480 --> 00:05:14,370 that is the notion of high level query languages. 146 00:05:15,830 --> 00:05:17,630 So, the databases are 147 00:05:17,710 --> 00:05:19,580 usually queried by languages 148 00:05:20,280 --> 00:05:23,210 that are relatively compact 149 00:05:23,680 --> 00:05:24,590 to describe, really at a 150 00:05:24,620 --> 00:05:27,090 very high level what information you want from the database. 151 00:05:28,040 --> 00:05:31,670 Specifically, they obey a 152 00:05:31,710 --> 00:05:33,420 notion that's called declarative, and what 153 00:05:33,970 --> 00:05:35,570 declarative is saying is that 154 00:05:36,120 --> 00:05:37,350 in the query, you describe 155 00:05:37,740 --> 00:05:38,720 what you want out of the 156 00:05:38,780 --> 00:05:39,880 database but you don't need 157 00:05:40,160 --> 00:05:42,590 to describe the algorithm to 158 00:05:42,630 --> 00:05:44,230 get the data out, and that's a really nice feature. 159 00:05:44,610 --> 00:05:45,640 It allows you to write queries in 160 00:05:45,710 --> 00:05:46,960 a very simple way, and then 161 00:05:47,130 --> 00:05:48,470 the system itself will find 162 00:05:48,870 --> 00:05:50,880 the algorithm to get that data out efficiently. 163 00:05:52,030 --> 00:05:54,360 And speaking of efficiency, that's 164 00:05:54,910 --> 00:05:55,860 number six, but certainly not 165 00:05:56,160 --> 00:05:58,570 sixth importance. There's in 166 00:05:59,190 --> 00:06:00,100 real estate as a little 167 00:06:00,240 --> 00:06:01,790 aside here, a old saying 168 00:06:02,270 --> 00:06:03,080 that when you have a piece of 169 00:06:03,150 --> 00:06:04,730 property, the most important three 170 00:06:05,040 --> 00:06:06,710 aspects of the property are 171 00:06:06,770 --> 00:06:09,320 the location of the property, the location and the location. 172 00:06:10,940 --> 00:06:11,860 And people say the same 173 00:06:12,210 --> 00:06:13,220 thing about databases, a similar 174 00:06:13,610 --> 00:06:14,830 parallel joke, which is that the 175 00:06:15,100 --> 00:06:16,920 three most important things in 176 00:06:17,510 --> 00:06:19,200 a database system is first 177 00:06:19,570 --> 00:06:22,120 performance, second performance and again performance. 178 00:06:23,080 --> 00:06:24,630 So database systems have 179 00:06:24,900 --> 00:06:26,970 to do really thousands of queries 180 00:06:28,030 --> 00:06:29,130 or updates per second. 181 00:06:31,360 --> 00:06:33,430 These are not simple queries necessarily. 182 00:06:34,170 --> 00:06:35,520 These may be very complex operations. 183 00:06:36,990 --> 00:06:38,750 So, constructing a 184 00:06:39,070 --> 00:06:39,890 database system, that can execute 185 00:06:40,920 --> 00:06:42,320 queries, complex queries, at that 186 00:06:42,510 --> 00:06:44,480 rate, over gigantic amounts of 187 00:06:44,550 --> 00:06:45,920 data, terabytes of data is no 188 00:06:46,150 --> 00:06:47,350 simple task, and that is 189 00:06:47,480 --> 00:06:49,380 one of the major features also, provided 190 00:06:49,800 --> 00:06:51,480 by a database management system. 191 00:06:51,950 --> 00:06:54,700 And lastly, but again not last in importance is reliability. 192 00:06:55,880 --> 00:06:56,800 Again, looking back at say 193 00:06:56,950 --> 00:06:57,990 your banking system or your telecommunications 194 00:06:58,940 --> 00:07:00,340 system, it's critically important 195 00:07:00,830 --> 00:07:02,450 that those are up all the time. 196 00:07:03,240 --> 00:07:06,470 So 99.99999 % up time 197 00:07:07,120 --> 00:07:08,560 is the type of guarantee that 198 00:07:08,680 --> 00:07:11,290 database management systems are making for their applications. 199 00:07:13,350 --> 00:07:14,120 So that gives us an idea 200 00:07:14,470 --> 00:07:16,960 of all the terrific things that a database system provides. 201 00:07:17,390 --> 00:07:18,490 I hope you're all ready convinced that 202 00:07:18,990 --> 00:07:20,980 if you have a application you 203 00:07:21,080 --> 00:07:22,680 want to build that involves data, it 204 00:07:22,830 --> 00:07:23,720 would be great to have all 205 00:07:23,970 --> 00:07:26,010 of these features provided for you in a database system. 206 00:07:27,370 --> 00:07:28,710 Now let me mention a few 207 00:07:28,990 --> 00:07:30,470 of the aspects surrounding database 208 00:07:30,730 --> 00:07:31,680 systems and scope a little 209 00:07:31,800 --> 00:07:33,580 bit what we're going to be covering in this course. 210 00:07:34,670 --> 00:07:36,000 When people build database applications, 211 00:07:37,020 --> 00:07:39,220 sometimes they program them with what's known as a framework. 212 00:07:40,170 --> 00:07:41,160 Currently at the time of 213 00:07:41,300 --> 00:07:42,240 this video, some of the 214 00:07:42,350 --> 00:07:43,630 popular frameworks are Django 215 00:07:44,270 --> 00:07:45,840 or Ruby on Rails, and these 216 00:07:46,100 --> 00:07:47,560 are environments that help you 217 00:07:48,360 --> 00:07:49,590 develop your programs, and help 218 00:07:49,860 --> 00:07:50,850 you generate, say the calls 219 00:07:51,290 --> 00:07:52,920 to the database system. We're 220 00:07:53,120 --> 00:07:54,350 not, in this set of 221 00:07:54,420 --> 00:07:55,250 videos, going to be talking 222 00:07:55,540 --> 00:07:56,760 about the frameworks, but rather we're 223 00:07:56,870 --> 00:07:57,880 going to be talking about the data 224 00:07:58,140 --> 00:08:00,350 base system itself and how it is used and what it provides. 225 00:08:02,060 --> 00:08:03,630 Second of all, database systems are 226 00:08:04,100 --> 00:08:06,360 often used in conjunction with what's known as middle-ware. 227 00:08:07,680 --> 00:08:08,400 Again, at the time of this 228 00:08:08,550 --> 00:08:09,940 video, typical middle-ware might 229 00:08:10,210 --> 00:08:11,910 be application servers, web servers, 230 00:08:12,810 --> 00:08:14,070 so this middle-ware helps 231 00:08:14,590 --> 00:08:15,990 applications interact with database 232 00:08:16,420 --> 00:08:17,960 systems in certain types of ways. 233 00:08:18,710 --> 00:08:20,620 Again, that's sort of outside the scope of the course. 234 00:08:20,950 --> 00:08:22,710 We won't be talking about middleware in the course. 235 00:08:24,210 --> 00:08:25,510 Finally, it's not the 236 00:08:25,800 --> 00:08:27,470 case that every application that 237 00:08:27,600 --> 00:08:29,090 involves data necessarily uses 238 00:08:29,410 --> 00:08:32,120 the database system, so historically, 239 00:08:33,100 --> 00:08:34,060 a lot of data has been stored 240 00:08:34,330 --> 00:08:36,930 in files, I think that's a little bit less so these days. 241 00:08:37,380 --> 00:08:40,040 Still, there's a lot of data out there that's simply sitting in files. 242 00:08:40,730 --> 00:08:42,980 Excel spreadsheets is another 243 00:08:43,890 --> 00:08:45,160 domain where there's a lot 244 00:08:45,360 --> 00:08:46,860 of data sitting out there, and 245 00:08:47,010 --> 00:08:49,380 it's useful in certain ways, and the 246 00:08:49,860 --> 00:08:50,930 processing of data is not always 247 00:08:51,180 --> 00:08:54,280 done through query languages associated with database systems. 248 00:08:54,700 --> 00:08:56,600 For example, Hadoop is 249 00:08:56,980 --> 00:08:58,910 a processing framework for running 250 00:08:59,730 --> 00:09:01,350 operations on data that's stored in files. 251 00:09:02,160 --> 00:09:04,080 Again, in this set of 252 00:09:04,170 --> 00:09:05,140 videos we're going to focus 253 00:09:05,570 --> 00:09:07,080 on the database management system 254 00:09:07,260 --> 00:09:08,820 itself and on storing 255 00:09:09,800 --> 00:09:12,220 and operating on data through a database management system. 256 00:09:13,770 --> 00:09:16,250 So there are four key concepts that we're going to cover for now. 257 00:09:16,920 --> 00:09:17,890 The first one is the data model. 258 00:09:18,660 --> 00:09:20,050 The data model is a 259 00:09:20,170 --> 00:09:22,700 description of, in general, how the data is structured. 260 00:09:23,770 --> 00:09:24,610 One of the most common 261 00:09:24,960 --> 00:09:26,210 data models is the relational dot 262 00:09:26,450 --> 00:09:28,280 data model, we'll spend quite a bit of time on that. 263 00:09:28,490 --> 00:09:29,730 In the relational data model 264 00:09:30,040 --> 00:09:32,440 the data and the database is thought of as a set of records. 265 00:09:33,880 --> 00:09:35,210 Now another popular way to 266 00:09:35,270 --> 00:09:36,690 store data is for example, 267 00:09:37,050 --> 00:09:38,760 in XML documents, so, an XML 268 00:09:39,000 --> 00:09:40,470 document captures data, instead 269 00:09:40,810 --> 00:09:41,920 of a set of records, as a 270 00:09:42,340 --> 00:09:44,850 hierarchical structure, of labeled values. 271 00:09:45,890 --> 00:09:47,460 Another possible data model 272 00:09:47,760 --> 00:09:49,090 would be a graph data model or 273 00:09:49,230 --> 00:09:51,880 all data in the database is in the form of nodes and edges. 274 00:09:52,770 --> 00:09:53,880 So again, a data model is 275 00:09:54,080 --> 00:09:55,780 telling you the general form of 276 00:09:55,970 --> 00:09:57,370 data that's going to be stored in the database. 277 00:09:58,880 --> 00:10:01,420 Next is the concept of schema versus data. 278 00:10:02,530 --> 00:10:03,510 One can think of this kind 279 00:10:03,980 --> 00:10:06,440 of like types and variables in a programming language. 280 00:10:07,100 --> 00:10:08,960 The schema sets up 281 00:10:09,550 --> 00:10:10,470 the structure of the database. 282 00:10:11,190 --> 00:10:12,350 Maybe I'm going to have information about 283 00:10:12,590 --> 00:10:14,250 students with IDs and 284 00:10:15,150 --> 00:10:16,480 GPAs, or about colleges, 285 00:10:17,640 --> 00:10:18,260 and it's just going to tell 286 00:10:18,430 --> 00:10:19,190 me the structure of the database 287 00:10:19,670 --> 00:10:20,750 where the data is the actual 288 00:10:21,960 --> 00:10:24,010 data stored within the schema. 289 00:10:25,300 --> 00:10:26,180 Again, in a program, you 290 00:10:26,340 --> 00:10:27,090 set up types and then you 291 00:10:27,220 --> 00:10:28,720 have variables of those types, we'll 292 00:10:28,940 --> 00:10:29,720 set up a schema, and then 293 00:10:29,840 --> 00:10:32,180 we will have a whole bunch of data that adheres to that schema. 294 00:10:32,940 --> 00:10:34,600 Typically the schema is set 295 00:10:34,760 --> 00:10:35,900 up at the beginning, and doesn't change 296 00:10:36,210 --> 00:10:37,650 very much where the data changes rapidly. 297 00:10:39,790 --> 00:10:40,660 Now to set up the schema, 298 00:10:41,050 --> 00:10:43,550 one normally uses what's known as a data definition language. 299 00:10:44,830 --> 00:10:45,980 Sometimes people use higher level design 300 00:10:46,480 --> 00:10:47,820 tools that help them think 301 00:10:48,010 --> 00:10:49,320 about the design and then from 302 00:10:49,560 --> 00:10:51,120 there go to the data definition language. 303 00:10:52,360 --> 00:10:53,600 But it's used in general to set up 304 00:10:53,980 --> 00:10:56,000 a scheme or structure for a particular database. 305 00:10:57,140 --> 00:10:58,320 Once the schema has been set up 306 00:10:58,650 --> 00:11:00,090 and data has been loaded, then 307 00:11:00,250 --> 00:11:01,260 it's possible to start querying 308 00:11:01,720 --> 00:11:03,180 and modifying the data and 309 00:11:03,300 --> 00:11:04,590 that's typically done with what's 310 00:11:04,830 --> 00:11:06,710 known as the data manipulation language, 311 00:11:07,650 --> 00:11:09,430 so for querying and modifying the database. 312 00:11:15,460 --> 00:11:16,250 Okay, so those are some key concepts 313 00:11:16,830 --> 00:11:17,580 certainly we're going to get in 314 00:11:17,650 --> 00:11:20,080 to much more detail in later videos about each of these concepts. 315 00:11:21,400 --> 00:11:22,130 Now let's talk about the 316 00:11:22,270 --> 00:11:24,920 people that are involved in a database system. So 317 00:11:25,150 --> 00:11:26,220 the first person we'll mention 318 00:11:26,620 --> 00:11:27,950 is the person who implements the 319 00:11:28,010 --> 00:11:29,790 database system itself, the database implementer. 320 00:11:31,060 --> 00:11:32,190 That's the person who builds the 321 00:11:32,270 --> 00:11:35,640 system, that's not going to be the focus of this course. 322 00:11:35,990 --> 00:11:37,200 We're going to be focusing more on 323 00:11:37,390 --> 00:11:38,230 the types of things that are 324 00:11:38,310 --> 00:11:40,640 done by the other three people that I'm going to describe. 325 00:11:41,620 --> 00:11:42,870 The next one is the database designer. 326 00:11:43,800 --> 00:11:45,190 So the database designer is the 327 00:11:45,260 --> 00:11:46,900 person who establishes the schema 328 00:11:47,700 --> 00:11:48,080 for a database. 329 00:11:48,930 --> 00:11:50,390 So, let's suppose we have an application. 330 00:11:51,090 --> 00:11:51,850 We know there's going to be a 331 00:11:51,960 --> 00:11:53,070 lot of data involved in the 332 00:11:53,450 --> 00:11:54,540 application and we want to 333 00:11:54,680 --> 00:11:55,490 figure out how we are gonna structure 334 00:11:55,920 --> 00:11:56,900 that data before we build 335 00:11:57,120 --> 00:11:59,470 the application. That's the job of the database designer. 336 00:11:59,930 --> 00:12:01,480 It's a surprisingly difficult job 337 00:12:01,790 --> 00:12:03,030 when you have a very complex 338 00:12:03,640 --> 00:12:04,980 data involved in an application. 339 00:12:05,350 --> 00:12:07,460 Once you've established the 340 00:12:07,530 --> 00:12:08,410 structure of the database 341 00:12:08,680 --> 00:12:09,750 then it's time to build the 342 00:12:10,190 --> 00:12:11,520 applications or programs that 343 00:12:11,660 --> 00:12:13,090 are going to run on the 344 00:12:13,150 --> 00:12:14,890 database, often interfacing between 345 00:12:15,290 --> 00:12:16,340 the eventual user and the 346 00:12:16,410 --> 00:12:17,950 data itself, and that's 347 00:12:18,020 --> 00:12:19,530 the job of the application developer, 348 00:12:20,030 --> 00:12:22,050 so those are the programs that operate on the database. 349 00:12:26,500 --> 00:12:27,910 And again I've mentioned already 350 00:12:28,750 --> 00:12:29,680 that you can have a database 351 00:12:29,890 --> 00:12:32,640 with many different programs that operate on it, be very common. 352 00:12:33,030 --> 00:12:34,430 You might, for example, have a 353 00:12:34,850 --> 00:12:37,030 sales database where some applications 354 00:12:37,710 --> 00:12:39,100 are actually inserting the sales 355 00:12:39,480 --> 00:12:41,520 as they happen, while others are analyzing the sales. 356 00:12:41,890 --> 00:12:43,190 So it's not necessary to have 357 00:12:43,330 --> 00:12:45,280 a one-to-one coupling between programs and databases. 358 00:12:46,870 --> 00:12:48,980 And the last person is the database administrator. 359 00:12:50,090 --> 00:12:51,500 So the database administrator is the 360 00:12:51,550 --> 00:12:52,510 person who loads the data, 361 00:12:53,290 --> 00:12:55,830 sort of gets the whole thing running and keeps it running smoothly. 362 00:12:57,140 --> 00:12:58,960 So, this actually turns 363 00:12:59,210 --> 00:13:00,480 out to be a very important job 364 00:13:00,860 --> 00:13:01,980 for large database applications. 365 00:13:03,090 --> 00:13:04,380 For better or worse, database systems 366 00:13:04,770 --> 00:13:05,760 do tend to have a 367 00:13:06,060 --> 00:13:07,480 number of tuning parameters 368 00:13:07,680 --> 00:13:09,310 associated with them, and getting 369 00:13:09,610 --> 00:13:10,930 those tuning parameters right can 370 00:13:11,010 --> 00:13:12,660 make a significant difference in the 371 00:13:12,980 --> 00:13:14,810 all important performance of the database system. 372 00:13:15,760 --> 00:13:17,050 So database administrators are 373 00:13:17,540 --> 00:13:20,210 actually, highly valued, very important, highly 374 00:13:20,530 --> 00:13:21,600 paid as a matter of fact, 375 00:13:22,240 --> 00:13:23,900 and are, for large deployments, 376 00:13:24,570 --> 00:13:26,230 an important person in the entire process. 377 00:13:26,930 --> 00:13:28,360 So those are the people that 378 00:13:28,510 --> 00:13:29,600 are involved, again, in this 379 00:13:29,700 --> 00:13:31,520 class we'll be focusing mostly on 380 00:13:31,810 --> 00:13:33,060 designing and developing applications, 381 00:13:33,850 --> 00:13:35,850 a little bit on administration, but in 382 00:13:36,070 --> 00:13:37,810 general thinking about databases and 383 00:13:37,920 --> 00:13:39,440 the use of database management systems 384 00:13:40,010 --> 00:13:42,650 from the perspective of the application builder and user. 385 00:13:43,710 --> 00:13:45,200 To conclude, we're going to 386 00:13:45,240 --> 00:13:46,770 be learning about databases and whether 387 00:13:47,110 --> 00:13:48,030 you know it or not not you're 388 00:13:48,240 --> 00:13:50,130 already using a database every day. 389 00:13:50,600 --> 00:13:51,690 In fact, more likely than not 390 00:13:52,270 --> 00:13:53,730 you're using a database every hour.