1 00:00:00,670 --> 00:00:03,050 In this video, we're going to learn about querying relational databases. 2 00:00:04,080 --> 00:00:06,710 We're not going to focus on a specific query language, we'll do that later. 3 00:00:07,320 --> 00:00:09,710 We're just going to talk about querying relational databases in general. 4 00:00:11,400 --> 00:00:12,280 Let's start by talking about 5 00:00:12,430 --> 00:00:13,540 the basic steps in creating 6 00:00:14,070 --> 00:00:15,280 and using a relational database. 7 00:00:16,700 --> 00:00:17,850 So, by the way, I 8 00:00:17,900 --> 00:00:19,190 should mention that database people 9 00:00:19,590 --> 00:00:20,780 have this habit of drawing databases 10 00:00:21,310 --> 00:00:22,700 and database systems as gigantic 11 00:00:23,440 --> 00:00:25,370 disks. So, I'll be using that same habit. 12 00:00:26,790 --> 00:00:28,030 So, the first step is to 13 00:00:28,360 --> 00:00:29,570 design the schema of the 14 00:00:29,640 --> 00:00:30,640 database and then create 15 00:00:30,960 --> 00:00:32,450 the schema using a data definition language. 16 00:00:33,130 --> 00:00:34,120 So as we discussed in previous 17 00:00:34,520 --> 00:00:35,640 videos in a relational database 18 00:00:36,120 --> 00:00:37,960 the schema consists 19 00:00:38,560 --> 00:00:40,200 of the structure of 20 00:00:40,330 --> 00:00:42,160 the relations and the attributes of those relations. 21 00:00:43,150 --> 00:00:44,880 So we set those up inside our big disk. 22 00:00:45,810 --> 00:00:46,920 Once that's ready, the next 23 00:00:47,400 --> 00:00:49,630 step is to load up the database with the initial data. 24 00:00:50,450 --> 00:00:51,940 So it's fairly common for the 25 00:00:52,080 --> 00:00:53,210 database to be initially loaded 26 00:00:53,510 --> 00:00:55,030 from data that comes from an outside source. 27 00:00:55,350 --> 00:00:56,240 Maybe the data is just stored 28 00:00:56,490 --> 00:00:58,390 in files of some type, and 29 00:00:58,540 --> 00:01:00,300 then that data could be loaded into the database. 30 00:01:01,810 --> 00:01:02,770 Once the data is loaded, then 31 00:01:02,880 --> 00:01:03,830 we have a bunch of tuples in 32 00:01:03,900 --> 00:01:05,280 our relation. Now, we're ready 33 00:01:05,580 --> 00:01:07,800 for the fun part which is to query and modify the data. 34 00:01:08,310 --> 00:01:09,570 And so that happens continuously 35 00:01:10,320 --> 00:01:12,570 over time as long as the database is in existence. 36 00:01:13,500 --> 00:01:14,510 So let's just say for now 37 00:01:14,730 --> 00:01:15,470 that we're going to have human 38 00:01:15,780 --> 00:01:17,470 users that are directly querying the database. 39 00:01:17,690 --> 00:01:19,290 In reality, that typically happens 40 00:01:19,760 --> 00:01:21,820 through say an application or a website. 41 00:01:22,660 --> 00:01:23,720 So, a user will come along and 42 00:01:23,910 --> 00:01:26,090 we'll ask a question of the database and we will get an answer. 43 00:01:27,120 --> 00:01:28,210 He might come along and 44 00:01:28,330 --> 00:01:30,850 ask another question Q2 and he'd get another answer back. 45 00:01:31,700 --> 00:01:32,690 The same human or maybe a 46 00:01:32,710 --> 00:01:35,190 different human might ask to modify the database. 47 00:01:35,790 --> 00:01:37,050 So, they might want 48 00:01:37,310 --> 00:01:38,430 to insert new data or 49 00:01:38,590 --> 00:01:39,270 update some of the data 50 00:01:39,870 --> 00:01:40,680 and the database will come back 51 00:01:40,900 --> 00:01:42,510 and say, "Okay, I made that change for you." 52 00:01:43,420 --> 00:01:44,610 So that's the basic paradigm 53 00:01:45,340 --> 00:01:47,320 of querying and updating relational databases. 54 00:01:49,040 --> 00:01:50,590 Relational databases support ad 55 00:01:50,860 --> 00:01:52,350 hoc queries and high-level languages. 56 00:01:53,620 --> 00:01:54,750 By ad hoc, I mean that 57 00:01:54,850 --> 00:01:56,780 you can pose queries that you didn't think of in advance. 58 00:01:57,360 --> 00:02:00,140 So it's not necessary to write long programs for specific queries. 59 00:02:00,920 --> 00:02:01,860 Rather the language can be 60 00:02:02,020 --> 00:02:02,720 used to pose a query 61 00:02:03,130 --> 00:02:04,330 as you think about what you want to ask. 62 00:02:05,140 --> 00:02:06,280 And as mentioned in previous videos 63 00:02:06,750 --> 00:02:07,930 the languages supported by relational 64 00:02:08,480 --> 00:02:10,090 systems are high level, meaning 65 00:02:10,330 --> 00:02:11,310 you can write in a fairly compact 66 00:02:12,430 --> 00:02:14,050 fashion rather complicated queries 67 00:02:14,410 --> 00:02:15,360 and you don't have to write the 68 00:02:15,570 --> 00:02:17,590 algorithms that get the data out of the database. 69 00:02:18,320 --> 00:02:19,440 So, let's look at an example of 70 00:02:19,560 --> 00:02:22,190 a few queries. Let's go to again to our 71 00:02:22,360 --> 00:02:24,670 imaginary database of students who are applying to colleges. 72 00:02:25,410 --> 00:02:26,900 And here's just three examples of the types of things 73 00:02:27,240 --> 00:02:28,960 that you might ask of a relational database. 74 00:02:29,840 --> 00:02:31,040 You might want to get all 75 00:02:31,450 --> 00:02:32,630 students whose GPA is greater 76 00:02:32,880 --> 00:02:34,100 than 3.7 who are applying 77 00:02:34,780 --> 00:02:35,890 to Stanford and MIT only. 78 00:02:37,000 --> 00:02:37,800 You might want to get all 79 00:02:38,040 --> 00:02:39,450 engineering departments in California 80 00:02:39,740 --> 00:02:41,610 with fewer than 500 applicants or 81 00:02:41,990 --> 00:02:42,820 you might ask for the 82 00:02:42,900 --> 00:02:44,020 college with the highest average 83 00:02:44,320 --> 00:02:45,980 accept rate over the last five years. 84 00:02:46,740 --> 00:02:47,980 Now these might seem 85 00:02:48,180 --> 00:02:49,560 like a fairly complicated queries 86 00:02:49,980 --> 00:02:50,860 but all of these can be 87 00:02:51,140 --> 00:02:52,130 written in a few lines 88 00:02:52,740 --> 00:02:54,130 in say the SQL language or 89 00:02:54,350 --> 00:02:56,150 a pretty simple expression in relational algebra. 90 00:02:57,190 --> 00:02:58,660 So, some queries are 91 00:02:58,900 --> 00:03:00,700 easier to pose than others, that's certainly true. 92 00:03:00,870 --> 00:03:02,190 Though the 3 queries you 93 00:03:02,280 --> 00:03:04,110 see here are as I said pretty easy to pose. 94 00:03:04,970 --> 00:03:06,100 Now some queries are easier 95 00:03:06,510 --> 00:03:08,720 for the database system to execute efficiently than others. 96 00:03:09,310 --> 00:03:10,740 And interestingly it's not necessarily. 97 00:03:12,120 --> 00:03:13,390 These two things aren't necessarily correlated. 98 00:03:13,990 --> 00:03:15,110 There are some queries that are easy 99 00:03:15,350 --> 00:03:16,520 to post but hard to execute 100 00:03:16,740 --> 00:03:18,620 efficiently and some that are vice-versa. 101 00:03:20,250 --> 00:03:21,600 Now, just a bit about terminology. 102 00:03:22,490 --> 00:03:23,820 Frequently, people talk about the 103 00:03:24,060 --> 00:03:25,760 query language of the database system. 104 00:03:26,080 --> 00:03:27,860 That's usually used sort 105 00:03:28,040 --> 00:03:29,450 of synonymously with the DML 106 00:03:29,970 --> 00:03:31,720 or Data Manipulation Language which 107 00:03:32,070 --> 00:03:34,570 usually includes not only querying but also data modifications. 108 00:03:36,500 --> 00:03:38,250 In all relational query languages, when 109 00:03:38,350 --> 00:03:39,600 you ask a query over a 110 00:03:39,770 --> 00:03:41,660 set of relations, you get a relation as a result. 111 00:03:42,120 --> 00:03:43,060 So let's run a query 112 00:03:43,200 --> 00:03:44,230 cue say over these three 113 00:03:44,460 --> 00:03:45,820 relations shown here and what 114 00:03:46,280 --> 00:03:48,230 we'll get back is another relation. 115 00:03:50,340 --> 00:03:51,810 When you get back the 116 00:03:51,880 --> 00:03:53,100 same type of object that 117 00:03:53,190 --> 00:03:55,220 you query, that's known as closure of the language. 118 00:03:55,500 --> 00:03:56,600 And it really is a nice feature. 119 00:03:57,640 --> 00:03:58,860 For example, when I want 120 00:03:59,100 --> 00:04:00,470 to run another query, say Q2, 121 00:04:01,340 --> 00:04:02,420 that query could be posed over 122 00:04:02,540 --> 00:04:03,530 the answer of my first query 123 00:04:04,040 --> 00:04:05,220 and could even combine that answer 124 00:04:05,460 --> 00:04:07,640 with some of the existing relations in the database. 125 00:04:08,850 --> 00:04:10,150 That's known as compositionality, the 126 00:04:10,380 --> 00:04:11,590 ability to run a query 127 00:04:12,070 --> 00:04:13,550 over the result of our previous query. 128 00:04:15,140 --> 00:04:17,260 Now, let me talk briefly about two query languages. 129 00:04:18,030 --> 00:04:19,030 We'll be learning these languages in 130 00:04:19,260 --> 00:04:20,410 detail later, but I'm just 131 00:04:20,540 --> 00:04:22,550 going to give the basic flavor of the languages here. 132 00:04:23,330 --> 00:04:25,000 Relational algebra is a formal language. 133 00:04:25,840 --> 00:04:28,010 Well, it's an algebra as you can tell by its name. 134 00:04:28,550 --> 00:04:30,810 So it's very theoretically well-grounded. 135 00:04:31,970 --> 00:04:34,170 SQL by contrast is 136 00:04:34,350 --> 00:04:36,850 what I'll call an actual language or an implemented language. 137 00:04:37,330 --> 00:04:40,300 That 's the one you're going to run on an actual deployed database application. 138 00:04:41,460 --> 00:04:44,290 But the SQL language does have 139 00:04:44,590 --> 00:04:46,270 as its foundation relational algebra. 140 00:04:46,340 --> 00:04:49,020 That's how the semantics of the SQL language are defined. 141 00:04:50,310 --> 00:04:51,340 Now let me just give you 142 00:04:51,660 --> 00:04:53,070 a flavor of these two languages and 143 00:04:53,320 --> 00:04:56,100 I'm going to write one query in each of the two languages. 144 00:04:57,280 --> 00:04:59,620 So, let me get rid of this little line here. 145 00:05:00,530 --> 00:05:01,720 Let's start in relational algebra. 146 00:05:02,590 --> 00:05:03,780 So we're looking for the 147 00:05:03,960 --> 00:05:05,480 ID's of students whose GPA 148 00:05:05,540 --> 00:05:07,740 is greater than 3.7 and they've applied to Stanford. 149 00:05:09,180 --> 00:05:10,420 In relational algebra, the basic 150 00:05:10,860 --> 00:05:12,200 operators language are Greek symbols. 151 00:05:12,600 --> 00:05:13,710 Again, we'll learn the details later, 152 00:05:13,980 --> 00:05:15,440 but this particular expression will 153 00:05:15,590 --> 00:05:18,060 be written by a Phi followed by a Sigma. 154 00:05:19,130 --> 00:05:20,010 The Phi says we're going to 155 00:05:20,120 --> 00:05:20,860 get the ID, the Sigma 156 00:05:21,290 --> 00:05:22,510 says we want students whose 157 00:05:22,740 --> 00:05:24,110 GPA is greater than 3.7 158 00:05:25,830 --> 00:05:27,360 and the college that the 159 00:05:27,420 --> 00:05:29,490 students have applied to is Stanford. 160 00:05:31,480 --> 00:05:32,510 And then that will operate 161 00:05:33,250 --> 00:05:35,120 on what's called the 162 00:05:35,220 --> 00:05:37,290 natural join of the 163 00:05:37,500 --> 00:05:40,710 student relation with the apply relation. 164 00:05:43,660 --> 00:05:45,500 Again, we'll learn the details of that in a later video. 165 00:05:46,710 --> 00:05:48,360 Now, here's the same query in SQL. 166 00:05:49,570 --> 00:05:50,390 And this is something that you would 167 00:05:50,500 --> 00:05:51,490 actually run on a deployed 168 00:05:51,810 --> 00:05:52,880 database system, and the SQL 169 00:05:53,250 --> 00:05:54,690 query is, in fact, directly 170 00:05:55,170 --> 00:05:56,790 equivalent to the relational algebra query. 171 00:05:58,150 --> 00:05:59,610 Now, pedagogically, I would 172 00:05:59,750 --> 00:06:00,940 highly recommend that you learn 173 00:06:01,210 --> 00:06:02,460 the relational algebra by watching 174 00:06:02,830 --> 00:06:04,400 the relational algebra videos before 175 00:06:04,730 --> 00:06:06,110 you move on to the SQL videos, 176 00:06:06,820 --> 00:06:08,250 but I'm not going to absolutely require 177 00:06:08,680 --> 00:06:10,090 that. So, if you're in a big hurry 178 00:06:10,640 --> 00:06:11,690 to learn SQL right away 179 00:06:11,990 --> 00:06:13,760 you may move ahead to the SQL videos. 180 00:06:14,500 --> 00:06:15,460 If you're interested in the formal 181 00:06:15,760 --> 00:06:16,880 foundations and a deeper understanding, 182 00:06:17,580 --> 00:06:19,760 I recommend moving next to the relational algebra video.