1 00:00:00,380 --> 00:00:01,550 In this video, we'll talk about 2 00:00:01,670 --> 00:00:02,690 the second major type of machine 3 00:00:03,010 --> 00:00:05,030 learning problem, called Unsupervised Learning. 4 00:00:06,300 --> 00:00:08,500 In the last video, we talked about Supervised Learning. 5 00:00:09,250 --> 00:00:10,700 Back then, recall data sets 6 00:00:11,020 --> 00:00:12,670 that look like this, where each 7 00:00:12,890 --> 00:00:15,150 example was labeled either 8 00:00:15,610 --> 00:00:16,900 as a positive or negative example, 9 00:00:17,530 --> 00:00:19,800 whether it was a benign or a malignant tumor. 10 00:00:20,850 --> 00:00:21,920 So for each example in Supervised 11 00:00:22,410 --> 00:00:24,270 Learning, we were told explicitly what 12 00:00:24,440 --> 00:00:25,760 is the so-called right answer, 13 00:00:26,490 --> 00:00:27,580 whether it's benign or malignant. 14 00:00:28,550 --> 00:00:30,210 In Unsupervised Learning, we're given 15 00:00:30,540 --> 00:00:31,720 data that looks different 16 00:00:31,950 --> 00:00:32,910 than data that looks like 17 00:00:33,190 --> 00:00:34,600 this that doesn't have 18 00:00:34,720 --> 00:00:35,920 any labels or that all 19 00:00:36,130 --> 00:00:37,460 has the same label or really no labels. 20 00:00:39,680 --> 00:00:40,740 So we're given the data set and 21 00:00:40,980 --> 00:00:42,460 we're not told what to 22 00:00:42,560 --> 00:00:43,290 do with it and we're not 23 00:00:43,640 --> 00:00:44,800 told what each data point is. 24 00:00:45,290 --> 00:00:47,190 Instead we're just told, here is a data set. 25 00:00:47,870 --> 00:00:49,650 Can you find some structure in the data? 26 00:00:50,480 --> 00:00:51,670 Given this data set, an 27 00:00:52,350 --> 00:00:53,940 Unsupervised Learning algorithm might decide that 28 00:00:54,060 --> 00:00:56,090 the data lives in two different clusters. 29 00:00:56,800 --> 00:00:57,960 And so there's one cluster 30 00:00:59,120 --> 00:00:59,910 and there's a different cluster. 31 00:01:01,110 --> 00:01:02,710 And yes, Supervised Learning algorithm may 32 00:01:03,040 --> 00:01:05,070 break these data into these two separate clusters. 33 00:01:06,410 --> 00:01:08,000 So this is called a clustering algorithm. 34 00:01:08,860 --> 00:01:10,310 And this turns out to be used in many places. 35 00:01:11,930 --> 00:01:13,310 One example where clustering 36 00:01:13,530 --> 00:01:14,860 is used is in Google 37 00:01:15,060 --> 00:01:16,160 News and if you have not 38 00:01:16,360 --> 00:01:17,320 seen this before, you can actually 39 00:01:18,210 --> 00:01:19,040 go to this URL news.google.com 40 00:01:19,830 --> 00:01:20,460 to take a look. 41 00:01:21,280 --> 00:01:22,970 What Google News does is everyday 42 00:01:23,480 --> 00:01:24,220 it goes and looks at tens 43 00:01:24,470 --> 00:01:25,430 of thousands or hundreds of 44 00:01:25,720 --> 00:01:26,740 thousands of new stories on the 45 00:01:26,800 --> 00:01:29,410 web and it groups them into cohesive news stories. 46 00:01:30,730 --> 00:01:31,690 For example, let's look here. 47 00:01:33,380 --> 00:01:35,370 The URLs here link 48 00:01:35,910 --> 00:01:37,260 to different news stories 49 00:01:38,010 --> 00:01:40,110 about the BP Oil Well story. 50 00:01:41,300 --> 00:01:42,160 So, let's click on 51 00:01:42,260 --> 00:01:43,090 one of these URL's and we'll 52 00:01:43,550 --> 00:01:44,780 click on one of these URL's. 53 00:01:45,100 --> 00:01:46,970 What I'll get to is a web page like this. 54 00:01:47,210 --> 00:01:48,390 Here's a Wall Street 55 00:01:48,590 --> 00:01:50,180 Journal article about, you know, the BP 56 00:01:51,110 --> 00:01:52,530 Oil Well Spill stories of 57 00:01:52,920 --> 00:01:54,350 "BP Kills Macondo", 58 00:01:54,590 --> 00:01:55,700 which is a name of the 59 00:01:55,980 --> 00:01:57,960 spill and if you 60 00:01:58,020 --> 00:01:59,360 click on a different URL 61 00:02:00,690 --> 00:02:02,500 from that group then you might get the different story. 62 00:02:02,950 --> 00:02:04,760 Here's the CNN story about a 63 00:02:04,820 --> 00:02:06,090 game, the BP Oil Spill, 64 00:02:07,090 --> 00:02:08,180 and if you click on yet 65 00:02:08,740 --> 00:02:10,990 a third link, then you might get a different story. 66 00:02:11,440 --> 00:02:13,380 Here's the UK Guardian story 67 00:02:13,940 --> 00:02:15,510 about the BP Oil Spill. 68 00:02:16,530 --> 00:02:17,790 So what Google News has done 69 00:02:17,990 --> 00:02:19,440 is look for tens of thousands of 70 00:02:19,490 --> 00:02:22,170 news stories and automatically cluster them together. 71 00:02:23,030 --> 00:02:24,660 So, the news stories that are all 72 00:02:25,080 --> 00:02:27,010 about the same topic get displayed together. 73 00:02:27,210 --> 00:02:29,170 It turns out that 74 00:02:29,380 --> 00:02:31,020 clustering algorithms and Unsupervised Learning 75 00:02:31,530 --> 00:02:33,550 algorithms are used in many other problems as well. 76 00:02:35,320 --> 00:02:36,690 Here's one on understanding genomics. 77 00:02:38,270 --> 00:02:40,510 Here's an example of DNA microarray data. 78 00:02:40,990 --> 00:02:42,230 The idea is put 79 00:02:42,430 --> 00:02:44,360 a group of different individuals and 80 00:02:44,510 --> 00:02:45,590 for each of them, you measure 81 00:02:46,100 --> 00:02:48,580 how much they do or do not have a certain gene. 82 00:02:49,050 --> 00:02:51,640 Technically you measure how much certain genes are expressed. 83 00:02:52,000 --> 00:02:54,190 So these colors, red, green, 84 00:02:54,930 --> 00:02:56,210 gray and so on, they 85 00:02:56,340 --> 00:02:57,500 show the degree to which 86 00:02:57,780 --> 00:02:59,440 different individuals do or 87 00:02:59,510 --> 00:03:01,270 do not have a specific gene. 88 00:03:02,500 --> 00:03:03,400 And what you can do is then 89 00:03:03,610 --> 00:03:05,070 run a clustering algorithm to group 90 00:03:05,380 --> 00:03:07,140 individuals into different categories 91 00:03:07,780 --> 00:03:08,810 or into different types of people. 92 00:03:10,230 --> 00:03:11,660 So this is Unsupervised Learning because 93 00:03:11,930 --> 00:03:14,010 we're not telling the algorithm in advance 94 00:03:14,590 --> 00:03:15,690 that these are type 1 people, 95 00:03:16,130 --> 00:03:17,420 those are type 2 persons, those 96 00:03:17,560 --> 00:03:18,650 are type 3 persons and so 97 00:03:19,610 --> 00:03:22,390 on and instead what were saying is yeah here's a bunch of data. 98 00:03:23,110 --> 00:03:24,030 I don't know what's in this data. 99 00:03:24,750 --> 00:03:25,870 I don't know who's and what type. 100 00:03:26,150 --> 00:03:26,940 I don't even know what the different 101 00:03:27,260 --> 00:03:28,480 types of people are, but can 102 00:03:28,610 --> 00:03:30,210 you automatically find structure in 103 00:03:30,360 --> 00:03:31,260 the data from the you automatically 104 00:03:32,180 --> 00:03:33,620 cluster the individuals into these types 105 00:03:33,870 --> 00:03:35,490 that I don't know in advance? 106 00:03:35,890 --> 00:03:37,610 Because we're not giving the algorithm 107 00:03:38,160 --> 00:03:40,140 the right answer for the 108 00:03:40,370 --> 00:03:41,270 examples in my data 109 00:03:41,590 --> 00:03:43,090 set, this is Unsupervised Learning. 110 00:03:44,290 --> 00:03:47,040 Unsupervised Learning or clustering is used for a bunch of other applications. 111 00:03:48,340 --> 00:03:50,340 It's used to organize large computer clusters. 112 00:03:51,390 --> 00:03:52,530 I had some friends looking at 113 00:03:52,680 --> 00:03:53,970 large data centers, that is 114 00:03:54,180 --> 00:03:55,970 large computer clusters and trying 115 00:03:56,230 --> 00:03:57,470 to figure out which machines tend to 116 00:03:57,590 --> 00:03:59,130 work together and if 117 00:03:59,200 --> 00:04:00,270 you can put those machines together, 118 00:04:01,100 --> 00:04:03,220 you can make your data center work more efficiently. 119 00:04:04,810 --> 00:04:06,820 This second application is on social network analysis. 120 00:04:07,890 --> 00:04:09,230 So given knowledge about which friends 121 00:04:09,630 --> 00:04:10,840 you email the most or 122 00:04:10,880 --> 00:04:12,150 given your Facebook friends or 123 00:04:12,180 --> 00:04:14,150 your Google+ circles, can 124 00:04:14,290 --> 00:04:16,380 we automatically identify which are 125 00:04:16,450 --> 00:04:17,950 cohesive groups of friends, 126 00:04:18,460 --> 00:04:19,420 also which are groups of people 127 00:04:20,230 --> 00:04:21,010 that all know each other? 128 00:04:22,540 --> 00:04:22,880 Market segmentation. 129 00:04:24,680 --> 00:04:26,780 Many companies have huge databases of customer information. 130 00:04:27,700 --> 00:04:28,410 So, can you look at this 131 00:04:28,510 --> 00:04:30,000 customer data set and automatically 132 00:04:30,740 --> 00:04:32,340 discover market segments and automatically 133 00:04:33,340 --> 00:04:35,290 group your customers into different 134 00:04:35,820 --> 00:04:37,400 market segments so that 135 00:04:37,710 --> 00:04:39,490 you can automatically and more 136 00:04:39,650 --> 00:04:41,580 efficiently sell or market 137 00:04:41,890 --> 00:04:43,250 your different market segments together? 138 00:04:44,260 --> 00:04:45,580 Again, this is Unsupervised Learning 139 00:04:45,820 --> 00:04:46,720 because we have all this 140 00:04:46,900 --> 00:04:48,340 customer data, but we don't 141 00:04:48,590 --> 00:04:49,710 know in advance what are the 142 00:04:49,790 --> 00:04:51,270 market segments and for 143 00:04:51,440 --> 00:04:52,570 the customers in our data 144 00:04:52,660 --> 00:04:53,590 set, you know, we don't know in 145 00:04:53,690 --> 00:04:54,700 advance who is in 146 00:04:54,800 --> 00:04:55,840 market segment one, who is 147 00:04:55,940 --> 00:04:57,800 in market segment two, and so on. 148 00:04:57,930 --> 00:05:00,630 But we have to let the algorithm discover all this just from the data. 149 00:05:01,970 --> 00:05:03,140 Finally, it turns out that Unsupervised 150 00:05:03,690 --> 00:05:05,620 Learning is also used for 151 00:05:06,090 --> 00:05:08,060 surprisingly astronomical data analysis 152 00:05:08,890 --> 00:05:10,390 and these clustering algorithms gives 153 00:05:10,580 --> 00:05:12,440 surprisingly interesting useful theories 154 00:05:12,900 --> 00:05:15,610 of how galaxies are born. 155 00:05:15,880 --> 00:05:17,620 All of these are examples of clustering, 156 00:05:18,400 --> 00:05:20,550 which is just one type of Unsupervised Learning. 157 00:05:21,530 --> 00:05:22,470 Let me tell you about another one. 158 00:05:23,200 --> 00:05:25,020 I'm gonna tell you about the cocktail party problem. 159 00:05:26,310 --> 00:05:28,270 So, you've been to cocktail parties before, right? 160 00:05:28,440 --> 00:05:30,080 Well, you can imagine there's a 161 00:05:30,300 --> 00:05:31,690 party, room full of people, all 162 00:05:31,870 --> 00:05:32,930 sitting around, all talking at the 163 00:05:32,970 --> 00:05:34,390 same time and there are 164 00:05:34,480 --> 00:05:36,230 all these overlapping voices because everyone 165 00:05:36,590 --> 00:05:37,920 is talking at the same time, and 166 00:05:38,070 --> 00:05:39,730 it is almost hard to hear the person in front of you. 167 00:05:40,690 --> 00:05:41,970 So maybe at a 168 00:05:42,020 --> 00:05:43,990 cocktail party with two people, 169 00:05:45,690 --> 00:05:46,670 two people talking at the same 170 00:05:46,770 --> 00:05:48,090 time, and it's a somewhat 171 00:05:48,740 --> 00:05:49,710 small cocktail party. 172 00:05:50,690 --> 00:05:51,630 And we're going to put two 173 00:05:51,890 --> 00:05:53,080 microphones in the room so 174 00:05:54,060 --> 00:05:55,640 there are microphones, and because 175 00:05:56,050 --> 00:05:57,430 these microphones are at two 176 00:05:57,560 --> 00:05:58,900 different distances from the 177 00:05:58,990 --> 00:06:01,250 speakers, each microphone records 178 00:06:01,830 --> 00:06:04,720 a different combination of these two speaker voices. 179 00:06:05,810 --> 00:06:06,970 Maybe speaker one is a 180 00:06:07,120 --> 00:06:08,320 little louder in microphone one 181 00:06:09,120 --> 00:06:10,680 and maybe speaker two is a 182 00:06:10,800 --> 00:06:12,350 little bit louder on microphone 2 183 00:06:12,560 --> 00:06:14,040 because the 2 microphones are 184 00:06:14,230 --> 00:06:15,950 at different positions relative to 185 00:06:16,400 --> 00:06:19,020 the 2 speakers, but each 186 00:06:19,250 --> 00:06:20,390 microphone would cause an overlapping 187 00:06:20,970 --> 00:06:22,590 combination of both speakers' voices. 188 00:06:23,960 --> 00:06:25,500 So here's an actual recording 189 00:06:26,520 --> 00:06:29,280 of two speakers recorded by a researcher. 190 00:06:29,740 --> 00:06:30,950 Let me play for you the 191 00:06:31,060 --> 00:06:32,760 first, what the first microphone sounds like. 192 00:06:33,560 --> 00:06:34,800 One (uno), two (dos), 193 00:06:35,070 --> 00:06:36,590 three (tres), four (cuatro), five 194 00:06:37,060 --> 00:06:38,550 (cinco), six (seis), seven (siete), 195 00:06:38,990 --> 00:06:40,610 eight (ocho), nine (nueve), ten (y diez). 196 00:06:41,610 --> 00:06:42,650 All right, maybe not the most interesting cocktail 197 00:06:43,000 --> 00:06:44,270 party, there's two people 198 00:06:44,620 --> 00:06:45,670 counting from one to ten 199 00:06:46,010 --> 00:06:47,880 in two languages but you know. 200 00:06:48,870 --> 00:06:49,760 What you just heard was the 201 00:06:49,820 --> 00:06:52,500 first microphone recording, here's the second recording. 202 00:06:57,440 --> 00:06:58,040 Uno (one), dos (two), tres (three), cuatro 203 00:06:58,060 --> 00:06:58,730 (four), cinco (five), seis (six), siete (seven), 204 00:06:59,160 --> 00:07:00,900 ocho (eight), nueve (nine) y diez (ten). 205 00:07:01,860 --> 00:07:02,850 So we can do, is take 206 00:07:03,380 --> 00:07:04,660 these two microphone recorders and give 207 00:07:04,980 --> 00:07:06,480 them to an Unsupervised Learning algorithm 208 00:07:07,010 --> 00:07:08,560 called the cocktail party algorithm, 209 00:07:08,780 --> 00:07:09,910 and tell the algorithm 210 00:07:10,450 --> 00:07:12,140 - find structure in this data for you. 211 00:07:12,250 --> 00:07:14,010 And what the algorithm will do 212 00:07:14,410 --> 00:07:15,730 is listen to these 213 00:07:15,980 --> 00:07:17,990 audio recordings and say, you 214 00:07:18,140 --> 00:07:19,020 know it sounds like the 215 00:07:19,360 --> 00:07:20,950 two audio recordings are being 216 00:07:21,240 --> 00:07:22,450 added together or that have being 217 00:07:22,670 --> 00:07:25,220 summed together to produce these recordings that we had. 218 00:07:25,990 --> 00:07:27,330 Moreover, what the cocktail party 219 00:07:27,710 --> 00:07:29,210 algorithm will do is separate 220 00:07:29,570 --> 00:07:30,810 out these two audio sources 221 00:07:31,480 --> 00:07:32,700 that were being added or being 222 00:07:33,000 --> 00:07:34,240 summed together to form other 223 00:07:34,410 --> 00:07:35,600 recordings and, in fact, 224 00:07:36,200 --> 00:07:38,630 here's the first output of the cocktail party algorithm. 225 00:07:39,790 --> 00:07:41,910 One, two, three, four, 226 00:07:42,590 --> 00:07:46,270 five, six, seven, eight, nine, ten. 227 00:07:47,630 --> 00:07:48,780 So, I separated out the English 228 00:07:49,240 --> 00:07:51,220 voice in one of the recordings. 229 00:07:52,460 --> 00:07:53,300 And here's the second of it. 230 00:07:53,380 --> 00:07:55,280 Uno, dos, tres, quatro, cinco, 231 00:07:55,980 --> 00:07:59,830 seis, siete, ocho, nueve y diez. 232 00:08:00,270 --> 00:08:01,180 Not too bad, to give you 233 00:08:03,810 --> 00:08:05,270 one more example, here's another 234 00:08:05,600 --> 00:08:07,370 recording of another similar situation, 235 00:08:08,060 --> 00:08:09,790 here's the first microphone : One, 236 00:08:10,470 --> 00:08:12,430 two, three, four, five, six, 237 00:08:13,370 --> 00:08:15,710 seven, eight, nine, ten. 238 00:08:16,980 --> 00:08:17,920 OK so the poor guy's gone 239 00:08:18,180 --> 00:08:19,350 home from the cocktail party and 240 00:08:19,420 --> 00:08:21,880 he 's now sitting in a room by himself talking to his radio. 241 00:08:23,090 --> 00:08:24,130 Here's the second microphone recording. 242 00:08:28,810 --> 00:08:31,800 One, two, three, four, five, six, seven, eight, nine, ten. 243 00:08:33,310 --> 00:08:34,160 When you give these two microphone 244 00:08:34,610 --> 00:08:35,530 recordings to the same algorithm, 245 00:08:36,360 --> 00:08:37,790 what it does, is again say, 246 00:08:38,380 --> 00:08:39,470 you know, it sounds like there 247 00:08:39,690 --> 00:08:41,370 are two audio sources, and moreover, 248 00:08:42,410 --> 00:08:43,820 the album says, here is 249 00:08:44,070 --> 00:08:46,010 the first of the audio sources I found. 250 00:08:47,480 --> 00:08:49,300 One, two, three, four, 251 00:08:49,730 --> 00:08:53,430 five, six, seven, eight, nine, ten. 252 00:08:54,650 --> 00:08:56,110 So that wasn't perfect, it 253 00:08:56,340 --> 00:08:57,360 got the voice, but it 254 00:08:57,570 --> 00:08:59,070 also got a little bit of the music in there. 255 00:08:59,890 --> 00:09:01,360 Then here's the second output to the algorithm. 256 00:09:10,020 --> 00:09:11,310 Not too bad, in that second 257 00:09:11,540 --> 00:09:13,300 output it managed to get rid of the voice entirely. 258 00:09:13,760 --> 00:09:14,850 And just, you know, 259 00:09:15,020 --> 00:09:17,380 cleaned up the music, got rid of the counting from one to ten. 260 00:09:18,840 --> 00:09:20,090 So you might look at 261 00:09:20,180 --> 00:09:21,750 an Unsupervised Learning algorithm like 262 00:09:21,950 --> 00:09:23,050 this and ask how 263 00:09:23,250 --> 00:09:25,110 complicated this is to implement this, right? 264 00:09:25,330 --> 00:09:26,560 It seems like in order to, 265 00:09:26,970 --> 00:09:28,870 you know, build this application, it seems 266 00:09:28,930 --> 00:09:30,550 like to do this audio processing you 267 00:09:30,670 --> 00:09:31,430 need to write a ton of code 268 00:09:32,240 --> 00:09:33,580 or maybe link into like a 269 00:09:33,690 --> 00:09:35,380 bunch of synthesizer Java libraries that 270 00:09:35,470 --> 00:09:37,150 process audio, seems like 271 00:09:37,240 --> 00:09:38,880 a really complicated program, to do 272 00:09:39,060 --> 00:09:41,040 this audio, separating out audio and so on. 273 00:09:42,460 --> 00:09:43,860 It turns out the algorithm, to 274 00:09:44,070 --> 00:09:45,640 do what you just heard, that 275 00:09:45,900 --> 00:09:47,280 can be done with one line 276 00:09:47,530 --> 00:09:49,270 of code - shown right here. 277 00:09:50,640 --> 00:09:52,350 It take researchers a long 278 00:09:52,610 --> 00:09:54,060 time to come up with this line of code. 279 00:09:54,490 --> 00:09:56,090 I'm not saying this is an easy problem, 280 00:09:57,080 --> 00:09:57,980 But it turns out that when you 281 00:09:58,180 --> 00:10:00,330 use the right programming environment, many learning 282 00:10:00,670 --> 00:10:02,060 algorithms can be really short programs. 283 00:10:03,510 --> 00:10:04,700 So this is also why in 284 00:10:04,840 --> 00:10:05,890 this class we're going to 285 00:10:06,010 --> 00:10:07,430 use the Octave programming environment. 286 00:10:08,550 --> 00:10:09,910 Octave, is free open source 287 00:10:10,120 --> 00:10:11,620 software, and using a 288 00:10:11,670 --> 00:10:13,130 tool like Octave or Matlab, 289 00:10:14,000 --> 00:10:15,400 many learning algorithms become just 290 00:10:15,690 --> 00:10:17,910 a few lines of code to implement. 291 00:10:18,380 --> 00:10:19,400 Later in this class, I'll just teach 292 00:10:19,620 --> 00:10:20,570 you a little bit about how to 293 00:10:20,720 --> 00:10:21,920 use Octave and you'll be 294 00:10:22,050 --> 00:10:24,590 implementing some of these algorithms in Octave. 295 00:10:24,980 --> 00:10:26,050 Or if you have Matlab you can use that too. 296 00:10:27,120 --> 00:10:28,500 It turns out the Silicon Valley, for 297 00:10:28,620 --> 00:10:29,470 a lot of machine learning algorithms, 298 00:10:30,290 --> 00:10:31,310 what we do is first prototype 299 00:10:32,040 --> 00:10:33,900 our software in Octave because software 300 00:10:34,330 --> 00:10:35,250 in Octave makes it incredibly fast 301 00:10:35,540 --> 00:10:36,920 to implement these learning algorithms. 302 00:10:38,230 --> 00:10:39,110 Here each of these functions 303 00:10:39,720 --> 00:10:41,460 like for example the SVD 304 00:10:41,680 --> 00:10:42,920 function that stands for singular 305 00:10:43,240 --> 00:10:44,520 value decomposition; but that turns 306 00:10:44,640 --> 00:10:45,690 out to be a 307 00:10:45,820 --> 00:10:48,420 linear algebra routine, that is just built into Octave. 308 00:10:49,500 --> 00:10:50,390 If you were trying to do this 309 00:10:50,460 --> 00:10:51,490 in C++ or Java, 310 00:10:51,780 --> 00:10:53,040 this would be many many lines of 311 00:10:53,180 --> 00:10:55,680 code linking complex C++ or Java libraries. 312 00:10:56,440 --> 00:10:57,490 So, you can implement this stuff as 313 00:10:57,680 --> 00:10:58,690 C++ or Java 314 00:10:59,050 --> 00:11:00,090 or Python, it's just much 315 00:11:00,290 --> 00:11:02,090 more complicated to do so in those languages. 316 00:11:03,750 --> 00:11:05,060 What I've seen after having taught 317 00:11:05,300 --> 00:11:06,980 machine learning for almost a 318 00:11:07,210 --> 00:11:08,680 decade now, is that, you 319 00:11:08,890 --> 00:11:10,340 learn much faster if you 320 00:11:10,480 --> 00:11:11,700 use Octave as your 321 00:11:11,790 --> 00:11:14,070 programming environment, and if 322 00:11:14,250 --> 00:11:15,570 you use Octave as your 323 00:11:16,260 --> 00:11:17,110 learning tool and as your 324 00:11:17,240 --> 00:11:18,640 prototyping tool, it'll let 325 00:11:19,000 --> 00:11:21,280 you learn and prototype learning algorithms much more quickly. 326 00:11:22,640 --> 00:11:23,850 And in fact what many people will 327 00:11:23,990 --> 00:11:25,390 do to in the large Silicon 328 00:11:25,730 --> 00:11:27,360 Valley companies is in fact, use 329 00:11:27,560 --> 00:11:29,020 an algorithm like Octave to first 330 00:11:29,370 --> 00:11:31,110 prototype the learning algorithm, and 331 00:11:31,510 --> 00:11:32,780 only after you've gotten it 332 00:11:32,860 --> 00:11:33,820 to work, then you migrate 333 00:11:34,390 --> 00:11:35,910 it to C++ or Java or whatever. 334 00:11:36,890 --> 00:11:37,960 It turns out that by doing 335 00:11:38,220 --> 00:11:39,070 things this way, you can often 336 00:11:39,400 --> 00:11:40,440 get your algorithm to work much 337 00:11:41,300 --> 00:11:43,050 faster than if you were starting out in C++. 338 00:11:44,440 --> 00:11:46,010 So, I know that as an 339 00:11:46,100 --> 00:11:47,490 instructor, I get to 340 00:11:47,570 --> 00:11:48,580 say "trust me on 341 00:11:48,730 --> 00:11:49,790 this one" only a finite 342 00:11:50,030 --> 00:11:51,420 number of times, but for 343 00:11:51,560 --> 00:11:52,720 those of you who've never used these 344 00:11:53,330 --> 00:11:54,880 Octave type programming environments before, 345 00:11:55,240 --> 00:11:56,070 I am going to ask you 346 00:11:56,130 --> 00:11:56,970 to trust me on this one, 347 00:11:57,570 --> 00:11:58,950 and say that you, you will, 348 00:11:59,700 --> 00:12:01,180 I think your time, your development 349 00:12:01,700 --> 00:12:03,100 time is one of the most valuable resources. 350 00:12:04,210 --> 00:12:05,570 And having seen lots 351 00:12:05,800 --> 00:12:06,850 of people do this, I think 352 00:12:07,190 --> 00:12:08,460 you as a machine learning 353 00:12:08,850 --> 00:12:09,990 researcher, or machine learning developer 354 00:12:10,830 --> 00:12:12,080 will be much more productive if 355 00:12:12,220 --> 00:12:13,010 you learn to start in prototype, 356 00:12:13,580 --> 00:12:15,250 to start in Octave, in some other language. 357 00:12:17,570 --> 00:12:19,790 Finally, to wrap 358 00:12:20,090 --> 00:12:22,890 up this video, I have one quick review question for you. 359 00:12:24,400 --> 00:12:26,400 We talked about Unsupervised Learning, which 360 00:12:26,700 --> 00:12:27,670 is a learning setting where you 361 00:12:27,760 --> 00:12:28,730 give the algorithm a ton 362 00:12:28,840 --> 00:12:30,120 of data and just ask it 363 00:12:30,240 --> 00:12:32,900 to find structure in the data for us. 364 00:12:33,160 --> 00:12:35,170 Of the following four examples, which 365 00:12:35,490 --> 00:12:36,410 ones, which of these four 366 00:12:36,870 --> 00:12:37,630 do you think would will be 367 00:12:37,720 --> 00:12:39,520 an Unsupervised Learning algorithm as 368 00:12:40,220 --> 00:12:41,950 opposed to Supervised Learning problem. 369 00:12:42,730 --> 00:12:43,590 For each of the four 370 00:12:43,860 --> 00:12:44,850 check boxes on the left, 371 00:12:45,640 --> 00:12:46,900 check the ones for which 372 00:12:47,210 --> 00:12:49,400 you think Unsupervised Learning 373 00:12:49,700 --> 00:12:51,300 algorithm would be appropriate and 374 00:12:51,440 --> 00:12:53,930 then click the button on the lower right to check your answer. 375 00:12:54,690 --> 00:12:57,030 So when the video pauses, please 376 00:12:57,370 --> 00:12:58,750 answer the question on the slide. 377 00:13:01,860 --> 00:13:03,950 So, hopefully, you've remembered the spam folder problem. 378 00:13:04,710 --> 00:13:06,310 If you have labeled data, you 379 00:13:06,450 --> 00:13:07,680 know, with spam and 380 00:13:07,800 --> 00:13:10,470 non-spam e-mail, we'd treat this as a Supervised Learning problem. 381 00:13:11,620 --> 00:13:13,870 The news story example, that's 382 00:13:14,100 --> 00:13:15,370 exactly the Google News example 383 00:13:15,910 --> 00:13:16,600 that we saw in this video, 384 00:13:17,090 --> 00:13:17,950 we saw how you can use 385 00:13:18,080 --> 00:13:19,460 a clustering algorithm to cluster 386 00:13:19,880 --> 00:13:21,980 these articles together so that's Unsupervised Learning. 387 00:13:23,250 --> 00:13:25,440 The market segmentation example I 388 00:13:25,510 --> 00:13:27,120 talked a little bit earlier, you 389 00:13:27,220 --> 00:13:29,110 can do that as an Unsupervised Learning problem 390 00:13:29,970 --> 00:13:30,860 because I am just gonna 391 00:13:30,930 --> 00:13:32,340 get my algorithm data and ask 392 00:13:32,500 --> 00:13:34,340 it to discover market segments automatically. 393 00:13:35,610 --> 00:13:37,930 And the final example, diabetes, well, 394 00:13:38,070 --> 00:13:39,080 that's actually just like our 395 00:13:39,350 --> 00:13:41,480 breast cancer example from the last video. 396 00:13:42,190 --> 00:13:43,320 Only instead of, you know, 397 00:13:43,600 --> 00:13:45,280 good and bad cancer tumors or 398 00:13:45,550 --> 00:13:47,390 benign or malignant tumors we 399 00:13:47,550 --> 00:13:49,270 instead have diabetes or 400 00:13:49,330 --> 00:13:50,440 not and so we will 401 00:13:50,700 --> 00:13:51,830 use that as a supervised, 402 00:13:52,370 --> 00:13:53,740 we will solve that as 403 00:13:53,870 --> 00:13:54,670 a Supervised Learning problem just like 404 00:13:54,730 --> 00:13:56,450 we did for the breast tumor data. 405 00:13:58,270 --> 00:13:59,400 So, that's it for Unsupervised 406 00:14:00,100 --> 00:14:01,580 Learning and in the 407 00:14:01,650 --> 00:14:02,940 next video, we'll delve more 408 00:14:03,270 --> 00:14:04,600 into specific learning algorithms 409 00:14:05,550 --> 00:14:06,590 and start to talk about 410 00:14:07,220 --> 00:14:08,750 just how these algorithms work and 411 00:14:08,920 --> 00:14:11,270 how we can, how you can go about implementing them.