1 00:00:02,530 --> 00:00:05,825 Greetings! Welcome back to the deep learning course. 2 00:00:05,825 --> 00:00:08,220 It's been almost a month since we began and so far you've made 3 00:00:08,220 --> 00:00:10,778 great progress in the field of artificial neural networks for general data. 4 00:00:10,778 --> 00:00:15,100 And particularly, you've studied a lot of advanced architectures for vision. 5 00:00:15,100 --> 00:00:18,025 You've trained your crucial neural networks to classify images. 6 00:00:18,025 --> 00:00:21,655 You also studied how to perform more advanced stuff based on the images. 7 00:00:21,655 --> 00:00:24,780 However all those problems lie in the domain of supervised learning. 8 00:00:24,780 --> 00:00:27,850 It's where you have some kind of source data, 9 00:00:27,850 --> 00:00:30,675 some features, and you have reference answers for those features, 10 00:00:30,675 --> 00:00:35,345 whether they are classes like cats or dog or some continuous value like, say, 11 00:00:35,345 --> 00:00:40,367 if you give it a person and it predicts the salary it gets hourly, 12 00:00:40,367 --> 00:00:42,295 he or she gets hourly, sorry. 13 00:00:42,295 --> 00:00:45,645 Unsupervised learning, that we're going to now; however, 14 00:00:45,645 --> 00:00:49,650 has no access to this target variable and essentially has a more general, 15 00:00:49,650 --> 00:00:54,130 less defined task of kind of explaining the data, finding the instruction. 16 00:00:54,130 --> 00:00:57,700 So if supervised learning finds you a boundary between cats and dogs, 17 00:00:57,700 --> 00:01:02,734 unsupervised learning takes unlabeled pictures of cats and dogs being sent to other, 18 00:01:02,734 --> 00:01:07,260 well Kaggle and tries to find whether there are some maybe clusters or 19 00:01:07,260 --> 00:01:12,109 some groups that fit the definition of one object in this data. 20 00:01:12,109 --> 00:01:14,430 It tries to maybe compress the data. 21 00:01:14,430 --> 00:01:17,745 It tries to find some good feature representation that is useful for other problems. 22 00:01:17,745 --> 00:01:20,820 Of course, finding some hidden section in the data that is well defined as 23 00:01:20,820 --> 00:01:24,574 a mathematical problem as it would be for example for image classification, 24 00:01:24,574 --> 00:01:26,045 not running any mapping. 25 00:01:26,045 --> 00:01:30,124 Well, in fact you do, but it's not clear which mapping is better and which is worse. 26 00:01:30,124 --> 00:01:33,090 So there's no differentiation criterion and there are 27 00:01:33,090 --> 00:01:35,490 many unsupervised methods that cover one 28 00:01:35,490 --> 00:01:38,725 or the other approach to what's better and what's worse. 29 00:01:38,725 --> 00:01:43,155 Basically, the unsupervised leaning methods are better studied in the way that 30 00:01:43,155 --> 00:01:45,120 they help you solve problems and each problem is 31 00:01:45,120 --> 00:01:47,775 solved with a particular branch of unsupervised learning methods. 32 00:01:47,775 --> 00:01:49,685 Today, we're going to study a lot of them, 33 00:01:49,685 --> 00:01:52,410 but let's quickly recap what problems can 34 00:01:52,410 --> 00:01:57,610 be solved by this kind of hidden structure finding. 35 00:01:57,610 --> 00:02:00,475 For example, imagine you're working with raw data on Kaggle, 36 00:02:00,475 --> 00:02:04,205 so there is yet another competition that makes you work with images whether 37 00:02:04,205 --> 00:02:08,780 cats dogs or satellite images or maybe medical scans or maybe sound data. 38 00:02:08,780 --> 00:02:13,120 And you're usually given this data in a form of if it's an image or 39 00:02:13,120 --> 00:02:18,060 pixels or maybe frequency series if it's sound data. 40 00:02:18,060 --> 00:02:23,055 What you actually want to do with this data is to make some kind of high level decision, 41 00:02:23,055 --> 00:02:24,345 say, if it is an image, 42 00:02:24,345 --> 00:02:27,795 you may want to classify it or maybe detect objects on it. 43 00:02:27,795 --> 00:02:31,080 If it's a sound, you may want to define genre or maybe give 44 00:02:31,080 --> 00:02:35,035 some recommendations to people who listen to this particular music if it's music sound. 45 00:02:35,035 --> 00:02:38,950 And generally, those tasks are better solved with a droll piece of data, 46 00:02:38,950 --> 00:02:40,650 but some higher level representation, 47 00:02:40,650 --> 00:02:43,435 which is usually learned by your supervised neural networks. 48 00:02:43,435 --> 00:02:44,835 But here's the catch, you see, 49 00:02:44,835 --> 00:02:50,190 you usually get only scarce amount of data within your context which is labeled, but, 50 00:02:50,190 --> 00:02:51,690 so maybe if you're solving 51 00:02:51,690 --> 00:02:54,512 a general classification or maybe if you're trying to recommend, 52 00:02:54,512 --> 00:02:57,345 there's only so much music tracks that 53 00:02:57,345 --> 00:03:00,850 are available for recommendation that your server knows about. 54 00:03:00,850 --> 00:03:03,085 But there is tonnes of unlabeled music out there. 55 00:03:03,085 --> 00:03:05,990 You can literally download hundreds of mp3s, 56 00:03:05,990 --> 00:03:07,993 maybe it's of questionable legality, 57 00:03:07,993 --> 00:03:10,395 but you can download gigabytes of music. 58 00:03:10,395 --> 00:03:13,785 For example, if you want to take a lot of image data, 59 00:03:13,785 --> 00:03:17,255 maybe gigabytes of human faces and transfer it over the web. 60 00:03:17,255 --> 00:03:19,800 What you want to do is instead of transferring the raw pieces you want 61 00:03:19,800 --> 00:03:22,340 to maybe find some compressed representations, say, 62 00:03:22,340 --> 00:03:26,735 a hundred or two hundred float values that describe a face, 63 00:03:26,735 --> 00:03:29,760 compress on the face region of this representation, 64 00:03:29,760 --> 00:03:33,450 transfer it over the web and then decompress the course in the faces. 65 00:03:33,450 --> 00:03:37,928 This is actually possible due to the fact that image data is very redundant, 66 00:03:37,928 --> 00:03:40,740 so even if you as a human were not given the entire image, 67 00:03:40,740 --> 00:03:43,445 if there was some noise or maybe some pixels were missing, 68 00:03:43,445 --> 00:03:47,555 you would easily understand what was there in place of the missing pixels. 69 00:03:47,555 --> 00:03:49,295 You want your newsletter to do the same, 70 00:03:49,295 --> 00:03:51,720 to find this optimal representation that does not have 71 00:03:51,720 --> 00:03:55,510 this redundancy so that you can spend as little information as possible. 72 00:03:55,510 --> 00:03:58,020 Unsupervised learning is also widely used in 73 00:03:58,020 --> 00:04:01,425 the domain of image retrieval or any retrieval basically. 74 00:04:01,425 --> 00:04:03,720 For those of you who have heard this word for the first time, 75 00:04:03,720 --> 00:04:06,330 image retrieval is the thing that search companies do. 76 00:04:06,330 --> 00:04:08,105 So Google does it, Yandex does it. 77 00:04:08,105 --> 00:04:11,385 And retrieval is basically when the user enters you a query and you want to 78 00:04:11,385 --> 00:04:15,105 reply with a lot of documents or images that match this query. 79 00:04:15,105 --> 00:04:17,220 So you mean, you search for, say, 80 00:04:17,220 --> 00:04:19,740 a cat that sat on the mat and you want 81 00:04:19,740 --> 00:04:22,629 to filter out all the images in the database that don't contain 82 00:04:22,629 --> 00:04:28,895 such cat and only give maybe the response of top 10 cats that are sitting on the mat. 83 00:04:28,895 --> 00:04:31,770 Trust me, recent increase in the quality of image search for 84 00:04:31,770 --> 00:04:35,315 example is due to a large part of success of deep learning applied 85 00:04:35,315 --> 00:04:37,920 to it and there is some kind of representation learning we're going to 86 00:04:37,920 --> 00:04:42,555 study later this week that is at play here. 87 00:04:42,555 --> 00:04:47,325 Finally, you could use unsupervised learning when you try to generate new data. 88 00:04:47,325 --> 00:04:50,660 This is kind of a reverse of the problem you have solved last week. 89 00:04:50,660 --> 00:04:54,740 So previously, you wanted to process a cat's image into a label, 90 00:04:54,740 --> 00:04:59,090 now you get a label only when no data at all if you want to narrate a cat an image 91 00:04:59,090 --> 00:05:04,160 that resembles the cat maybe that is indistinguishable from actual images. 92 00:05:04,160 --> 00:05:08,420 This can also be done with some unsupervised learning methods that we are going to cover. 93 00:05:08,420 --> 00:05:11,050 Now finally, there is the exploratory data analysis. 94 00:05:11,050 --> 00:05:13,880 This is the least defined of the least defined methods here 95 00:05:13,880 --> 00:05:16,820 because or the problems if you wish because here 96 00:05:16,820 --> 00:05:20,030 you don't actually want to build an algorithm so much as you 97 00:05:20,030 --> 00:05:23,720 want to look into the data and find whether there is some pattern there. 98 00:05:23,720 --> 00:05:26,450 This is especially useful if the data is low level, for example, 99 00:05:26,450 --> 00:05:30,425 you have a lot of brain scans maybe thousands and you have no time to 100 00:05:30,425 --> 00:05:36,110 actually see all of them to process all the information manually, 101 00:05:36,110 --> 00:05:40,340 instead you want to use data analysis tricks to find whether there is 102 00:05:40,340 --> 00:05:42,134 maybe some particular pattern there that you may 103 00:05:42,134 --> 00:05:45,570 exploit for your problem, say, medical diagnosis.