1 00:00:00,000 --> 00:00:04,733 [MUSIC] 2 00:00:04,733 --> 00:00:08,339 Now that we've gone through even more material on clustering, let's spend 3 00:00:08,339 --> 00:00:11,900 a little bit of time describing what we didn't cover in this course. 4 00:00:11,900 --> 00:00:12,750 So in this course, 5 00:00:12,750 --> 00:00:15,970 we tried to focus on the methods that we believe are the most practical and 6 00:00:15,970 --> 00:00:21,410 most widely used tools out there for performing retrieval and clustering. 7 00:00:21,410 --> 00:00:24,710 But there were still some other important topics that we didn't cover in 8 00:00:24,710 --> 00:00:25,870 this course. 9 00:00:25,870 --> 00:00:27,240 So for example, in retrieval, 10 00:00:27,240 --> 00:00:30,500 there are lots of other distance metrics that we could have described. 11 00:00:30,500 --> 00:00:32,400 We listed some of these in a module, but 12 00:00:32,400 --> 00:00:35,980 we didn't actually describe these in any detail. 13 00:00:35,980 --> 00:00:39,620 Another thing we didn't cover was something called distance metrics learning 14 00:00:39,620 --> 00:00:43,990 where these are procedures that allow you to actually learn 15 00:00:43,990 --> 00:00:46,790 metrics that are useful for the task at hand. 16 00:00:46,790 --> 00:00:51,050 Then for clustering, we didn't talk about things like nonparametric clustering. 17 00:00:51,050 --> 00:00:53,960 Actually our hierarchical clustering methods can be used for 18 00:00:53,960 --> 00:00:57,070 nonparametric clustering, but we didn't describe this very explicitly. 19 00:00:57,070 --> 00:01:03,360 So nonparametric clustering, these are methods where the complexity of the models 20 00:01:03,360 --> 00:01:08,440 or the description of the clustering can grow as you get more and more data points. 21 00:01:08,440 --> 00:01:13,120 Another method is called spectral clustering that can be robust to different 22 00:01:13,120 --> 00:01:18,790 cluster shapes like the Swiss roll type images we showed in the second module, 23 00:01:18,790 --> 00:01:20,210 when we're motivating clustering and 24 00:01:20,210 --> 00:01:23,700 talking about some of the limitations of the methods we were going to describe, 25 00:01:23,700 --> 00:01:27,440 where we have to actually specify the shape of the clusters. 26 00:01:27,440 --> 00:01:30,010 But unfortunately spectral clustering methods don't 27 00:01:30,010 --> 00:01:34,280 tend to have very good scalability properties to large data sets. 28 00:01:34,280 --> 00:01:38,580 And then there are a set of ideas that are related to things that we talked about but 29 00:01:38,580 --> 00:01:42,280 not specifically for retrieval or clustering. 30 00:01:42,280 --> 00:01:46,590 So, as an example, mixtures of Gaussians are commonly used for 31 00:01:46,590 --> 00:01:48,348 something called density estimation. 32 00:01:48,348 --> 00:01:52,720 So we showed this picture of this histogram over the blue intensity of 33 00:01:52,720 --> 00:01:54,990 all the images in our data set and 34 00:01:54,990 --> 00:01:59,910 we talked about the distribution on those intensities. 35 00:01:59,910 --> 00:02:04,040 So we can think of that as a density over intensities, and try and 36 00:02:04,040 --> 00:02:07,950 estimate the form of that density explicitly, rather than 37 00:02:07,950 --> 00:02:12,440 thinking about mixtures of Gaussians as a means of clustering data points. 38 00:02:12,440 --> 00:02:15,519 So they're hand in hand, the same tool, but different tasks. 39 00:02:16,570 --> 00:02:18,940 And then within the context of density estimation, 40 00:02:18,940 --> 00:02:22,270 you can start talking about things like anomaly detection, 41 00:02:22,270 --> 00:02:27,320 where this is a question of you get some new data point, and 42 00:02:27,320 --> 00:02:31,410 does it look significantly different than the data points that you've seen so far. 43 00:02:31,410 --> 00:02:35,355 But with what you've learned in this class you have the tools to go out and 44 00:02:35,355 --> 00:02:36,917 learn these other methods. 45 00:02:36,917 --> 00:02:41,219 [MUSIC]