1 00:00:03,650 --> 00:00:06,410 All right. In this final video, 2 00:00:06,410 --> 00:00:08,170 let's see some extensions of 3 00:00:08,170 --> 00:00:13,495 latent dirichlet allocation and also summarize what we've seen in this week. 4 00:00:13,495 --> 00:00:18,140 Here is our joint distribution over W, 5 00:00:18,140 --> 00:00:23,400 Z, and theta and let's try to find out the interpretation for alpha. 6 00:00:23,400 --> 00:00:25,769 If alpha is high, 7 00:00:25,769 --> 00:00:29,490 then we'll have more topics for each document. 8 00:00:29,490 --> 00:00:31,630 However, if alpha is small, 9 00:00:31,630 --> 00:00:35,320 we'll have only few topics in each document. 10 00:00:35,320 --> 00:00:43,670 By varying alpha, we can obtain either sparse or dense topics, or dense documents. 11 00:00:43,670 --> 00:00:47,897 Alpha can also be selected using maximum likelihood principle, 12 00:00:47,897 --> 00:00:53,110 that is, you will try to maximize the probability of the data given alpha. 13 00:00:53,110 --> 00:00:56,932 You have the sparsity of the document. 14 00:00:56,932 --> 00:01:00,280 We can also think about sparsity of the topics. 15 00:01:00,280 --> 00:01:05,770 To have this, we should make the matrix, 16 00:01:05,770 --> 00:01:07,955 phi, a random variable. 17 00:01:07,955 --> 00:01:11,374 It won't be a parameter anymore, 18 00:01:11,374 --> 00:01:13,885 but we will try to model it. 19 00:01:13,885 --> 00:01:17,275 For this, we will have to add an extra term, 20 00:01:17,275 --> 00:01:19,600 the product over all topics distributes on 21 00:01:19,600 --> 00:01:22,540 the corresponding distribution and we can 22 00:01:22,540 --> 00:01:26,240 model this using the dirichlet distribution again. 23 00:01:26,240 --> 00:01:30,595 Now, the parameter beta would show how to sparse other topics. 24 00:01:30,595 --> 00:01:33,121 For example, if topics are really sparse, 25 00:01:33,121 --> 00:01:35,335 we will have only a few words in this topic. 26 00:01:35,335 --> 00:01:38,507 For example, we'll have the topic about football, 27 00:01:38,507 --> 00:01:41,035 we'll have about hockey, 28 00:01:41,035 --> 00:01:43,715 and if beta would be a bit larger, 29 00:01:43,715 --> 00:01:46,525 we'll have the topic about sports and we'll have 30 00:01:46,525 --> 00:01:50,743 the football hockey board saying it with like equal probabilities. 31 00:01:50,743 --> 00:01:55,440 The second extension is called correlated topic model. 32 00:01:55,440 --> 00:01:58,012 In the dirichlet distribution, 33 00:01:58,012 --> 00:02:02,260 we can't model such beautiful correlations between the topics. 34 00:02:02,260 --> 00:02:05,335 However, for some cases, it is really useful. 35 00:02:05,335 --> 00:02:08,605 For example, if you see a topic about lasers, 36 00:02:08,605 --> 00:02:12,250 the document would probably also be about physics a bit. 37 00:02:12,250 --> 00:02:14,945 Here is some correlation. 38 00:02:14,945 --> 00:02:21,580 Some topics appear more frequently with another ones and then the others. 39 00:02:21,580 --> 00:02:27,575 This can be done using so-called logistic normal distribution. 40 00:02:27,575 --> 00:02:30,360 I will not define it fully. 41 00:02:30,360 --> 00:02:32,370 I will just show you some pictures. 42 00:02:32,370 --> 00:02:36,220 By varying the covariance matrix sigma, 43 00:02:36,220 --> 00:02:43,675 we can have a correlation between different objects as shown on these pictures. 44 00:02:43,675 --> 00:02:47,707 Finally, on the dynamics topic model. 45 00:02:47,707 --> 00:02:54,100 If, for example, we model some collectional documents from old time. 46 00:02:54,100 --> 00:02:57,470 Some words can be used more frequently in the past, 47 00:02:57,470 --> 00:03:00,280 but now they are not used very frequently. 48 00:03:00,280 --> 00:03:03,625 However, there are some synonyms that are used from then now. 49 00:03:03,625 --> 00:03:08,890 For example here the word nerve was frequently used in 1880's, 50 00:03:08,890 --> 00:03:13,225 however now, it is used really rarely. 51 00:03:13,225 --> 00:03:18,010 However the word neuron is used very often. 52 00:03:18,010 --> 00:03:22,745 To model this, we can use the following trick. 53 00:03:22,745 --> 00:03:25,192 We'll have some extra random variable B, 54 00:03:25,192 --> 00:03:30,490 left on be actually obtains using the normal distribution. 55 00:03:30,490 --> 00:03:32,530 We state at some position B, 56 00:03:32,530 --> 00:03:35,890 and then we add the Gaussian noise is times 10. 57 00:03:35,890 --> 00:03:40,030 So we do a random work in this space of B. 58 00:03:40,030 --> 00:03:42,053 Then for each concept, 59 00:03:42,053 --> 00:03:46,220 we obtain phi as the softmax of B. 60 00:03:46,220 --> 00:03:51,760 We apply softmax just to get a probability distributions and using this, 61 00:03:51,760 --> 00:03:54,595 we'll have a dynamic values for phi. 62 00:03:54,595 --> 00:03:59,575 We will be able to track the dynamics of the words, 63 00:03:59,575 --> 00:04:01,840 of the topics throughout the time. 64 00:04:01,840 --> 00:04:04,100 All right. 65 00:04:04,100 --> 00:04:06,280 We've seen a latent dirichlet allocation this module. 66 00:04:06,280 --> 00:04:08,938 It has a lot of a pro's and con's. 67 00:04:08,938 --> 00:04:13,655 The main pro is that the topics that are generated 68 00:04:13,655 --> 00:04:19,430 are really interpretable and it is really useful for interpreting the models. 69 00:04:19,430 --> 00:04:21,860 For example, if you trimmed the collection of the documents, 70 00:04:21,860 --> 00:04:25,380 you can find out what this collection was about. 71 00:04:25,380 --> 00:04:27,320 For example, if you see that, 72 00:04:27,320 --> 00:04:29,370 there are lot of topics about adventure, 73 00:04:29,370 --> 00:04:33,000 maybe you had a collection of adventure books. 74 00:04:33,000 --> 00:04:37,145 Also, this model works well for rare words. 75 00:04:37,145 --> 00:04:40,411 It is also fast for even huge text collections. 76 00:04:40,411 --> 00:04:49,595 It has a really good implementation that allows for multicore and distributed trainings, 77 00:04:49,595 --> 00:04:52,790 and also, many features that are 78 00:04:52,790 --> 00:04:57,320 useful can be added with extensions as we've seen before.