All right. In this final video, let's see some extensions of latent dirichlet allocation and also summarize what we've seen in this week. Here is our joint distribution over W, Z, and theta and let's try to find out the interpretation for alpha. If alpha is high, then we'll have more topics for each document. However, if alpha is small, we'll have only few topics in each document. By varying alpha, we can obtain either sparse or dense topics, or dense documents. Alpha can also be selected using maximum likelihood principle, that is, you will try to maximize the probability of the data given alpha. You have the sparsity of the document. We can also think about sparsity of the topics. To have this, we should make the matrix, phi, a random variable. It won't be a parameter anymore, but we will try to model it. For this, we will have to add an extra term, the product over all topics distributes on the corresponding distribution and we can model this using the dirichlet distribution again. Now, the parameter beta would show how to sparse other topics. For example, if topics are really sparse, we will have only a few words in this topic. For example, we'll have the topic about football, we'll have about hockey, and if beta would be a bit larger, we'll have the topic about sports and we'll have the football hockey board saying it with like equal probabilities. The second extension is called correlated topic model. In the dirichlet distribution, we can't model such beautiful correlations between the topics. However, for some cases, it is really useful. For example, if you see a topic about lasers, the document would probably also be about physics a bit. Here is some correlation. Some topics appear more frequently with another ones and then the others. This can be done using so-called logistic normal distribution. I will not define it fully. I will just show you some pictures. By varying the covariance matrix sigma, we can have a correlation between different objects as shown on these pictures. Finally, on the dynamics topic model. If, for example, we model some collectional documents from old time. Some words can be used more frequently in the past, but now they are not used very frequently. However, there are some synonyms that are used from then now. For example here the word nerve was frequently used in 1880's, however now, it is used really rarely. However the word neuron is used very often. To model this, we can use the following trick. We'll have some extra random variable B, left on be actually obtains using the normal distribution. We state at some position B, and then we add the Gaussian noise is times 10. So we do a random work in this space of B. Then for each concept, we obtain phi as the softmax of B. We apply softmax just to get a probability distributions and using this, we'll have a dynamic values for phi. We will be able to track the dynamics of the words, of the topics throughout the time. All right. We've seen a latent dirichlet allocation this module. It has a lot of a pro's and con's. The main pro is that the topics that are generated are really interpretable and it is really useful for interpreting the models. For example, if you trimmed the collection of the documents, you can find out what this collection was about. For example, if you see that, there are lot of topics about adventure, maybe you had a collection of adventure books. Also, this model works well for rare words. It is also fast for even huge text collections. It has a really good implementation that allows for multicore and distributed trainings, and also, many features that are useful can be added with extensions as we've seen before.