I would say that there are three main groups of methods in NLP. One group would be about rule-based approaches. So, for example regular expressions would go to this group. Another one would be traditional machine learning. And the last one would be deep learning that has recently gained lots of popularity in NLP. In this video, I want to go through all three approaches just by example of one particular tasks, so that you get some flavor of all of them. The task could be called semantic slot filling. So, you can see the query in the bottom of the slide which says Show me flights from Boston to San Francisco on Tuesday. So you have some sequence of words, and you want to find some slots. So the slots would be destinations or departure or some date and something like that. And to fill those slots you can use different approaches. This slide is about context-free grammars. So, it is a rule-based approach. The context-free grammars show you what would be the rules to produce some words. For example, you can see that non terminal words show can produce words, show me, or can I see, or something like that. And some other words for example origin, non terminal can produce from city, and city non terminal can then produce some specific cities from a list. Now, when you have this context-free grammar, you can use it to parse your data. So you can get to the sequence and say what are the non terminals that created this certain words. So what will be the advantages and disadvantages of this approach? Well, this approach is usually done manually. So you have to write all those rules just by yourself or some linguists should come and write it for you. So obviously, this is very time consuming. Also the record of this approach would be not very nice because well, you cannot write down all the possible cities because there are so many of them and the language is so very native. Right? So, the positive thing though would be the precision of this approach. Usually, rule-based approaches have high precision but low recall. Now, another approach would be to build some machine learning system. To do that, first of all you need some training data. So you need a corpus with some markup. So here, you have a sequence of words and you know that these certain phrases have these certain texts. Right? Like origin, destination, and date. After you have your training data, you need to do some feature engineering. So you need to create features like for example is the word capitalized? Or does this word occur in some list of Cities or something like that. Then, you need to define your model. So the probabilistic model would for example produce the probabilities of your text given your words. This can be different kinds of models and we will explore a lot of them in our course. But generally, these models would have some parameters and they will depend on some features that you have just generated. And the parameters of the model should be trained. So you will need to take your train data and fit your model to this data. So you will maximize the probability of what you see, by the parameters. This way, you will fix the parameters of the model and you will be able to apply this model to the test data. For the inference, you will apply it, and you will find the most probable text for your words with some fixed parameters. Right? So this is called inference or test or deployment or something like that. So, this is just the general framework. Right? So you have some perimeters, you train them, and then you apply your model. The similar thing happens for deep learning approach. There you also have this stages but usually you do not have the stage of feature generation. So what you're doing is that you just feed your sequence of words as is to some neural network. So, I now do not go into the details of the neural network. We will have time to go into those details. I just show you the idea that you feed your words just as one hot encoders. As the vectors that have only one non zero element that corresponds to the number of this word in the vocabulary and lots of zeros. So you feed this vectors to some complicated neural network that has some complicated architecture and lots of parameters. You feed this perimeters and then you apply this network to your test data to get the text out of this model. Deep learning methods perform a really nice for many tasks in NLP. So, sometimes it feels like we forget about traditional approaches, and there are some reasons not to forget about it. Well, the first reason would be that traditional methods perform really nice for some obligations. For example for sequence labeling, we can do probabilistic modeling and we will discuss it during the week two, and we'll get a really good performance. Another reason would be that some ideas in deep learning methods are really similar to something that was happening in the area before them. So, for example, word2vec method which is actually not even deep learning but it is inspired by some neural networks, has really similar ideas as some distributional semantic methods have. And in week three of our course, we will discuss both of them. Now, another reason would be that we can sometimes use the knowledge that we had in traditional approaches to improve the models based on deep learning. For example, word alignments in machine translation and attention the Haney's in neural networks are very similar, and we will see during the week four. Deep learning methods are indeed fancy and we have lots of research publications about them in our current conferences. So, it looks like this is where the area will go in the future. So, obviously, we need to have them in our course as well. So what do we do? Well, I think that we will have both of them in parallel. So, for every task, we'll have traditional and deep learning approaches studied one by one. And this is all for this video. And in the next video we, will see what is the plan for our next week's.