In this video, we will talk about lexicon utilization in our NLU. Why do we want to utilize lexicon? Let's take ATIS dataset for example. The problem with these dataset is that it has a finite set of cities in training. And, the thing we don't know is whether the model will work for a new city during testing. And, the good fact is that we have a list of all cities like from Wikipedia or any other source, and we can actually use it somehow to help on a model to detect new cities. Another example, imagine you need to fill a slot like "music artist" and we have all music artists in the database, like musicbrainz.org and you can actually download it, parse it, and use for your NLU. But how can we use it? Let's add lexicon features to our input words. We will overview an approach from the paper, you can see the lower left corner. Let's match every n-gram of input text against entries in our lexicon. Let's take n-grams "Take me," "me to," "san," and "San Francisco," and all the possible ones. And let's match them with the lexicon, with the dictionary that we have for, let's say, cities. And we will say that the match is successful when the n-gram matches either prefix or postfix of an entry from the dictionary, and it is at least half the length of the entry, so that we don't have a lot of spurious matches. Let's see the matches we might have. San might have a match with San Antonio, with San Francisco, and the San Francisco n-gram can match with San Francisco entry. So, we'd get these matches and we need to decide which one of them is best. And when we have overlapping matches, that means that one word can be used in different n-grams, we need to decide which one is better, and we will prefer them in the following order. First, will prefer exact matches over partial. So, if the word San is used in San Francisco and that is an exact match, then it is preferable than, let's say, the match of San with San Antonio. And we will also prefer longer matches over shorter, and we will prefer earlier matches in the sequence over later. This three rules actually give us a unique distribution of our words in the non-overlapping matches with our lexicon. So, let's see how we can use that information, that lexicon matching information in our model. We will use a so-called BIOES coding, which stands for Begin, Inside, Outside, End, and Single, and we will mark the token with B if token matches the beginning of some entity. We will use B and I if token matches as prefix. We will use I and E if two tokens match as postfix. So, it is some token in the middle and some token at the end of the entity. And we will use S for matches when a single token matches an entity. Let's see an example of such coding for four lexicon dictionaries, location, miscellaneous, organization, and person. And we have a certain utterance like "Hayao Tada commander of the Japanese North China area army." And you can see that we have a match in persons lexicon and that gives us a B and E, so we know that that is an entity. And we also have a full match in "North China area army," and it has a match with organisation lexicon, and it has an encoding like B, I, E, I, E. And, we can actually have the full match even if we don't have an entity in our lexicon. Let's say, we have North China History Museum, and let's say, I don't know, any country area army entities. And when we have those two entities, we can actually have the postfix from the second one and the prefix from the first match and it will still give us the same BIOES encoding. So, this is pretty cool. We can make new entities that we haven't seen before. Okay, so, what we do next is we use these letters as we will later encode them as one hot encoded vectors. Let's see how we can add that lexicon information to our module. Let's say we have an utterance, "We saw paintings of Picasso," and we have a word embedding for every token. And to that word embedding, we can actually add some lexicon information. And we do it in the following way. Remember the table that we have on the previous slide? Let's take two first words and let's take that column that corresponds to the word, and let's use one hot encoding to decode that BIOES letters into numbers, and we will use that vector and we will concatenate it with the embedding vector for the word, and we will use it as an input for our B directional LSTM, let's say. And this thing will predict tags for our slot tagger. So, this is like a pretty easy approach to embed that lexicon information into your model. Let's see how it works. It was bench-marked on the dataset for a Named Entity Recognition, and you can see that if you add lexicon, it actually improves your Precision, Recall and F1 measure a little bit, like one percent or something like that. So, it seems to work and it seems that it will be helpful to implement these lexicon features for your real world dialogue system. Let's look into some training details. You can sample your lexicon dictionaries so that your model learns not only the lexicon features but also the context of the words. Let's say, when I say, "Take me to San Francisco," that means that the word that comes after the phrase "take me to" is most likely a two-slot. And we want the model to learn those features as well because in real world, we can see entities that were not in our vocabulary before, and our lexicon features will not work. So, this sampling procedure actually gives you an ability to detect unknown entities during testing. So, this is a pretty cool approach. When you have the lexicon dictionaries, you can also augment your data set because you can replace the slot values by some other values from the same lexicon. Let's say, "Take me to San Francisco," becomes "Take me to Washington," because you can easily replace San Francisco's slot value with Washington because you have the lexicon dictionaries. So, let me summarize. You can add lexicon features to further improve your NLU because that will help you to detect the entities that the user mentions and some unknown and long entities like "South China area army" that can be detected. In the next video, we will take a look at Dialogue Manager.