1 00:00:02,770 --> 00:00:08,765 In this video, we will talk about lexicon utilization in our NLU. 2 00:00:08,765 --> 00:00:11,355 Why do we want to utilize lexicon? 3 00:00:11,355 --> 00:00:14,050 Let's take ATIS dataset for example. 4 00:00:14,050 --> 00:00:19,390 The problem with these dataset is that it has a finite set of cities in training. 5 00:00:19,390 --> 00:00:21,640 And, the thing we don't know is whether 6 00:00:21,640 --> 00:00:24,820 the model will work for a new city during testing. 7 00:00:24,820 --> 00:00:27,340 And, the good fact is that we have a list of 8 00:00:27,340 --> 00:00:30,610 all cities like from Wikipedia or any other source, 9 00:00:30,610 --> 00:00:34,060 and we can actually use it somehow to help on a model to detect new cities. 10 00:00:34,060 --> 00:00:37,480 Another example, imagine you need to fill 11 00:00:37,480 --> 00:00:41,860 a slot like "music artist" and we have all music artists in the database, 12 00:00:41,860 --> 00:00:44,965 like musicbrainz.org and you can actually download it, 13 00:00:44,965 --> 00:00:47,460 parse it, and use for your NLU. 14 00:00:47,460 --> 00:00:53,970 But how can we use it? Let's add lexicon features to our input words. 15 00:00:53,970 --> 00:00:56,820 We will overview an approach from the paper, 16 00:00:56,820 --> 00:00:59,870 you can see the lower left corner. 17 00:00:59,870 --> 00:01:04,830 Let's match every n-gram of input text against entries in our lexicon. 18 00:01:04,830 --> 00:01:07,180 Let's take n-grams "Take me," "me to," 19 00:01:07,180 --> 00:01:10,865 "san," and "San Francisco," and all the possible ones. 20 00:01:10,865 --> 00:01:13,680 And let's match them with the lexicon, 21 00:01:13,680 --> 00:01:17,555 with the dictionary that we have for, let's say, cities. 22 00:01:17,555 --> 00:01:20,590 And we will say that the match is successful when 23 00:01:20,590 --> 00:01:26,126 the n-gram matches either prefix or postfix of an entry from the dictionary, 24 00:01:26,126 --> 00:01:28,885 and it is at least half the length of the entry, 25 00:01:28,885 --> 00:01:32,050 so that we don't have a lot of spurious matches. 26 00:01:32,050 --> 00:01:34,225 Let's see the matches we might have. 27 00:01:34,225 --> 00:01:37,897 San might have a match with San Antonio, 28 00:01:37,897 --> 00:01:44,590 with San Francisco, and the San Francisco n-gram can match with San Francisco entry. 29 00:01:44,590 --> 00:01:48,610 So, we'd get these matches and we need to decide which one of them is best. 30 00:01:48,610 --> 00:01:51,730 And when we have overlapping matches, 31 00:01:51,730 --> 00:01:58,260 that means that one word can be used in different n-grams, 32 00:01:58,260 --> 00:01:59,935 we need to decide which one is better, 33 00:01:59,935 --> 00:02:01,810 and we will prefer them in the following order. 34 00:02:01,810 --> 00:02:06,340 First, will prefer exact matches over partial. 35 00:02:06,340 --> 00:02:12,160 So, if the word San is used in San Francisco and that is an exact match, 36 00:02:12,160 --> 00:02:13,840 then it is preferable than, 37 00:02:13,840 --> 00:02:17,660 let's say, the match of San with San Antonio. 38 00:02:17,660 --> 00:02:22,720 And we will also prefer longer matches over shorter, 39 00:02:22,720 --> 00:02:26,228 and we will prefer earlier matches in the sequence over later. 40 00:02:26,228 --> 00:02:29,170 This three rules actually give us 41 00:02:29,170 --> 00:02:34,150 a unique distribution of 42 00:02:34,150 --> 00:02:40,355 our words in the non-overlapping matches with our lexicon. 43 00:02:40,355 --> 00:02:42,905 So, let's see how we can use that information, 44 00:02:42,905 --> 00:02:45,900 that lexicon matching information in our model. 45 00:02:45,900 --> 00:02:51,195 We will use a so-called BIOES coding, 46 00:02:51,195 --> 00:02:52,995 which stands for Begin, Inside, 47 00:02:52,995 --> 00:02:54,780 Outside, End, and Single, 48 00:02:54,780 --> 00:03:00,250 and we will mark the token with B if token matches the beginning of some entity. 49 00:03:00,250 --> 00:03:04,377 We will use B and I if token matches as prefix. 50 00:03:04,377 --> 00:03:08,805 We will use I and E if two tokens match as postfix. 51 00:03:08,805 --> 00:03:14,790 So, it is some token in the middle and some token at the end of the entity. 52 00:03:14,790 --> 00:03:20,908 And we will use S for matches when a single token matches an entity. 53 00:03:20,908 --> 00:03:26,475 Let's see an example of such coding for four lexicon dictionaries, 54 00:03:26,475 --> 00:03:30,580 location, miscellaneous, organization, and person. 55 00:03:30,580 --> 00:03:34,455 And we have a certain utterance like 56 00:03:34,455 --> 00:03:39,645 "Hayao Tada commander of the Japanese North China area army." 57 00:03:39,645 --> 00:03:47,530 And you can see that we have a match in persons lexicon and that gives us a B and E, 58 00:03:47,530 --> 00:03:49,710 so we know that that is an entity. 59 00:03:49,710 --> 00:03:53,035 And we also have a full match in "North China area 60 00:03:53,035 --> 00:03:57,765 army," and it has a match with organisation lexicon, 61 00:03:57,765 --> 00:04:03,270 and it has an encoding like B, I, E, I, E. And, 62 00:04:03,270 --> 00:04:08,475 we can actually have the full match even if we don't have an entity in our lexicon. 63 00:04:08,475 --> 00:04:13,040 Let's say, we have North China History Museum, 64 00:04:13,040 --> 00:04:15,725 and let's say, I don't know, 65 00:04:15,725 --> 00:04:18,685 any country area army entities. 66 00:04:18,685 --> 00:04:21,255 And when we have those two entities, 67 00:04:21,255 --> 00:04:26,330 we can actually have the postfix from the second one and the prefix from 68 00:04:26,330 --> 00:04:31,730 the first match and it will still give us the same BIOES encoding. 69 00:04:31,730 --> 00:04:33,114 So, this is pretty cool. 70 00:04:33,114 --> 00:04:37,435 We can make new entities that we haven't seen before. 71 00:04:37,435 --> 00:04:42,385 Okay, so, what we do next is we use these letters 72 00:04:42,385 --> 00:04:47,485 as we will later encode them as one hot encoded vectors. 73 00:04:47,485 --> 00:04:51,850 Let's see how we can add that lexicon information to our module. 74 00:04:51,850 --> 00:04:54,300 Let's say we have an utterance, 75 00:04:54,300 --> 00:05:00,290 "We saw paintings of Picasso," and we have a word embedding for every token. 76 00:05:00,290 --> 00:05:01,730 And to that word embedding, 77 00:05:01,730 --> 00:05:04,970 we can actually add some lexicon information. 78 00:05:04,970 --> 00:05:07,030 And we do it in the following way. 79 00:05:07,030 --> 00:05:09,915 Remember the table that we have on the previous slide? 80 00:05:09,915 --> 00:05:17,175 Let's take two first words and let's take that column that corresponds to the word, 81 00:05:17,175 --> 00:05:23,805 and let's use one hot encoding to decode that BIOES letters into numbers, 82 00:05:23,805 --> 00:05:26,535 and we will use that vector and we will 83 00:05:26,535 --> 00:05:30,195 concatenate it with the embedding vector for the word, 84 00:05:30,195 --> 00:05:34,530 and we will use it as an input for our B directional LSTM, let's say. 85 00:05:34,530 --> 00:05:40,560 And this thing will predict tags for our slot tagger. 86 00:05:40,560 --> 00:05:43,680 So, this is like a pretty easy approach 87 00:05:43,680 --> 00:05:49,190 to embed that lexicon information into your model. 88 00:05:49,190 --> 00:05:51,615 Let's see how it works. 89 00:05:51,615 --> 00:05:58,180 It was bench-marked on the dataset for a Named Entity Recognition, 90 00:05:58,180 --> 00:06:00,775 and you can see that if you add lexicon, 91 00:06:00,775 --> 00:06:03,100 it actually improves your Precision, 92 00:06:03,100 --> 00:06:06,055 Recall and F1 measure a little bit, 93 00:06:06,055 --> 00:06:08,120 like one percent or something like that. 94 00:06:08,120 --> 00:06:11,200 So, it seems to work and it seems that 95 00:06:11,200 --> 00:06:14,635 it will be helpful to 96 00:06:14,635 --> 00:06:18,905 implement these lexicon features for your real world dialogue system. 97 00:06:18,905 --> 00:06:21,960 Let's look into some training details. 98 00:06:21,960 --> 00:06:25,765 You can sample your lexicon dictionaries so that your model learns 99 00:06:25,765 --> 00:06:30,110 not only the lexicon features but also the context of the words. 100 00:06:30,110 --> 00:06:31,855 Let's say, when I say, 101 00:06:31,855 --> 00:06:35,170 "Take me to San Francisco," that means that the word that 102 00:06:35,170 --> 00:06:39,530 comes after the phrase "take me to" is most likely a two-slot. 103 00:06:39,530 --> 00:06:44,575 And we want the model to learn those features as well because in real world, 104 00:06:44,575 --> 00:06:48,445 we can see entities that were not in our vocabulary before, 105 00:06:48,445 --> 00:06:50,725 and our lexicon features will not work. 106 00:06:50,725 --> 00:06:54,400 So, this sampling procedure actually gives 107 00:06:54,400 --> 00:06:59,200 you an ability to detect unknown entities during testing. 108 00:06:59,200 --> 00:07:01,275 So, this is a pretty cool approach. 109 00:07:01,275 --> 00:07:04,285 When you have the lexicon dictionaries, 110 00:07:04,285 --> 00:07:08,080 you can also augment your data set because you can replace 111 00:07:08,080 --> 00:07:13,290 the slot values by some other values from the same lexicon. 112 00:07:13,290 --> 00:07:15,850 Let's say, "Take me to San Francisco," 113 00:07:15,850 --> 00:07:19,300 becomes "Take me to Washington," because you can easily replace 114 00:07:19,300 --> 00:07:25,795 San Francisco's slot value with Washington because you have the lexicon dictionaries. 115 00:07:25,795 --> 00:07:27,720 So, let me summarize. 116 00:07:27,720 --> 00:07:31,465 You can add lexicon features to further improve your NLU 117 00:07:31,465 --> 00:07:35,960 because that will help you to detect the entities that the 118 00:07:35,960 --> 00:07:40,280 user mentions and some unknown and long entities like 119 00:07:40,280 --> 00:07:45,185 "South China area army" that can be detected. 120 00:07:45,185 --> 00:07:49,130 In the next video, we will take a look at Dialogue Manager.