1 00:00:00,000 --> 00:00:03,551 [MUSIC] 2 00:00:03,551 --> 00:00:07,890 Hi, in this video I want to overview what we have done this week. 3 00:00:09,210 --> 00:00:13,889 We have overviewed so-called task-oriented dialog systems. 4 00:00:13,889 --> 00:00:16,660 And our dialog system looks like the following. 5 00:00:16,660 --> 00:00:21,310 We get the speech from the user and we can convert it to text using ASR. 6 00:00:21,310 --> 00:00:24,280 Or we can get text like in chat bots. 7 00:00:24,280 --> 00:00:28,070 Then comes Natural Language Understanding that gives us intents and 8 00:00:28,070 --> 00:00:31,320 slots from that natural language. 9 00:00:31,320 --> 00:00:36,770 Then, there is a magic box called Dialog Manager, and it actually does two things. 10 00:00:36,770 --> 00:00:39,310 It tracks the dialog state and 11 00:00:39,310 --> 00:00:44,260 it learns the dialog policy, what should be done and what the user actually wants. 12 00:00:44,260 --> 00:00:50,250 The Dialog Manager can query a backend like Google Maps or Yelp or any other. 13 00:00:50,250 --> 00:00:52,830 And then it cast to say something to the user. 14 00:00:52,830 --> 00:00:56,130 And we need to convert the text from Dialogue Manager to speech 15 00:00:56,130 --> 00:00:58,640 with some Natural Language Generation. 16 00:00:58,640 --> 00:01:03,880 The red boxes here are the parts of the system that we don't overview because 17 00:01:03,880 --> 00:01:05,370 it will take a lot of time. 18 00:01:05,370 --> 00:01:08,520 And it can actually work without those systems. 19 00:01:08,520 --> 00:01:13,500 It can take the user input as text, so you will not need ASR. 20 00:01:13,500 --> 00:01:20,700 Then you can output your response to the user as a text as well. 21 00:01:20,700 --> 00:01:23,340 So we don't need Natural Language Generation. 22 00:01:23,340 --> 00:01:30,080 And sometimes you don't need Backend action to solve the user's task. 23 00:01:30,080 --> 00:01:34,700 We have overviewed in details Natural Language Understanding and Dialog Manager. 24 00:01:35,700 --> 00:01:40,191 And let me remind you, you can train slot tagger and intent classifier, 25 00:01:40,191 --> 00:01:41,816 which are basically NLU. 26 00:01:41,816 --> 00:01:46,300 And you can train them separately or jointly. 27 00:01:46,300 --> 00:01:50,760 And when you do that jointly, that yields better results. 28 00:01:50,760 --> 00:01:55,520 You can train NLU and Dialogue Manager separately or jointly, and 29 00:01:55,520 --> 00:01:58,040 it will give you better results as well. 30 00:01:58,040 --> 00:02:00,690 You can use hand-crafted rules sometimes. 31 00:02:00,690 --> 00:02:03,075 For example, for dialog policy over state tracking. 32 00:02:03,075 --> 00:02:07,980 But learning from data actually works better if you have time for that. 33 00:02:10,074 --> 00:02:14,350 Let me remind you how we evaluate NLU and Dialog Manager. 34 00:02:14,350 --> 00:02:19,509 For NLU, we use turn-level metrics like intent accuracy and slots F1. 35 00:02:19,509 --> 00:02:22,840 For Dialogue Manager, there are two kinds of metrics. 36 00:02:22,840 --> 00:02:25,650 The first is turn-level metrics. 37 00:02:25,650 --> 00:02:29,756 That means that after every turn in the dialogue, 38 00:02:29,756 --> 00:02:34,561 we track let's say, state accuracy or policy accuracy. 39 00:02:34,561 --> 00:02:39,086 And they're are dialog-level metrics like success rate, 40 00:02:39,086 --> 00:02:43,253 whether this dialog solved the problem of a user or not or 41 00:02:43,253 --> 00:02:47,611 what reward we got when we solved that problem of the user. 42 00:02:47,611 --> 00:02:52,577 The reward could be the number of turns and we want to minimize that turns, so 43 00:02:52,577 --> 00:02:55,460 that we solve that task for the user faster. 44 00:02:57,770 --> 00:03:00,590 And here, actually, is the question. 45 00:03:00,590 --> 00:03:02,780 We have NLU and Dialogue Manager. 46 00:03:02,780 --> 00:03:06,216 And if we train them separately, 47 00:03:06,216 --> 00:03:11,248 we want to understand how the errors of NLU affect 48 00:03:11,248 --> 00:03:15,558 the final quality of our Dialog Manager. 49 00:03:15,558 --> 00:03:21,155 Here, on the left, on the vertical axis, 50 00:03:21,155 --> 00:03:24,115 we have success rate. 51 00:03:24,115 --> 00:03:25,916 And on the right, on the same axis, 52 00:03:25,916 --> 00:03:29,020 we have average number of turns in the dialogue. 53 00:03:29,020 --> 00:03:33,080 And we have three colors in the legend. 54 00:03:33,080 --> 00:03:37,510 The blue one is when we don't have any NLU errors. 55 00:03:37,510 --> 00:03:41,891 The green one is when we have 10% of the errors in NLU and 56 00:03:41,891 --> 00:03:45,833 a red one is when we have 20% of errors in our NLU. 57 00:03:45,833 --> 00:03:47,770 And you can see what happens. 58 00:03:47,770 --> 00:03:49,986 When you have a huge error in NLU, 59 00:03:49,986 --> 00:03:53,830 the success rate of your task actually decreases. 60 00:03:53,830 --> 00:03:59,770 And the number of turns needed to solve that task where there was a success, 61 00:03:59,770 --> 00:04:01,060 actually increases. 62 00:04:01,060 --> 00:04:05,610 So it takes more time for the user to solve his task and 63 00:04:05,610 --> 00:04:08,130 the chance of solving that task is lower. 64 00:04:10,180 --> 00:04:14,830 But NLU actually consists of intent classifier and slot tagger. 65 00:04:14,830 --> 00:04:18,914 So let's see which one is more important. 66 00:04:18,914 --> 00:04:24,060 Let's look what happens when we change the Intent Error Rate. 67 00:04:24,060 --> 00:04:26,670 It looks like it doesn't effect the quality, 68 00:04:26,670 --> 00:04:29,070 the success rate of our dialogue that much. 69 00:04:29,070 --> 00:04:32,020 And the dialogues don't become that much longer. 70 00:04:32,020 --> 00:04:37,240 So it looks like intent error is not as 71 00:04:37,240 --> 00:04:42,090 important as slot tagging, and we will see now why. 72 00:04:42,090 --> 00:04:47,430 Because when you introduce the same amount of error in slot tagging, 73 00:04:47,430 --> 00:04:52,110 that actually decreases your success rate of the dialogue dramatically. 74 00:04:52,110 --> 00:04:57,450 And it seems that slot tagging error is actually the main 75 00:04:59,070 --> 00:05:02,820 problem of our success rate. 76 00:05:02,820 --> 00:05:06,010 So it looks like we need to concentrate on slot tagger. 77 00:05:06,010 --> 00:05:09,630 And that can give you some insight when you want to train a joint model. 78 00:05:09,630 --> 00:05:14,390 When you have a loss for intent and a loss for slot tagging. 79 00:05:14,390 --> 00:05:16,830 You can actually come up with some weights for them so 80 00:05:16,830 --> 00:05:19,710 that the intuition isn't following. 81 00:05:19,710 --> 00:05:24,730 It seems like a slot tagging loss should have a bigger weight 82 00:05:24,730 --> 00:05:29,920 because it is more important for the success of the whole dialogue. 83 00:05:29,920 --> 00:05:36,680 Let me summarize, we have overviewed how test-oriented dialogue system looks like. 84 00:05:36,680 --> 00:05:41,220 And we have overviewed in-depth NLU component and Dialog Manager component. 85 00:05:42,470 --> 00:05:46,060 So this is the basic knowledge that you will need to 86 00:05:46,060 --> 00:05:49,450 build your own task-oriented dialog system. 87 00:05:49,450 --> 00:05:53,409 So that's it for this week, I wish you good luck with your final project. 88 00:05:53,409 --> 00:06:03,409 [MUSIC]